Abstract:
The world is facing a new era in which social media communication plays a fundamental role in people’s lives. Along with proven benefits, several collateral drawbacks have risen, one being the widespread of false information with malicious intents, oftentimes using anonymous or false identities. Fighting this problem is challenging, especially when considering the nature of text messages involved on social media platforms: a sea of small messages and a myriad of users. Attributing the authorship of such messages is an ambitious endeavor; nevertheless, it is a way to fight this undesired disinformation scenario. In this work, we tackle the problem of authorship attribution of tiny messages, but, different from what has been done with longer texts, we rely upon data-driven approaches, avoiding handcraft features and harnessing recent advances of deep neural networks in the field of pattern recognition. By modeling small texts employed in social media as unidimensional signals, we propose a deep learning model to project these messages onto a manifold suitable for the task of authorship attribution. We provide two state-of-the-art solutions tailored for different setups and strategies for the scenario of authorship verification. These advances were possible, thanks to three additional contributions: an updated dataset based on the Twitter® platform, new sanitization techniques to improve the quality of the training data, and novel visual analytics techniques to help the development of authorship attribution solutions.