Abstract:
Misogyny is a serious social problem that affects the mental and physical health of women and can even lead to femicide. This problem is visible and prevalent in different communication channels, such as music and social networks, encouraging and reinforcing this harmful behavior. Given this situation, the automatic detection of misogynistic content on social networks is a task of increasing interest. In this regard, most current computational approaches employ a supervised machine learning strategy. The main challenge is to capture the diversity and complexity of offensive language directed at women. Accordingly, the size and quality of training data play a fundamental role in the results of the methods. In this paper, we propose a novel data augmentation approach that takes advantage of song lyrics to increase the generalization capability of methods and improve their performance. Hence, we present a methodology for automatically compiling a corpus of song phrases that show abusive and explicit words against women. The proposed approach was evaluated using English and Spanish benchmark datasets, obtaining results that outperform conventional transfer learning techniques and achieve high competitiveness compared with state-of-the-art methods.