Text Based Hate-Speech Analysis

Text Based Hate-Speech Analysis

Abstract:

The definition of the term “hate speech” as per Oxford is `a speech that might involve abusive or threatening words which can have or can express pre-bias against a special community / group. This pre-bias can be anything like religion, race or sexual orientation, caste.' Hatred is generally based on ethnicity, religion, disability, gender, caste, and sexual orientation the internet and social media has become a powerful tool for such propagandist to spread hate and reach new audience. The anonymity and flexibility that the internet offers allow such haters to easily and safely propagate hate without any fear. Lack of regulation and legal policy worsens the situation a bit more. The need of the hour is for automated state of the art and scalable methods for hate speech detection and classification. This paper introduces two ensemble-based models in which the first one is based on Linear SVC, Logistic Regression, Random Forest and other based on Random Forest, KNN, Logistic Regression. Also, few deep learning models using self-trained and pre-trained word embeddings have been introduced for Twitter hate speech classification systems. The proposed research work has attempted to classify the tweets & assign them in one of the 3 categories i.e., Racist, Sexist or None.