Hate Speech Detection Using Text Mining and Machine Learning

Automatic hate speech detection on social media is becoming an outstanding concern in modern countries. Indeed, hate speech towards people brings about violent acts and social chaos, hence law prohibits it, and it engenders moral and legal implications. It is crucial that we can precisely categorize the hate speech, and not a hate speech automatically, while this allows us to identify easily real people who represent a threat for our society, and who wrongly regard as hateful speakers. In this paper, we applied a complete text mining process and Naïve Bayes machine learning classification algorithm to two different data sets (tweets_Num1 and tweets_Num2) taken from Twitter, to better classify tweets. The results obtained demonstrate that our model performed well regarding different metrics based on the confusion matrix including the accuracy metric, which achieved 87. 23% on the first dataset, and 93. 06% on the second.

Download Full-text

Sinhala Hate Speech Detection in Social Media using Text Mining and Machine learning

2019 19th International Conference on Advances in ICT for Emerging Regions (ICTer) ◽

10.1109/icter48817.2019.9023655 ◽

2019 ◽

Author(s):

H.M.S.T Sandaruwan ◽

S.A.S Lorensuhewa ◽

M.A.L Kalyani

Keyword(s):

Machine Learning ◽

Social Media ◽

Text Mining ◽

Hate Speech ◽

Speech Detection

Download Full-text

A Web Interface for Analyzing Hate Speech

Future Internet ◽

10.3390/fi13030080 ◽

2021 ◽

Vol 13 (3) ◽

pp. 80

Author(s):

Lazaros Vrysis ◽

Nikolaos Vryzas ◽

Rigas Kotsakis ◽

Theodora Saridou ◽

Maria Matsiola ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Graphical User Interface ◽

Hate Speech ◽

Web Interface ◽

Learning Models ◽

Speech Detection ◽

Media Services ◽

The Web ◽

Machine Learning Models

Social media services make it possible for an increasing number of people to express their opinion publicly. In this context, large amounts of hateful comments are published daily. The PHARM project aims at monitoring and modeling hate speech against refugees and migrants in Greece, Italy, and Spain. In this direction, a web interface for the creation and the query of a multi-source database containing hate speech-related content is implemented and evaluated. The selected sources include Twitter, YouTube, and Facebook comments and posts, as well as comments and articles from a selected list of websites. The interface allows users to search in the existing database, scrape social media using keywords, annotate records through a dedicated platform and contribute new content to the database. Furthermore, the functionality for hate speech detection and sentiment analysis of texts is provided, making use of novel methods and machine learning models. The interface can be accessed online with a graphical user interface compatible with modern internet browsers. For the evaluation of the interface, a multifactor questionnaire was formulated, targeting to record the users’ opinions about the web interface and the corresponding functionality.

Download Full-text

YouTube based religious hate speech and extremism detection dataset with machine learning baselines

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219264 ◽

2021 ◽

pp. 1-9

Author(s):

Noman Ashraf ◽

Abid Rafiq ◽

Sabur Butt ◽

Hafiz Muhammad Faisal Shehzad ◽

Grigori Sidorov ◽

...

Keyword(s):

Machine Learning ◽

Social Networking ◽

Social Networking Sites ◽

Nearest Neighbor ◽

Hate Speech ◽

Support Vector ◽

K Nearest Neighbor ◽

Speech Detection ◽

Supervised Learning Algorithms ◽

Youtube Videos

On YouTube, billions of videos are watched online and millions of short messages are posted each day. YouTube along with other social networking sites are used by individuals and extremist groups for spreading hatred among users. In this paper, we consider religion as the most targeted domain for spreading hate speech among people of different religions. We present a methodology for the detection of religion-based hate videos on YouTube. Messages posted on YouTube videos generally express the opinions of users’ related to that video. We provide a novel dataset for religious hate speech detection on Youtube comments. The proposed methodology applies data mining techniques on extracted comments from religious videos in order to filter religion-oriented messages and detect those videos which are used for spreading hate. The supervised learning algorithms: Support Vector Machine (SVM), Logistic Regression (LR), and k-Nearest Neighbor (k-NN) are used for baseline results.

Download Full-text