Bangla hate speech detection on social media using attention-based recurrent neural network

Abstract Hate speech has spread more rapidly through the daily use of technology and, most notably, by sharing your opinions or feelings on social media in a negative aspect. Although numerous works have been carried out in detecting hate speeches in English, German, and other languages, very few works have been carried out in the context of the Bengali language. In contrast, millions of people communicate on social media in Bengali. The few existing works that have been carried out need improvements in both accuracy and interpretability. This article proposed encoder–decoder-based machine learning model, a popular tool in NLP, to classify user’s Bengali comments from Facebook pages. A dataset of 7,425 Bengali comments, consisting of seven distinct categories of hate speeches, was used to train and evaluate our model. For extracting and encoding local features from the comments, 1D convolutional layers were used. Finally, the attention mechanism, LSTM, and GRU-based decoders have been used for predicting hate speech categories. Among the three encoder–decoder algorithms, attention-based decoder obtained the best accuracy (77%).

Download Full-text

A Web Interface for Analyzing Hate Speech

Future Internet ◽

10.3390/fi13030080 ◽

2021 ◽

Vol 13 (3) ◽

pp. 80

Author(s):

Lazaros Vrysis ◽

Nikolaos Vryzas ◽

Rigas Kotsakis ◽

Theodora Saridou ◽

Maria Matsiola ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Graphical User Interface ◽

Hate Speech ◽

Web Interface ◽

Learning Models ◽

Speech Detection ◽

Media Services ◽

The Web ◽

Machine Learning Models

Social media services make it possible for an increasing number of people to express their opinion publicly. In this context, large amounts of hateful comments are published daily. The PHARM project aims at monitoring and modeling hate speech against refugees and migrants in Greece, Italy, and Spain. In this direction, a web interface for the creation and the query of a multi-source database containing hate speech-related content is implemented and evaluated. The selected sources include Twitter, YouTube, and Facebook comments and posts, as well as comments and articles from a selected list of websites. The interface allows users to search in the existing database, scrape social media using keywords, annotate records through a dedicated platform and contribute new content to the database. Furthermore, the functionality for hate speech detection and sentiment analysis of texts is provided, making use of novel methods and machine learning models. The interface can be accessed online with a graphical user interface compatible with modern internet browsers. For the evaluation of the interface, a multifactor questionnaire was formulated, targeting to record the users’ opinions about the web interface and the corresponding functionality.

Download Full-text

Neural Network Applications in Hate Speech Detection

Advances in Computer and Electrical Engineering - Neural Networks for Natural Language Processing ◽

10.4018/978-1-7998-1159-6.ch012 ◽

2020 ◽

pp. 188-204

Author(s):

Brian Tuan Khieu ◽

Melody Moh

Keyword(s):

Neural Network ◽

Social Media ◽

Hate Speech ◽

Speech Detection ◽

Network Applications ◽

Current State ◽

New Directions ◽

Key Techniques ◽

Neural Network Applications ◽

Positive Results

This chapter presents a literature survey of the current state of hate speech detection models with a focus on neural network applications in the area. The growth and freedom of social media has facilitated the dissemination of positive and negative ideas. Proponents of hate speech are one of the key abusers of the privileges allotted by social media, and the companies behind these networks have a vested interest in identifying such speech. Manual moderation is too cumbersome and slow to deal with the torrent of content generation on these social media sites, which is why many have turned to machine learning. Neural network applications in this area have been very promising and yielded positive results. However, there are newly discovered and unaddressed problems with the current state of hate speech detection. Authors' survey identifies the key techniques and methods used in identifying hate speech, and they discuss promising new directions for the field as well as newly identified issues.

Download Full-text

Comparison Between Traditional Machine Learning Models And Neural Network Models For Vietnamese Hate Speech Detection

2020 RIVF International Conference on Computing and Communication Technologies (RIVF) ◽

10.1109/rivf48685.2020.9140745 ◽

2020 ◽

Cited By ~ 2

Author(s):

Son T. Luu ◽

Hung P. Nguyen ◽

Kiet Van Nguyen ◽

Ngan Luu-Thuy Nguyen

Keyword(s):

Neural Network ◽

Machine Learning ◽

Hate Speech ◽

Network Models ◽

Learning Models ◽

Neural Network Models ◽

Speech Detection ◽

Machine Learning Models

Download Full-text

Advances in Machine Learning Algorithms for Hate Speech Detection in Social Media: A Review

IEEE Access ◽

10.1109/access.2021.3089515 ◽

2021 ◽

pp. 1-1

Author(s):

Nanlir Sallau Mullah ◽

Wan Mohd Nazmee Wan Zainon

Keyword(s):

Machine Learning ◽

Social Media ◽

Hate Speech ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Speech Detection

Download Full-text

Sinhala Hate Speech Detection in Social Media using Text Mining and Machine learning

2019 19th International Conference on Advances in ICT for Emerging Regions (ICTer) ◽

10.1109/icter48817.2019.9023655 ◽

2019 ◽

Author(s):

H.M.S.T Sandaruwan ◽

S.A.S Lorensuhewa ◽

M.A.L Kalyani

Keyword(s):

Machine Learning ◽

Social Media ◽

Text Mining ◽

Hate Speech ◽

Speech Detection

Download Full-text

Analyzing Social Media Opinions Using Hybrid Machine Learning Model Based on Artificial Neural Network Optimized by Particle Swarm Optimization

Advances in Intelligent Systems and Computing - Advanced Intelligent Systems for Sustainable Development (AI2SD’2019) ◽

10.1007/978-3-030-36674-2_13 ◽

2020 ◽

pp. 123-131

Author(s):

Youness Khourdifi ◽

Mohamed Bahaj

Keyword(s):

Neural Network ◽

Machine Learning ◽

Social Media ◽

Artificial Neural Network ◽

Particle Swarm Optimization ◽

Particle Swarm ◽

Learning Model ◽

Swarm Optimization ◽

Machine Learning Model ◽

Hybrid Machine

Download Full-text

Lexicon-Based Indonesian Local Language Abusive Words Dictionary to Detect Hate Speech in Social Media

Journal of Information Systems Engineering and Business Intelligence ◽

10.20473/jisebi.6.1.9-17 ◽

2020 ◽

Vol 6 (1) ◽

pp. 9

Author(s):

Mardhiya Hayaty ◽

Sumarni Adi ◽

Anggit Dwi Hartanto

Keyword(s):

Machine Learning ◽

Social Media ◽

Random Sampling ◽

Hate Speech ◽

Sampling Technique ◽

Stratified Random Sampling ◽

Speech Detection ◽

Or Groups ◽

Machine Learning Approach ◽

Local Languages

Background: Hate speech is an expression to someone or a group of people that contain feelings of hate and/or anger at people or groups. On social media users are free to express themselves by writing harsh words and share them with a group of people so that it triggers separations and conflicts between groups. Currently, research has been conducted by several experts to detect hate speech in social media namely machine learning-based and lexicon-based, but the machine learning approach has a weakness namely the manual labelling process by an annotator in separating positive, negative or neutral opinions takes time long and tiringObjective: This study aims to produce a dictionary containing abusive words from local languages in Indonesia. Lexicon-base is very dependent on the language contained in dictionary words. Indonesia has thousands of tribes with 2500 local languages, and 80% of the population of Indonesia use local languages in communication, with the result that a significant challenge to detect hate speech of social media.Methods: Abusive words surveys are conducted by using proportionate stratified random sampling techniques in 4 major tribes on the island of Java, namely Betawi, Sundanese, Javanese, MadureseResults: The experimental results produce 250 abusive words dictionary from 4 major Indonesian tribes to detect hate speech in Indonesian social media by using the lexicon-based approach. Conclusion: A stratified random sampling technique has been conducted in 4 major Indonesian tribes to produce 250 abusive words for hate speech detection using the lexicon-based approach.

Download Full-text

Dynamics of online hate and misinformation

Scientific Reports ◽

10.1038/s41598-021-01487-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Matteo Cinelli ◽

Andraž Pelicon ◽

Igor Mozetič ◽

Walter Quattrociocchi ◽

Petra Kralj Novak ◽

...

Keyword(s):

Machine Learning ◽

Hate Speech ◽

Learning Model ◽

Large Set ◽

Speech Detection ◽

Machine Learning Model ◽

Echo Chamber ◽

Youtube Videos

AbstractOnline debates are often characterised by extreme polarisation and heated discussions among users. The presence of hate speech online is becoming increasingly problematic, making necessary the development of appropriate countermeasures. In this work, we perform hate speech detection on a corpus of more than one million comments on YouTube videos through a machine learning model, trained and fine-tuned on a large set of hand-annotated data. Our analysis shows that there is no evidence of the presence of “pure haters”, meant as active users posting exclusively hateful comments. Moreover, coherently with the echo chamber hypothesis, we find that users skewed towards one of the two categories of video channels (questionable, reliable) are more prone to use inappropriate, violent, or hateful language within their opponents’ community. Interestingly, users loyal to reliable sources use on average a more toxic language than their counterpart. Finally, we find that the overall toxicity of the discussion increases with its length, measured both in terms of the number of comments and time. Our results show that, coherently with Godwin’s law, online debates tend to degenerate towards increasingly toxic exchanges of views.

Download Full-text

Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques

Applied Sciences ◽

10.3390/app11188575 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8575

Author(s):

Sudhir Kumar Mohapatra ◽

Srinivas Prasad ◽

Dwiti Krishna Bebarta ◽

Tapan Kumar Das ◽

Kathiravan Srinivasan ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Hate Speech ◽

Learning Algorithm ◽

Machine Learning Techniques ◽

Mixed Data ◽

Support Vector ◽

Speech Detection ◽

Detection Model ◽

Feature Based

Hate speech on social media may spread quickly through online users and subsequently, may even escalate into local vile violence and heinous crimes. This paper proposes a hate speech detection model by means of machine learning and text mining feature extraction techniques. In this study, the authors collected the hate speech of English-Odia code mixed data from a Facebook public page and manually organized them into three classes. In order to build binary and ternary datasets, the data are further converted into binary classes. The modeling of hate speech employs the combination of a machine learning algorithm and features extraction. Support vector machine (SVM), naïve Bayes (NB) and random forest (RF) models were trained using the whole dataset, with the extracted feature based on word unigram, bigram, trigram, combined n-grams, term frequency-inverse document frequency (TF-IDF), combined n-grams weighted by TF-IDF and word2vec for both the datasets. Using the two datasets, we developed two kinds of models with each feature—binary models and ternary models. The models based on SVM with word2vec achieved better performance than the NB and RF models for both the binary and ternary categories. The result reveals that the ternary models achieved less confusion between hate and non-hate speech than the binary models.

Download Full-text

Multi-channel Convolutional Neural Network for Hate Speech Detection in Social Media

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Advances of Science and Technology ◽

10.1007/978-3-030-93709-6_41 ◽

2022 ◽

pp. 603-618

Author(s):

Zeleke Abebaw ◽

Andreas Rauber ◽

Solomon Atnafu

Keyword(s):

Neural Network ◽

Social Media ◽

Convolutional Neural Network ◽

Hate Speech ◽

Speech Detection

Download Full-text