Analysis Text of Hate Speech Detection Using Recurrent Neural Network

Author(s):  
Arum Sucia Saksesi ◽  
Muhammad Nasrun ◽  
Casi Setianingsih
2020 ◽  
Author(s):  
Surafel Getachew Tesfaye ◽  
Kula Kakeba

Abstract During the last few years, social activities over the internet especially on social media platforms increased drastically, but unfortunately, social networks have also become the place for hate speech proliferation by which most people’s social lives are disturbed because of hate speech posts and conflicts triggered by those posts. Studies confirm that online hate speech has different offline consequences. Even though there are a lot of researches on automated hate speech detection most of them are for other language and there is a scarcity of labeled data to apply automated analysis and detection methods on Amharic dataset. Therefore the research on automatic detection of hate speech posts attracted our attention. As a solution to those problems, this research aimed to prepare a labeled huge Amharic dataset by collecting posts and comments from selected Facebook pages of activists that participated actively. Those Facebook data sets are labeled manually as hate and free based on the guidelines given from researcher and pre-processed by applying data cleaning and normalization techniques. In this research the recurrent neural network models for automated hate speech posts detection from Amharic posts on Facebook is developed by using Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) with word n-grams for feature extraction and word2vec to represent each unique word by vector representation. The experiment conducted on those two models by using 80% of the data set for training and 10% for validation to train the model and to select the best hyper-parameters combination for automated hate speech posts detection. The remaining 10% of the dataset used for testing the model after training. As a result LSTM based RNN of Batch size 128, and learning rate 0.001 with RMSProp optimizer and 0.5 dropout achieves an accuracy of 97.9% to detect posts as hate speech or free by training with 100 epochs. Which is assured by testing the models using models performance test and inference on user-generated data.


Author(s):  
Brian Tuan Khieu ◽  
Melody Moh

This chapter presents a literature survey of the current state of hate speech detection models with a focus on neural network applications in the area. The growth and freedom of social media has facilitated the dissemination of positive and negative ideas. Proponents of hate speech are one of the key abusers of the privileges allotted by social media, and the companies behind these networks have a vested interest in identifying such speech. Manual moderation is too cumbersome and slow to deal with the torrent of content generation on these social media sites, which is why many have turned to machine learning. Neural network applications in this area have been very promising and yielded positive results. However, there are newly discovered and unaddressed problems with the current state of hate speech detection. Authors' survey identifies the key techniques and methods used in identifying hate speech, and they discuss promising new directions for the field as well as newly identified issues.


2020 ◽  
Vol 13 (4) ◽  
pp. 485-525
Author(s):  
Femi Emmanuel Ayo ◽  
Olusegun Folorunso ◽  
Friday Thomas Ibharalu ◽  
Idowu Ademola Osinuga

PurposeHate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with social media data has witnessed special research attention in recent studies, hence, the need to design a generic metadata architecture and efficient feature extraction technique to enhance hate speech detection.Design/methodology/approachThis study proposes a hybrid embeddings enhanced with a topic inference method and an improved cuckoo search neural network for hate speech detection in Twitter data. The proposed method uses a hybrid embeddings technique that includes Term Frequency-Inverse Document Frequency (TF-IDF) for word-level feature extraction and Long Short Term Memory (LSTM) which is a variant of recurrent neural networks architecture for sentence-level feature extraction. The extracted features from the hybrid embeddings then serve as input into the improved cuckoo search neural network for the prediction of a tweet as hate speech, offensive language or neither.FindingsThe proposed method showed better results when tested on the collected Twitter datasets compared to other related methods. In order to validate the performances of the proposed method, t-test and post hoc multiple comparisons were used to compare the significance and means of the proposed method with other related methods for hate speech detection. Furthermore, Paired Sample t-Test was also conducted to validate the performances of the proposed method with other related methods.Research limitations/implicationsFinally, the evaluation results showed that the proposed method outperforms other related methods with mean F1-score of 91.3.Originality/valueThe main novelty of this study is the use of an automatic topic spotting measure based on naïve Bayes model to improve features representation.


2021 ◽  
Author(s):  
Jee-weon Jung ◽  
Hee-Soo Heo ◽  
Youngki Kwon ◽  
Joon Son Chung ◽  
Bong-Jin Lee

2020 ◽  
Vol 10 (23) ◽  
pp. 8614 ◽  
Author(s):  
Raghad Alshalan ◽  
Hend Al-Khalifa

With the rise of hate speech phenomena in the Twittersphere, significant research efforts have been undertaken in order to provide automatic solutions for detecting hate speech, varying from simple machine learning models to more complex deep neural network models. Despite this, research works investigating hate speech problem in Arabic are still limited. This paper, therefore, aimed to investigate several neural network models based on convolutional neural network (CNN) and recurrent neural network (RNN) to detect hate speech in Arabic tweets. It also evaluated the recent language representation model bidirectional encoder representations from transformers (BERT) on the task of Arabic hate speech detection. To conduct our experiments, we firstly built a new hate speech dataset that contained 9316 annotated tweets. Then, we conducted a set of experiments on two datasets to evaluate four models: CNN, gated recurrent units (GRU), CNN + GRU, and BERT. Our experimental results in our dataset and an out-domain dataset showed that the CNN model gave the best performance, with an F1-score of 0.79 and area under the receiver operating characteristic curve (AUROC) of 0.89.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 204951-204962
Author(s):  
Pradeep Kumar Roy ◽  
Asis Kumar Tripathy ◽  
Tapan Kumar Das ◽  
Xiao-Zhi Gao

Sign in / Sign up

Export Citation Format

Share Document