scholarly journals Automated Amharic Hate Speech Posts and Comments Detection Model Using Recurrent Neural Network

Author(s):  
Surafel Getachew Tesfaye ◽  
Kula Kakeba

Abstract During the last few years, social activities over the internet especially on social media platforms increased drastically, but unfortunately, social networks have also become the place for hate speech proliferation by which most people’s social lives are disturbed because of hate speech posts and conflicts triggered by those posts. Studies confirm that online hate speech has different offline consequences. Even though there are a lot of researches on automated hate speech detection most of them are for other language and there is a scarcity of labeled data to apply automated analysis and detection methods on Amharic dataset. Therefore the research on automatic detection of hate speech posts attracted our attention. As a solution to those problems, this research aimed to prepare a labeled huge Amharic dataset by collecting posts and comments from selected Facebook pages of activists that participated actively. Those Facebook data sets are labeled manually as hate and free based on the guidelines given from researcher and pre-processed by applying data cleaning and normalization techniques. In this research the recurrent neural network models for automated hate speech posts detection from Amharic posts on Facebook is developed by using Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) with word n-grams for feature extraction and word2vec to represent each unique word by vector representation. The experiment conducted on those two models by using 80% of the data set for training and 10% for validation to train the model and to select the best hyper-parameters combination for automated hate speech posts detection. The remaining 10% of the dataset used for testing the model after training. As a result LSTM based RNN of Batch size 128, and learning rate 0.001 with RMSProp optimizer and 0.5 dropout achieves an accuracy of 97.9% to detect posts as hate speech or free by training with 100 epochs. Which is assured by testing the models using models performance test and inference on user-generated data.

2019 ◽  
Vol 8 (4) ◽  
pp. 3152-3158

With the digitization, the importance of content writing is being increased. This is due to the huge improvement in accessibility and the major impact of digital content on human beings. Due to veracity and huge demand for digital content, author profiling becomes a necessity to identify the correct person for particular content writing. This paper works on deep neural network models to identify the gender of author for any particular content. The analysis has been done on the corpus dataset by using artificial neural networks with different number of layers, long short term memory based Recurrent Neural Network (RNN), bidirectional long short term memory based RNN and attention-based RNN models using mean absolute error, root mean square error, accuracy, and loss as analysis parameters. The results of different epochs show the significance of each model.


Author(s):  
Baoquan Wang ◽  
Tonghai Jiang ◽  
Xi Zhou ◽  
Bo Ma ◽  
Fan Zhao ◽  
...  

For abnormal detection of time series data, the supervised anomaly detection methods require labeled data. While the range of outlier factors used by the existing semi-supervised methods varies with data, model and time, the threshold for determining abnormality is difficult to obtain, in addition, the computational cost of the way to calculate outlier factors from other data points in the data set is also very large. These make such methods difficult to practically apply. This paper proposes a framework named LSTM-VE which uses clustering combined with visualization method to roughly label normal data, and then uses the normal data to train long short-term memory (LSTM) neural network for semi-supervised anomaly detection. The variance error (VE) of the normal data category classification probability sequence is used as outlier factor. The framework enables anomaly detection based on deep learning to be practically applied and using VE avoids the shortcomings of existing outlier factors and gains a better performance. In addition, the framework is easy to expand because the LSTM neural network can be replaced with other classification models. Experiments on the labeled and real unlabeled data sets prove that the framework is better than replicator neural networks with reconstruction error (RNN-RS) and has good scalability as well as practicability.


2020 ◽  
Vol 13 (4) ◽  
pp. 485-525
Author(s):  
Femi Emmanuel Ayo ◽  
Olusegun Folorunso ◽  
Friday Thomas Ibharalu ◽  
Idowu Ademola Osinuga

PurposeHate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with social media data has witnessed special research attention in recent studies, hence, the need to design a generic metadata architecture and efficient feature extraction technique to enhance hate speech detection.Design/methodology/approachThis study proposes a hybrid embeddings enhanced with a topic inference method and an improved cuckoo search neural network for hate speech detection in Twitter data. The proposed method uses a hybrid embeddings technique that includes Term Frequency-Inverse Document Frequency (TF-IDF) for word-level feature extraction and Long Short Term Memory (LSTM) which is a variant of recurrent neural networks architecture for sentence-level feature extraction. The extracted features from the hybrid embeddings then serve as input into the improved cuckoo search neural network for the prediction of a tweet as hate speech, offensive language or neither.FindingsThe proposed method showed better results when tested on the collected Twitter datasets compared to other related methods. In order to validate the performances of the proposed method, t-test and post hoc multiple comparisons were used to compare the significance and means of the proposed method with other related methods for hate speech detection. Furthermore, Paired Sample t-Test was also conducted to validate the performances of the proposed method with other related methods.Research limitations/implicationsFinally, the evaluation results showed that the proposed method outperforms other related methods with mean F1-score of 91.3.Originality/valueThe main novelty of this study is the use of an automatic topic spotting measure based on naïve Bayes model to improve features representation.


2020 ◽  
Vol 10 (23) ◽  
pp. 8614 ◽  
Author(s):  
Raghad Alshalan ◽  
Hend Al-Khalifa

With the rise of hate speech phenomena in the Twittersphere, significant research efforts have been undertaken in order to provide automatic solutions for detecting hate speech, varying from simple machine learning models to more complex deep neural network models. Despite this, research works investigating hate speech problem in Arabic are still limited. This paper, therefore, aimed to investigate several neural network models based on convolutional neural network (CNN) and recurrent neural network (RNN) to detect hate speech in Arabic tweets. It also evaluated the recent language representation model bidirectional encoder representations from transformers (BERT) on the task of Arabic hate speech detection. To conduct our experiments, we firstly built a new hate speech dataset that contained 9316 annotated tweets. Then, we conducted a set of experiments on two datasets to evaluate four models: CNN, gated recurrent units (GRU), CNN + GRU, and BERT. Our experimental results in our dataset and an out-domain dataset showed that the CNN model gave the best performance, with an F1-score of 0.79 and area under the receiver operating characteristic curve (AUROC) of 0.89.


Water ◽  
2019 ◽  
Vol 11 (4) ◽  
pp. 865 ◽  
Author(s):  
Di Zhang ◽  
Qidong Peng ◽  
Junqiang Lin ◽  
Dongsheng Wang ◽  
Xuefei Liu ◽  
...  

The reservoir is an important hydraulic engineering measure for human utilization and management of water resources. Additionally, a reasonable and effective reservoir operating plan is essential for realizing reservoir function. To explore the application of a deep learning algorithm on the field of reservoir operations, a recurrent neural network (RNN), long short-term memory (LSTM), and gated recurrent unit (GRU) are employed to predict outflows for the Xiluodu (XLD) reservoir. Meanwhile, this paper summarized the law of the effect of parameter setting on model performance compared to the simulation performance of three models, and analyzed the main factors that affect reservoir operation to provide the reference for future model of application research. Results show (1) the number of iterations and hidden nodes mainly influence the model precision, and the former has more effect than the latter, and the batch size mainly affects the calculated speed; (2) all three models can predict the reservoir outflow accurately and efficiently; (3) the operating decision generated by three models can implement the flood control and power generation goal of the reservoir and meet the operating regulation; and (4) under different hydrological periods, the influence factors of reservoir operation and their importance are different.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Zhi-Ying Xie ◽  
Yuan-Rong He ◽  
Chih-Cheng Chen ◽  
Qing-Quan Li ◽  
Chia-Chun Wu

Accurate predictions of bus arrival times help passengers arrange their trips easily and flexibly and improve travel efficiency. Thus, it is important to manage and schedule the arrival times of buses for the efficient deployment of buses and to ease traffic congestion, which improves the service quality of the public transport system. However, due to many variables disturbing the scheduled transportation, accurate prediction is challenging. For accurate prediction of the arrival time of a bus, this research adopted a recurrent neural network (RNN). For the prediction, the variables affecting the bus arrival time were investigated from the data set containing the route, a driver, weather, and the schedule. Then, a stacked multilayer RNN model was created with the variables that were categorized into four groups. The RNN model with a separate multi-input and spatiotemporal sequence model was applied to the data of the arrival and leaving times of a bus from all of a Shandong Linyi bus route. The result of the model simulation revealed that the convolutional long short-term memory (ConvLSTM) model showed the highest accuracy among the tested models. The propagation of error and the number of prediction steps influenced the prediction accuracy.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Karun Thanjavur ◽  
Arif Babul ◽  
Brandon Foran ◽  
Maya Bielecki ◽  
Adam Gilchrist ◽  
...  

AbstractConcussion is a global health concern. Despite its high prevalence, a sound understanding of the mechanisms underlying this type of diffuse brain injury remains elusive. It is, however, well established that concussions cause significant functional deficits; that children and youths are disproportionately affected and have longer recovery time than adults; and that individuals suffering from a concussion are more prone to experience additional concussions, with each successive injury increasing the risk of long term neurological and mental health complications. Currently, the most significant challenge in concussion management is the lack of objective, clinically- accepted, brain-based approaches for determining whether an athlete has suffered a concussion. Here, we report on our efforts to address this challenge. Specifically, we introduce a deep learning long short-term memory (LSTM)-based recurrent neural network that is able to distinguish between non-concussed and acute post-concussed adolescent athletes using only short (i.e. 90 s long) samples of resting state EEG data as input. The athletes were neither required to perform a specific task nor expected to respond to a stimulus during data collection. The acquired EEG data were neither filtered, cleaned of artefacts, nor subjected to explicit feature extraction. The LSTM network was trained and validated using data from 27 male, adolescent athletes with sports related concussion, benchmarked against 35 non-concussed adolescent athletes. During rigorous testing, the classifier consistently identified concussions with an accuracy of > 90% and achieved an ensemble median Area Under the Receiver Operating Characteristic Curve (ROC/AUC) equal to 0.971. This is the first instance of a high-performing classifier that relies only on easy-to-acquire resting state, raw EEG data. Our concussion classifier represents a promising first step towards the development of an easy-to-use, objective, brain-based, automatic classification of concussion at an individual level.


Sign in / Sign up

Export Citation Format

Share Document