To BAN or Not to BAN: Bayesian Attention Networks for Reliable Hate Speech Detection

Cognitive Computation ◽

10.1007/s12559-021-09826-9 ◽

2021 ◽

Author(s):

Kristian Miok ◽

Blaž Škrlj ◽

Daniela Zaharie ◽

Marko Robnik-Šikonja

Keyword(s):

Monte Carlo ◽

Hate Speech ◽

Classification Performance ◽

Reliability Estimation ◽

Superior Performance ◽

Speech Detection ◽

Attention Networks ◽

Reliability Estimates ◽

Viable Mechanism ◽

Affective Dimensions

AbstractHate speech is an important problem in the management of user-generated content. To remove offensive content or ban misbehaving users, content moderators need reliable hate speech detectors. Recently, deep neural networks based on the transformer architecture, such as the (multilingual) BERT model, have achieved superior performance in many natural language classification tasks, including hate speech detection. So far, these methods have not been able to quantify their output in terms of reliability. We propose a Bayesian method using Monte Carlo dropout within the attention layers of the transformer models to provide well-calibrated reliability estimates. We evaluate and visualize the results of the proposed approach on hate speech detection problems in several languages. Additionally, we test whether affective dimensions can enhance the information extracted by the BERT model in hate speech classification. Our experiments show that Monte Carlo dropout provides a viable mechanism for reliability estimation in transformer networks. Used within the BERT model, it offers state-of-the-art classification performance and can detect less trusted predictions.

Download Full-text

Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media

10.20944/preprints202011.0646.v1 ◽

2020 ◽

Author(s):

Neeraj Vashistha ◽

Arkaitz Zubiaga

Keyword(s):

Social Media ◽

Hate Speech ◽

Model Performance ◽

Academic Community ◽

Human Interaction ◽

Superior Performance ◽

Competitive Performance ◽

Speech Detection ◽

Improve Model ◽

Use Of The Internet

The exponential increase in the use of the Internet and social media over the last two decades has changed human interaction. This has led to many positive outcomes, but at the same time it has brought risks and harms. While the volume of harmful content online, such as hate speech, is not manageable by humans, interest in the academic community to investigate automated means for hate speech detection has increased. In this study, we analyse six publicly available datasets by combining them into a single homogeneous dataset and classify them into three classes, abusive, hateful or neither. We create a baseline model and we improve model performance scores using various optimisation techniques. After attaining a competitive performance score, we create a tool which identifies and scores a page with effective metric in near-real time and uses the same as feedback to re-train our model. We prove the competitive performance of our multilingual model on two langauges, English and Hindi, leading to comparable or superior performance to most monolingual models.

Download Full-text

Time of Your Hate: The Challenge of Time in Hate Speech Detection on Social Media

Applied Sciences ◽

10.3390/app10124180 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4180 ◽

Cited By ~ 2

Author(s):

Komal Florio ◽

Valerio Basile ◽

Marco Polignano ◽

Pierpaolo Basile ◽

Viviana Patti

Keyword(s):

Social Media ◽

Hate Speech ◽

Time Window ◽

Classification Performance ◽

Fine Tuning ◽

Classification Model ◽

Temporal Distance ◽

Speech Detection ◽

Highly Sensitive

The availability of large annotated corpora from social media and the development of powerful classification approaches have contributed in an unprecedented way to tackle the challenge of monitoring users’ opinions and sentiments in online social platforms across time. Such linguistic data are strongly affected by events and topic discourse, and this aspect is crucial when detecting phenomena such as hate speech, especially from a diachronic perspective. We address this challenge by focusing on a real case study: the “Contro l’odio” platform for monitoring hate speech against immigrants in the Italian Twittersphere. We explored the temporal robustness of a BERT model for Italian (AlBERTo), the current benchmark on non-diachronic detection settings. We tested different training strategies to evaluate how the classification performance is affected by adding more data temporally distant from the test set and hence potentially different in terms of topic and language use. Our analysis points out the limits that a supervised classification model encounters on data that are heavily influenced by events. Our results show how AlBERTo is highly sensitive to the temporal distance of the fine-tuning set. However, with an adequate time window, the performance increases, while requiring less annotated data than a traditional classifier.

Download Full-text

Ensemble Method for Indonesian Twitter Hate Speech Detection

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v11.i1.pp294-299 ◽

2018 ◽

Vol 11 (1) ◽

pp. 294 ◽

Cited By ~ 8

Author(s):

M. Ali Fauzi ◽

Anny Yuniarti

Keyword(s):

Social Media ◽

Hate Speech ◽

Ensemble Methods ◽

Classification Performance ◽

Ensemble Method ◽

Support Vector ◽

Web Content ◽

Speech Detection ◽

Social Media Networks ◽

Nearest Neighbours

Due to the massive increase of user-generated web content, in particular on social media networks where anyone can give a statement freely without any limitations, the amount of hateful activities is also increasing. Social media and microblogging web services, such as Twitter, allowing to read and analyze user tweets in near real time. Twitter is a logical source of data for hate speech analysis since users of twitter are more likely to express their emotions of an event by posting some tweet. This analysis can help for early identification of hate speech so it can be prevented to be spread widely. The manual way of classifying out hateful contents in twitter is costly and not scalable. Therefore, the automatic way of hate speech detection is needed to be developed for tweets in Indonesian language. In this study, we used ensemble method for hate speech detection in Indonesian language. We employed five stand-alone classification algorithms, including Naïve Bayes, K-Nearest Neighbours, Maximum Entropy, Random Forest, and Support Vector Machines, and two ensemble methods, hard voting and soft voting, on Twitter hate speech dataset. The experiment results showed that using ensemble method can improve the classification performance. The best result is achieved when using soft voting with F1 measure 79.8% on unbalance dataset and 84.7% on balanced dataset. Although the improvement is not truly remarkable, using ensemble method can reduce the jeopardy of choosing a poor classifier to be used for detecting new tweets as hate speech or not.

Download Full-text

Methods to split cognitive task data for estimating split-half reliability: A comprehensive review and systematic assessment

Psychonomic Bulletin & Review ◽

10.3758/s13423-021-01948-3 ◽

2021 ◽

Author(s):

Thomas Pronk ◽

Dylan Molenaar ◽

Reinout W. Wiers ◽

Jaap Murre

Keyword(s):

Monte Carlo ◽

Cognitive Task ◽

Splitting Method ◽

R Package ◽

Task Design ◽

Reliability Estimation ◽

Systematic Assessment ◽

Reliability Estimates ◽

Non Linear ◽

And Task

AbstractEstimating the reliability of cognitive task datasets is commonly done via split-half methods. We review four methods that differ in how the trials are split into parts: a first-second half split, an odd-even trial split, a permutated split, and a Monte Carlo-based split. Additionally, each splitting method could be combined with stratification by task design. These methods are reviewed in terms of the degree to which they are confounded with four effects that may occur in cognitive tasks: effects of time, task design, trial sampling, and non-linear scoring. Based on the theoretical review, we recommend Monte Carlo splitting (possibly in combination with stratification by task design) as being the most robust method with respect to the four confounds considered. Next, we estimated the reliabilities of the main outcome variables from four cognitive task datasets, each (typically) scored with a different non-linear algorithm, by systematically applying each splitting method. Differences between methods were interpreted in terms of confounding effects inflating or attenuating reliability estimates. For three task datasets, our findings were consistent with our model of confounding effects. Evidence for confounding effects was strong for time and task design and weak for non-linear scoring. When confounding effects occurred, they attenuated reliability estimates. For one task dataset, findings were inconsistent with our model but they may offer indicators for assessing whether a split-half reliability estimate is appropriate. Additionally, we make suggestions on further research of reliability estimation, supported by a compendium R package that implements each of the splitting methods reviewed here.

Download Full-text

Ensemble-based Semi-Supervised Learning for Hate Speech Detection

The International FLAIRS Conference Proceedings ◽

10.32473/flairs.v34i1.128427 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Safa Alsafari

Keyword(s):

Social Media ◽

Supervised Learning ◽

Hate Speech ◽

Classification Performance ◽

Media Content ◽

Learning Approach ◽

Classification Methods ◽

Speech Detection ◽

Speech Classification

Large and accurately labeled textual corpora are vital to developing efficient hate speech classifiers. This paper introduces an ensemble-based semi-supervised learning approach to leverage the availability of abundant social media content. Starting with a reliable hate speech dataset, we train and test diverse classifiers that are then used to label a corpus of one million tweets. Next, we investigate several strategies to select the most confident labels from the obtained pseudo labels. We assess these strategies by re-training all the classifiers with the seed dataset augmented with the trusted pseudo-labeled data. Finally, we demonstrate that our approach improves classification performance over supervised hate speech classification methods.

Download Full-text

BERT-BU12 Hate Speech Detection using Bidirectional Encoder-Decoder

International Journal of System Dynamics Applications ◽

10.4018/ijsda.20220801oa04 ◽

2022 ◽

Vol 11 (2) ◽

pp. 0-0

Keyword(s):

Text Classification ◽

Question Answering ◽

Hate Speech ◽

State Of The Art ◽

Learning Models ◽

Speech Detection ◽

Attention Networks ◽

Attention Model ◽

Proposed Model ◽

Novel Method

In the recent times transfer learning models have known to exhibited good results in the area of text classification for question-answering, summarization, next word prediction but these learning models have not been extensively used for the problem of hate speech detection yet. We anticipate that these networks may give better results in another task of text classification i.e. hate speech detection. This paper introduces a novel method of hate speech detection based on the concept of attention networks using the BERT attention model. We have conducted exhaustive experiments and evaluation over publicly available datasets using various evaluation metrics (precision, recall and F1 score). We show that our model outperforms all the state-of-the-art methods by almost 4%. We have also discussed in detail the technical challenges faced during the implementation of the proposed model.

Download Full-text

Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media

Information ◽

10.3390/info12010005 ◽

2020 ◽

Vol 12 (1) ◽

pp. 5

Author(s):

Neeraj Vashistha ◽

Arkaitz Zubiaga

Keyword(s):

Social Media ◽

Hate Speech ◽

Model Performance ◽

Academic Community ◽

Human Interaction ◽

Superior Performance ◽

Competitive Performance ◽

Speech Detection ◽

Improve Model ◽

Use Of The Internet

The last two decades have seen an exponential increase in the use of the Internet and social media, which has changed basic human interaction. This has led to many positive outcomes. At the same time, it has brought risks and harms. The volume of harmful content online, such as hate speech, is not manageable by humans. The interest in the academic community to investigate automated means for hate speech detection has increased. In this study, we analyse six publicly available datasets by combining them into a single homogeneous dataset. Having classified them into three classes, abusive, hateful or neither, we create a baseline model and improve model performance scores using various optimisation techniques. After attaining a competitive performance score, we create a tool that identifies and scores a page with an effective metric in near-real-time and uses the same feedback to re-train our model. We prove the competitive performance of our multilingual model in two languages, English and Hindi. This leads to comparable or superior performance to most monolingual models.

Download Full-text