scholarly journals Challenges of Hate Speech Detection in Social Media

2021 ◽  
Vol 2 (2) ◽  
Author(s):  
György Kovács ◽  
Pedro Alonso ◽  
Rajkumar Saini

AbstractThe detection of hate speech in social media is a crucial task. The uncontrolled spread of hate has the potential to gravely damage our society, and severely harm marginalized people or groups. A major arena for spreading hate speech online is social media. This significantly contributes to the difficulty of automatic detection, as social media posts include paralinguistic signals (e.g. emoticons, and hashtags), and their linguistic content contains plenty of poorly written text. Another difficulty is presented by the context-dependent nature of the task, and the lack of consensus on what constitutes as hate speech, which makes the task difficult even for humans. This makes the task of creating large labeled corpora difficult, and resource consuming. The problem posed by ungrammatical text has been largely mitigated by the recent emergence of deep neural network (DNN) architectures that have the capacity to efficiently learn various features. For this reason, we proposed a deep natural language processing (NLP) model—combining convolutional and recurrent layers—for the automatic detection of hate speech in social media data. We have applied our model on the HASOC2019 corpus, and attained a macro F1 score of 0.63 in hate speech detection on the test set of HASOC. The capacity of DNNs for efficient learning, however, also means an increased risk of overfitting. Particularly, with limited training data available (as was the case for HASOC). For this reason, we investigated different methods for expanding resources used. We have explored various opportunities, such as leveraging unlabeled data, similarly labeled corpora, as well as the use of novel models. Our results showed that by doing so, it was possible to significantly increase the classification score attained.

Author(s):  
Fabio Poletto ◽  
Valerio Basile ◽  
Manuela Sanguinetti ◽  
Cristina Bosco ◽  
Viviana Patti

Abstract Hate Speech in social media is a complex phenomenon, whose detection has recently gained significant traction in the Natural Language Processing community, as attested by several recent review works. Annotated corpora and benchmarks are key resources, considering the vast number of supervised approaches that have been proposed. Lexica play an important role as well for the development of hate speech detection systems. In this review, we systematically analyze the resources made available by the community at large, including their development methodology, topical focus, language coverage, and other factors. The results of our analysis highlight a heterogeneous, growing landscape, marked by several issues and venues for improvement.


2021 ◽  
Vol 5 (7) ◽  
pp. 34
Author(s):  
Konstantinos Perifanos ◽  
Dionysis Goutsos

Hateful and abusive speech presents a major challenge for all online social media platforms. Recent advances in Natural Language Processing and Natural Language Understanding allow for more accurate detection of hate speech in textual streams. This study presents a new multimodal approach to hate speech detection by combining Computer Vision and Natural Language processing models for abusive context detection. Our study focuses on Twitter messages and, more specifically, on hateful, xenophobic, and racist speech in Greek aimed at refugees and migrants. In our approach, we combine transfer learning and fine-tuning of Bidirectional Encoder Representations from Transformers (BERT) and Residual Neural Networks (Resnet). Our contribution includes the development of a new dataset for hate speech classification, consisting of tweet IDs, along with the code to obtain their visual appearance, as they would have been rendered in a web browser. We have also released a pre-trained Language Model trained on Greek tweets, which has been used in our experiments. We report a consistently high level of accuracy (accuracy score = 0.970, f1-score = 0.947 in our best model) in racist and xenophobic speech detection.


2021 ◽  
Vol 13 (3) ◽  
pp. 80
Author(s):  
Lazaros Vrysis ◽  
Nikolaos Vryzas ◽  
Rigas Kotsakis ◽  
Theodora Saridou ◽  
Maria Matsiola ◽  
...  

Social media services make it possible for an increasing number of people to express their opinion publicly. In this context, large amounts of hateful comments are published daily. The PHARM project aims at monitoring and modeling hate speech against refugees and migrants in Greece, Italy, and Spain. In this direction, a web interface for the creation and the query of a multi-source database containing hate speech-related content is implemented and evaluated. The selected sources include Twitter, YouTube, and Facebook comments and posts, as well as comments and articles from a selected list of websites. The interface allows users to search in the existing database, scrape social media using keywords, annotate records through a dedicated platform and contribute new content to the database. Furthermore, the functionality for hate speech detection and sentiment analysis of texts is provided, making use of novel methods and machine learning models. The interface can be accessed online with a graphical user interface compatible with modern internet browsers. For the evaluation of the interface, a multifactor questionnaire was formulated, targeting to record the users’ opinions about the web interface and the corresponding functionality.


2021 ◽  
Author(s):  
Lucas Rodrigues ◽  
Antonio Jacob Junior ◽  
Fábio Lobato

Posts with defamatory content or hate speech are constantly foundon social media. The results for readers are numerous, not restrictedonly to the psychological impact, but also to the growth of thissocial phenomenon. With the General Law on the Protection ofPersonal Data and the Marco Civil da Internet, service providersbecame responsible for the content in their platforms. Consideringthe importance of this issue, this paper aims to analyze the contentpublished (news and comments) on the G1 News Portal with techniquesbased on data visualization and Natural Language Processing,such as sentiment analysis and topic modeling. The results showthat even with most of the comments being neutral or negative andclassified or not as hate speech, the majority of them were acceptedby the users.


Author(s):  
Neeraj Vashistha ◽  
Arkaitz Zubiaga

The exponential increase in the use of the Internet and social media over the last two decades has changed human interaction. This has led to many positive outcomes, but at the same time it has brought risks and harms. While the volume of harmful content online, such as hate speech, is not manageable by humans, interest in the academic community to investigate automated means for hate speech detection has increased. In this study, we analyse six publicly available datasets by combining them into a single homogeneous dataset and classify them into three classes, abusive, hateful or neither. We create a baseline model and we improve model performance scores using various optimisation techniques. After attaining a competitive performance score, we create a tool which identifies and scores a page with effective metric in near-real time and uses the same as feedback to re-train our model. We prove the competitive performance of our multilingual model on two langauges, English and Hindi, leading to comparable or superior performance to most monolingual models.


Author(s):  
Ainhoa Serna ◽  
Jon Kepa Gerrikagoitia

In recent years, digital technology and research methods have developed natural language processing for better understanding consumers and what they share in social media. There are hardly any studies in transportation analysis with TripAdvisor, and moreover, there is not a complete analysis from the point of view of sentiment analysis. The aim of study is to investigate and discover the presence of sustainable transport modes underlying in non-categorized TripAdvisor texts, such as walking mobility in order to impact positively in public services and businesses. The methodology follows a quantitative and qualitative approach based on knowledge discovery techniques. Thus, data gathering, normalization, classification, polarity analysis, and labelling tasks have been carried out to obtain sentiment labelled training data set in the transport domain as a valuable contribution for predictive analytics. This research has allowed the authors to discover sustainable transport modes underlying the texts, focused on walking mobility but extensible to other means of transport and social media sources.


PLoS ONE ◽  
2020 ◽  
Vol 15 (8) ◽  
pp. e0237861
Author(s):  
Marzieh Mozafari ◽  
Reza Farahbakhsh ◽  
Noël Crespi

2020 ◽  
Vol 10 (12) ◽  
pp. 4180 ◽  
Author(s):  
Komal Florio ◽  
Valerio Basile ◽  
Marco Polignano ◽  
Pierpaolo Basile ◽  
Viviana Patti

The availability of large annotated corpora from social media and the development of powerful classification approaches have contributed in an unprecedented way to tackle the challenge of monitoring users’ opinions and sentiments in online social platforms across time. Such linguistic data are strongly affected by events and topic discourse, and this aspect is crucial when detecting phenomena such as hate speech, especially from a diachronic perspective. We address this challenge by focusing on a real case study: the “Contro l’odio” platform for monitoring hate speech against immigrants in the Italian Twittersphere. We explored the temporal robustness of a BERT model for Italian (AlBERTo), the current benchmark on non-diachronic detection settings. We tested different training strategies to evaluate how the classification performance is affected by adding more data temporally distant from the test set and hence potentially different in terms of topic and language use. Our analysis points out the limits that a supervised classification model encounters on data that are heavily influenced by events. Our results show how AlBERTo is highly sensitive to the temporal distance of the fine-tuning set. However, with an adequate time window, the performance increases, while requiring less annotated data than a traditional classifier.


Sign in / Sign up

Export Citation Format

Share Document