Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model

Wassen Aldjanabi; Abdelghani Dahou; Mohammed A. A. Al-qaness; Mohamed Abd Elaziz; Ahmed Mohamed Helmi; Robertas Damaševičius

doi:10.3390/informatics8040069

Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model

Informatics ◽

10.3390/informatics8040069 ◽

2021 ◽

Vol 8 (4) ◽

pp. 69

Author(s):

Wassen Aldjanabi ◽

Abdelghani Dahou ◽

Mohammed A. A. Al-qaness ◽

Mohamed Abd Elaziz ◽

Ahmed Mohamed Helmi ◽

...

Keyword(s):

Social Media ◽

Hate Speech ◽

Detection System ◽

Language Model ◽

Arabic Language ◽

Social Phenomena ◽

Speech Detection ◽

Task Learning ◽

Opinion Expression ◽

Social Media Platforms

As social media platforms offer a medium for opinion expression, social phenomena such as hatred, offensive language, racism, and all forms of verbal violence have increased spectacularly. These behaviors do not affect specific countries, groups, or communities only, extending beyond these areas into people’s everyday lives. This study investigates offensive and hate speech on Arab social media to build an accurate offensive and hate speech detection system. More precisely, we develop a classification system for determining offensive and hate speech using a multi-task learning (MTL) model built on top of a pre-trained Arabic language model. We train the MTL model on the same task using cross-corpora representing a variation in the offensive and hate context to learn global and dataset-specific contextual representations. The developed MTL model showed a significant performance and outperformed existing models in the literature on three out of four datasets for Arabic offensive and hate speech detection tasks.

Download Full-text

Detecting hate speech against politicians in Arabic community on social media

International Journal of Web Information Systems ◽

10.1108/ijwis-08-2019-0036 ◽

2020 ◽

Vol 16 (3) ◽

pp. 295-313

Author(s):

Imane Guellil ◽

Ahsan Adeel ◽

Faical Azouaou ◽

Sara Chennoufi ◽

Hanene Maafi ◽

...

Keyword(s):

Social Media ◽

Deep Learning ◽

Hate Speech ◽

Short Term Memory ◽

Arabic Language ◽

Short Term ◽

Speech Corpus ◽

Term Memory ◽

Content Type ◽

Speech Detection

Purpose This paper aims to propose an approach for hate speech detection against politicians in Arabic community on social media (e.g. Youtube). In the literature, similar works have been presented for other languages such as English. However, to the best of the authors’ knowledge, not much work has been conducted in the Arabic language. Design/methodology/approach This approach uses both classical algorithms of classification and deep learning algorithms. For the classical algorithms, the authors use Gaussian NB (GNB), Logistic Regression (LR), Random Forest (RF), SGD Classifier (SGD) and Linear SVC (LSVC). For the deep learning classification, four different algorithms (convolutional neural network (CNN), multilayer perceptron (MLP), long- or short-term memory (LSTM) and bi-directional long- or short-term memory (Bi-LSTM) are applied. For extracting features, the authors use both Word2vec and FastText with their two implementations, namely, Skip Gram (SG) and Continuous Bag of Word (CBOW). Findings Simulation results demonstrate the best performance of LSVC, BiLSTM and MLP achieving an accuracy up to 91%, when it is associated to SG model. The results are also shown that the classification that has been done on balanced corpus are more accurate than those done on unbalanced corpus. Originality/value The principal originality of this paper is to construct a new hate speech corpus (Arabic_fr_en) which was annotated by three different annotators. This corpus contains the three languages used by Arabic people being Arabic, French and English. For Arabic, the corpus contains both script Arabic and Arabizi (i.e. Arabic words written with Latin letters). Another originality is to rely on both shallow and deep leaning classification by using different model for extraction features such as Word2vec and FastText with their two implementation SG and CBOW.

Download Full-text

Monitoring Users’ Behavior: Anti-Immigration Speech Detection on Twitter

Machine Learning and Knowledge Extraction ◽

10.3390/make2030011 ◽

2020 ◽

Vol 2 (3) ◽

pp. 192-215 ◽

Cited By ~ 1

Author(s):

Nikolaos Pitropakis ◽

Kamil Kokot ◽

Dimitra Gkatzia ◽

Robert Ludwiniak ◽

Alexios Mylonas ◽

...

Keyword(s):

Social Media ◽

Hate Speech ◽

Political Campaigns ◽

Canadian English ◽

Speech Detection ◽

Social Media Platforms ◽

Privacy Breaches ◽

Future Work ◽

Different Sources ◽

Media Data

The proliferation of social media platforms changed the way people interact online. However, engagement with social media comes with a price, the users’ privacy. Breaches of users’ privacy, such as the Cambridge Analytica scandal, can reveal how the users’ data can be weaponized in political campaigns, which many times trigger hate speech and anti-immigration views. Hate speech detection is a challenging task due to the different sources of hate that can have an impact on the language used, as well as the lack of relevant annotated data. To tackle this, we collected and manually annotated an immigration-related dataset of publicly available Tweets in UK, US, and Canadian English. In an empirical study, we explored anti-immigration speech detection utilizing various language features (word n-grams, character n-grams) and measured their impact on a number of trained classifiers. Our work demonstrates that using word n-grams results in higher precision, recall, and f-score as compared to character n-grams. Finally, we discuss the implications of these results for future work on hate-speech detection and social media data analysis in general.

Download Full-text

Multimodal Hate Speech Detection in Greek Social Media

Multimodal Technologies and Interaction ◽

10.3390/mti5070034 ◽

2021 ◽

Vol 5 (7) ◽

pp. 34

Author(s):

Konstantinos Perifanos ◽

Dionysis Goutsos

Keyword(s):

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Hate Speech ◽

Language Model ◽

Fine Tuning ◽

Accuracy Score ◽

Speech Detection ◽

Online Social Media

Hateful and abusive speech presents a major challenge for all online social media platforms. Recent advances in Natural Language Processing and Natural Language Understanding allow for more accurate detection of hate speech in textual streams. This study presents a new multimodal approach to hate speech detection by combining Computer Vision and Natural Language processing models for abusive context detection. Our study focuses on Twitter messages and, more specifically, on hateful, xenophobic, and racist speech in Greek aimed at refugees and migrants. In our approach, we combine transfer learning and fine-tuning of Bidirectional Encoder Representations from Transformers (BERT) and Residual Neural Networks (Resnet). Our contribution includes the development of a new dataset for hate speech classification, consisting of tweet IDs, along with the code to obtain their visual appearance, as they would have been rendered in a web browser. We have also released a pre-trained Language Model trained on Greek tweets, which has been used in our experiments. We report a consistently high level of accuracy (accuracy score = 0.970, f1-score = 0.947 in our best model) in racist and xenophobic speech detection.

Download Full-text

Automatic Hate Speech Detection: A Literature Review

International Journal of Engineering and Management Research ◽

10.31033/ijemr.11.2.17 ◽

2021 ◽

Vol 11 (2) ◽

pp. 116-121

Author(s):

Mohiyaddeen ◽

Dr. Shifaulla Siddiqui

Keyword(s):

Social Media ◽

Review Paper ◽

Hate Speech ◽

Detection System ◽

Hybrid Approach ◽

The Internet ◽

Rule Based ◽

Speech Detection ◽

Social Media Platform ◽

Media Platform

Hate speech has been an ongoing problem on the Internet for many years. Besides, social media, especially Facebook, and Twitter have given it a global stage where those hate speeches can spread far more rapidly. Every social media platform needs to implement an effective hate speech detection system to remove offensive content in real-time. There are various approaches to identify hate speech, such as Rule-Based, Machine Learning based, deep learning based and Hybrid approach. Since this is a review paper, we explained the valuable works of various authors who have invested their valuable time in studying to identifying hate speech using various approaches.

Download Full-text

A Web Interface for Analyzing Hate Speech

Future Internet ◽

10.3390/fi13030080 ◽

2021 ◽

Vol 13 (3) ◽

pp. 80

Author(s):

Lazaros Vrysis ◽

Nikolaos Vryzas ◽

Rigas Kotsakis ◽

Theodora Saridou ◽

Maria Matsiola ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Graphical User Interface ◽

Hate Speech ◽

Web Interface ◽

Learning Models ◽

Speech Detection ◽

Media Services ◽

The Web ◽

Machine Learning Models

Social media services make it possible for an increasing number of people to express their opinion publicly. In this context, large amounts of hateful comments are published daily. The PHARM project aims at monitoring and modeling hate speech against refugees and migrants in Greece, Italy, and Spain. In this direction, a web interface for the creation and the query of a multi-source database containing hate speech-related content is implemented and evaluated. The selected sources include Twitter, YouTube, and Facebook comments and posts, as well as comments and articles from a selected list of websites. The interface allows users to search in the existing database, scrape social media using keywords, annotate records through a dedicated platform and contribute new content to the database. Furthermore, the functionality for hate speech detection and sentiment analysis of texts is provided, making use of novel methods and machine learning models. The interface can be accessed online with a graphical user interface compatible with modern internet browsers. For the evaluation of the interface, a multifactor questionnaire was formulated, targeting to record the users’ opinions about the web interface and the corresponding functionality.

Download Full-text

Emotionally Informed Hate Speech Detection: A Multi-target Perspective

Cognitive Computation ◽

10.1007/s12559-021-09862-5 ◽

2021 ◽

Author(s):

Patricia Chiril ◽

Endang Wahyu Pamungkas ◽

Farah Benamara ◽

Véronique Moriceau ◽

Viviana Patti

Keyword(s):

Hate Speech ◽

Binary Classification ◽

Online Communication ◽

Vulnerable Groups ◽

Speech Detection ◽

Social Media Platforms ◽

Sentic Computing ◽

Specific Manifestation ◽

The Impact ◽

First Time

AbstractHate Speech and harassment are widespread in online communication, due to users' freedom and anonymity and the lack of regulation provided by social media platforms. Hate speech is topically focused (misogyny, sexism, racism, xenophobia, homophobia, etc.), and each specific manifestation of hate speech targets different vulnerable groups based on characteristics such as gender (misogyny, sexism), ethnicity, race, religion (xenophobia, racism, Islamophobia), sexual orientation (homophobia), and so on. Most automatic hate speech detection approaches cast the problem into a binary classification task without addressing either the topical focus or the target-oriented nature of hate speech. In this paper, we propose to tackle, for the first time, hate speech detection from a multi-target perspective. We leverage manually annotated datasets, to investigate the problem of transferring knowledge from different datasets with different topical focuses and targets. Our contribution is threefold: (1) we explore the ability of hate speech detection models to capture common properties from topic-generic datasets and transfer this knowledge to recognize specific manifestations of hate speech; (2) we experiment with the development of models to detect both topics (racism, xenophobia, sexism, misogyny) and hate speech targets, going beyond standard binary classification, to investigate how to detect hate speech at a finer level of granularity and how to transfer knowledge across different topics and targets; and (3) we study the impact of affective knowledge encoded in sentic computing resources (SenticNet, EmoSenticNet) and in semantically structured hate lexicons (HurtLex) in determining specific manifestations of hate speech. We experimented with different neural models including multitask approaches. Our study shows that: (1) training a model on a combination of several (training sets from several) topic-specific datasets is more effective than training a model on a topic-generic dataset; (2) the multi-task approach outperforms a single-task model when detecting both the hatefulness of a tweet and its topical focus in the context of a multi-label classification approach; and (3) the models incorporating EmoSenticNet emotions, the first level emotions of SenticNet, a blend of SenticNet and EmoSenticNet emotions or affective features based on Hurtlex, obtained the best results. Our results demonstrate that multi-target hate speech detection from existing datasets is feasible, which is a first step towards hate speech detection for a specific topic/target when dedicated annotated data are missing. Moreover, we prove that domain-independent affective knowledge, injected into our models, helps finer-grained hate speech detection.

Download Full-text

Social Media Toxicity Classification Using Deep Learning: Real-World Application UK Brexit

Electronics ◽

10.3390/electronics10111332 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1332

Author(s):

Hong Fan ◽

Wu Du ◽

Abdelghani Dahou ◽

Ahmed A. Ewees ◽

Dalia Yousri ◽

...

Keyword(s):

Social Media ◽

Hate Speech ◽

Modern Society ◽

User Generated Content ◽

Real World Application ◽

Proposed Model ◽

Social Media Platforms ◽

Public Dataset ◽

The Uk ◽

Harmful Side Effect

Social media has become an essential facet of modern society, wherein people share their opinions on a wide variety of topics. Social media is quickly becoming indispensable for a majority of people, and many cases of social media addiction have been documented. Social media platforms such as Twitter have demonstrated over the years the value they provide, such as connecting people from all over the world with different backgrounds. However, they have also shown harmful side effects that can have serious consequences. One such harmful side effect of social media is the immense toxicity that can be found in various discussions. The word toxic has become synonymous with online hate speech, internet trolling, and sometimes outrage culture. In this study, we build an efficient model to detect and classify toxicity in social media from user-generated content using the Bidirectional Encoder Representations from Transformers (BERT). The BERT pre-trained model and three of its variants has been fine-tuned on a well-known labeled toxic comment dataset, Kaggle public dataset (Toxic Comment Classification Challenge). Moreover, we test the proposed models with two datasets collected from Twitter from two different periods to detect toxicity in user-generated content (tweets) using hashtages belonging to the UK Brexit. The results showed that the proposed model can efficiently classify and analyze toxic tweets.

Download Full-text

Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media

10.20944/preprints202011.0646.v1 ◽

2020 ◽

Author(s):

Neeraj Vashistha ◽

Arkaitz Zubiaga

Keyword(s):

Social Media ◽

Hate Speech ◽

Model Performance ◽

Academic Community ◽

Human Interaction ◽

Superior Performance ◽

Competitive Performance ◽

Speech Detection ◽

Improve Model ◽

Use Of The Internet

The exponential increase in the use of the Internet and social media over the last two decades has changed human interaction. This has led to many positive outcomes, but at the same time it has brought risks and harms. While the volume of harmful content online, such as hate speech, is not manageable by humans, interest in the academic community to investigate automated means for hate speech detection has increased. In this study, we analyse six publicly available datasets by combining them into a single homogeneous dataset and classify them into three classes, abusive, hateful or neither. We create a baseline model and we improve model performance scores using various optimisation techniques. After attaining a competitive performance score, we create a tool which identifies and scores a page with effective metric in near-real time and uses the same as feedback to re-train our model. We prove the competitive performance of our multilingual model on two langauges, English and Hindi, leading to comparable or superior performance to most monolingual models.

Download Full-text

Platform Politics: The Emergence of Alternative Social Media in India

Asia Pacific Media Educator ◽

10.1177/1326365x211056699 ◽

2021 ◽

Vol 31 (2) ◽

pp. 269-276

Author(s):

Prashanth Bhat

Keyword(s):

Social Media ◽

Public Sphere ◽

Hate Speech ◽

Right Wing ◽

Bharatiya Janata Party ◽

The World ◽

Social Media Platforms ◽

Media Platform ◽

Corporate Social ◽

Widespread Dissemination

Widespread dissemination of hate speech on corporate social media platforms such as Twitter, Facebook, and YouTube has necessitated technological companies to moderate content on their platforms. At the receiving end of these content moderation efforts are supporters of right-wing populist parties, who have gained notoriety for harassing journalists, spreading disinformation, and vilifying liberal activists. In recent months, several prominent right-wing figures across the world were removed from social media - a phenomenon also known as ‘deplatforming’- for violating platform policies. Prominent among such right-wing groups are online supporters of the Hindu nationalist Bharatiya Janata Party (BJP) in India, who have begun accusing corporate social media of pursuing a ‘liberal agenda’ and ‘curtailing free speech.’ In response to deplatforming, the BJP-led Government of India has aggressively promoted and embraced Koo, an indigenously developed social media platform. This commentary examines the implications of this alternative social platform for the online communicative environment in the Indian public sphere.

Download Full-text

Automatic Hate Speech Detection on Social Media: A Brief Survey

2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA) ◽

10.1109/aiccsa47632.2019.9035228 ◽

2019 ◽

Author(s):

Ahlam Alrehili

Keyword(s):

Social Media ◽

Hate Speech ◽

Speech Detection

Download Full-text