Hate Speech and Offensive Language Detection from Social Media

The use of slang, abusive, and offensive language has become common practice on social media. Even though social media companies have censorship polices for slang, abusive, vulgar, and offensive language, due to limited resources and research in the automatic detection of abusive language mechanisms other than English, this condemnable act is still practiced. This study proposes USAD (Urdu Slang and Abusive words Detection), a lexicon-based intelligent framework to detect abusive and slang words in Perso-Arabic-scripted Urdu Tweets. Furthermore, due to the nonavailability of the standard dataset, we also design and annotate a dataset of abusive, offensive, and slang word Perso-Arabic-scripted Urdu as our second significant contribution for future research. The results show that our proposed USAD model can identify 72.6% correctly as abusive or nonabusive Tweet. Additionally, we have also identified some key factors that can help the researchers improve their abusive language detection models.

Download Full-text

Offensive Language Detection in Nepali Social Media

10.18653/v1/2021.woah-1.7 ◽

2021 ◽

Author(s):

Nobal B. Niraula ◽

Saurab Dulal ◽

Diwa Koirala

Keyword(s):

Social Media ◽

Offensive Language ◽

Language Detection

Download Full-text

Hate-Speech and Offensive Language Detection in Roman Urdu

10.18653/v1/2020.emnlp-main.197 ◽

2020 ◽

Author(s):

Hammad Rizwan ◽

Muhammad Haroon Shakeel ◽

Asim Karim

Keyword(s):

Hate Speech ◽

Offensive Language ◽

Language Detection

Download Full-text

Hate Speech and Offensive Language Detection: A New Feature Set with Filter-Embedded Combining Feature Selection

2021 3rd International Cyber Resilience Conference (CRC) ◽

10.1109/crc50527.2021.9392486 ◽

2021 ◽

Author(s):

Noor Azeera Abdul Aziz ◽

Mohd Aizaini Maarof ◽

Anazida Zainal

Keyword(s):

Feature Selection ◽

Hate Speech ◽

New Feature ◽

Offensive Language ◽

Language Detection

Download Full-text

On the Impact of Word Representation in Hate Speech and Offensive Language Detection and Explanation

Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy ◽

10.1145/3374664.3379535 ◽

2020 ◽

Author(s):

Ruijia (Roger) Hu ◽

Wyatt Dorris ◽

Nishant Vishwamitra ◽

Feng Luo ◽

Matthew Costello

Keyword(s):

Hate Speech ◽

Word Representation ◽

The Impact ◽

Offensive Language ◽

Language Detection

Download Full-text

Insults, criminalization, and calls for violence: Forms of hate speech and offensive language in German user comments on immigration

10.31235/osf.io/z78jm ◽

2021 ◽

Author(s):

Sünje Paasch-Colberg ◽

Joachim Trebbe ◽

Christian Strippel ◽

Martin Emmer

Keyword(s):

Social Media ◽

Public Discourse ◽

Hate Speech ◽

Right Wing ◽

The Public ◽

Text Annotation ◽

User Comments ◽

News Websites ◽

Social Media Platforms ◽

Offensive Language

In the past decade, the public discourse on immigration in Germany has been strongly affected by right-wing populist, racist, and Islamophobic positions. This becomes evident especially in the comment sections of news websites and social media platforms, where user discussions often escalate and trigger hate comments against refugees and immigrants and also against journalists, politicians, and other groups. In view of the threatening consequences such sentiments can have for groups who are targeted by right-wing extremist violence, we take a closer look into such user discussions to gain detailed insights into the various forms of hate speech and offensive language against these groups. Using a modularized framework that goes beyond the common “hate/no-hate” dichotomy in the field, we conducted a structured text annotation of 5,031 user comments posted on German news websites and social media in March 2019. Most of the hate speech we found was directed against refugees and immigrants, while other groups were mostly exposed to various forms of offensive language. In comments containing hate speech, refugees and Muslims were frequently stereotyped as criminals, whereas extreme forms of hate speech, such as calls for violence, were rare in our data. These findings are discussed with a focus on their potential consequences for public discourse on immigration in Germany.

Download Full-text

FuzzE: Fuzzy Fairness Evaluation of Offensive Language Classifiers on African-American English

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5434 ◽

2020 ◽

Vol 34 (01) ◽

pp. 881-889

Author(s):

Anthony Rios

Keyword(s):

Social Media ◽

African American ◽

Hate Speech ◽

American English ◽

Ground Truth ◽

Standard American English ◽

Vernacular English ◽

Text Classifiers ◽

Social Media Platforms ◽

Offensive Language

Hate speech and offensive language are rampant on social media. Machine learning has provided a way to moderate foul language at scale. However, much of the current research focuses on overall performance. Models may perform poorly on text written in a minority dialectal language. For instance, a hate speech classifier may produce more false positives on tweets written in African-American Vernacular English (AAVE). To measure these problems, we need text written in both AAVE and Standard American English (SAE). Unfortunately, it is challenging to curate data for all linguistic styles in a timely manner—especially when we are constrained to specific problems, social media platforms, or by limited resources. In this paper, we answer the question, “How can we evaluate the performance of classifiers across minority dialectal languages when they are not present within a particular dataset?” Specifically, we propose an automated fairness fuzzing tool called FuzzE to quantify the fairness of text classifiers applied to AAVE text using a dataset that only contains text written in SAE. Overall, we find that the fairness estimates returned by our technique moderately correlates with the use of real ground-truth AAVE text. Warning: Offensive language is displayed in this manuscript.

Download Full-text

Text Analyzer - An Approach for Hate Speech & Offensive Language Detection

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35770 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 589-593

Author(s):

Dr. Sweeta Bansal

Keyword(s):

Logistic Regression ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Hate Speech ◽

The Social ◽

Logistic Regression Algorithm ◽

Offensive Language ◽

Language Detection ◽

Day By Day

As we know that the social crowd is increasing day by day, so is the hatred among them online. This hatred gives rise to hate speech/comments that are passed to one another online. Recently, the hate speech has increased so much that we need a way to stop them or at least contain it to minimum. Keeping this problem in mind, we have introduced a way in which we can detect the class of comments that are posted online and stop its spread if it belongs to hateful category. We have used Natural Language Processing methods and Logistic Regression algorithm to achieve our goal.

Download Full-text