Hate or Non-hate: Translation based hate speech identification in Code-Mixed Hinglish data set

Presidential campaign periods are a major trigger event for hate speech on social media in almost every country. A systematic review of previous studies indicates inadequate publicly available annotated datasets and hardly any evidence of theoretical underpinning for the annotation schemes used for hate speech identification. This situation stifles the development of empirically useful data for research, especially in supervised machine learning. This paper describes the methodology that was used to develop a multidimensional hate speech framework based on the duplex theory of hate [1] components that include distance, passion, commitment to hate, and hate as a story. Subsequently, an annotation scheme based on the framework was used to annotate a random sample of ~51k tweets from ~400k tweets that were collected during the August and October 2017 presidential campaign period in Kenya. This resulted in a goldstandard codeswitched dataset that could be used for comparative and empirical studies in supervised machine learning. The resulting classifiers trained on this dataset could be used to provide real-time monitoring of hate speech spikes on social media and inform data-driven decision-making by relevant security agencies in government.

Download Full-text

Hate Speech Identification using the Hate Codes for Indonesian Tweets

Proceedings of the 2019 2nd International Conference on Data Science and Information Technology ◽

10.1145/3352411.3352432 ◽

2019 ◽

Cited By ~ 2

Author(s):

Nur Indah Pratiwi ◽

Indra Budi ◽

Meganingrum Arista Jiwanggi

Keyword(s):

Hate Speech ◽

Speech Identification

Download Full-text

HateClassify: A Service Framework for Hate Speech Identiﬁcation on Social Media

IEEE Internet Computing ◽

10.1109/mic.2020.3037034 ◽

2020 ◽

pp. 1-1

Author(s):

Muhammad Usman Shahid Khan ◽

Assad Abbas ◽

Attiqa Rehman ◽

Raheel Nawaz

Keyword(s):

Social Media ◽

Hate Speech ◽

Speech Identification ◽

Service Framework

Download Full-text

Speech Classification using Logical ART Deep Mechanism of Machine Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7239.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2605-2611

Keyword(s):

Speech Processing ◽

Hate Speech ◽

Learning Model ◽

Small Data ◽

Model Framework ◽

Large Dataset ◽

Critical Feature ◽

Data Set ◽

Lower Accuracy ◽

Speech Classification

Apart from this there are many domains including medical, voice synthesis, hate speech classification and other custom applications where classification of speech plays an important role. The conventional techniques of speech processing and classification works on a small data set also provide lower accuracy of the classification. This paper introduces a learning model using neural network (NN) for the large dataset machine training and classification using critical feature analysis for the pattern of speech spectrogram and waveforms. The performance evaluation of the proposed training model for the speech classification is validated on a single CPU and found to achieve (12-82) % of accuracy in just 5-epochs and also continuously decreases the loss at successive iteration of the epochs. This method provides learning model framework for the speech processing and classification for a very large dataset.

Download Full-text

Linguistic Patterns for Code Word Resilient Hate Speech Identification

Sensors ◽

10.3390/s21237859 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7859

Author(s):

Fernando H. Calderón ◽

Namrita Balani ◽

Jherez Taylor ◽

Melvyn Peignon ◽

Yen-Hao Huang ◽

...

Keyword(s):

Supervised Classification ◽

Hate Speech ◽

Automatic Detection ◽

Code Word ◽

Detection Methods ◽

Online Activity ◽

Empirical Results ◽

Linguistic Patterns ◽

The Impact ◽

Speech Identification

The permanent transition to online activity has brought with it a surge in hate speech discourse. This has prompted increased calls for automatic detection methods, most of which currently rely on a dictionary of hate speech words, and supervised classification. This approach often falls short when dealing with newer words and phrases produced by online extremist communities. These code words are used with the aim of evading automatic detection by systems. Code words are frequently used and have benign meanings in regular discourse, for instance, “skypes, googles, bing, yahoos” are all examples of words that have a hidden hate speech meaning. Such overlap presents a challenge to the traditional keyword approach of collecting data that is specific to hate speech. In this work, we first introduced a word embedding model that learns the hidden hate speech meaning of words. With this insight on code words, we developed a classifier that leverages linguistic patterns to reduce the impact of individual words. The proposed method was evaluated across three different datasets to test its generalizability. The empirical results show that the linguistic patterns approach outperforms the baselines and enables further analysis on hate speech expressions.

Download Full-text

Hate Speech Identification in Text Written in Indonesian with Recurrent Neural Network

2019 International Conference on Advanced Computer Science and information Systems (ICACSIS) ◽

10.1109/icacsis47736.2019.8979959 ◽

2019 ◽

Author(s):

Erryan Sazany ◽

Indra Budi

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Hate Speech ◽

Speech Identification

Download Full-text

Deep Learning-Based Implementation of Hate Speech Identification on Texts in Indonesian: Preliminary Study

2018 International Conference on Applied Information Technology and Innovation (ICAITI) ◽

10.1109/icaiti.2018.8686725 ◽

2018 ◽

Cited By ~ 4

Author(s):

Erryan Sazany ◽

Indra Budi

Keyword(s):

Deep Learning ◽

Hate Speech ◽

Preliminary Study ◽

Speech Identification

Download Full-text

Exploring Multi-Task Multi-Lingual Learning of Transformer Models for Hate Speech and Offensive Speech Identification in Social Media

SN Computer Science ◽

10.1007/s42979-021-00455-5 ◽

2021 ◽

Vol 2 (2) ◽

Author(s):

Sudhanshu Mishra ◽

Shivangi Prasad ◽

Shubhanshu Mishra

Keyword(s):

Social Media ◽

Hate Speech ◽

Speech Identification ◽

Offensive Speech

Download Full-text

Automated Classification of Evidence of Respect in the Communication through Twitter

Applied Sciences ◽

10.3390/app11031294 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1294

Author(s):

Krzysztof Fiok ◽

Waldemar Karwowski ◽

Edgar Gutierrez ◽

Tameika Liciaga ◽

Alessandro Belmonte ◽

...

Keyword(s):

Social Media ◽

Text Analysis ◽

Hate Speech ◽

Automatic Detection ◽

Data Sets ◽

Automated Classification ◽

Data Set ◽

Analysis Methods ◽

Textual Data

Volcanoes of hate and disrespect erupt in societies often not without fatal consequences. To address this negative phenomenon scientists struggled to understand and analyze its roots and language expressions described as hate speech. As a result, it is now possible to automatically detect and counter hate speech in textual data spreading rapidly, for example, in social media. However, recently another approach to tackling the roots of disrespect was proposed, it is based on the concept of promoting positive behavior instead of only penalizing hate and disrespect. In our study, we followed this approach and discovered that it is hard to find any textual data sets or studies discussing automatic detection regarding respectful behaviors and their textual expressions. Therefore, we decided to contribute probably one of the first human-annotated data sets which allows for supervised training of text analysis methods for automatic detection of respectful messages. By choosing a data set of tweets which already possessed sentiment annotations we were also able to discuss the correlation of sentiment and respect. Finally, we provide a comparison of recent machine and deep learning text analysis methods and their performance which allowed us to demonstrate that automatic detection of respectful messages in social media is feasible.

Download Full-text

Psychosocial Features for Hate Speech Detection in Code-switched Texts

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.06.03 ◽

2021 ◽

Vol 13 (6) ◽

pp. 29-47

Author(s):

Edward Ombui ◽

◽

Lawrence Muchemi ◽

Peter Wagacha

Keyword(s):

Language Processing ◽

Presidential Elections ◽

Hate Speech ◽

Learning Models ◽

Speech Detection ◽

Word Families ◽

Novel Approach ◽

Duplex Theory ◽

High Level ◽

Speech Identification

This study examines the problem of hate speech identification in codeswitched text from social media using a natural language processing approach. It explores different features in training nine models and empirically evaluates their predictiveness in identifying hate speech in a ~50k human-annotated dataset. The study espouses a novel approach to handle this challenge by introducing a hierarchical approach that employs Latent Dirichlet Analysis to generate topic models that help build a high-level Psychosocial feature set that we acronym PDC. PDC groups similar meaning words in word families, which is significant in capturing codeswitching during the preprocessing stage for supervised learning models. The high-level PDC features generated are based on a hate speech annotation framework [1] that is largely informed by the duplex theory of hate [2]. Results obtained from frequency-based models using the PDC feature on the dataset comprising of tweets generated during the 2012 and 2017 presidential elections in Kenya indicate an f-score of 83% (precision: 81%, recall: 85%) in identifying hate speech. The study is significant in that it publicly shares a unique codeswitched dataset for hate speech that is valuable for comparative studies. Secondly, it provides a methodology for building a novel PDC feature set to identify nuanced forms of hate speech, camouflaged in codeswitched data, which conventional methods could not adequately identify.

Download Full-text