Sinhala Hate Speech Detection in Social Media using Text Mining and Machine learning

Author(s):  
H.M.S.T Sandaruwan ◽  
S.A.S Lorensuhewa ◽  
M.A.L Kalyani
2021 ◽  
Vol 13 (3) ◽  
pp. 80
Author(s):  
Lazaros Vrysis ◽  
Nikolaos Vryzas ◽  
Rigas Kotsakis ◽  
Theodora Saridou ◽  
Maria Matsiola ◽  
...  

Social media services make it possible for an increasing number of people to express their opinion publicly. In this context, large amounts of hateful comments are published daily. The PHARM project aims at monitoring and modeling hate speech against refugees and migrants in Greece, Italy, and Spain. In this direction, a web interface for the creation and the query of a multi-source database containing hate speech-related content is implemented and evaluated. The selected sources include Twitter, YouTube, and Facebook comments and posts, as well as comments and articles from a selected list of websites. The interface allows users to search in the existing database, scrape social media using keywords, annotate records through a dedicated platform and contribute new content to the database. Furthermore, the functionality for hate speech detection and sentiment analysis of texts is provided, making use of novel methods and machine learning models. The interface can be accessed online with a graphical user interface compatible with modern internet browsers. For the evaluation of the interface, a multifactor questionnaire was formulated, targeting to record the users’ opinions about the web interface and the corresponding functionality.


2022 ◽  
Vol 14 (1) ◽  
pp. 0-0

Automatic hate speech detection on social media is becoming an outstanding concern in modern countries. Indeed, hate speech towards people brings about violent acts and social chaos, hence law prohibits it, and it engenders moral and legal implications. It is crucial that we can precisely categorize the hate speech, and not a hate speech automatically, while this allows us to identify easily real people who represent a threat for our society, and who wrongly regard as hateful speakers. In this paper, we applied a complete text mining process and Naïve Bayes machine learning classification algorithm to two different data sets (tweets_Num1 and tweets_Num2) taken from Twitter, to better classify tweets. The results obtained demonstrate that our model performed well regarding different metrics based on the confusion matrix including the accuracy metric, which achieved 87. 23% on the first dataset, and 93. 06% on the second.


Author(s):  
Mardhiya Hayaty ◽  
Sumarni Adi ◽  
Anggit Dwi Hartanto

Background: Hate speech is an expression to someone or a group of people that contain feelings of hate and/or anger at people or groups. On social media users are free to express themselves by writing harsh words and share them with a group of people so that it triggers separations and conflicts between groups. Currently, research has been conducted by several experts to detect hate speech in social media namely machine learning-based and lexicon-based, but the machine learning approach has a weakness namely the manual labelling process by an annotator in separating positive, negative or neutral opinions takes time long and tiringObjective: This study aims to produce a dictionary containing abusive words from local languages in Indonesia. Lexicon-base is very dependent on the language contained in dictionary words. Indonesia has thousands of tribes with 2500 local languages, and 80% of the population of Indonesia use local languages in communication, with the result that a significant challenge to detect hate speech of social media.Methods: Abusive words surveys are conducted by using proportionate stratified random sampling techniques in 4 major tribes on the island of Java, namely Betawi, Sundanese, Javanese, MadureseResults: The experimental results produce 250 abusive words dictionary from 4 major Indonesian tribes to detect hate speech in Indonesian social media by using the lexicon-based approach. Conclusion: A stratified random sampling technique has been conducted in 4 major Indonesian tribes to produce 250 abusive words for hate speech detection using the lexicon-based approach.


2021 ◽  
Vol 11 (18) ◽  
pp. 8575
Author(s):  
Sudhir Kumar Mohapatra ◽  
Srinivas Prasad ◽  
Dwiti Krishna Bebarta ◽  
Tapan Kumar Das ◽  
Kathiravan Srinivasan ◽  
...  

Hate speech on social media may spread quickly through online users and subsequently, may even escalate into local vile violence and heinous crimes. This paper proposes a hate speech detection model by means of machine learning and text mining feature extraction techniques. In this study, the authors collected the hate speech of English-Odia code mixed data from a Facebook public page and manually organized them into three classes. In order to build binary and ternary datasets, the data are further converted into binary classes. The modeling of hate speech employs the combination of a machine learning algorithm and features extraction. Support vector machine (SVM), naïve Bayes (NB) and random forest (RF) models were trained using the whole dataset, with the extracted feature based on word unigram, bigram, trigram, combined n-grams, term frequency-inverse document frequency (TF-IDF), combined n-grams weighted by TF-IDF and word2vec for both the datasets. Using the two datasets, we developed two kinds of models with each feature—binary models and ternary models. The models based on SVM with word2vec achieved better performance than the NB and RF models for both the binary and ternary categories. The result reveals that the ternary models achieved less confusion between hate and non-hate speech than the binary models.


2021 ◽  
Vol 06 (12) ◽  
Author(s):  
Dr Ramakrishna Hegde ◽  

This is a review paper on the topic “Hate Speech Detection”. One of the main disadvantages of social media is the way it is used to spread hate. This hate can affect an individual or a group in different ways like, degrading their mental health leading to anxiety and depression. This can lead to suicides or homicide. So it is very important to control how a platform can be used in spreading a particular message. To do this we have to identify the hate speech content automatically, this can be done with the help of techniques in machine learning and deep learning. We have reviewed few papers that deal with the different methodologies of detecting hate speech in a given text


2021 ◽  
Vol 30 (1) ◽  
pp. 578-591
Author(s):  
Amit Kumar Das ◽  
Abdullah Al Asif ◽  
Anik Paul ◽  
Md. Nur Hossain

Abstract Hate speech has spread more rapidly through the daily use of technology and, most notably, by sharing your opinions or feelings on social media in a negative aspect. Although numerous works have been carried out in detecting hate speeches in English, German, and other languages, very few works have been carried out in the context of the Bengali language. In contrast, millions of people communicate on social media in Bengali. The few existing works that have been carried out need improvements in both accuracy and interpretability. This article proposed encoder–decoder-based machine learning model, a popular tool in NLP, to classify user’s Bengali comments from Facebook pages. A dataset of 7,425 Bengali comments, consisting of seven distinct categories of hate speeches, was used to train and evaluate our model. For extracting and encoding local features from the comments, 1D convolutional layers were used. Finally, the attention mechanism, LSTM, and GRU-based decoders have been used for predicting hate speech categories. Among the three encoder–decoder algorithms, attention-based decoder obtained the best accuracy (77%).


Author(s):  
Noman Ashraf ◽  
Abid Rafiq ◽  
Sabur Butt ◽  
Hafiz Muhammad Faisal Shehzad ◽  
Grigori Sidorov ◽  
...  

On YouTube, billions of videos are watched online and millions of short messages are posted each day. YouTube along with other social networking sites are used by individuals and extremist groups for spreading hatred among users. In this paper, we consider religion as the most targeted domain for spreading hate speech among people of different religions. We present a methodology for the detection of religion-based hate videos on YouTube. Messages posted on YouTube videos generally express the opinions of users’ related to that video. We provide a novel dataset for religious hate speech detection on Youtube comments. The proposed methodology applies data mining techniques on extracted comments from religious videos in order to filter religion-oriented messages and detect those videos which are used for spreading hate. The supervised learning algorithms: Support Vector Machine (SVM), Logistic Regression (LR), and k-Nearest Neighbor (k-NN) are used for baseline results.


Author(s):  
Neeraj Vashistha ◽  
Arkaitz Zubiaga

The exponential increase in the use of the Internet and social media over the last two decades has changed human interaction. This has led to many positive outcomes, but at the same time it has brought risks and harms. While the volume of harmful content online, such as hate speech, is not manageable by humans, interest in the academic community to investigate automated means for hate speech detection has increased. In this study, we analyse six publicly available datasets by combining them into a single homogeneous dataset and classify them into three classes, abusive, hateful or neither. We create a baseline model and we improve model performance scores using various optimisation techniques. After attaining a competitive performance score, we create a tool which identifies and scores a page with effective metric in near-real time and uses the same as feedback to re-train our model. We prove the competitive performance of our multilingual model on two langauges, English and Hindi, leading to comparable or superior performance to most monolingual models.


Sign in / Sign up

Export Citation Format

Share Document