Hate Speech Detection on Indonesian Long Text Documents Using Machine Learning Approach

Background: Hate speech is an expression to someone or a group of people that contain feelings of hate and/or anger at people or groups. On social media users are free to express themselves by writing harsh words and share them with a group of people so that it triggers separations and conflicts between groups. Currently, research has been conducted by several experts to detect hate speech in social media namely machine learning-based and lexicon-based, but the machine learning approach has a weakness namely the manual labelling process by an annotator in separating positive, negative or neutral opinions takes time long and tiringObjective: This study aims to produce a dictionary containing abusive words from local languages in Indonesia. Lexicon-base is very dependent on the language contained in dictionary words. Indonesia has thousands of tribes with 2500 local languages, and 80% of the population of Indonesia use local languages in communication, with the result that a significant challenge to detect hate speech of social media.Methods: Abusive words surveys are conducted by using proportionate stratified random sampling techniques in 4 major tribes on the island of Java, namely Betawi, Sundanese, Javanese, MadureseResults: The experimental results produce 250 abusive words dictionary from 4 major Indonesian tribes to detect hate speech in Indonesian social media by using the lexicon-based approach. Conclusion: A stratified random sampling technique has been conducted in 4 major Indonesian tribes to produce 250 abusive words for hate speech detection using the lexicon-based approach.

Download Full-text

Detection of Hate Speech Text in Afan Oromo Social Media using Machine Learning Approach

Indian Journal of Science and Technology ◽

10.17485/ijst/v14i31.1019 ◽

2021 ◽

Vol 14 (31) ◽

pp. 2567-2578

Author(s):

Naol Bakala Defersha ◽

◽

Kula Kekeba Tune

Keyword(s):

Machine Learning ◽

Social Media ◽

Hate Speech ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Machine Learning approach for Content Mining System

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/06469 ◽

2021 ◽

Vol 23 (06) ◽

pp. 1569-1576

Author(s):

Dr.A. Mekala ◽

◽

Dr.A. Prakash ◽

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Text Classification ◽

Text Categorization ◽

Learning Approach ◽

Mining System ◽

Text Documents ◽

Categorization Task ◽

Machine Learning Approach ◽

Content Mining

Text Classification (TC), also known as Text Categorization, is the mission of robotically classifying a set of text documents into dissimilar categories from a predefined set. If a manuscript belongs to exactly one of the categories, it is a single-label categorization task; otherwise, it is a multi-label categorization task. TC uses several tools from Information Retrieval (IR) and Machine Learning (ML) and has received much consideration in the last years from both researchers in academia and manufacturing developers. In this paper, we first categorize the documents using KNN based machine learning approach and then return the most appropriate documents.

Download Full-text