Prototype Application Hate Speech Detection Website Using String Matching and Searching Algorithm

Nuning Kurniasih; Leon Andretti Abdillah; I Ketut Sudarsana; I Wayan Lali Yogantara; I Nyoman Temon Astawa; Ricardo Freedom Nanuru; Aveanty Miagina; Jefrey Oxianus Sabarua; Mohamad Jamil; Johana Tandisalla; Electronita Duan; Frits Gerit John Rupilele; Mutiara Dara Utama; Maya Laisila; Ansari Saleh Ahmar; Robbi Rahim

doi:10.14419/ijet.v7i2.5.13952

Prototype Application Hate Speech Detection Website Using String Matching and Searching Algorithm

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.5.13952 ◽

2018 ◽

Vol 7 (2.5) ◽

pp. 62 ◽

Cited By ~ 3

Author(s):

Nuning Kurniasih ◽

Leon Andretti Abdillah ◽

I Ketut Sudarsana ◽

I Wayan Lali Yogantara ◽

I Nyoman Temon Astawa ◽

...

Keyword(s):

Social Media ◽

Hate Speech ◽

String Matching ◽

Media Content ◽

Speech Detection

Hate speech is now a problem for social media users such as Facebook, Twitter, Whatsapp and also Telegram. The current social media users are also a lot to post, share the content both consciously and unconsciously to various social media as well as even some hate speech postings are shared by irresponsible parties to gain profit from the chaos that he created, denigrating religion, vilify certain individuals even as an act of provocation. Prototype hate speech detection application created to detect hate speech on Facebook and it can give notification to users to be more aware of social media content and also careful in reading, share content that can trigger unpleasant actions.

Download Full-text

Ensemble-based Semi-Supervised Learning for Hate Speech Detection

The International FLAIRS Conference Proceedings ◽

10.32473/flairs.v34i1.128427 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Safa Alsafari

Keyword(s):

Social Media ◽

Supervised Learning ◽

Hate Speech ◽

Classification Performance ◽

Media Content ◽

Learning Approach ◽

Classification Methods ◽

Speech Detection ◽

Speech Classification

Large and accurately labeled textual corpora are vital to developing efficient hate speech classifiers. This paper introduces an ensemble-based semi-supervised learning approach to leverage the availability of abundant social media content. Starting with a reliable hate speech dataset, we train and test diverse classifiers that are then used to label a corpus of one million tweets. Next, we investigate several strategies to select the most confident labels from the obtained pseudo labels. We assess these strategies by re-training all the classifiers with the seed dataset augmented with the trusted pseudo-labeled data. Finally, we demonstrate that our approach improves classification performance over supervised hate speech classification methods.

Download Full-text

A Web Interface for Analyzing Hate Speech

Future Internet ◽

10.3390/fi13030080 ◽

2021 ◽

Vol 13 (3) ◽

pp. 80

Author(s):

Lazaros Vrysis ◽

Nikolaos Vryzas ◽

Rigas Kotsakis ◽

Theodora Saridou ◽

Maria Matsiola ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Graphical User Interface ◽

Hate Speech ◽

Web Interface ◽

Learning Models ◽

Speech Detection ◽

Media Services ◽

The Web ◽

Machine Learning Models

Social media services make it possible for an increasing number of people to express their opinion publicly. In this context, large amounts of hateful comments are published daily. The PHARM project aims at monitoring and modeling hate speech against refugees and migrants in Greece, Italy, and Spain. In this direction, a web interface for the creation and the query of a multi-source database containing hate speech-related content is implemented and evaluated. The selected sources include Twitter, YouTube, and Facebook comments and posts, as well as comments and articles from a selected list of websites. The interface allows users to search in the existing database, scrape social media using keywords, annotate records through a dedicated platform and contribute new content to the database. Furthermore, the functionality for hate speech detection and sentiment analysis of texts is provided, making use of novel methods and machine learning models. The interface can be accessed online with a graphical user interface compatible with modern internet browsers. For the evaluation of the interface, a multifactor questionnaire was formulated, targeting to record the users’ opinions about the web interface and the corresponding functionality.

Download Full-text

Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media

10.20944/preprints202011.0646.v1 ◽

2020 ◽

Author(s):

Neeraj Vashistha ◽

Arkaitz Zubiaga

Keyword(s):

Social Media ◽

Hate Speech ◽

Model Performance ◽

Academic Community ◽

Human Interaction ◽

Superior Performance ◽

Competitive Performance ◽

Speech Detection ◽

Improve Model ◽

Use Of The Internet

The exponential increase in the use of the Internet and social media over the last two decades has changed human interaction. This has led to many positive outcomes, but at the same time it has brought risks and harms. While the volume of harmful content online, such as hate speech, is not manageable by humans, interest in the academic community to investigate automated means for hate speech detection has increased. In this study, we analyse six publicly available datasets by combining them into a single homogeneous dataset and classify them into three classes, abusive, hateful or neither. We create a baseline model and we improve model performance scores using various optimisation techniques. After attaining a competitive performance score, we create a tool which identifies and scores a page with effective metric in near-real time and uses the same as feedback to re-train our model. We prove the competitive performance of our multilingual model on two langauges, English and Hindi, leading to comparable or superior performance to most monolingual models.

Download Full-text

Automatic Hate Speech Detection on Social Media: A Brief Survey

2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA) ◽

10.1109/aiccsa47632.2019.9035228 ◽

2019 ◽

Author(s):

Ahlam Alrehili

Keyword(s):

Social Media ◽

Hate Speech ◽

Speech Detection

Download Full-text

Hate speech detection and racial bias mitigation in social media based on BERT model

PLoS ONE ◽

10.1371/journal.pone.0237861 ◽

2020 ◽

Vol 15 (8) ◽

pp. e0237861

Author(s):

Marzieh Mozafari ◽

Reza Farahbakhsh ◽

Noël Crespi

Keyword(s):

Social Media ◽

Hate Speech ◽

Racial Bias ◽

Speech Detection

Download Full-text

Time of Your Hate: The Challenge of Time in Hate Speech Detection on Social Media

Applied Sciences ◽

10.3390/app10124180 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4180 ◽

Cited By ~ 2

Author(s):

Komal Florio ◽

Valerio Basile ◽

Marco Polignano ◽

Pierpaolo Basile ◽

Viviana Patti

Keyword(s):

Social Media ◽

Hate Speech ◽

Time Window ◽

Classification Performance ◽

Fine Tuning ◽

Classification Model ◽

Temporal Distance ◽

Speech Detection ◽

Highly Sensitive

The availability of large annotated corpora from social media and the development of powerful classification approaches have contributed in an unprecedented way to tackle the challenge of monitoring users’ opinions and sentiments in online social platforms across time. Such linguistic data are strongly affected by events and topic discourse, and this aspect is crucial when detecting phenomena such as hate speech, especially from a diachronic perspective. We address this challenge by focusing on a real case study: the “Contro l’odio” platform for monitoring hate speech against immigrants in the Italian Twittersphere. We explored the temporal robustness of a BERT model for Italian (AlBERTo), the current benchmark on non-diachronic detection settings. We tested different training strategies to evaluate how the classification performance is affected by adding more data temporally distant from the test set and hence potentially different in terms of topic and language use. Our analysis points out the limits that a supervised classification model encounters on data that are heavily influenced by events. Our results show how AlBERTo is highly sensitive to the temporal distance of the fine-tuning set. However, with an adequate time window, the performance increases, while requiring less annotated data than a traditional classifier.

Download Full-text

Hate Speech Detection in Hindi-English Code-Mixed Social Media Text

Proceedings of the ACM India Joint International Conference on Data Science and Management of Data - CoDS-COMAD '19 ◽

10.1145/3297001.3297048 ◽

2019 ◽

Cited By ~ 8

Author(s):

T. Y.S.S. Santosh ◽

K. V.S. Aravind

Keyword(s):

Social Media ◽

Hate Speech ◽

Speech Detection ◽

Social Media Text

Download Full-text

A Large-Scale Dataset for Hate Speech Detection on Vietnamese Social Media Texts

Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices - Lecture Notes in Computer Science ◽

10.1007/978-3-030-79457-6_35 ◽

2021 ◽

pp. 415-426

Author(s):

Son T. Luu ◽

Kiet Van Nguyen ◽

Ngan Luu-Thuy Nguyen

Keyword(s):

Social Media ◽

Large Scale ◽

Hate Speech ◽

Speech Detection ◽

Large Scale Dataset ◽

Media Texts

Download Full-text

LOCAL CULTURE-BASED SOCIAL MEDIA LITERATION: LOCAL CULTURE CONTENT ON SOCIAL MEDIA AS STRENGTHENING SOCIAL INTEGRATION

Profetik Jurnal Komunikasi ◽

10.14421/pjk.v13i1.1742 ◽

2020 ◽

Vol 13 (1) ◽

pp. 74

Author(s):

Joko Sutarso

Keyword(s):

Social Media ◽

Social Integration ◽

Hate Speech ◽

Local Culture ◽

Media Content ◽

Negative Effects ◽

Educational Backgrounds ◽

Positive Side ◽

The Social ◽

Cultural Content

Abstrak. Penggunaan media sosial semakin meningkat dari tahun ke tahun, namun demikian tidak semua konten media sosial memiliki sisi positif. Beberapa dampak negatif penggunaan media sosial seperti penyebaran berita bohong (hoax), ujaran kebencian (hate speech), perundungan (cyberbullying) dan konten negatiflainnya merupakan bentuk-bentuk penyalahgunaan media sosial menjadi keprihatinan masyarakat karena telah memasuki ranah sosial, politik, ekonomi dan bahkan keagamaan. Hal ini tidak terlepas dari kapitalisasi koorporasi media sosial yang terus berkembang dengan terpaan yang semakin meluas melintasi batas negara dan bangsa, masuk dalam kehidupan berbagai generasi, strata sosial ekonomi, tingkat pendidikan dan latar belakang pendidikan serta pengalaman. Metode yang digunakan dalam tulisan ini adalah teoritis kualitatif yang didasarkan pada pengamatan terhadap isi media sosial dan kajian teoritis yang berusaha menjelaskan pengaruh isi media terhadap perilaku masyarakat dalam bermedia sebagai bahan pengayaan (enrichment) bagi kegiatan literasi media sosial di kalangan masyarakat bagi para pegiat literasi. Penjelasan teoritis yang dipakai meliputi aspek positif dan negatif dilihat dari aspek sosial, politik, psikologi, pendidikan dan kebudayaan. Hasilnya konten budaya lokal memiliki peluang mengisi konten dalam ruang media sosial dan konten budaya lokal yang selektif, kreatif, edukatif, dan sekaligus menghibur dapat digunakan untuk meminimalkan dampak negatif globalisasi dan kapitalisme media sosial. Manfaat lain dari sosialisasi dari promosi budaya lokal di media sosial adalah untuk meningkatkan integrasi masyarakat karena didalamnya terdapat nilai-nilai kearifan lokal yang memiliki nilai bersifat nasional bahkan universal.Abstract. Social media uses have been increasing from year to year. However, not all social media content has a positive side. Some negative effects of social media from hoaxes, hate speech, cyberbullying to other negative content are the forms of abuse of social media. It is concern to the public because these have entered the social, political, economic and religious spheres. It is definitely inseparable from the capitalization of a social media corporation. It has been developing with increasingly widespread exposure across national borders, and it has been entering into the lives of various generations, socio-economic strata, education levels and educational backgrounds and experiences as well. The research method used in this research was a qualitative theoretical approach based on observations of social media content and theoretical studies. It aims at seeking to explain the influence of media content on people's behavior in their media use as the enrichment material for social media literacy activities in society for literacy activists. The theoretical explanations used in this research include positive and negative aspects. In this matter the social, political, psychological, educational and cultural perspectives will see the aspects. Moreover, the research results show that local cultural content has the opportunity to fill content in the social media space. Selective, creative, educative, and entertaining local cultural content can be used to minimize the negative effects of globalization and social media capitalism. Another benefit of socialization of local culture promotion on social media is to increase social integration because in the local culture there are local wisdom values and national or universal values as well.

Download Full-text

Hate Speech Detection for Indonesia Tweets Using Word Embedding And Gated Recurrent Unit

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.40125 ◽

2019 ◽

Vol 13 (1) ◽

pp. 43

Author(s):

Junanda Patihullah ◽

Edi Winarko

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Feature Extraction ◽

Deep Learning ◽

Hate Speech ◽

Support Vector ◽

Speech Detection ◽

The People ◽

Gated Recurrent Unit ◽

Rule Out

Social media has changed the people mindset to express thoughts and moods. As the activity of social media users increases, it does not rule out the possibility of crimes of spreading hate speech can spread quickly and widely. So that it is not possible to detect hate speech manually. GRU is one of the deep learning methods that has the ability to learn information relations from the previous time to the present time. In this research feature extraction used is word2vec, because it has the ability to learn semantics between words. In this research the GRU performance will be compared with other supervision methods such as support vector machine, naive bayes, decision tree and logistic regression. The results obtained show that the best accuracy is 92.96% by the GRU model with word2vec feature extraction. The use of word2vec in the comparison supervision method is not good enough from tf and tf-idf.

Download Full-text