Prototype Application Hate Speech Detection Website Using String Matching and Searching Algorithm

2018 ◽  
Vol 7 (2.5) ◽  
pp. 62 ◽  
Author(s):  
Nuning Kurniasih ◽  
Leon Andretti Abdillah ◽  
I Ketut Sudarsana ◽  
I Wayan Lali Yogantara ◽  
I Nyoman Temon Astawa ◽  
...  

Hate speech is now a problem for social media users such as Facebook, Twitter, Whatsapp and also Telegram. The current social media users are also a lot to post, share the content both consciously and unconsciously to various social media as well as even some hate speech postings are shared by irresponsible parties to gain profit from the chaos that he created, denigrating religion, vilify certain individuals even as an act of provocation. Prototype hate speech detection application created to detect hate speech on Facebook and it can give notification to users to be more aware of social media content and also careful in reading, share content that can trigger unpleasant actions.  

Author(s):  
Safa Alsafari

Large and accurately labeled textual corpora are vital to developing efficient hate speech classifiers. This paper introduces an ensemble-based semi-supervised learning approach to leverage the availability of abundant social media content. Starting with a reliable hate speech dataset, we train and test diverse classifiers that are then used to label a corpus of one million tweets. Next, we investigate several strategies to select the most confident labels from the obtained pseudo labels. We assess these strategies by re-training all the classifiers with the seed dataset augmented with the trusted pseudo-labeled data. Finally, we demonstrate that our approach improves classification performance over supervised hate speech classification methods.


2021 ◽  
Vol 13 (3) ◽  
pp. 80
Author(s):  
Lazaros Vrysis ◽  
Nikolaos Vryzas ◽  
Rigas Kotsakis ◽  
Theodora Saridou ◽  
Maria Matsiola ◽  
...  

Social media services make it possible for an increasing number of people to express their opinion publicly. In this context, large amounts of hateful comments are published daily. The PHARM project aims at monitoring and modeling hate speech against refugees and migrants in Greece, Italy, and Spain. In this direction, a web interface for the creation and the query of a multi-source database containing hate speech-related content is implemented and evaluated. The selected sources include Twitter, YouTube, and Facebook comments and posts, as well as comments and articles from a selected list of websites. The interface allows users to search in the existing database, scrape social media using keywords, annotate records through a dedicated platform and contribute new content to the database. Furthermore, the functionality for hate speech detection and sentiment analysis of texts is provided, making use of novel methods and machine learning models. The interface can be accessed online with a graphical user interface compatible with modern internet browsers. For the evaluation of the interface, a multifactor questionnaire was formulated, targeting to record the users’ opinions about the web interface and the corresponding functionality.


Author(s):  
Neeraj Vashistha ◽  
Arkaitz Zubiaga

The exponential increase in the use of the Internet and social media over the last two decades has changed human interaction. This has led to many positive outcomes, but at the same time it has brought risks and harms. While the volume of harmful content online, such as hate speech, is not manageable by humans, interest in the academic community to investigate automated means for hate speech detection has increased. In this study, we analyse six publicly available datasets by combining them into a single homogeneous dataset and classify them into three classes, abusive, hateful or neither. We create a baseline model and we improve model performance scores using various optimisation techniques. After attaining a competitive performance score, we create a tool which identifies and scores a page with effective metric in near-real time and uses the same as feedback to re-train our model. We prove the competitive performance of our multilingual model on two langauges, English and Hindi, leading to comparable or superior performance to most monolingual models.


PLoS ONE ◽  
2020 ◽  
Vol 15 (8) ◽  
pp. e0237861
Author(s):  
Marzieh Mozafari ◽  
Reza Farahbakhsh ◽  
Noël Crespi

2020 ◽  
Vol 10 (12) ◽  
pp. 4180 ◽  
Author(s):  
Komal Florio ◽  
Valerio Basile ◽  
Marco Polignano ◽  
Pierpaolo Basile ◽  
Viviana Patti

The availability of large annotated corpora from social media and the development of powerful classification approaches have contributed in an unprecedented way to tackle the challenge of monitoring users’ opinions and sentiments in online social platforms across time. Such linguistic data are strongly affected by events and topic discourse, and this aspect is crucial when detecting phenomena such as hate speech, especially from a diachronic perspective. We address this challenge by focusing on a real case study: the “Contro l’odio” platform for monitoring hate speech against immigrants in the Italian Twittersphere. We explored the temporal robustness of a BERT model for Italian (AlBERTo), the current benchmark on non-diachronic detection settings. We tested different training strategies to evaluate how the classification performance is affected by adding more data temporally distant from the test set and hence potentially different in terms of topic and language use. Our analysis points out the limits that a supervised classification model encounters on data that are heavily influenced by events. Our results show how AlBERTo is highly sensitive to the temporal distance of the fine-tuning set. However, with an adequate time window, the performance increases, while requiring less annotated data than a traditional classifier.


2020 ◽  
Vol 13 (1) ◽  
pp. 74
Author(s):  
Joko Sutarso

Abstrak. Penggunaan media sosial semakin meningkat dari tahun ke tahun, namun demikian tidak semua konten media sosial memiliki sisi positif. Beberapa dampak negatif penggunaan media sosial seperti penyebaran berita bohong (hoax), ujaran kebencian (hate speech), perundungan (cyberbullying) dan konten negatiflainnya merupakan bentuk-bentuk penyalahgunaan media sosial menjadi keprihatinan masyarakat karena telah memasuki  ranah sosial, politik, ekonomi dan bahkan keagamaan. Hal ini tidak terlepas dari kapitalisasi koorporasi media sosial yang terus berkembang dengan terpaan yang semakin meluas melintasi batas negara dan bangsa, masuk dalam kehidupan berbagai generasi, strata sosial ekonomi, tingkat pendidikan dan latar belakang pendidikan serta pengalaman. Metode yang digunakan dalam tulisan ini adalah teoritis kualitatif yang didasarkan pada pengamatan terhadap isi media sosial dan kajian teoritis yang berusaha menjelaskan pengaruh isi media terhadap perilaku masyarakat dalam bermedia sebagai bahan pengayaan  (enrichment) bagi kegiatan literasi media sosial di kalangan masyarakat bagi para pegiat literasi. Penjelasan teoritis yang dipakai meliputi aspek positif dan negatif dilihat dari aspek sosial, politik, psikologi, pendidikan dan kebudayaan. Hasilnya konten budaya lokal memiliki peluang mengisi konten dalam ruang media sosial dan konten budaya lokal yang selektif, kreatif, edukatif, dan sekaligus menghibur  dapat digunakan untuk meminimalkan dampak negatif globalisasi dan kapitalisme media sosial. Manfaat lain dari sosialisasi dari promosi budaya lokal di media sosial adalah untuk meningkatkan integrasi masyarakat karena didalamnya terdapat nilai-nilai kearifan lokal yang memiliki nilai bersifat nasional bahkan universal.Abstract. Social media uses have been increasing from year to year. However, not all social media content has a positive side. Some negative effects of social media from hoaxes, hate speech, cyberbullying to other negative content are the forms of abuse of social media. It is concern to the public because these have entered the social, political, economic and religious spheres. It is definitely inseparable from the capitalization of a social media corporation. It has been developing with increasingly widespread exposure across national borders, and it has been entering into the lives of various generations, socio-economic strata, education levels and educational backgrounds and experiences as well. The research method used in this research was a qualitative theoretical approach based on observations of social media content and theoretical studies. It aims at seeking to explain the influence of media content on people's behavior in their media use as the enrichment material for social media literacy activities in society for literacy activists. The theoretical explanations used in this research include positive and negative aspects. In this matter the social, political, psychological, educational and cultural perspectives will see the aspects. Moreover, the research results show that local cultural content has the opportunity to fill content in the social media space. Selective, creative, educative, and entertaining local cultural content can be used to minimize the negative effects of globalization and social media capitalism. Another benefit of socialization of local culture promotion on social media is to increase social integration because in the local culture there are local wisdom values and national or universal values as well.


Author(s):  
Junanda Patihullah ◽  
Edi Winarko

Social media has changed the people mindset to express thoughts and moods. As the activity of social media users increases, it does not rule out the possibility of crimes of spreading hate speech can spread quickly and widely. So that it is not possible to detect hate speech manually. GRU is one of the deep learning methods that has the ability to learn information relations from the previous time to the present time. In this research feature extraction used is word2vec, because it has the ability to learn semantics between words. In this research the GRU performance will be compared with other supervision methods such as support vector machine, naive bayes, decision tree and logistic regression. The results obtained show that the best accuracy is 92.96% by the GRU model with word2vec feature extraction. The use of word2vec in the comparison supervision method is not good enough from tf and tf-idf.


Sign in / Sign up

Export Citation Format

Share Document