scholarly journals A Telegram Corpus for Hate Speech, Offensive Language, and Online Harm

2021 ◽  
Vol 7 ◽  
Author(s):  
Veronika Solopova ◽  
Tatjana Scheffler ◽  
Mihaela Popa-Wyatt



Author(s):  
Vildan Mercan ◽  
Akhtar Jamil ◽  
Alaa Ali Hameed ◽  
Irfan Ahmed Magsi ◽  
Sibghatullah Bazai ◽  
...  


2019 ◽  
Author(s):  
Rahmadsyah Rangkuti ◽  
Zulfan . ◽  
Andi Pratama Lubis

A characteristic inherent in a democratic State is a guarantee of freedom of opinion and expression by every citizen. However, the space for freedom cannot be misused to express various ideas or views so that it becomes a tool to attack human rights and the freedom of others manifested in the form of hate speech. Acts of hate speech are currently getting more and more attention from various circles, not only for law enforcers and practitioners, politicians, information and communication technology experts. But it is also a very serious concern for the Indonesian government to form and give birth to regulations concerning to handling of hate speech. Moreover, caring for diversity and harmonization in diversity in the era of globalization of information technology is the biggest challenge today. In this study, phenomenology is used as research design whereas purposive sampling from online media is used to collect the data. The aim is to maintain unity in the midst of a multicultural community life such as Batu Bara. On the other hand, the emergence of discussions about hate speech actually gave the object of a new study for linguistics. Based on the linguistic perspective, hate speech is a phenomenon of offensive language that can present linguistic data and can be analyzed linguistically. Therefore, this article conceptually describes the role of linguistics and linguists in understanding and explaining the subject of hate speech.







2021 ◽  
Vol 9 (1) ◽  
pp. 171-180
Author(s):  
Sünje Paasch-Colberg ◽  
Christian Strippel ◽  
Joachim Trebbe ◽  
Martin Emmer

In recent debates on offensive language in participatory online spaces, the term ‘hate speech’ has become especially prominent. Originating from a legal context, the term usually refers to violent threats or expressions of prejudice against particular groups on the basis of race, religion, or sexual orientation. However, due to its explicit reference to the emotion of hate, it is also used more colloquially as a general label for any kind of negative expression. This ambiguity leads to misunderstandings in discussions about hate speech and challenges its identification. To meet this challenge, this article provides a modularized framework to differentiate various forms of hate speech and offensive language. On the basis of this framework, we present a text annotation study of 5,031 user comments on the topic of immigration and refuge posted in March 2019 on three German news sites, four Facebook pages, 13 YouTube channels, and one right-wing blog. An in-depth analysis of these comments identifies various types of hate speech and offensive language targeting immigrants and refugees. By exploring typical combinations of labeled attributes, we empirically map the variety of offensive language in the subject area ranging from insults to calls for hate crimes, going beyond the common ‘hate/no-hate’ dichotomy found in similar studies. The results are discussed with a focus on the grey area between hate speech and offensive language.



Author(s):  
S.E. VISWAPRIYA ◽  
AJAY GOUR ◽  
BOLLOJU GOPI CHAND


2020 ◽  
Author(s):  
Hammad Rizwan ◽  
Muhammad Haroon Shakeel ◽  
Asim Karim


Author(s):  
Wyatt Dorris ◽  
Ruijia (Roger) Hu ◽  
Nishant Vishwamitra ◽  
Feng Luo ◽  
Matthew Costello


Author(s):  
Tharindu Ranasinghe ◽  
Marcos Zampieri

Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g., hate speech, cyberbullying, and cyberaggression). The clear majority of these studies deal with English partially because most annotated datasets available contain English data. In this article, we take advantage of available English datasets by applying cross-lingual contextual word embeddings and transfer learning to make predictions in low-resource languages. We project predictions on comparable data in Arabic, Bengali, Danish, Greek, Hindi, Spanish, and Turkish. We report results of 0.8415 F1 macro for Bengali in TRAC-2 shared task [23], 0.8532 F1 macro for Danish and 0.8701 F1 macro for Greek in OffensEval 2020 [58], 0.8568 F1 macro for Hindi in HASOC 2019 shared task [27], and 0.7513 F1 macro for Spanish in in SemEval-2019 Task 5 (HatEval) [7], showing that our approach compares favorably to the best systems submitted to recent shared tasks on these three languages. Additionally, we report competitive performance on Arabic and Turkish using the training and development sets of OffensEval 2020 shared task. The results for all languages confirm the robustness of cross-lingual contextual embeddings and transfer learning for this task.



Sign in / Sign up

Export Citation Format

Share Document