Monitoring Users’ Behavior: Anti-Immigration Speech Detection on Twitter

The proliferation of social media platforms changed the way people interact online. However, engagement with social media comes with a price, the users’ privacy. Breaches of users’ privacy, such as the Cambridge Analytica scandal, can reveal how the users’ data can be weaponized in political campaigns, which many times trigger hate speech and anti-immigration views. Hate speech detection is a challenging task due to the different sources of hate that can have an impact on the language used, as well as the lack of relevant annotated data. To tackle this, we collected and manually annotated an immigration-related dataset of publicly available Tweets in UK, US, and Canadian English. In an empirical study, we explored anti-immigration speech detection utilizing various language features (word n-grams, character n-grams) and measured their impact on a number of trained classifiers. Our work demonstrates that using word n-grams results in higher precision, recall, and f-score as compared to character n-grams. Finally, we discuss the implications of these results for future work on hate-speech detection and social media data analysis in general.

Download Full-text

Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model

Informatics ◽

10.3390/informatics8040069 ◽

2021 ◽

Vol 8 (4) ◽

pp. 69

Author(s):

Wassen Aldjanabi ◽

Abdelghani Dahou ◽

Mohammed A. A. Al-qaness ◽

Mohamed Abd Elaziz ◽

Ahmed Mohamed Helmi ◽

...

Keyword(s):

Social Media ◽

Hate Speech ◽

Detection System ◽

Language Model ◽

Arabic Language ◽

Social Phenomena ◽

Speech Detection ◽

Task Learning ◽

Opinion Expression ◽

Social Media Platforms

As social media platforms offer a medium for opinion expression, social phenomena such as hatred, offensive language, racism, and all forms of verbal violence have increased spectacularly. These behaviors do not affect specific countries, groups, or communities only, extending beyond these areas into people’s everyday lives. This study investigates offensive and hate speech on Arab social media to build an accurate offensive and hate speech detection system. More precisely, we develop a classification system for determining offensive and hate speech using a multi-task learning (MTL) model built on top of a pre-trained Arabic language model. We train the MTL model on the same task using cross-corpora representing a variation in the offensive and hate context to learn global and dataset-specific contextual representations. The developed MTL model showed a significant performance and outperformed existing models in the literature on three out of four datasets for Arabic offensive and hate speech detection tasks.

Download Full-text

Citizen Engagement and Social Media

International Journal of E-Politics ◽

10.4018/ijep.2019070103 ◽

2019 ◽

Vol 10 (2) ◽

pp. 24-43

Author(s):

Rodrigo Sandoval-Almazan ◽

Juan Carlos Montes de Oca Lopez

Keyword(s):

Social Media ◽

Political Campaigns ◽

Election Campaigns ◽

Weekly Basis ◽

Social Media Data ◽

Social Media Platforms ◽

Use Of Social Media ◽

High Level ◽

Media Data

Social media has transformed election campaigns around the world. While it is difficult to determine to what extent social media influence voters' decisions, there is no doubt that social media platforms impact on candidate advertising and public debate during elections. This research, the methodological formulation of which is based on a case study, seeks to investigate the use of social media during political campaigns to collect signatures of support. In the elections of 2018, aspiring candidates for presidential election required a certain number of signatures of support in order to register as official candidates. We collected social media data on a weekly basis from the Twitter, Facebook, and YouTube accounts of seven candidates and contrasted this data with the number of signatures validated by the electoral authority. We found no relationship between the level of support received and the use of social media in the case of any of the candidates. However, we observed candidates who did achieve the required number of signatures and who did receive official presidential candidate status as a result of their high level of visibility. This research contributes methodologically to the current literature and provides empirical evidence regarding independent candidates in Mexico.

Download Full-text

Citizen Engagement and Social Media

10.4018/978-1-6684-3706-3.ch051 ◽

2022 ◽

pp. 945-966

Author(s):

Rodrigo Sandoval-Almazan ◽

Juan Carlos Montes de Oca Lopez

Keyword(s):

Social Media ◽

Presidential Election ◽

Political Campaigns ◽

Election Campaigns ◽

The World ◽

Social Media Platforms ◽

Use Of Social Media ◽

High Level ◽

Media Data

Download Full-text

Embed2Detect: temporally clustered embedded words for event detection in social media

Machine Learning ◽

10.1007/s10994-021-05988-7 ◽

2021 ◽

Author(s):

Hansi Hettiarachchi ◽

Mariam Adedoyin-Olowe ◽

Jagdev Bhogal ◽

Mohamed Medhat Gaber

Keyword(s):

Social Media ◽

Event Detection ◽

High Volume ◽

Detection Methods ◽

Word Embeddings ◽

Agglomerative Clustering ◽

Data Set ◽

Social Media Data ◽

Social Media Platforms ◽

Media Data

AbstractSocial media is becoming a primary medium to discuss what is happening around the world. Therefore, the data generated by social media platforms contain rich information which describes the ongoing events. Further, the timeliness associated with these data is capable of facilitating immediate insights. However, considering the dynamic nature and high volume of data production in social media data streams, it is impractical to filter the events manually and therefore, automated event detection mechanisms are invaluable to the community. Apart from a few notable exceptions, most previous research on automated event detection have focused only on statistical and syntactical features in data and lacked the involvement of underlying semantics which are important for effective information retrieval from text since they represent the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in word embeddings and hierarchical agglomerative clustering. The adoption of word embeddings gives Embed2Detect the capability to incorporate powerful semantical features into event detection and overcome a major limitation inherent in previous approaches. We experimented our method on two recent real social media data sets which represent the sports and political domain and also compared the results to several state-of-the-art methods. The obtained results show that Embed2Detect is capable of effective and efficient event detection and it outperforms the recent event detection methods. For the sports data set, Embed2Detect achieved 27% higher F-measure than the best-performed baseline and for the political data set, it was an increase of 29%.

Download Full-text

A Web Interface for Analyzing Hate Speech

Future Internet ◽

10.3390/fi13030080 ◽

2021 ◽

Vol 13 (3) ◽

pp. 80

Author(s):

Lazaros Vrysis ◽

Nikolaos Vryzas ◽

Rigas Kotsakis ◽

Theodora Saridou ◽

Maria Matsiola ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Graphical User Interface ◽

Hate Speech ◽

Web Interface ◽

Learning Models ◽

Speech Detection ◽

Media Services ◽

The Web ◽

Machine Learning Models

Social media services make it possible for an increasing number of people to express their opinion publicly. In this context, large amounts of hateful comments are published daily. The PHARM project aims at monitoring and modeling hate speech against refugees and migrants in Greece, Italy, and Spain. In this direction, a web interface for the creation and the query of a multi-source database containing hate speech-related content is implemented and evaluated. The selected sources include Twitter, YouTube, and Facebook comments and posts, as well as comments and articles from a selected list of websites. The interface allows users to search in the existing database, scrape social media using keywords, annotate records through a dedicated platform and contribute new content to the database. Furthermore, the functionality for hate speech detection and sentiment analysis of texts is provided, making use of novel methods and machine learning models. The interface can be accessed online with a graphical user interface compatible with modern internet browsers. For the evaluation of the interface, a multifactor questionnaire was formulated, targeting to record the users’ opinions about the web interface and the corresponding functionality.

Download Full-text

Emotionally Informed Hate Speech Detection: A Multi-target Perspective

Cognitive Computation ◽

10.1007/s12559-021-09862-5 ◽

2021 ◽

Author(s):

Patricia Chiril ◽

Endang Wahyu Pamungkas ◽

Farah Benamara ◽

Véronique Moriceau ◽

Viviana Patti

Keyword(s):

Hate Speech ◽

Binary Classification ◽

Online Communication ◽

Vulnerable Groups ◽

Speech Detection ◽

Social Media Platforms ◽

Sentic Computing ◽

Specific Manifestation ◽

The Impact ◽

First Time

AbstractHate Speech and harassment are widespread in online communication, due to users' freedom and anonymity and the lack of regulation provided by social media platforms. Hate speech is topically focused (misogyny, sexism, racism, xenophobia, homophobia, etc.), and each specific manifestation of hate speech targets different vulnerable groups based on characteristics such as gender (misogyny, sexism), ethnicity, race, religion (xenophobia, racism, Islamophobia), sexual orientation (homophobia), and so on. Most automatic hate speech detection approaches cast the problem into a binary classification task without addressing either the topical focus or the target-oriented nature of hate speech. In this paper, we propose to tackle, for the first time, hate speech detection from a multi-target perspective. We leverage manually annotated datasets, to investigate the problem of transferring knowledge from different datasets with different topical focuses and targets. Our contribution is threefold: (1) we explore the ability of hate speech detection models to capture common properties from topic-generic datasets and transfer this knowledge to recognize specific manifestations of hate speech; (2) we experiment with the development of models to detect both topics (racism, xenophobia, sexism, misogyny) and hate speech targets, going beyond standard binary classification, to investigate how to detect hate speech at a finer level of granularity and how to transfer knowledge across different topics and targets; and (3) we study the impact of affective knowledge encoded in sentic computing resources (SenticNet, EmoSenticNet) and in semantically structured hate lexicons (HurtLex) in determining specific manifestations of hate speech. We experimented with different neural models including multitask approaches. Our study shows that: (1) training a model on a combination of several (training sets from several) topic-specific datasets is more effective than training a model on a topic-generic dataset; (2) the multi-task approach outperforms a single-task model when detecting both the hatefulness of a tweet and its topical focus in the context of a multi-label classification approach; and (3) the models incorporating EmoSenticNet emotions, the first level emotions of SenticNet, a blend of SenticNet and EmoSenticNet emotions or affective features based on Hurtlex, obtained the best results. Our results demonstrate that multi-target hate speech detection from existing datasets is feasible, which is a first step towards hate speech detection for a specific topic/target when dedicated annotated data are missing. Moreover, we prove that domain-independent affective knowledge, injected into our models, helps finer-grained hate speech detection.

Download Full-text

Social Media Toxicity Classification Using Deep Learning: Real-World Application UK Brexit

Electronics ◽

10.3390/electronics10111332 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1332

Author(s):

Hong Fan ◽

Wu Du ◽

Abdelghani Dahou ◽

Ahmed A. Ewees ◽

Dalia Yousri ◽

...

Keyword(s):

Social Media ◽

Hate Speech ◽

Modern Society ◽

User Generated Content ◽

Real World Application ◽

Proposed Model ◽

Social Media Platforms ◽

Public Dataset ◽

The Uk ◽

Harmful Side Effect

Social media has become an essential facet of modern society, wherein people share their opinions on a wide variety of topics. Social media is quickly becoming indispensable for a majority of people, and many cases of social media addiction have been documented. Social media platforms such as Twitter have demonstrated over the years the value they provide, such as connecting people from all over the world with different backgrounds. However, they have also shown harmful side effects that can have serious consequences. One such harmful side effect of social media is the immense toxicity that can be found in various discussions. The word toxic has become synonymous with online hate speech, internet trolling, and sometimes outrage culture. In this study, we build an efficient model to detect and classify toxicity in social media from user-generated content using the Bidirectional Encoder Representations from Transformers (BERT). The BERT pre-trained model and three of its variants has been fine-tuned on a well-known labeled toxic comment dataset, Kaggle public dataset (Toxic Comment Classification Challenge). Moreover, we test the proposed models with two datasets collected from Twitter from two different periods to detect toxicity in user-generated content (tweets) using hashtages belonging to the UK Brexit. The results showed that the proposed model can efficiently classify and analyze toxic tweets.

Download Full-text

Google Plus as a Contentious Field of Revolutionary Identity

Comparative Sociology ◽

10.1163/15691330-bja10036 ◽

2021 ◽

Vol 20 (3) ◽

pp. 402-416

Author(s):

Amirhossein Teimouri

Keyword(s):

Social Media ◽

Iranian Revolution ◽

Social Media Data ◽

Social Media Platforms ◽

New Generation ◽

Media Data

Abstract Social media platforms have been increasingly reinvigorating extreme movements, especially rightist movements. Utilizing unique Google Plus data, the author shows the rise and fall of the 2015 rightist anti-Nuclear Deal movement in Iran. He argues that the Google Plus platform in 2015 provided the new generation of revolutionary Islamist rightist activists with a contentious space of mobilization, enabling them to develop a new revolutionary rightist identity. This revolutionary identity and its corresponding language and discourse did not fully unfold in Iranian mainstream rightist media, even though rightist groups, compared to liberal groups, are not censored and repressed. The new generation of rightist activists perceived the Nuclear Deal as an existential threat to revolutionary principles of the country, and thus played out their outrage and identity anxieties on Google Plus. The author contends that this online outrage, due to the activists’ identity bond with the regime and the 1979 Iranian Revolution, however, did not translate into any massive offline mobilization against the Nuclear Deal. He also discusses the methodological implications of using social media data, especially the discontinuation of Google Plus.

Download Full-text

IMPACT OF SOCIAL MEDIA ON SALES FUNNELS IN B2C AND B2B SEGMENTS IN THE REPUBLIC OF NORTH MACEDONIA

Зборник радова Економског факултета у Источном Сарајеву ◽

10.7251/zrefis2122051k ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Martin Kiselicki ◽

Saso Josimovski ◽

Lidija Pulevska Ivanovska ◽

Mijalce Santa

Keyword(s):

Social Media ◽

New Technologies ◽

Secondary Data ◽

Primary Data ◽

World Population ◽

Social Media Platforms ◽

The Republic ◽

New Generation ◽

Sales Funnel ◽

Media Data

The research focuses on introducing social media platforms as either a complementary or main channel in the company sales funnel. Internet technologies and Web 2.0 continue to provide innovations in digital marketing, with the latest iteration being lead generation services through social media. Data shows that almost half of the world population is active on social media, with the new Generation Alpha being projected to be entirely online dependent and proficient in the use of new technologies. The paper provides an overview of the digitalization of sales funnels, as well as the benefits that social media platforms can offer if implemented correctly. Secondary data provides the basis for transforming sales funnels with social media, where previous research provides limited data on the effectiveness of these types of efforts. Primary data demonstrates that introducing social media platforms can provide improvements of up to 3 to 4 times in analyzed case studies, as well as the shorter time when deciding about purchase in use case scenarios. Social media advertising can also be utilized to shorten the sales funnel process and serve as a unified point of entrance and exit in the first few stages.

Download Full-text

Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media

10.20944/preprints202011.0646.v1 ◽

2020 ◽

Author(s):

Neeraj Vashistha ◽

Arkaitz Zubiaga

Keyword(s):

Social Media ◽

Hate Speech ◽

Model Performance ◽

Academic Community ◽

Human Interaction ◽

Superior Performance ◽

Competitive Performance ◽

Speech Detection ◽

Improve Model ◽

Use Of The Internet

The exponential increase in the use of the Internet and social media over the last two decades has changed human interaction. This has led to many positive outcomes, but at the same time it has brought risks and harms. While the volume of harmful content online, such as hate speech, is not manageable by humans, interest in the academic community to investigate automated means for hate speech detection has increased. In this study, we analyse six publicly available datasets by combining them into a single homogeneous dataset and classify them into three classes, abusive, hateful or neither. We create a baseline model and we improve model performance scores using various optimisation techniques. After attaining a competitive performance score, we create a tool which identifies and scores a page with effective metric in near-real time and uses the same as feedback to re-train our model. We prove the competitive performance of our multilingual model on two langauges, English and Hindi, leading to comparable or superior performance to most monolingual models.

Download Full-text