A Web Interface for Analyzing Hate Speech

Social media services make it possible for an increasing number of people to express their opinion publicly. In this context, large amounts of hateful comments are published daily. The PHARM project aims at monitoring and modeling hate speech against refugees and migrants in Greece, Italy, and Spain. In this direction, a web interface for the creation and the query of a multi-source database containing hate speech-related content is implemented and evaluated. The selected sources include Twitter, YouTube, and Facebook comments and posts, as well as comments and articles from a selected list of websites. The interface allows users to search in the existing database, scrape social media using keywords, annotate records through a dedicated platform and contribute new content to the database. Furthermore, the functionality for hate speech detection and sentiment analysis of texts is provided, making use of novel methods and machine learning models. The interface can be accessed online with a graphical user interface compatible with modern internet browsers. For the evaluation of the interface, a multifactor questionnaire was formulated, targeting to record the users’ opinions about the web interface and the corresponding functionality.

Download Full-text

Comparison Between Traditional Machine Learning Models And Neural Network Models For Vietnamese Hate Speech Detection

2020 RIVF International Conference on Computing and Communication Technologies (RIVF) ◽

10.1109/rivf48685.2020.9140745 ◽

2020 ◽

Cited By ~ 2

Author(s):

Son T. Luu ◽

Hung P. Nguyen ◽

Kiet Van Nguyen ◽

Ngan Luu-Thuy Nguyen

Keyword(s):

Neural Network ◽

Machine Learning ◽

Hate Speech ◽

Network Models ◽

Learning Models ◽

Neural Network Models ◽

Speech Detection ◽

Machine Learning Models

Download Full-text

Utilizing Blockchain Technology in Social Media Bot Identification

10.36227/techrxiv.12049374 ◽

2020 ◽

Author(s):

Shreya Reddy ◽

Lisa Ewen ◽

Pankti Patel ◽

Prerak Patel ◽

Ankit Kundal ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Gold Standard ◽

The Internet ◽

Learning Models ◽

Current Time ◽

Machine Learning Methods ◽

Blockchain Technology ◽

Modern Age ◽

Machine Learning Models

<p>As bots become more prevalent and smarter in the modern age of the internet, it becomes ever more important that they be identified and removed. Recent research has dictated that machine learning methods are accurate and the gold standard of bot identification on social media. Unfortunately, machine learning models do not come without their negative aspects such as lengthy training times, difficult feature selection, and overwhelming pre-processing tasks. To overcome these difficulties, we are proposing a blockchain framework for bot identification. At the current time, it is unknown how this method will perform, but it serves to prove the existence of an overwhelming gap of research under this area.<i></i></p>

Download Full-text

Triage and diagnosis of COVID-19 from medical social media (Preprint)

10.2196/preprints.30397 ◽

2021 ◽

Author(s):

Abul Hasan ◽

Mark Levene ◽

David Weston ◽

Renate Fromson ◽

Nicolas Koslover ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Models ◽

Rule Based ◽

Additional Information ◽

Processing Pipeline ◽

Machine Learning Models

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.

Download Full-text

Telugu News Data Classification Using Machine Learning Approach

10.4018/978-1-7998-7685-4.ch014 ◽

2022 ◽

pp. 181-194

Author(s):

Bala Krishna Priya G. ◽

Jabeen Sultana ◽

Usha Rani M.

Keyword(s):

Machine Learning ◽

Social Media ◽

Research Work ◽

Learning Approach ◽

Fake News ◽

Learning Models ◽

Machine Learning Classifiers ◽

Proposed Model ◽

Machine Learning Approach ◽

Machine Learning Models

Mining Telugu news data and categorizing based on public sentiments is quite important since a lot of fake news emerged with rise of social media. Identifying whether news text is positive, negative, or neutral and later classifying the data in which areas they fall like business, editorial, entertainment, nation, and sports is included throughout this research work. This research work proposes an efficient model by adopting machine learning classifiers to perform classification on Telugu news data. The results obtained by various machine-learning models are compared, and an efficient model is found, and it is observed that the proposed model outperformed with reference to accuracy, precision, recall, and F1-score.

Download Full-text

ML-CB: Machine Learning Canvas Block

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2021-0056 ◽

2021 ◽

Vol 2021 (3) ◽

pp. 453-473

Author(s):

Nathan Reitinger ◽

Michelle L. Mazurek

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Semantic Representation ◽

Source Code ◽

Online Privacy ◽

Learning Approach ◽

Learning Models ◽

One Step ◽

The Web ◽

Machine Learning Models

Abstract With the aim of increasing online privacy, we present a novel, machine-learning based approach to blocking one of the three main ways website visitors are tracked online—canvas fingerprinting. Because the act of canvas fingerprinting uses, at its core, a JavaScript program, and because many of these programs are reused across the web, we are able to fit several machine learning models around a semantic representation of a potentially offending program, achieving accurate and robust classifiers. Our supervised learning approach is trained on a dataset we created by scraping roughly half a million websites using a custom Google Chrome extension storing information related to the canvas. Classification leverages our key insight that the images drawn by canvas fingerprinting programs have a facially distinct appearance, allowing us to manually classify files based on the images drawn; we take this approach one step further and train our classifiers not on the malleable images themselves, but on the more-difficult-to-change, underlying source code generating the images. As a result, ML-CB allows for more accurate tracker blocking.

Download Full-text

Building High Performance Explainable Machine Learning Models for Social Media-based Substance Use Prediction

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821302060009x ◽

2020 ◽

Vol 29 (03n04) ◽

pp. 2060009

Author(s):

Tao Ding ◽

Fatema Hasan ◽

Warren K. Bickel ◽

Shimei Pan

Keyword(s):

Machine Learning ◽

Social Media ◽

Substance Use ◽

Human Behavior ◽

High Performance ◽

Supervised Machine Learning ◽

Learning Models ◽

Wide Range ◽

And Behavior ◽

Machine Learning Models

Social media contain rich information that can be used to help understand human mind and behavior. Social media data, however, are mostly unstructured (e.g., text and image) and a large number of features may be needed to represent them (e.g., we may need millions of unigrams to represent social media texts). Moreover, accurately assessing human behavior is often difficult (e.g., assessing addiction may require medical diagnosis). As a result, the ground truth data needed to train a supervised human behavior model are often difficult to obtain at a large scale. To avoid overfitting, many state-of-the-art behavior models employ sophisticated unsupervised or self-supervised machine learning methods to leverage a large amount of unsupervised data for both feature learning and dimension reduction. Unfortunately, despite their high performance, these advanced machine learning models often rely on latent features that are hard to explain. Since understanding the knowledge captured in these models is important to behavior scientists and public health providers, we explore new methods to build machine learning models that are not only accurate but also interpretable. We evaluate the effectiveness of the proposed methods in predicting Substance Use Disorders (SUD). We believe the methods we proposed are general and applicable to a wide range of data-driven human trait and behavior analysis applications.

Download Full-text

Politicians in the line of fire: Incivility and the treatment of women on social media

Research & Politics ◽

10.1177/2053168018816228 ◽

2019 ◽

Vol 6 (1) ◽

pp. 205316801881622 ◽

Cited By ~ 11

Author(s):

Ludovic Rheault ◽

Erica Rayment ◽

Andreea Musulan

Keyword(s):

Machine Learning ◽

Social Media ◽

Digital Age ◽

Women In Politics ◽

Learning Models ◽

Media Reports ◽

Machine Learning Models

A seemingly inescapable feature of the digital age is that people choosing to devote their lives to politics must now be ready to face a barrage of insults and disparaging comments targeted at them through social media. This article represents an effort to document this phenomenon systematically. We implement machine learning models to predict the incivility of about 2.2 m messages addressed to Canadian politicians and US Senators on Twitter. Specifically, we test whether women in politics are more heavily targeted by online incivility, as recent media reports suggested. Our estimates indicate that roughly 15% of public messages sent to Senators can be categorized as uncivil, whereas the proportion is about four points lower in Canada. We find evidence that women are more heavily targeted by uncivil messages than men, although only among highly visible politicians.

Download Full-text

Advances in Machine Learning Algorithms for Hate Speech Detection in Social Media: A Review

IEEE Access ◽

10.1109/access.2021.3089515 ◽

2021 ◽

pp. 1-1

Author(s):

Nanlir Sallau Mullah ◽

Wan Mohd Nazmee Wan Zainon

Keyword(s):

Machine Learning ◽

Social Media ◽

Hate Speech ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Speech Detection

Download Full-text

Sentiment analysis of Social Media Text-Emoticon Post with Machine learning Models Contribution Title

Journal of Physics Conference Series ◽

10.1088/1742-6596/2070/1/012079 ◽

2021 ◽

Vol 2070 (1) ◽

pp. 012079

Author(s):

V Jagadishwari ◽

A Indulekha ◽

Kiran Raghu ◽

P Harshini

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Online Social Networks ◽

Data Sets ◽

Learning Models ◽

Twitter Data ◽

The Social ◽

Social Media Text ◽

Machine Learning Models

Abstract Social Media is an arena in recent times for people to share their perspectives on a variety of topics. Most of the social interactions are through the Social Media. Though all the Online Social Networks allow users to express their views and opinions in many forms like audio, video, text etc, the most popular form of expression is text, Emoticons and Emojis. The work presented in this paper aims at detecting the sentiments expressed in the Social Media posts. The Machine Learning Models namely Bernoulli Bayes, Multinomial Bayes, Regression and SVM were implemented. All these models were trained and tested with Twitter Data sets. Users on Twitter express their opinions in the form of tweets with limited characters. Tweets also contain Emoticons and Emojis therefore Twitter data sets are best suited for the sentiment analysis. The effect of emoticons present in the tweet is also analyzed. The models are first trained only with the text and then they are trained with text and emoticon in the tweet. The performance of all the four models in both cases are tested and the results are presented in the paper.

Download Full-text

Sinhala Hate Speech Detection in Social Media using Text Mining and Machine learning

2019 19th International Conference on Advances in ICT for Emerging Regions (ICTer) ◽

10.1109/icter48817.2019.9023655 ◽

2019 ◽

Author(s):

H.M.S.T Sandaruwan ◽

S.A.S Lorensuhewa ◽

M.A.L Kalyani

Keyword(s):

Machine Learning ◽

Social Media ◽

Text Mining ◽

Hate Speech ◽

Speech Detection

Download Full-text