Exploring fake news identification using word and sentence embeddings

The widespread use of social media like Facebook, Twitter, Whatsapp, etc. has changed the way News is created and published; accessing news has become easy and inexpensive. However, the scale of usage and inability to moderate the content has made social media, a breeding ground for the circulation of fake news. Fake news is deliberately created either to increase the readership or disrupt the order in the society for political and commercial benefits. It is of paramount importance to identify and filter out fake news especially in democratic societies. Most existing methods for detecting fake news involve traditional supervised machine learning which has been quite ineffective. In this paper, we are analyzing word embedding features that can tell apart fake news from true news. We use the LIAR and ISOT data set. We churn out highly correlated news data from the entire data set by using cosine similarity and other such metrices, in order to distinguish their domains based on central topics. We then employ auto-encoders to detect and differentiate between true and fake news while also exploring their separability through network analysis.

Download Full-text

Towards Automatic Fake News Detection in Digital Platforms: Properties, Limitations, and Applications

10.5753/ctd.2021.15754 ◽

2021 ◽

Author(s):

Julio C. S. Reis ◽

Fabrício Benevenuto

Keyword(s):

Machine Learning ◽

Social Media ◽

Prediction Performance ◽

Supervised Machine Learning ◽

Learning Approaches ◽

Fake News ◽

Digital Platforms ◽

Worldwide Phenomenon ◽

Fact Checking ◽

Media Systems

Digital platforms, including social media systems and messaging applications, have become a place for campaigns of misinformation that affect the credibility of the entire news ecosystem. The emergence of fake news in these environments has quickly evolved into a worldwide phenomenon, where the lack of scalable fact-checking strategies is especially worrisome. In this context, this thesis aim at investigating practical approaches for the automatic detection of fake news disseminated in digital platforms. Particularly, we explore new datasets and features for fake news detection to assess the prediction performance of current supervised machine learning approaches. We also propose an unbiased framework for quantifying the informativeness of features for fake news detection, and present an explanation of factors contributing to model decisions considering data from different scenarios. Finally, we propose and implement a new mechanism that accounts for the potential occurrence of fake news within the data, significantly reducing the number of content pieces journalists and fact-checkers have to go through before finding a fake story.

Download Full-text

Fake News Detection using Machine Learning

ITM Web of Conferences ◽

10.1051/itmconf/20214003003 ◽

2021 ◽

Vol 40 ◽

pp. 03003

Author(s):

Prasad Kulkarni ◽

Suyash Karwande ◽

Rhucha Keskar ◽

Prashant Kale ◽

Sumitra Iyer

Keyword(s):

Machine Learning ◽

Social Media ◽

Data Extraction ◽

Fake News ◽

Detection Model ◽

News Websites ◽

Social Media Platforms ◽

Use Of Social Media ◽

Short Time ◽

The Given

Everyone depends upon various online resources for news in this modern age, where the internet is pervasive. As the use of social media platforms such as Facebook, Twitter, and others has increased, news spreads quickly among millions of users in a short time. The consequences of Fake news are far-reaching, from swaying election outcomes in favor of certain candidates to creating biased opinions. WhatsApp, Instagram, and many other social media platforms are the main source for spreading fake news. This work provides a solution by introducing a fake news detection model using machine learning. This model requires prerequisite data extracted from various news websites. Web scraping technique is used for data extraction which is further used to create datasets. The data is classified into two major categories which are true dataset and false dataset. Classifiers used for the classification of data are Random Forest, Logistic Regression, Decision Tree, KNN and Gradient Booster. Based on the output received the data is classified either as true or false data. Based on that, the user can find out whether the given news is fake or not on the webserver.

Download Full-text

Automatic detection of cyberbullying and threatening in Saudi tweets using machine learning

International Journal of ADVANCED AND APPLIED SCIENCES ◽

10.21833/ijaas.2021.10.003 ◽

2021 ◽

Vol 8 (10) ◽

pp. 17-25

Author(s):

Alghamdi et al. ◽

Keyword(s):

Machine Learning ◽

Social Media ◽

Automatic Detection ◽

Arabic Language ◽

Supervised Machine Learning ◽

Psychological State ◽

Support Vector ◽

Feature Extraction Method ◽

Efficient Detection ◽

Use Of Social Media

Social media has become a major factor in people's lives, which affects their communication and psychological state. The widespread use of social media has formed new types of violence, such as cyberbullying. Manual detection and reporting of violent texts in social media applications are challenging due to the increasing number of social media users and the huge amounts of generated data. Automatic detection of violent texts is language-dependent, and it requires an efficient detection approach, which considers the unique features and structures of a specific language or dialect. Only a few studies have focused on the automatic detection and classification of violent texts in the Arabic Language. This paper aims to build a two-level classifier model for classifying Arabic violent texts. The first level classifies text into violent and non-violent. The second level classifies violent text into either cyberbullying or threatening. The dataset used to build the classifier models is collected from Twitter, using specific keywords and trending hashtags in Saudi Arabia. Supervised machine learning is used to build two classifier models, using two different algorithms, which are Support Vector Machine (SVM), and Naive Bayes (NB). Both models are trained in different experimental settings of varying the feature extraction method and whether stop-word removal is applied or not. The performances of the proposed SVM-based and NB-based models have been compared. The SVM-based model outperforms the NB-based model with F1 scores of 76.06%, and 89.18%, and accuracy scores of 73.35% and 87.79% for the first and second levels of classification, respectively.

Download Full-text

Linguistic drivers of misinformation diffusion on social media during the COVID-19 pandemic

Italian Journal of Marketing ◽

10.1007/s43039-021-00026-9 ◽

2021 ◽

Author(s):

Giandomenico Di Domenico ◽

Annamaria Tuan ◽

Marco Visentin

Keyword(s):

Machine Learning ◽

Social Media ◽

New Technologies ◽

Fake News ◽

Conspiracy Theories ◽

Social Media Platform ◽

The Social ◽

Media Platform ◽

Textual Cues ◽

Twitter Users

AbstractIn the wake of the COVID-19 pandemic, unprecedent amounts of fake news and hoax spread on social media. In particular, conspiracy theories argued on the effect of specific new technologies like 5G and misinformation tarnished the reputation of brands like Huawei. Language plays a crucial role in understanding the motivational determinants of social media users in sharing misinformation, as people extract meaning from information based on their discursive resources and their skillset. In this paper, we analyze textual and non-textual cues from a panel of 4923 tweets containing the hashtags #5G and #Huawei during the first week of May 2020, when several countries were still adopting lockdown measures, to determine whether or not a tweet is retweeted and, if so, how much it is retweeted. Overall, through traditional logistic regression and machine learning, we found different effects of the textual and non-textual cues on the retweeting of a tweet and on its ability to accumulate retweets. In particular, the presence of misinformation plays an interesting role in spreading the tweet on the network. More importantly, the relative influence of the cues suggests that Twitter users actually read a tweet but not necessarily they understand or critically evaluate it before deciding to share it on the social media platform.

Download Full-text

Intelligent Detection of False Information in Arabic Tweets Utilizing Hybrid Harris Hawks Based Feature Selection and Machine Learning Models

Symmetry ◽

10.3390/sym13040556 ◽

2021 ◽

Vol 13 (4) ◽

pp. 556

Author(s):

Thaer Thaher ◽

Mahmoud Saheb ◽

Hamza Turabieh ◽

Hamouda Chantar

Keyword(s):

Machine Learning ◽

Social Media ◽

Feature Selection ◽

Language Processing ◽

User Profile ◽

Vital Role ◽

Classification Model ◽

Fake News ◽

False Information ◽

Social Media Platforms

Fake or false information on social media platforms is a significant challenge that leads to deliberately misleading users due to the inclusion of rumors, propaganda, or deceptive information about a person, organization, or service. Twitter is one of the most widely used social media platforms, especially in the Arab region, where the number of users is steadily increasing, accompanied by an increase in the rate of fake news. This drew the attention of researchers to provide a safe online environment free of misleading information. This paper aims to propose a smart classification model for the early detection of fake news in Arabic tweets utilizing Natural Language Processing (NLP) techniques, Machine Learning (ML) models, and Harris Hawks Optimizer (HHO) as a wrapper-based feature selection approach. Arabic Twitter corpus composed of 1862 previously annotated tweets was utilized by this research to assess the efficiency of the proposed model. The Bag of Words (BoW) model is utilized using different term-weighting schemes for feature extraction. Eight well-known learning algorithms are investigated with varying combinations of features, including user-profile, content-based, and words-features. Reported results showed that the Logistic Regression (LR) with Term Frequency-Inverse Document Frequency (TF-IDF) model scores the best rank. Moreover, feature selection based on the binary HHO algorithm plays a vital role in reducing dimensionality, thereby enhancing the learning model’s performance for fake news detection. Interestingly, the proposed BHHO-LR model can yield a better enhancement of 5% compared with previous works on the same dataset.

Download Full-text

Leveraging Road Characteristics and Contributor Behaviour for Assessing Road Type Quality in OSM

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070436 ◽

2021 ◽

Vol 10 (7) ◽

pp. 436

Author(s):

Amerah Alghanim ◽

Musfira Jilani ◽

Michela Bertolotto ◽

Gavin McArdle

Keyword(s):

Machine Learning ◽

Spatial Data ◽

Classification Accuracy ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Set ◽

Semantic Inference ◽

Road Type ◽

The Impact

Volunteered Geographic Information (VGI) is often collected by non-expert users. This raises concerns about the quality and veracity of such data. There has been much effort to understand and quantify the quality of VGI. Extrinsic measures which compare VGI to authoritative data sources such as National Mapping Agencies are common but the cost and slow update frequency of such data hinder the task. On the other hand, intrinsic measures which compare the data to heuristics or models built from the VGI data are becoming increasingly popular. Supervised machine learning techniques are particularly suitable for intrinsic measures of quality where they can infer and predict the properties of spatial data. In this article we are interested in assessing the quality of semantic information, such as the road type, associated with data in OpenStreetMap (OSM). We have developed a machine learning approach which utilises new intrinsic input features collected from the VGI dataset. Specifically, using our proposed novel approach we obtained an average classification accuracy of 84.12%. This result outperforms existing techniques on the same semantic inference task. The trustworthiness of the data used for developing and training machine learning models is important. To address this issue we have also developed a new measure for this using direct and indirect characteristics of OSM data such as its edit history along with an assessment of the users who contributed the data. An evaluation of the impact of data determined to be trustworthy within the machine learning model shows that the trusted data collected with the new approach improves the prediction accuracy of our machine learning technique. Specifically, our results demonstrate that the classification accuracy of our developed model is 87.75% when applied to a trusted dataset and 57.98% when applied to an untrusted dataset. Consequently, such results can be used to assess the quality of OSM and suggest improvements to the data set.

Download Full-text

How to foster employee satisfaction by means of coaching, motivation, emotional salary and social media skills in the agri-food value chain

New Medit ◽

10.30682/nm2101c ◽

2021 ◽

Vol 20 (1) ◽

Author(s):

Keyword(s):

Social Media ◽

Value Chain ◽

Employee Satisfaction ◽

Data Set ◽

Business Skills ◽

A Value ◽

The Social ◽

Use Of Social Media ◽

Food Value ◽

Insight Into

Most employee satisfaction studies do not consider the current digital transformation of the social world. The aim of this research is to provide insight into employee satisfaction in agribusiness by means of coaching, motivation, emotional salary and social media with a value chain methodology. The model is tested empirically by analysing a survey data set of 381 observations in Spanish agribusiness firms of the agri-food value chain. The results show flexible remunerations of emotional salary are determinants of employee satisfaction. Additionally, motivation is relevant in the production within commercialisation link and coaching in the production within transformation link. Whole-of-chain employees showed the greatest satisfaction with the use of social media in personnel management. Findings also confirmed that employees will stay when a job is satisfying. This study contributes to the literature by investigating the effect of current social and digital business skills on employee satisfaction in the agri-food value chain.

Download Full-text

An Experimental Study of Spammer Detection on Chinese Microblogs

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s021819402040029x ◽

2020 ◽

Vol 30 (11n12) ◽

pp. 1759-1777

Author(s):

Jialing Liang ◽

Peiquan Jin ◽

Lin Mu ◽

Jie Zhao

Keyword(s):

Machine Learning ◽

Social Media ◽

User Behavior ◽

Real Data ◽

User Profile ◽

Data Set ◽

Sina Weibo ◽

Factors Affecting ◽

The Government ◽

Hot Event

With the development of Web 2.0, social media such as Twitter and Sina Weibo have become an essential platform for disseminating hot events. Simultaneously, due to the free policy of microblogging services, users can post user-generated content freely on microblogging platforms. Accordingly, more and more hot events on microblogging platforms have been labeled as spammers. Spammers will not only hurt the healthy development of social media but also introduce many economic and social problems. Therefore, the government and enterprises must distinguish whether a hot event on microblogging platforms is a spammer or is a naturally-developing event. In this paper, we focus on the hot event list on Sina Weibo and collect the relevant microblogs of each hot event to study the detecting methods of spammers. Notably, we develop an integral feature set consisting of user profile, user behavior, and user relationships to reflect various factors affecting the detection of spammers. Then, we employ typical machine learning methods to conduct extensive experiments on detecting spammers. We use a real data set crawled from the most prominent Chinese microblogging platform, Sina Weibo, and evaluate the performance of 10 machine learning models with five sampling methods. The results in terms of various metrics show that the Random Forest model and the over-sampling method achieve the best accuracy in detecting spammers and non-spammers.

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

A Supervised Machine Learning Approach to Fake News Identification

Intelligent Data Communication Technologies and Internet of Things - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-3-030-34080-3_22 ◽

2019 ◽

pp. 197-204

Author(s):

Anisha Datta ◽

Shukrity Si

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Learning Approach ◽

Fake News ◽

Machine Learning Approach

Download Full-text