Fighting the COVID-19 Infodemic with Supervised Machine Learning, Computational Linguistics, and Network Science (Preprint)

Mapping Intimacies ◽

10.2196/preprints.26785 ◽

2020 ◽

Author(s):

Mohammad AR Abdeen ◽

Ahmed Abdeen Hamed ◽

Xindong Wu

Keyword(s):

Machine Learning ◽

Computational Linguistic ◽

Computational Linguistics ◽

Noun Phrases ◽

Supervised Machine Learning ◽

News Article ◽

Fake News ◽

Conspiracy Theories ◽

Linguistic Methods ◽

Linguistic Approach

BACKGROUND The spread of the Coronavirus pandemic has been accompanied by an infodemic. The false information that is embedded in the infodemic affects people’s ability to have access to safety and follow proper procedures to mitigate the risks. OBJECTIVE This research aims to target the falsehood part of the infodemic, which prominently proliferates in news articles. Specifically, we present a computational approach that predicts if a news article falls under the category of a COVID-19 safe or suspicious. METHODS Here, we present a novel supervised machine learning and a computational linguistic approach that analyzes the content of a given news article and assign a label to it. In particular, we designed an algorithm which we called NeoNet that is trained by a network of noun-phrases selected from a trustworthy COVID-19 news dataset. Noun-phrases are known to capture facts and eliminate subjectivity. When trained, the algorithm predicts a label for new articles and decides whether an article is suspicious. RESULTS The result shows that the NeoNet algorithm predicts a label of an article with a 98.8% precision using a non-pruned model and 95.8% precision using a pruned model. In five different comparisons, NeoNet surpassed NaiveBayes three times while the other two were too close to call in a pruned setting. When compared without pruning, NeoNet outperformed NaiveBayes in all the five experiments. CONCLUSIONS The infodemic that has accompanied the COVID-19 pandemic presents a significant challenge because of the spread of misinformation, disinformation, fake news, rumors, and conspiracy theories. However, using machine learning combined with the powerful computational linguistic methods can provide the necessary tools to inform the general public of whether a news article is COVID-19 SAFE or DISPUTED (when containing suspicious contents). CLINICALTRIAL N/A

Download Full-text

Fighting the COVID-19 Infodemic with Neonet: A Text-Based Supervised Machine Learning Algorithm

10.20944/preprints202106.0482.v1 ◽

2021 ◽

Author(s):

Mohammad AR Abdeen ◽

Ahmed Abdeen Hamed ◽

Xindong Wu

Keyword(s):

Machine Learning ◽

Text Mining ◽

Network Science ◽

Learning Algorithm ◽

Noun Phrases ◽

Supervised Machine Learning ◽

News Article ◽

Fake News ◽

False Information ◽

Conspiracy Theories

The spread of the Coronavirus pandemic has been accompanied by an infodemic. The false information that is embedded in the infodemic affects people’s ability to have access to safety and follow proper procedures to mitigate the risks. Here, we present a novel supervised machine learning text mining algorithm that analyzes the content of a given news article and assign a label to it. The NeoNet algorithm is trained by noun-phrases features which contributes a network model. The algorithm was tested on a real-world dataset and predicted the label of never-seem articles and flags ones that are suspicious or disputed. In five different fold comparisons, NeoNet surpassed prominent contemporary algorithm such as Neural Networks, SVM, and Random Forests. The analysis shows that the NeoNet algorithm predicts a label of an article with a 100% precision using a non-pruned model. This highlights the promise of detecting disputed online contents that may contribute negatively to the COVID-19 pandemic. Indeed, using machine learning combined with powerful text mining and network science provide the necessary tools to counter the spread of misinformation, disinformation, fake news, rumors, and conspiracy theories that is associated with the COVID19 Infodemic.

Download Full-text

Exploring fake news identification using word and sentence embeddings

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189865 ◽

2021 ◽

pp. 1-8

Author(s):

V.T Priyanga ◽

J.P Sanjanasri ◽

Vijay Krishna Menon ◽

E.A Gopalakrishnan ◽

K.P Soman

Keyword(s):

Machine Learning ◽

Social Media ◽

Network Analysis ◽

Supervised Machine Learning ◽

Breeding Ground ◽

Fake News ◽

Data Set ◽

Highly Correlated ◽

Use Of Social Media ◽

The Liar

The widespread use of social media like Facebook, Twitter, Whatsapp, etc. has changed the way News is created and published; accessing news has become easy and inexpensive. However, the scale of usage and inability to moderate the content has made social media, a breeding ground for the circulation of fake news. Fake news is deliberately created either to increase the readership or disrupt the order in the society for political and commercial benefits. It is of paramount importance to identify and filter out fake news especially in democratic societies. Most existing methods for detecting fake news involve traditional supervised machine learning which has been quite ineffective. In this paper, we are analyzing word embedding features that can tell apart fake news from true news. We use the LIAR and ISOT data set. We churn out highly correlated news data from the entire data set by using cosine similarity and other such metrices, in order to distinguish their domains based on central topics. We then employ auto-encoders to detect and differentiate between true and fake news while also exploring their separability through network analysis.

Download Full-text

Linguistic drivers of misinformation diffusion on social media during the COVID-19 pandemic

Italian Journal of Marketing ◽

10.1007/s43039-021-00026-9 ◽

2021 ◽

Author(s):

Giandomenico Di Domenico ◽

Annamaria Tuan ◽

Marco Visentin

Keyword(s):

Machine Learning ◽

Social Media ◽

New Technologies ◽

Fake News ◽

Conspiracy Theories ◽

Social Media Platform ◽

The Social ◽

Media Platform ◽

Textual Cues ◽

Twitter Users

AbstractIn the wake of the COVID-19 pandemic, unprecedent amounts of fake news and hoax spread on social media. In particular, conspiracy theories argued on the effect of specific new technologies like 5G and misinformation tarnished the reputation of brands like Huawei. Language plays a crucial role in understanding the motivational determinants of social media users in sharing misinformation, as people extract meaning from information based on their discursive resources and their skillset. In this paper, we analyze textual and non-textual cues from a panel of 4923 tweets containing the hashtags #5G and #Huawei during the first week of May 2020, when several countries were still adopting lockdown measures, to determine whether or not a tweet is retweeted and, if so, how much it is retweeted. Overall, through traditional logistic regression and machine learning, we found different effects of the textual and non-textual cues on the retweeting of a tweet and on its ability to accumulate retweets. In particular, the presence of misinformation plays an interesting role in spreading the tweet on the network. More importantly, the relative influence of the cues suggests that Twitter users actually read a tweet but not necessarily they understand or critically evaluate it before deciding to share it on the social media platform.

Download Full-text

A Supervised Machine Learning Approach to Fake News Identification

Intelligent Data Communication Technologies and Internet of Things - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-3-030-34080-3_22 ◽

2019 ◽

pp. 197-204

Author(s):

Anisha Datta ◽

Shukrity Si

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Learning Approach ◽

Fake News ◽

Machine Learning Approach

Download Full-text

Supervised Machine Learning Algorithms for Fake News Detection

Lecture Notes in Electrical Engineering - Advances in Communication and Computational Technology ◽

10.1007/978-981-15-5341-7_58 ◽

2020 ◽

pp. 767-778

Author(s):

Ankit Kesarwani ◽

Sudakar Singh Chauhan ◽

Anil Ramachandra Nair ◽

Gaurav Verma

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Fake News

Download Full-text

F-NAD: An Application for Fake News Article Detection using Machine Learning Techniques

2019 IEEE Bombay Section Signature Conference (IBSSC) ◽

10.1109/ibssc47189.2019.8973059 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ranojoy Barua ◽

Rajdeep Maity ◽

Dipankar Minj ◽

Tarang Barua ◽

Ashish Kumar Layek

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

News Article ◽

Fake News ◽

Learning Techniques

Download Full-text

Fighting the COVID-19 Infodemic in New articles and False Publications: NeoNet, a Text-based Supervised Machine Learning Algorithm

10.20944/preprints202106.0482.v2 ◽

2021 ◽

Author(s):

Mohammad AR Abdeen ◽

Ahmed Abdeen Hamed ◽

Xindong Wu

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Safety Information ◽

Training Model ◽

Supervised Machine Learning ◽

News Article ◽

False Information ◽

Network Training ◽

Medical Publication ◽

Real World Datasets

The spread of the Coronavirus pandemic has been accompanied by an infodemic. The false information that is embedded in the infodemic affects people’s ability to have access to safety information and follow proper procedures to mitigate the risks. This research aims to target the falsehood part of the infodemic, which prominently proliferates in news articles and false medical publications. Here, we present NeoNet, a novel supervised machine learning text mining algorithm that analyzes the content of a document (news article, a medical publication) and assigns a label to it. The algorithm is trained by TFIDF bigram features which contribute a network training model. The algorithm is tested on two different real-world datasets from the CBC news network and Covid-19 publications. In five different fold comparisons, the algorithm predicted a label of an article with a precision of 97-99 %. When compared with prominent algorithms such as Neural Networks, SVM, and Random Forests NeoNet surpassed them. The analysis highlighted the promise of NeoNet in detecting disputed online contents which may contribute negatively to the COVID-19 pandemic.

Download Full-text

Towards Automatic Fake News Detection in Digital Platforms: Properties, Limitations, and Applications

10.5753/ctd.2021.15754 ◽

2021 ◽

Author(s):

Julio C. S. Reis ◽

Fabrício Benevenuto

Keyword(s):

Machine Learning ◽

Social Media ◽

Prediction Performance ◽

Supervised Machine Learning ◽

Learning Approaches ◽

Fake News ◽

Digital Platforms ◽

Worldwide Phenomenon ◽

Fact Checking ◽

Media Systems

Digital platforms, including social media systems and messaging applications, have become a place for campaigns of misinformation that affect the credibility of the entire news ecosystem. The emergence of fake news in these environments has quickly evolved into a worldwide phenomenon, where the lack of scalable fact-checking strategies is especially worrisome. In this context, this thesis aim at investigating practical approaches for the automatic detection of fake news disseminated in digital platforms. Particularly, we explore new datasets and features for fake news detection to assess the prediction performance of current supervised machine learning approaches. We also propose an unbiased framework for quantifying the informativeness of features for fake news detection, and present an explanation of factors contributing to model decisions considering data from different scenarios. Finally, we propose and implement a new mechanism that accounts for the potential occurrence of fake news within the data, significantly reducing the number of content pieces journalists and fact-checkers have to go through before finding a fake story.

Download Full-text

Improving Classification Efficiency of Fake News using Semi-Supervised Method

10.21203/rs.3.rs-1201074/v1 ◽

2022 ◽

Author(s):

Suben Kumer Saha ◽

Khandaker Tabin Hasan

Keyword(s):

Machine Learning ◽

News Media ◽

High Volume ◽

Online News ◽

Supervised Machine Learning ◽

Fake News ◽

Time Data ◽

Computing Device ◽

Internet Connection ◽

Accuracy Difference

Abstract Online News media which is more accessible, cheaper, and faster to consume, is also of questionable quality as there is less moderation. Anybody with a computing device and internet connection can take part in creating, contributing, and spreading news in online portals. Social media has intensified the problem further. Due to the high volume, velocity, and veracity, online news content is beyond traditional moderation, also known as moderation through human experts. So different machine learning method is being tested and used to spot fake news. One of the main challenges for fake-news classification is getting labeled instances for this high volume of real-time data. In this study, we examined how semi-supervised machine learning can help to decrease the need for labeled instances with an acceptable drop of accuracy. The accuracy difference between the supervised classifier and the semi-supervised classifier is around 0.05 while using only five percent of label instances of the supervised classifier. We tested with logistic regression, SVM, and random forest classifier to prove our hypothesis.

Download Full-text

Fighting the COVID-19 Infodemic in News articles and False Publications: The NeoNet Text Classifier, a Supervised Machine Learning Algorithm

10.20944/preprints202106.0482.v3 ◽

2021 ◽

Author(s):

Mohammad AR Abdeen ◽

Ahmed Abdeen Hamed ◽

Xindong Wu

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Safety Information ◽

Training Model ◽

Supervised Machine Learning ◽

News Article ◽

False Information ◽

Network Training ◽

Medical Publication ◽

Real World Datasets

The spread of the Coronavirus pandemic has been accompanied by an infodemic. The false information that is embedded in the infodemic affects people’s ability to have access to safety information and follow proper procedures to mitigate the risks. This research aims to target the falsehood part of the infodemic, which prominently proliferates in news articles and false medical publications. Here, we present NeoNet, a novel supervised machine learning text mining algorithm that analyzes the content of a document (news article, a medical publication) and assigns a label to it. The algorithm is trained by TFIDF bigram features which contribute a network training model. The algorithm is tested on two different real-world datasets from the CBC news network and Covid-19 publications. In five different fold comparisons, the algorithm predicted a label of an article with a precision of 97-99 %. When compared with prominent algorithms such as Neural Networks, SVM, and Random Forests NeoNet surpassed them. The analysis highlighted the promise of NeoNet in detecting disputed online contents which may contribute negatively to the COVID-19 pandemic.

Download Full-text