Fighting the COVID-19 Infodemic with Supervised Machine Learning, Computational Linguistics, and Network Science (Preprint)

2020 ◽  
Author(s):  
Mohammad AR Abdeen ◽  
Ahmed Abdeen Hamed ◽  
Xindong Wu

BACKGROUND The spread of the Coronavirus pandemic has been accompanied by an infodemic. The false information that is embedded in the infodemic affects people’s ability to have access to safety and follow proper procedures to mitigate the risks. OBJECTIVE This research aims to target the falsehood part of the infodemic, which prominently proliferates in news articles. Specifically, we present a computational approach that predicts if a news article falls under the category of a COVID-19 safe or suspicious. METHODS Here, we present a novel supervised machine learning and a computational linguistic approach that analyzes the content of a given news article and assign a label to it. In particular, we designed an algorithm which we called NeoNet that is trained by a network of noun-phrases selected from a trustworthy COVID-19 news dataset. Noun-phrases are known to capture facts and eliminate subjectivity. When trained, the algorithm predicts a label for new articles and decides whether an article is suspicious. RESULTS The result shows that the NeoNet algorithm predicts a label of an article with a 98.8% precision using a non-pruned model and 95.8% precision using a pruned model. In five different comparisons, NeoNet surpassed NaiveBayes three times while the other two were too close to call in a pruned setting. When compared without pruning, NeoNet outperformed NaiveBayes in all the five experiments. CONCLUSIONS The infodemic that has accompanied the COVID-19 pandemic presents a significant challenge because of the spread of misinformation, disinformation, fake news, rumors, and conspiracy theories. However, using machine learning combined with the powerful computational linguistic methods can provide the necessary tools to inform the general public of whether a news article is COVID-19 SAFE or DISPUTED (when containing suspicious contents). CLINICALTRIAL N/A

Author(s):  
Mohammad AR Abdeen ◽  
Ahmed Abdeen Hamed ◽  
Xindong Wu

The spread of the Coronavirus pandemic has been accompanied by an infodemic. The false information that is embedded in the infodemic affects people’s ability to have access to safety and follow proper procedures to mitigate the risks. Here, we present a novel supervised machine learning text mining algorithm that analyzes the content of a given news article and assign a label to it. The NeoNet algorithm is trained by noun-phrases features which contributes a network model. The algorithm was tested on a real-world dataset and predicted the label of never-seem articles and flags ones that are suspicious or disputed. In five different fold comparisons, NeoNet surpassed prominent contemporary algorithm such as Neural Networks, SVM, and Random Forests. The analysis shows that the NeoNet algorithm predicts a label of an article with a 100% precision using a non-pruned model. This highlights the promise of detecting disputed online contents that may contribute negatively to the COVID-19 pandemic. Indeed, using machine learning combined with powerful text mining and network science provide the necessary tools to counter the spread of misinformation, disinformation, fake news, rumors, and conspiracy theories that is associated with the COVID19 Infodemic.


Author(s):  
V.T Priyanga ◽  
J.P Sanjanasri ◽  
Vijay Krishna Menon ◽  
E.A Gopalakrishnan ◽  
K.P Soman

The widespread use of social media like Facebook, Twitter, Whatsapp, etc. has changed the way News is created and published; accessing news has become easy and inexpensive. However, the scale of usage and inability to moderate the content has made social media, a breeding ground for the circulation of fake news. Fake news is deliberately created either to increase the readership or disrupt the order in the society for political and commercial benefits. It is of paramount importance to identify and filter out fake news especially in democratic societies. Most existing methods for detecting fake news involve traditional supervised machine learning which has been quite ineffective. In this paper, we are analyzing word embedding features that can tell apart fake news from true news. We use the LIAR and ISOT data set. We churn out highly correlated news data from the entire data set by using cosine similarity and other such metrices, in order to distinguish their domains based on central topics. We then employ auto-encoders to detect and differentiate between true and fake news while also exploring their separability through network analysis.


Author(s):  
Giandomenico Di Domenico ◽  
Annamaria Tuan ◽  
Marco Visentin

AbstractIn the wake of the COVID-19 pandemic, unprecedent amounts of fake news and hoax spread on social media. In particular, conspiracy theories argued on the effect of specific new technologies like 5G and misinformation tarnished the reputation of brands like Huawei. Language plays a crucial role in understanding the motivational determinants of social media users in sharing misinformation, as people extract meaning from information based on their discursive resources and their skillset. In this paper, we analyze textual and non-textual cues from a panel of 4923 tweets containing the hashtags #5G and #Huawei during the first week of May 2020, when several countries were still adopting lockdown measures, to determine whether or not a tweet is retweeted and, if so, how much it is retweeted. Overall, through traditional logistic regression and machine learning, we found different effects of the textual and non-textual cues on the retweeting of a tweet and on its ability to accumulate retweets. In particular, the presence of misinformation plays an interesting role in spreading the tweet on the network. More importantly, the relative influence of the cues suggests that Twitter users actually read a tweet but not necessarily they understand or critically evaluate it before deciding to share it on the social media platform.


Author(s):  
Mohammad AR Abdeen ◽  
Ahmed Abdeen Hamed ◽  
Xindong Wu

The spread of the Coronavirus pandemic has been accompanied by an infodemic. The false information that is embedded in the infodemic affects people’s ability to have access to safety information and follow proper procedures to mitigate the risks. This research aims to target the falsehood part of the infodemic, which prominently proliferates in news articles and false medical publications. Here, we present NeoNet, a novel supervised machine learning text mining algorithm that analyzes the content of a document (news article, a medical publication) and assigns a label to it. The algorithm is trained by TFIDF bigram features which contribute a network training model. The algorithm is tested on two different real-world datasets from the CBC news network and Covid-19 publications. In five different fold comparisons, the algorithm predicted a label of an article with a precision of 97-99 %. When compared with prominent algorithms such as Neural Networks, SVM, and Random Forests NeoNet surpassed them. The analysis highlighted the promise of NeoNet in detecting disputed online contents which may contribute negatively to the COVID-19 pandemic.


2021 ◽  
Author(s):  
Julio C. S. Reis ◽  
Fabrício Benevenuto

Digital platforms, including social media systems and messaging applications, have become a place for campaigns of misinformation that affect the credibility of the entire news ecosystem. The emergence of fake news in these environments has quickly evolved into a worldwide phenomenon, where the lack of scalable fact-checking strategies is especially worrisome. In this context, this thesis aim at investigating practical approaches for the automatic detection of fake news disseminated in digital platforms. Particularly, we explore new datasets and features for fake news detection to assess the prediction performance of current supervised machine learning approaches. We also propose an unbiased framework for quantifying the informativeness of features for fake news detection, and present an explanation of factors contributing to model decisions considering data from different scenarios. Finally, we propose and implement a new mechanism that accounts for the potential occurrence of fake news within the data, significantly reducing the number of content pieces journalists and fact-checkers have to go through before finding a fake story.


2022 ◽  
Author(s):  
Suben Kumer Saha ◽  
Khandaker Tabin Hasan

Abstract Online News media which is more accessible, cheaper, and faster to consume, is also of questionable quality as there is less moderation. Anybody with a computing device and internet connection can take part in creating, contributing, and spreading news in online portals. Social media has intensified the problem further. Due to the high volume, velocity, and veracity, online news content is beyond traditional moderation, also known as moderation through human experts. So different machine learning method is being tested and used to spot fake news. One of the main challenges for fake-news classification is getting labeled instances for this high volume of real-time data. In this study, we examined how semi-supervised machine learning can help to decrease the need for labeled instances with an acceptable drop of accuracy. The accuracy difference between the supervised classifier and the semi-supervised classifier is around 0.05 while using only five percent of label instances of the supervised classifier. We tested with logistic regression, SVM, and random forest classifier to prove our hypothesis.


Author(s):  
Mohammad AR Abdeen ◽  
Ahmed Abdeen Hamed ◽  
Xindong Wu

The spread of the Coronavirus pandemic has been accompanied by an infodemic. The false information that is embedded in the infodemic affects people’s ability to have access to safety information and follow proper procedures to mitigate the risks. This research aims to target the falsehood part of the infodemic, which prominently proliferates in news articles and false medical publications. Here, we present NeoNet, a novel supervised machine learning text mining algorithm that analyzes the content of a document (news article, a medical publication) and assigns a label to it. The algorithm is trained by TFIDF bigram features which contribute a network training model. The algorithm is tested on two different real-world datasets from the CBC news network and Covid-19 publications. In five different fold comparisons, the algorithm predicted a label of an article with a precision of 97-99 %. When compared with prominent algorithms such as Neural Networks, SVM, and Random Forests NeoNet surpassed them. The analysis highlighted the promise of NeoNet in detecting disputed online contents which may contribute negatively to the COVID-19 pandemic.


Sign in / Sign up

Export Citation Format

Share Document