Fighting the COVID-19 Infodemic with Neonet: A Text-Based Supervised Machine Learning Algorithm

Mapping Intimacies ◽

10.20944/preprints202106.0482.v1 ◽

2021 ◽

Author(s):

Mohammad AR Abdeen ◽

Ahmed Abdeen Hamed ◽

Xindong Wu

Keyword(s):

Machine Learning ◽

Text Mining ◽

Network Science ◽

Learning Algorithm ◽

Noun Phrases ◽

Supervised Machine Learning ◽

News Article ◽

Fake News ◽

False Information ◽

Conspiracy Theories

The spread of the Coronavirus pandemic has been accompanied by an infodemic. The false information that is embedded in the infodemic affects people’s ability to have access to safety and follow proper procedures to mitigate the risks. Here, we present a novel supervised machine learning text mining algorithm that analyzes the content of a given news article and assign a label to it. The NeoNet algorithm is trained by noun-phrases features which contributes a network model. The algorithm was tested on a real-world dataset and predicted the label of never-seem articles and flags ones that are suspicious or disputed. In five different fold comparisons, NeoNet surpassed prominent contemporary algorithm such as Neural Networks, SVM, and Random Forests. The analysis shows that the NeoNet algorithm predicts a label of an article with a 100% precision using a non-pruned model. This highlights the promise of detecting disputed online contents that may contribute negatively to the COVID-19 pandemic. Indeed, using machine learning combined with powerful text mining and network science provide the necessary tools to counter the spread of misinformation, disinformation, fake news, rumors, and conspiracy theories that is associated with the COVID19 Infodemic.

Download Full-text

Fighting the COVID-19 Infodemic with Supervised Machine Learning, Computational Linguistics, and Network Science (Preprint)

10.2196/preprints.26785 ◽

2020 ◽

Author(s):

Mohammad AR Abdeen ◽

Ahmed Abdeen Hamed ◽

Xindong Wu

Keyword(s):

Machine Learning ◽

Computational Linguistic ◽

Computational Linguistics ◽

Noun Phrases ◽

Supervised Machine Learning ◽

News Article ◽

Fake News ◽

Conspiracy Theories ◽

Linguistic Methods ◽

Linguistic Approach

BACKGROUND The spread of the Coronavirus pandemic has been accompanied by an infodemic. The false information that is embedded in the infodemic affects people’s ability to have access to safety and follow proper procedures to mitigate the risks. OBJECTIVE This research aims to target the falsehood part of the infodemic, which prominently proliferates in news articles. Specifically, we present a computational approach that predicts if a news article falls under the category of a COVID-19 safe or suspicious. METHODS Here, we present a novel supervised machine learning and a computational linguistic approach that analyzes the content of a given news article and assign a label to it. In particular, we designed an algorithm which we called NeoNet that is trained by a network of noun-phrases selected from a trustworthy COVID-19 news dataset. Noun-phrases are known to capture facts and eliminate subjectivity. When trained, the algorithm predicts a label for new articles and decides whether an article is suspicious. RESULTS The result shows that the NeoNet algorithm predicts a label of an article with a 98.8% precision using a non-pruned model and 95.8% precision using a pruned model. In five different comparisons, NeoNet surpassed NaiveBayes three times while the other two were too close to call in a pruned setting. When compared without pruning, NeoNet outperformed NaiveBayes in all the five experiments. CONCLUSIONS The infodemic that has accompanied the COVID-19 pandemic presents a significant challenge because of the spread of misinformation, disinformation, fake news, rumors, and conspiracy theories. However, using machine learning combined with the powerful computational linguistic methods can provide the necessary tools to inform the general public of whether a news article is COVID-19 SAFE or DISPUTED (when containing suspicious contents). CLINICALTRIAL N/A

Download Full-text

Fighting the COVID-19 Infodemic in New articles and False Publications: NeoNet, a Text-based Supervised Machine Learning Algorithm

10.20944/preprints202106.0482.v2 ◽

2021 ◽

Author(s):

Mohammad AR Abdeen ◽

Ahmed Abdeen Hamed ◽

Xindong Wu

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Safety Information ◽

Training Model ◽

Supervised Machine Learning ◽

News Article ◽

False Information ◽

Network Training ◽

Medical Publication ◽

Real World Datasets

The spread of the Coronavirus pandemic has been accompanied by an infodemic. The false information that is embedded in the infodemic affects people’s ability to have access to safety information and follow proper procedures to mitigate the risks. This research aims to target the falsehood part of the infodemic, which prominently proliferates in news articles and false medical publications. Here, we present NeoNet, a novel supervised machine learning text mining algorithm that analyzes the content of a document (news article, a medical publication) and assigns a label to it. The algorithm is trained by TFIDF bigram features which contribute a network training model. The algorithm is tested on two different real-world datasets from the CBC news network and Covid-19 publications. In five different fold comparisons, the algorithm predicted a label of an article with a precision of 97-99 %. When compared with prominent algorithms such as Neural Networks, SVM, and Random Forests NeoNet surpassed them. The analysis highlighted the promise of NeoNet in detecting disputed online contents which may contribute negatively to the COVID-19 pandemic.

Download Full-text

Fighting the COVID-19 Infodemic in News articles and False Publications: The NeoNet Text Classifier, a Supervised Machine Learning Algorithm

10.20944/preprints202106.0482.v3 ◽

2021 ◽

Author(s):

Mohammad AR Abdeen ◽

Ahmed Abdeen Hamed ◽

Xindong Wu

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Safety Information ◽

Training Model ◽

Supervised Machine Learning ◽

News Article ◽

False Information ◽

Network Training ◽

Medical Publication ◽

Real World Datasets

The spread of the Coronavirus pandemic has been accompanied by an infodemic. The false information that is embedded in the infodemic affects people’s ability to have access to safety information and follow proper procedures to mitigate the risks. This research aims to target the falsehood part of the infodemic, which prominently proliferates in news articles and false medical publications. Here, we present NeoNet, a novel supervised machine learning text mining algorithm that analyzes the content of a document (news article, a medical publication) and assigns a label to it. The algorithm is trained by TFIDF bigram features which contribute a network training model. The algorithm is tested on two different real-world datasets from the CBC news network and Covid-19 publications. In five different fold comparisons, the algorithm predicted a label of an article with a precision of 97-99 %. When compared with prominent algorithms such as Neural Networks, SVM, and Random Forests NeoNet surpassed them. The analysis highlighted the promise of NeoNet in detecting disputed online contents which may contribute negatively to the COVID-19 pandemic.

Download Full-text

Distinguishing True and Fake News by Using Text Mining and Machine Learning Algorithm

American Journal of Data Mining and Knowledge Discovery ◽

10.11648/j.ajdmkd.20200502.11 ◽

2020 ◽

Vol 5 (2) ◽

pp. 20

Author(s):

Hyunseo Lee ◽

Ian Paik Choe ◽

Jioh In ◽

Han Sol Kim

Keyword(s):

Machine Learning ◽

Text Mining ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Fake News

Download Full-text

Fighting the COVID-19 Infodemic in News Articles and False Publications: The NeoNet Text Classifier, a Supervised Machine Learning Algorithm

Applied Sciences ◽

10.3390/app11167265 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7265

Author(s):

Mohammad A. R. Abdeen ◽

Ahmed Abdeen Hamed ◽

Xindong Wu

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Safety Information ◽

Training Model ◽

Supervised Machine Learning ◽

News Article ◽

Machine Learning Algorithm ◽

Network Training ◽

Medical Publication ◽

Real World Datasets

The spread of the Coronavirus pandemic has been accompanied by an infodemic. The false information that is embedded in the infodemic affects people’s ability to have access to safety information and follow proper procedures to mitigate the risks. This research aims to target the falsehood part of the infodemic, which prominently proliferates in news articles and false medical publications. Here, we present NeoNet, a novel supervised machine learning algorithm that analyzes the content of a document (news article, a medical publication) and assigns a label to it. The algorithm was trained by Term Frequency Inverse Document Frequency (TF-IDF) bigram features, which contribute a network training model. The algorithm was tested on two different real-world datasets from the CBC news network and COVID-19 publications. In five different fold comparisons, the algorithm predicted a label of an article with a precision of 97–99%. When compared with prominent algorithms such as Neural Networks, SVM, and Random Forests NeoNet surpassed them. The analysis highlighted the promise of NeoNet in detecting disputed online contents, which may contribute negatively to the COVID-19 pandemic.

Download Full-text

A Reckoning Analysis and Assessment of Different Supervised Machine Learning Algorithm for Breast Cancer Prediction

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i3.8388 ◽

2019 ◽

Vol 7 (3) ◽

pp. 83-88

Author(s):

Pragati Prakash ◽

Nidhi Ekka ◽

Manjit Jaiswal

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Learning Algorithm ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Cancer Prediction

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text

Exploring fake news identification using word and sentence embeddings

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189865 ◽

2021 ◽

pp. 1-8

Author(s):

V.T Priyanga ◽

J.P Sanjanasri ◽

Vijay Krishna Menon ◽

E.A Gopalakrishnan ◽

K.P Soman

Keyword(s):

Machine Learning ◽

Social Media ◽

Network Analysis ◽

Supervised Machine Learning ◽

Breeding Ground ◽

Fake News ◽

Data Set ◽

Highly Correlated ◽

Use Of Social Media ◽

The Liar

The widespread use of social media like Facebook, Twitter, Whatsapp, etc. has changed the way News is created and published; accessing news has become easy and inexpensive. However, the scale of usage and inability to moderate the content has made social media, a breeding ground for the circulation of fake news. Fake news is deliberately created either to increase the readership or disrupt the order in the society for political and commercial benefits. It is of paramount importance to identify and filter out fake news especially in democratic societies. Most existing methods for detecting fake news involve traditional supervised machine learning which has been quite ineffective. In this paper, we are analyzing word embedding features that can tell apart fake news from true news. We use the LIAR and ISOT data set. We churn out highly correlated news data from the entire data set by using cosine similarity and other such metrices, in order to distinguish their domains based on central topics. We then employ auto-encoders to detect and differentiate between true and fake news while also exploring their separability through network analysis.

Download Full-text

Linguistic drivers of misinformation diffusion on social media during the COVID-19 pandemic

Italian Journal of Marketing ◽

10.1007/s43039-021-00026-9 ◽

2021 ◽

Author(s):

Giandomenico Di Domenico ◽

Annamaria Tuan ◽

Marco Visentin

Keyword(s):

Machine Learning ◽

Social Media ◽

New Technologies ◽

Fake News ◽

Conspiracy Theories ◽

Social Media Platform ◽

The Social ◽

Media Platform ◽

Textual Cues ◽

Twitter Users

AbstractIn the wake of the COVID-19 pandemic, unprecedent amounts of fake news and hoax spread on social media. In particular, conspiracy theories argued on the effect of specific new technologies like 5G and misinformation tarnished the reputation of brands like Huawei. Language plays a crucial role in understanding the motivational determinants of social media users in sharing misinformation, as people extract meaning from information based on their discursive resources and their skillset. In this paper, we analyze textual and non-textual cues from a panel of 4923 tweets containing the hashtags #5G and #Huawei during the first week of May 2020, when several countries were still adopting lockdown measures, to determine whether or not a tweet is retweeted and, if so, how much it is retweeted. Overall, through traditional logistic regression and machine learning, we found different effects of the textual and non-textual cues on the retweeting of a tweet and on its ability to accumulate retweets. In particular, the presence of misinformation plays an interesting role in spreading the tweet on the network. More importantly, the relative influence of the cues suggests that Twitter users actually read a tweet but not necessarily they understand or critically evaluate it before deciding to share it on the social media platform.

Download Full-text

Intelligent Detection of False Information in Arabic Tweets Utilizing Hybrid Harris Hawks Based Feature Selection and Machine Learning Models

Symmetry ◽

10.3390/sym13040556 ◽

2021 ◽

Vol 13 (4) ◽

pp. 556

Author(s):

Thaer Thaher ◽

Mahmoud Saheb ◽

Hamza Turabieh ◽

Hamouda Chantar

Keyword(s):

Machine Learning ◽

Social Media ◽

Feature Selection ◽

Language Processing ◽

User Profile ◽

Vital Role ◽

Classification Model ◽

Fake News ◽

False Information ◽

Social Media Platforms

Fake or false information on social media platforms is a significant challenge that leads to deliberately misleading users due to the inclusion of rumors, propaganda, or deceptive information about a person, organization, or service. Twitter is one of the most widely used social media platforms, especially in the Arab region, where the number of users is steadily increasing, accompanied by an increase in the rate of fake news. This drew the attention of researchers to provide a safe online environment free of misleading information. This paper aims to propose a smart classification model for the early detection of fake news in Arabic tweets utilizing Natural Language Processing (NLP) techniques, Machine Learning (ML) models, and Harris Hawks Optimizer (HHO) as a wrapper-based feature selection approach. Arabic Twitter corpus composed of 1862 previously annotated tweets was utilized by this research to assess the efficiency of the proposed model. The Bag of Words (BoW) model is utilized using different term-weighting schemes for feature extraction. Eight well-known learning algorithms are investigated with varying combinations of features, including user-profile, content-based, and words-features. Reported results showed that the Logistic Regression (LR) with Term Frequency-Inverse Document Frequency (TF-IDF) model scores the best rank. Moreover, feature selection based on the binary HHO algorithm plays a vital role in reducing dimensionality, thereby enhancing the learning model’s performance for fake news detection. Interestingly, the proposed BHHO-LR model can yield a better enhancement of 5% compared with previous works on the same dataset.

Download Full-text