Data mining applied in fake news classification through textual patterns

Fake news has been around for a long time. But with the advancement of social media and internet access, fake news has become a bigger problem. Because of the rapid spread in social media and instant messaging applications, fake news can reach more people in less time by directly influencing democratic processes, leveraging security issues that sometimes lead to tragic ends. In order to promote a fast and automated method of fake news identification, in this study, we performed an analysis of false Brazilian news, identifying writing patterns through natural language processing and machine learning.

Download Full-text

Intelligent Detection of False Information in Arabic Tweets Utilizing Hybrid Harris Hawks Based Feature Selection and Machine Learning Models

Symmetry ◽

10.3390/sym13040556 ◽

2021 ◽

Vol 13 (4) ◽

pp. 556

Author(s):

Thaer Thaher ◽

Mahmoud Saheb ◽

Hamza Turabieh ◽

Hamouda Chantar

Keyword(s):

Machine Learning ◽

Social Media ◽

Feature Selection ◽

Language Processing ◽

User Profile ◽

Vital Role ◽

Classification Model ◽

Fake News ◽

False Information ◽

Social Media Platforms

Fake or false information on social media platforms is a significant challenge that leads to deliberately misleading users due to the inclusion of rumors, propaganda, or deceptive information about a person, organization, or service. Twitter is one of the most widely used social media platforms, especially in the Arab region, where the number of users is steadily increasing, accompanied by an increase in the rate of fake news. This drew the attention of researchers to provide a safe online environment free of misleading information. This paper aims to propose a smart classification model for the early detection of fake news in Arabic tweets utilizing Natural Language Processing (NLP) techniques, Machine Learning (ML) models, and Harris Hawks Optimizer (HHO) as a wrapper-based feature selection approach. Arabic Twitter corpus composed of 1862 previously annotated tweets was utilized by this research to assess the efficiency of the proposed model. The Bag of Words (BoW) model is utilized using different term-weighting schemes for feature extraction. Eight well-known learning algorithms are investigated with varying combinations of features, including user-profile, content-based, and words-features. Reported results showed that the Logistic Regression (LR) with Term Frequency-Inverse Document Frequency (TF-IDF) model scores the best rank. Moreover, feature selection based on the binary HHO algorithm plays a vital role in reducing dimensionality, thereby enhancing the learning model’s performance for fake news detection. Interestingly, the proposed BHHO-LR model can yield a better enhancement of 5% compared with previous works on the same dataset.

Download Full-text

Ternion: An Autonomous Model for Fake News Detection

Applied Sciences ◽

10.3390/app11199292 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9292

Author(s):

Noman Islam ◽

Asadullah Shaikh ◽

Asma Qaiser ◽

Yousef Asiri ◽

Sultan Almakdi ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Support Vector Machine ◽

Logistic Regression ◽

Language Processing ◽

Negative Impact ◽

Machine Learning Techniques ◽

Support Vector ◽

Fake News ◽

Processing Techniques

In recent years, the consumption of social media content to keep up with global news and to verify its authenticity has become a considerable challenge. Social media enables us to easily access news anywhere, anytime, but it also gives rise to the spread of fake news, thereby delivering false information. This also has a negative impact on society. Therefore, it is necessary to determine whether or not news spreading over social media is real. This will allow for confusion among social media users to be avoided, and it is important in ensuring positive social development. This paper proposes a novel solution by detecting the authenticity of news through natural language processing techniques. Specifically, this paper proposes a novel scheme comprising three steps, namely, stance detection, author credibility verification, and machine learning-based classification, to verify the authenticity of news. In the last stage of the proposed pipeline, several machine learning techniques are applied, such as decision trees, random forest, logistic regression, and support vector machine (SVM) algorithms. For this study, the fake news dataset was taken from Kaggle. The experimental results show an accuracy of 93.15%, precision of 92.65%, recall of 95.71%, and F1-score of 94.15% for the support vector machine algorithm. The SVM is better than the second best classifier, i.e., logistic regression, by 6.82%.

Download Full-text

Automatic Fake News Detector in Social Media Using Machine Learning and Natural Language Processing Approaches

Smart Computing Techniques and Applications - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-16-1502-3_30 ◽

2021 ◽

pp. 295-305

Author(s):

J. Srinivas ◽

K. Venkata Subba Reddy ◽

G. J. Sunny Deol ◽

P. VaraPrasada Rao

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Fake News

Download Full-text

Automatic Identification and Filtration of COVID-19 Misinformation

Computer and Information Science ◽

10.5539/cis.v14n4p57 ◽

2021 ◽

Vol 14 (4) ◽

pp. 57

Author(s):

Paras Gulati ◽

Abiodun Adeyinka. O. ◽

Saritha Ramkumar

Keyword(s):

Social Media ◽

Language Processing ◽

Automatic Identification ◽

Fake News ◽

Global Pandemic ◽

Rapid Spread ◽

Social Media Platforms ◽

Data Explosion ◽

Real Time Identification ◽

The Right

The rapid spread of online fake news through some media platforms has increased over the last decade. Misinformation and disinformation of any kind is extensively propagated through social media platforms, some of the popular ones are Facebook and Twitter. With the present global pandemic ravaging the world and killing hundreds of thousands, getting fake news from these social media platforms can exacerbate the situation. Unfortunately, there has been a lot of misinformation and disinformation on COVID-19 virus implications of which has been disastrous for various people, countries, and economies. The right information is crucial in the fight against this pandemic and, in this age of data explosion, where TBs of data is generated every minute, near real time identification and tagging of misinformation is quintessential to minimize its consequences. In this paper, the authors use Natural Language Processing (NLP) based two-step approach to classify a tweet to be a potentially misinforming one or not. Firstly, COVID -19 tagged tweets were filtered based on the presence of keywords formulated from the list of common misinformation spread around the virus. Secondly, a deep neural network (RNN) trained on openly available real and fake news dataset was used to predict if the keyword filtered tweets were factual or misinformed.

Download Full-text

An Effecient Fake News Detection System Using Machine Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9453.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 3125-3129 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Social Media ◽

Language Processing ◽

Negative Impact ◽

Detection System ◽

Vital Role ◽

Machine Learning Algorithms ◽

Easy Access ◽

Fake News ◽

K Nearest Neighbors

Social media plays a major role in several things in our life. Social media helps all of us to find some important news with low price. It also provides easy access in less time. But sometimes social media gives a chance for the fast-spreading of fake news. So there is a possibility that less quality news with false information is spread through the social media. This shows a negative impact on the number of people. Sometimes it may impact society also. So, detection of fake news has vast importance. Machine learning algorithms play a vital role in fake news detection; Especially NLP (Natural Language Processing) algorithms are very useful for detecting the fake news. In this paper, we employed machine learning classifiers SVM, K-Nearest Neighbors, Decision tree, Random forest. By using these classifiers we successfully build a model to detect fake news from the given dataset. Python language was used for experiments.

Download Full-text

Thai Fake News Detection Based on Information Retrieval, Natural Language Processing and Machine Learning

SN Computer Science ◽

10.1007/s42979-021-00775-6 ◽

2021 ◽

Vol 2 (6) ◽

Author(s):

Phayung Meesad

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Fake News

Download Full-text

Exploring fake news identification using word and sentence embeddings

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189865 ◽

2021 ◽

pp. 1-8

Author(s):

V.T Priyanga ◽

J.P Sanjanasri ◽

Vijay Krishna Menon ◽

E.A Gopalakrishnan ◽

K.P Soman

Keyword(s):

Machine Learning ◽

Social Media ◽

Network Analysis ◽

Supervised Machine Learning ◽

Breeding Ground ◽

Fake News ◽

Data Set ◽

Highly Correlated ◽

Use Of Social Media ◽

The Liar

The widespread use of social media like Facebook, Twitter, Whatsapp, etc. has changed the way News is created and published; accessing news has become easy and inexpensive. However, the scale of usage and inability to moderate the content has made social media, a breeding ground for the circulation of fake news. Fake news is deliberately created either to increase the readership or disrupt the order in the society for political and commercial benefits. It is of paramount importance to identify and filter out fake news especially in democratic societies. Most existing methods for detecting fake news involve traditional supervised machine learning which has been quite ineffective. In this paper, we are analyzing word embedding features that can tell apart fake news from true news. We use the LIAR and ISOT data set. We churn out highly correlated news data from the entire data set by using cosine similarity and other such metrices, in order to distinguish their domains based on central topics. We then employ auto-encoders to detect and differentiate between true and fake news while also exploring their separability through network analysis.

Download Full-text

Social Media Content Categorization Using Supervised Based Machine Learning Methods and Natural Language Processing in Bangla Language

2020 11th International Conference on Electrical and Computer Engineering (ICECE) ◽

10.1109/icece51571.2020.9393095 ◽

2020 ◽

Author(s):

Md. Rejaul Alam ◽

Afsana Akter ◽

Minhajul Abedin Shafin ◽

Md. Mehedi Hasan ◽

Antara Mahmud

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Media Content ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100262 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100262

Author(s):

Mustafa Khanbhai ◽

Patrick Anyadi ◽

Joshua Symons ◽

Kelsey Flott ◽

Ara Darzi ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Patient Experience ◽

Language Processing ◽

Performance Metrics ◽

Free Text ◽

Patient Feedback

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.

Download Full-text

Linguistic drivers of misinformation diffusion on social media during the COVID-19 pandemic

Italian Journal of Marketing ◽

10.1007/s43039-021-00026-9 ◽

2021 ◽

Author(s):

Giandomenico Di Domenico ◽

Annamaria Tuan ◽

Marco Visentin

Keyword(s):

Machine Learning ◽

Social Media ◽

New Technologies ◽

Fake News ◽

Conspiracy Theories ◽

Social Media Platform ◽

The Social ◽

Media Platform ◽

Textual Cues ◽

Twitter Users

AbstractIn the wake of the COVID-19 pandemic, unprecedent amounts of fake news and hoax spread on social media. In particular, conspiracy theories argued on the effect of specific new technologies like 5G and misinformation tarnished the reputation of brands like Huawei. Language plays a crucial role in understanding the motivational determinants of social media users in sharing misinformation, as people extract meaning from information based on their discursive resources and their skillset. In this paper, we analyze textual and non-textual cues from a panel of 4923 tweets containing the hashtags #5G and #Huawei during the first week of May 2020, when several countries were still adopting lockdown measures, to determine whether or not a tweet is retweeted and, if so, how much it is retweeted. Overall, through traditional logistic regression and machine learning, we found different effects of the textual and non-textual cues on the retweeting of a tweet and on its ability to accumulate retweets. In particular, the presence of misinformation plays an interesting role in spreading the tweet on the network. More importantly, the relative influence of the cues suggests that Twitter users actually read a tweet but not necessarily they understand or critically evaluate it before deciding to share it on the social media platform.

Download Full-text