Fake News Detection using Machine Learning

Everyone depends upon various online resources for news in this modern age, where the internet is pervasive. As the use of social media platforms such as Facebook, Twitter, and others has increased, news spreads quickly among millions of users in a short time. The consequences of Fake news are far-reaching, from swaying election outcomes in favor of certain candidates to creating biased opinions. WhatsApp, Instagram, and many other social media platforms are the main source for spreading fake news. This work provides a solution by introducing a fake news detection model using machine learning. This model requires prerequisite data extracted from various news websites. Web scraping technique is used for data extraction which is further used to create datasets. The data is classified into two major categories which are true dataset and false dataset. Classifiers used for the classification of data are Random Forest, Logistic Regression, Decision Tree, KNN and Gradient Booster. Based on the output received the data is classified either as true or false data. Based on that, the user can find out whether the given news is fake or not on the webserver.

Download Full-text

Detecting Fake News Using Social Media Platforms

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38561 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1115-1120

Author(s):

Prof. B. J. Deokate

Keyword(s):

Machine Learning ◽

Social Media ◽

Random Forest ◽

Decision Tree ◽

Short Term Memory ◽

Fake News ◽

Online Social Media ◽

News Websites ◽

Computer Scientists ◽

Social Media Platforms

Abstract: Fake news detection is an interesting topic for computer scientists and social science. The recent growth of the online social media fake news has great impact to the society. There is a huge information from disparate sources among various users around the world. Social media platforms like Facebook, WhatsApp and Twitter are one of the most popular applications that are able to deliver appealing data in timely manner. Developing a technique that can detect fake news from these platforms is becoming a necessary and challenging task. This project proposes a machine learning method which can identify the credibility of an article that will be extracted from the Uniform Resource Locator (URL) entered by the user on the front end of a website. The project uses the five widely used machine learning methods: Long Short Term Memory (LSTM), Random Forest (random tree), Random Forest (decision tree), Decision Tree and Neural Network to give a response telling the user about the credibility of that news. Our initial definition of reliable and unreliable will rely on the human-curated data http://opensources.co. OpenSources.co has a list of about 20 credible news websites and a list of over 700 fake news websites. The proposed model is working well and defining the correctness of results upto 87.45% of accuracy. Keywords: Data Pre-processing, Fake news datasets, ML algorithms, Prediction.

Download Full-text

Exploring fake news identification using word and sentence embeddings

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189865 ◽

2021 ◽

pp. 1-8

Author(s):

V.T Priyanga ◽

J.P Sanjanasri ◽

Vijay Krishna Menon ◽

E.A Gopalakrishnan ◽

K.P Soman

Keyword(s):

Machine Learning ◽

Social Media ◽

Network Analysis ◽

Supervised Machine Learning ◽

Breeding Ground ◽

Fake News ◽

Data Set ◽

Highly Correlated ◽

Use Of Social Media ◽

The Liar

The widespread use of social media like Facebook, Twitter, Whatsapp, etc. has changed the way News is created and published; accessing news has become easy and inexpensive. However, the scale of usage and inability to moderate the content has made social media, a breeding ground for the circulation of fake news. Fake news is deliberately created either to increase the readership or disrupt the order in the society for political and commercial benefits. It is of paramount importance to identify and filter out fake news especially in democratic societies. Most existing methods for detecting fake news involve traditional supervised machine learning which has been quite ineffective. In this paper, we are analyzing word embedding features that can tell apart fake news from true news. We use the LIAR and ISOT data set. We churn out highly correlated news data from the entire data set by using cosine similarity and other such metrices, in order to distinguish their domains based on central topics. We then employ auto-encoders to detect and differentiate between true and fake news while also exploring their separability through network analysis.

Download Full-text

Intelligent Detection of False Information in Arabic Tweets Utilizing Hybrid Harris Hawks Based Feature Selection and Machine Learning Models

Symmetry ◽

10.3390/sym13040556 ◽

2021 ◽

Vol 13 (4) ◽

pp. 556

Author(s):

Thaer Thaher ◽

Mahmoud Saheb ◽

Hamza Turabieh ◽

Hamouda Chantar

Keyword(s):

Machine Learning ◽

Social Media ◽

Feature Selection ◽

Language Processing ◽

User Profile ◽

Vital Role ◽

Classification Model ◽

Fake News ◽

False Information ◽

Social Media Platforms

Fake or false information on social media platforms is a significant challenge that leads to deliberately misleading users due to the inclusion of rumors, propaganda, or deceptive information about a person, organization, or service. Twitter is one of the most widely used social media platforms, especially in the Arab region, where the number of users is steadily increasing, accompanied by an increase in the rate of fake news. This drew the attention of researchers to provide a safe online environment free of misleading information. This paper aims to propose a smart classification model for the early detection of fake news in Arabic tweets utilizing Natural Language Processing (NLP) techniques, Machine Learning (ML) models, and Harris Hawks Optimizer (HHO) as a wrapper-based feature selection approach. Arabic Twitter corpus composed of 1862 previously annotated tweets was utilized by this research to assess the efficiency of the proposed model. The Bag of Words (BoW) model is utilized using different term-weighting schemes for feature extraction. Eight well-known learning algorithms are investigated with varying combinations of features, including user-profile, content-based, and words-features. Reported results showed that the Logistic Regression (LR) with Term Frequency-Inverse Document Frequency (TF-IDF) model scores the best rank. Moreover, feature selection based on the binary HHO algorithm plays a vital role in reducing dimensionality, thereby enhancing the learning model’s performance for fake news detection. Interestingly, the proposed BHHO-LR model can yield a better enhancement of 5% compared with previous works on the same dataset.

Download Full-text

Trends in the diffusion of misinformation on social media

Research & Politics ◽

10.1177/2053168019848554 ◽

2019 ◽

Vol 6 (2) ◽

pp. 205316801984855 ◽

Cited By ~ 47

Author(s):

Hunt Allcott ◽

Matthew Gentzkow ◽

Chuan Yu

Keyword(s):

Social Media ◽

Relative Magnitude ◽

Democratic Institutions ◽

Fake News ◽

User Interactions ◽

News Stories ◽

News Websites ◽

Social Media Platforms

In recent years, there has been widespread concern that misinformation on social media is damaging societies and democratic institutions. In response, social media platforms have announced actions to limit the spread of false content. We measure trends in the diffusion of content from 569 fake news websites and 9540 fake news stories on Facebook and Twitter between January 2015 and July 2018. User interactions with false content rose steadily on both Facebook and Twitter through the end of 2016. Since then, however, interactions with false content have fallen sharply on Facebook while continuing to rise on Twitter, with the ratio of Facebook engagements to Twitter shares decreasing by 60%. In comparison, interactions with other news, business, or culture sites have followed similar trends on both platforms. Our results suggest that the relative magnitude of the misinformation problem on Facebook has declined since its peak.

Download Full-text

Machine Learning in Detecting COVID-19 Misinformation on Twitter

Future Internet ◽

10.3390/fi13100244 ◽

2021 ◽

Vol 13 (10) ◽

pp. 244

Author(s):

Mohammed N. Alenezi ◽

Zainab M. Alqenaei

Keyword(s):

Machine Learning ◽

Social Media ◽

Short Term Memory ◽

K Nearest Neighbors ◽

Daily Lives ◽

Frontline Workers ◽

Detection Model ◽

Social Media Platforms ◽

Long Short Term Memory ◽

Types Of Information

Social media platforms such as Facebook, Instagram, and Twitter are an inevitable part of our daily lives. These social media platforms are effective tools for disseminating news, photos, and other types of information. In addition to the positives of the convenience of these platforms, they are often used for propagating malicious data or information. This misinformation may misguide users and even have dangerous impact on society’s culture, economics, and healthcare. The propagation of this enormous amount of misinformation is difficult to counter. Hence, the spread of misinformation related to the COVID-19 pandemic, and its treatment and vaccination may lead to severe challenges for each country’s frontline workers. Therefore, it is essential to build an effective machine-learning (ML) misinformation-detection model for identifying the misinformation regarding COVID-19. In this paper, we propose three effective misinformation detection models. The proposed models are long short-term memory (LSTM) networks, which is a special type of RNN; a multichannel convolutional neural network (MC-CNN); and k-nearest neighbors (KNN). Simulations were conducted to evaluate the performance of the proposed models in terms of various evaluation metrics. The proposed models obtained superior results to those from the literature.

Download Full-text

Identification and Detection of Cyberbullying on Facebook Using Machine Learning Algorithms

Journal of Cases on Information Technology ◽

10.4018/jcit.296254 ◽

2021 ◽

Vol 23 (4) ◽

pp. 1-21

Author(s):

Nureni Ayofe AZEEZ ◽

Sanjay Misra ◽

Omotola Ifeoluwa LAWAL ◽

Jonathan Oluranti

Keyword(s):

Machine Learning ◽

Social Media ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

K Nearest Neighbor ◽

Chi Square ◽

Social Media Platforms ◽

Bayes Algorithm ◽

Use Of Social Media

The use of social media platforms such as Facebook, Twitter, Instagram, WhatsApp, etc. have enabled a lot of people to communicate effectively and frequently with each other and this has enabled cyberbullying to occur more frequently while using these networks. Cyberbullying is known to be the cause of some serious health issues among social media users and creating a way to identify and detect this holds significant importance. This paper takes a look at unique features gotten from the Facebook dataset and develops a model that identifies and detect cyberbullying posts by applying machine learning algorithms (Naïve Bayes Algorithm and K-Nearest Neighbor). The project also uses a feature selection algorithm namely x2 test (Chi-Square test) to select important features which can improve the performance of the classifiers and decrease classification time. The result of this paper tends to detect cyberbullying in Facebook with a high degree of accuracy and also improve the performance of the machine learning classifiers.

Download Full-text

Combating Fake News, Misinformation, and Machine Learning Generated Fakes: Insight's from the Islamic Ethical Tradition

ICR Journal ◽

10.52282/icr.v10i2.42 ◽

2019 ◽

Vol 10 (2) ◽

pp. 189-212

Author(s):

Talat Zubair ◽

Amana Raquib ◽

Junaid Qadir

Keyword(s):

Machine Learning ◽

Social Media ◽

Deep Learning ◽

World Wide ◽

Security And Privacy ◽

Fake News ◽

Islamic Ethics ◽

The World ◽

Social Media Platforms ◽

Filter Bubbles

The growing trend of sharing and acquiring news through social media platforms and the World Wide Web has impacted individuals as well as societies, spreading misinformation and disinformation. This trend—along with rapid developments in the field of machine learning, particularly with the emergence of techniques such as deep learning that can be used to generate data—has grave political, social, ethical, security, and privacy implications for society. This paper discusses the technologies that have led to the rise of problems such as fake news articles, filter bubbles, social media bots, and deep-fake videos, and their implications, while providing insights from the Islamic ethical tradition that can aid in mitigating them. We view these technologies and artifacts through the Islamic lens, concluding that they violate the commandment of spreading truth and countering falsehood. We present a set of guidelines, with reference to Qur‘anic and Prophetic teachings and the practices of the early Muslim scholars, on countering deception, putting forward ideas on developing these technologies while keeping Islamic ethics in perspective.

Download Full-text

Detecting Fake News using Machine Learning: A Systematic Literature Review

Psychology and Education Journal ◽

10.17762/pae.v58i1.1046 ◽

2021 ◽

Vol 58 (1) ◽

pp. 1932-1939

Author(s):

Alim Al Ayub Ahmed Et al.

Keyword(s):

Machine Learning ◽

Social Media ◽

Literature Review ◽

Systematic Literature Review ◽

Political Party ◽

Fake News ◽

Machine Learning Classifiers ◽

Online Platforms ◽

Learning Classifiers ◽

Social Media Platforms

Internet is one of the important inventions and a large number of persons are its users. These persons use this for different purposes. There are different social media platforms that are accessible to these users. Any user can make a post or spread the news through these online platforms. These platforms do not verify the users or their posts. So some of the users try to spread fake news through these platforms. These fake news can be a propaganda against an individual, society, organization or political party. A human being is unable to detect all these fake news. So there is a need for machine learning classifiers that can detect these fake news automatically. Use of machine learning classifiers for detecting the fake news is described in this systematic literature review.

Download Full-text

Fuentes informativas en tiempos de Covid-19: Cómo los medios en Chile narraron la pandemia a través de sus redes sociales

El profesional de la información ◽

10.3145/epi.2021.jul.21 ◽

2021 ◽

Author(s):

Claudia Mellado ◽

Luis Cárcamo-Ulloa ◽

Amaranta Alfaro ◽

Daria Inai ◽

José Isbej

Keyword(s):

Machine Learning ◽

Social Media ◽

Redes Sociales ◽

Health Crisis ◽

Political Sources ◽

Social Media Platforms ◽

The Media ◽

Ma Process ◽

Use Of Social Media ◽

Over Time

This study analyzes the use of social media sources by nine news outlets in Chile in regard to Covid-19. We identified the most frequently used types of sources, their evolution over time, and the differences between the various social media platforms used by the Chilean media during the pandemic. Specifically, we extracted 838,618 messages published by Chilean media on Facebook, Instagram, and Twitter between January and December 2020. An initial machine learning (MA) process was applied to automatically identify 168,250 messages that included keywords that link their content to Covid-19. Based on a list of 2,130 entities, another MA process was used to apply a set of rules based on the appearance of declarative verbs or common expressions used by the media when citing a source, and the use of colons or quotation marks to detect the presence of different types of sources in the news content. The results reveal that Chilean media outlets’ use of different voices on social media broadly favored political sources followed by health, citizen, academic-scientific, and economic ones. Although the hierarchy of the most important sources used to narrate the public health crisis tended to remain stable, there were nuances over time, and its variation depended on key historic milestones. An analysis of the use of sources by each platform revealed that Twitter was the least pluralist, giving space to a more restricted group of voices and intensifying the presence of political sources over the others, particularly citizen sources. Finally, our study revealed significant differences across media types in the use of political, health, and citizen sources, with television showing a greater presence than in other types of media. Resumen Se analiza el uso de fuentes en redes sociales de nueve medios de información de referencia en Chile frente al Covid-19. Se identificaron los tipos de fuentes más utilizados, su evolución en el tiempo, así como las diferencias encontradas entre distintas plataformas de redes sociales de los medios chilenos. Específicamente, se extrajeron 838.618 publicaciones de medios nacionales desde Facebook, Instagram y Twitter entre enero y diciembre de 2020. A ese corpus se aplicó un primer proceso de machine learning (MA) para filtrar automáticamente 168.250 publicaciones que incluían palabras claves que identifican su contenido con el Covid-19. A partir de una lista de 2.130 entidades, se utilizó otro proceso de MA para aplicar un conjunto de reglas basadas en la presencia de verbos declarativos o de expresiones comunes usadas por los medios cuando se cita a una entidad, así como el uso de dos puntos o de comillas, con el objeto de detectar distintos tipos de fuentes en el contenido informativo. Los resultados muestran que el uso que los medios chilenos dieron a distintas voces en sus redes sociales favoreció ampliamente a las fuentes políticas, seguidas por las fuentes de salud, y más desde lejos por las ciudadanas, académico-científicas y económicas. Aunque la jerarquía de las fuentes que se usó para narrar la crisis sanitaria tendió a mantenerse estable, tuvo matices a lo largo del tiempo y su variación dependió de los hitos que marcaron la historia del país. Al analizar el uso de fuentes según plataforma, se observa a Twitter como menos pluralista, dando espacio a un grupo más restringido de voces e intensificando la presencia de las fuentes políticas por sobre las demás; en especial, por sobre las ciudadanas. Finalmente, nuestro estudio reveló diferencias significativas en las fuentes utilizadas por publicaciones de origen televisivo, particularmente en el uso de fuentes políticas, de salud y ciudadanas, las cuales tuvieron una presencia mayor que en los demás tipos de medios

Download Full-text

Understanding the Strategies of Creating Fake News in Social Media

10.20944/preprints202011.0369.v2 ◽

2020 ◽

Author(s):

Dinusha Vatsalan ◽

Nalin A.G. Arachchilage

Keyword(s):

Machine Learning ◽

Social Media ◽

Online News ◽

Data Sets ◽

Fake News ◽

Threat Model ◽

Social Media Platforms ◽

Model Understanding ◽

Cyber Criminals

Social media giants like Facebook are struggling to keep up with fake news, in the light of the fact that disinformation diffuses at lightning speed. For example, the COVID-19 (i.e. Coronavirus) pandemic is testing the citizens' ability to distinguish real news from falsifying facts (i.e. disinformation). Cyber-criminals take advantage of the inability to cope with fake news diffusion on social media platforms. Fake news, created as a means to manipulate readers to perform various malicious IT activities such as clicking on fraudulent links associated with the fake news/posts. However, no previous study has investigated the strategies used to create fake news on social media. Therefore, we have analysed five data-sets using Machine Learning (ML) that contain online news articles (i.e. both fake and legitimate news) to investigate strategies of creating fake news on social media platforms. Our study findings revealed a threat model understanding strategies of crafting fake news which may highly likely diffuse on social media platforms.

Download Full-text