Reducing Fraudulent News Proliferation using Classification Techniques

The expansion of dishonorable information in normal get entry to social access media retailers like internet based media channels, news web journals, and online papers have made it hard to identify dependable news sources, subsequently growing the need for technique tools able to deliver insights into the reliability of online content substances.. This paper comes up with the applications of Natural language process techniques for detective work the dishonest news, that is, dishonorable news stories that return from the non-reputable sources. Solely by building a model supported mistreatment word tallies or a Term Frequency-Inverse Document Frequency matrix, will solely get you to date. Is it potential for you to make a model which will differentiate between “Real “news and “Fake” news? Thus our planned work is going to be on grouping a knowledge set of each pretend and real news and uses a Naïve mathematician classifier so as to make a model to classify an editorial into pretend or really supported its words and phrases.

Download Full-text

Fake News Detection System Using machine Learning

International Journal of Advances in Computer Science and Technology ◽

10.30534/ijacst/2021/021062021 ◽

2021 ◽

Vol 10 (6) ◽

pp. 12-15

Keyword(s):

Information Quality ◽

Detection System ◽

Fake News ◽

Web Based ◽

News Stories ◽

News Sources ◽

Quality Check ◽

Media Channels ◽

Highlight Extraction

Expansion of deluding data in ordinary access news sources, for example, web-based media channels, news web journals, and online papers have made it testing to distinguish reliable news sources, hence expanding the requirement for computational apparatusesready to give bits of knowledge into the unwavering quality of online substance. In this paper, every person center around the programmed ID of phony substance in the news stories. In the first place, all of us present a dataset for the undertaking of phony news identification. All and sundry depict the pre-preparing, highlight extraction, characterization and forecast measure in detail. We've utilized Logistic Regression language handling strategies to order counterfeit news. The prepreparing capacities play out certain tasks like tokenizing, stemming and exploratory information examination like reaction variable conveyance and information quality check (for example invalid or missing qualities). Straightforward pack of-words, n-grams, TF-IDF is utilized as highlight extraction strategies. Strategic relapse model is utilized as classifier for counterfeit news identification with likelihood of truth.

Download Full-text

Efficient natural language classification algorithm for detecting duplicate unsupervised features

Informatics and Automation - Информатика и автоматизация ◽

10.15622/ia.2021.3.5 ◽

2021 ◽

Vol 20 (3) ◽

pp. 623-653

Author(s):

Saud Altaf ◽

Sofia Iqbal ◽

Muhammad Waseem Soomro

Keyword(s):

Natural Language ◽

Short Term Memory ◽

Short Term ◽

Vocabulary Size ◽

Language Understanding ◽

Inverse Document Frequency ◽

Classification Technique ◽

Document Frequency ◽

Text Features ◽

Long Short Term Memory

This paper focuses on capturing the meaning of Natural Language Understanding (NLU) text features to detect the duplicate unsupervised features. The NLU features are compared with lexical approaches to prove the suitable classification technique. The transfer-learning approach is utilized to train the extraction of features on the Semantic Textual Similarity (STS) task. All features are evaluated with two types of datasets that belong to Bosch bug and Wikipedia article reports. This study aims to structure the recent research efforts by comparing NLU concepts for featuring semantics of text and applying it to IR. The main contribution of this paper is a comparative study of semantic similarity measurements. The experimental results demonstrate the Term Frequency–Inverse Document Frequency (TF-IDF) feature results on both datasets with reasonable vocabulary size. It indicates that the Bidirectional Long Short Term Memory (BiLSTM) can learn the structure of a sentence to improve the classification.

Download Full-text

Fake News Detection on Social Media Using A Natural Language Inference Approach

10.21203/rs.3.rs-107893/v1 ◽

2020 ◽

Author(s):

Amir Bidgoly ◽

Hossein Amirkhani ◽

Fariba Sadeghi

Keyword(s):

Social Media ◽

Natural Language ◽

News Item ◽

Support Vector ◽

Statistical Features ◽

Fake News ◽

K Nearest Neighbors ◽

News Sources ◽

Online Social Media ◽

Embedding Methods

Abstract Fake news detection is a challenging problem in online social media, with considerable social and political impacts. Several methods have already been proposed for the automatic detection of fake news, which are often based on the statistical features of the content or context of news. In this paper, we propose a novel fake news detection method based on Natural Language Inference (NLI) approach. Instead of using only statistical features of the content or context of the news, the proposed method exploits a human-like approach, which is based on inferring veracity using a set of reliable news. In this method, the related and similar news published in reputable news sources are used as auxiliary knowledge to infer the veracity of a given news item. We also collect and publish the first inference-based fake news detection dataset, called FNID, in two formats: the two-class version (FNID-FakeNewsNet) and the six-class version (FNID-LIAR). We use the NLI approach to boost several classical and deep machine learning models including Decision Tree, Naïve Bayes, Random Forest, Logistic Regression, k-Nearest Neighbors, Support Vector Machine, BiGRU, and BiLSTM along with different word embedding methods including Word2vec, GloVe, fastText, and BERT. The experiments show that the proposed method achieves 85.58% and 41.31% accuracies in the FNID-FakeNewsNet and FNID-LIAR datasets, respectively, which are 10.44% and 13.19% respective absolute improvements.

Download Full-text

Development of Reading Comprehension System for Kannada Text Documents

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f1008.0486s419 ◽

2019 ◽

Vol 8 (6S4) ◽

pp. 42-45

Keyword(s):

Reading Comprehension ◽

Natural Language ◽

Language Processing ◽

Primary Language ◽

Grammatical Structure ◽

Text Documents ◽

Inverse Document Frequency ◽

Document Frequency ◽

Proposed Model ◽

The Given

Reading Comprehension (RC) plays an important role in Natural Language Processing (NLP) as it reads and understands text written in Natural Language. Reading Comprehension systems comprehend the given document and answer questions in the context of the given document. This paper proposes a Reading Comprehension System for Kannada documents. The RC system analyses text in the Kannada script and allows users to pose questions to It in Kannada. This system is aimed at masses whose primary language is Kannada - who would otherwise have difficulties in parsing through vast Kannada documents for the information they require. This paper discusses the proposed model built using Term Frequency - Inverse Document Frequency (TF-IDF) and its performance in extracting the answers from the context document. The proposed model captures the grammatical structure of Kannada to provide the most accurate answers to the user

Download Full-text

Semantic Ontology-Based Approach to Enhance Arabic Text Classification

Big Data and Cognitive Computing ◽

10.3390/bdcc3040053 ◽

2019 ◽

Vol 3 (4) ◽

pp. 53

Author(s):

Ahmad Hawalah

Keyword(s):

Text Classification ◽

Arabic Language ◽

Arabic Text ◽

Inverse Document Frequency ◽

Classification Techniques ◽

Topic Discovery ◽

Document Frequency ◽

The Rich ◽

Semantic Ontology ◽

Arabic Text Classification

Text classification is a process of classifying textual contents to a set of predefined classes and categories. As enormous numbers of documents and contextual contents are introduced every day on the Internet, it becomes essential to use text classification techniques for different purposes such as enhancing search retrieval and recommendation systems. A lot of work has been done to study different aspects of English text classification techniques. However, little attention has been devoted to study Arabic text classification due to the difficulty of processing Arabic language. Consequently, in this paper, we propose an enhanced Arabic topic-discovery architecture (EATA) that can use ontology to provide an effective Arabic topic classification mechanism. We have introduced a semantic enhancement model to improve Arabic text classification and the topic discovery technique by utilizing the rich semantic information in Arabic ontology. We rely in this study on the vector space model (term frequency-inverse document frequency (TF-IDF)) as well as the cosine similarity approach to classify new Arabic textual documents.

Download Full-text

Sentiment analysis of customer reviews in zomato bangalore restaurants using random forest classifier

Abstract Proceedings International Scholars Conference ◽

10.35974/isc.v7i1.1003 ◽

2019 ◽

Vol 7 (1) ◽

pp. 1831-1840

Author(s):

Bern Jonathan ◽

Jay Idoan Sihotang ◽

Stanley Martin

Keyword(s):

Natural Language Processing ◽

Random Forest ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Natural Languages ◽

Inverse Document Frequency ◽

Customer Reviews ◽

Document Frequency ◽

Split Test

Introduction: Natural Language Processing is one part of Artificial Intelligence and Machine Learning to make an understanding of the interactions between computers and human (natural) languages. Sentiment analysis is one part of Natural Language Processing, that often used to analyze words based on the patterns of people in writing to find positive, negative, or neutral sentiments. Sentiment analysis is useful for knowing how users like something or not. Zomato is an application for rating restaurants. The rating has a review of the restaurant which can be used for sentiment analysis. Based on this, writers want to discuss the sentiment of the review to be predicted. Method: The method used for preprocessing the review is to make all words lowercase, tokenization, remove numbers and punctuation, stop words, and lemmatization. Then after that, we create word to vector with the term frequency-inverse document frequency (TF-IDF). The data that we process are 150,000 reviews. After that make positive with reviews that have a rating of 3 and above, negative with reviews that have a rating of 3 and below, and neutral who have a rating of 3. The author uses Split Test, 80% Data Training and 20% Data Testing. The metrics used to determine random forest classifiers are precision, recall, and accuracy. The accuracy of this research is 92%. Result: The precision of positive, negative, and neutral sentiment is 92%, 93%, 96%. The recall of positive, negative, and neutral sentiment are 99%, 89%, 73%. Average precision and recall are 93% and 87%. The 10 words that affect the results are: “bad”, “good”, “average”, “best”, “place”, “love”, “order”, “food”, “try”, and “nice”.

Download Full-text

Fake News Detection on Reddit Utilising CountVectorizer and Term Frequency-Inverse Document Frequency with Logistic Regression, MultinominalNB and Support Vector Machine

2021 32nd Irish Signals and Systems Conference (ISSC) ◽

10.1109/issc52156.2021.9467842 ◽

2021 ◽

Author(s):

Ankitkumar Patel ◽

Kevin Meehan

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Support Vector ◽

Fake News ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency

Download Full-text

Post-Truths and Fake News in Disinformation Contexts

Advances in Business Strategy and Competitive Advantage - Handbook of Research on Transmedia Storytelling, Audience Engagement, and Business Strategies ◽

10.4018/978-1-7998-3119-8.ch020 ◽

2020 ◽

pp. 306-320

Author(s):

Maria del Mar Ramirez-Alvarado

Keyword(s):

Social Media ◽

Public Opinion ◽

Fake News ◽

Personal Beliefs ◽

General Introduction ◽

News Stories ◽

News Sources ◽

Emotional Appeals ◽

Theoretical Reflection

This chapter analyses the concept of post-truth related to the circumstances in which objective facts are less influential in the formation of public opinion than emotional appeals and personal beliefs, and the subsequent projection of this phenomenon in social media, as various studies have demonstrated that some fake news stories generate more engagement from users than vetted reporting from reliable news sources. This will start from a general introduction and an associated theoretical reflection, and then focus on the case of Venezuela and its recent historical circumstances in order to analyze how fake news circulates in this country stimulated by a context of widespread disinformation.

Download Full-text

Exploiting user network topology and comment semantic for accurate rumour stance recognition on social media

Journal of Information Science ◽

10.1177/0165551520977443 ◽

2020 ◽

pp. 016555152097744

Author(s):

Yongcong Luo ◽

Jing Ma ◽

Chai Kiat Yeo

Keyword(s):

Social Media ◽

Network Topology ◽

Semantic Representation ◽

Fake News ◽

Inverse Document Frequency ◽

Online Social Media ◽

Document Frequency ◽

Rapid Dissemination ◽

Sentiment Score ◽

Vector Matrix

Online social media (OSM) has become a hotbed for the rapid dissemination of disinformation or fake news. In order to recognise fake news and guide users of OSM, we focus on the stance recognition of comments, posted on OSM on the fake news-related users. In this article, we propose a framework for recognition of rumour stances (we set four categories –‘agree’, ‘disagree’, ‘neutral’ and ‘query’), combining network topology and comment semantic enhancement (CSE). We first construct a vector matrix of comments via a novel optimised term frequency–inverse document frequency (OTI). To better recognise stances, we employ another vector matrix with novel or special attributes which comprises the network topology of the OSM users derived from the random walk with restart (RWR) method. In addition, we set a weight parameter for each word in the comments to enhance comment semantic representation, where these parameters are tuned based on sentiment score, topology features and question format words. These vector matrices are optimised and combined into an integrated matrix whose transpose matrix is fed into a neural network (NN) for final rumour stance recognition. Experimental evaluations show that our approach achieves a high precision of 93.96% and F1-score of 92.02% which are superior to baselines and other existing methods.

Download Full-text

Analyzing Documents with TF-IDF

The Programming Historian ◽

10.46430/phen0082 ◽

2019 ◽

Author(s):

Matthew J. Lavin

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Analysis ◽

Retrieval Method ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency

This lesson focuses on a foundational natural language processing and information retrieval method called Term Frequency - Inverse Document Frequency (tf-idf). This lesson explores the foundations of tf-idf, and will also introduce you to some of the questions and concepts of computationally oriented text analysis.

Download Full-text