scholarly journals Reducing Fraudulent News Proliferation using Classification Techniques

The expansion of dishonorable information in normal get entry to social access media retailers like internet based media channels, news web journals, and online papers have made it hard to identify dependable news sources, subsequently growing the need for technique tools able to deliver insights into the reliability of online content substances.. This paper comes up with the applications of Natural language process techniques for detective work the dishonest news, that is, dishonorable news stories that return from the non-reputable sources. Solely by building a model supported mistreatment word tallies or a Term Frequency-Inverse Document Frequency matrix, will solely get you to date. Is it potential for you to make a model which will differentiate between “Real “news and “Fake” news? Thus our planned work is going to be on grouping a knowledge set of each pretend and real news and uses a Naïve mathematician classifier so as to make a model to classify an editorial into pretend or really supported its words and phrases.

Expansion of deluding data in ordinary access news sources, for example, web-based media channels, news web journals, and online papers have made it testing to distinguish reliable news sources, hence expanding the requirement for computational apparatusesready to give bits of knowledge into the unwavering quality of online substance. In this paper, every person center around the programmed ID of phony substance in the news stories. In the first place, all of us present a dataset for the undertaking of phony news identification. All and sundry depict the pre-preparing, highlight extraction, characterization and forecast measure in detail. We've utilized Logistic Regression language handling strategies to order counterfeit news. The prepreparing capacities play out certain tasks like tokenizing, stemming and exploratory information examination like reaction variable conveyance and information quality check (for example invalid or missing qualities). Straightforward pack of-words, n-grams, TF-IDF is utilized as highlight extraction strategies. Strategic relapse model is utilized as classifier for counterfeit news identification with likelihood of truth.


Author(s):  
Saud Altaf ◽  
Sofia Iqbal ◽  
Muhammad Waseem Soomro

This paper focuses on capturing the meaning of Natural Language Understanding (NLU) text features to detect the duplicate unsupervised features. The NLU features are compared with lexical approaches to prove the suitable classification technique. The transfer-learning approach is utilized to train the extraction of features on the Semantic Textual Similarity (STS) task. All features are evaluated with two types of datasets that belong to Bosch bug and Wikipedia article reports. This study aims to structure the recent research efforts by comparing NLU concepts for featuring semantics of text and applying it to IR. The main contribution of this paper is a comparative study of semantic similarity measurements. The experimental results demonstrate the Term Frequency–Inverse Document Frequency (TF-IDF) feature results on both datasets with reasonable vocabulary size. It indicates that the Bidirectional Long Short Term Memory (BiLSTM) can learn the structure of a sentence to improve the classification.


2020 ◽  
Author(s):  
Amir Bidgoly ◽  
Hossein Amirkhani ◽  
Fariba Sadeghi

Abstract Fake news detection is a challenging problem in online social media, with considerable social and political impacts. Several methods have already been proposed for the automatic detection of fake news, which are often based on the statistical features of the content or context of news. In this paper, we propose a novel fake news detection method based on Natural Language Inference (NLI) approach. Instead of using only statistical features of the content or context of the news, the proposed method exploits a human-like approach, which is based on inferring veracity using a set of reliable news. In this method, the related and similar news published in reputable news sources are used as auxiliary knowledge to infer the veracity of a given news item. We also collect and publish the first inference-based fake news detection dataset, called FNID, in two formats: the two-class version (FNID-FakeNewsNet) and the six-class version (FNID-LIAR). We use the NLI approach to boost several classical and deep machine learning models including Decision Tree, Naïve Bayes, Random Forest, Logistic Regression, k-Nearest Neighbors, Support Vector Machine, BiGRU, and BiLSTM along with different word embedding methods including Word2vec, GloVe, fastText, and BERT. The experiments show that the proposed method achieves 85.58% and 41.31% accuracies in the FNID-FakeNewsNet and FNID-LIAR datasets, respectively, which are 10.44% and 13.19% respective absolute improvements.


Reading Comprehension (RC) plays an important role in Natural Language Processing (NLP) as it reads and understands text written in Natural Language. Reading Comprehension systems comprehend the given document and answer questions in the context of the given document. This paper proposes a Reading Comprehension System for Kannada documents. The RC system analyses text in the Kannada script and allows users to pose questions to It in Kannada. This system is aimed at masses whose primary language is Kannada - who would otherwise have difficulties in parsing through vast Kannada documents for the information they require. This paper discusses the proposed model built using Term Frequency - Inverse Document Frequency (TF-IDF) and its performance in extracting the answers from the context document. The proposed model captures the grammatical structure of Kannada to provide the most accurate answers to the user


2019 ◽  
Vol 3 (4) ◽  
pp. 53
Author(s):  
Ahmad Hawalah

Text classification is a process of classifying textual contents to a set of predefined classes and categories. As enormous numbers of documents and contextual contents are introduced every day on the Internet, it becomes essential to use text classification techniques for different purposes such as enhancing search retrieval and recommendation systems. A lot of work has been done to study different aspects of English text classification techniques. However, little attention has been devoted to study Arabic text classification due to the difficulty of processing Arabic language. Consequently, in this paper, we propose an enhanced Arabic topic-discovery architecture (EATA) that can use ontology to provide an effective Arabic topic classification mechanism. We have introduced a semantic enhancement model to improve Arabic text classification and the topic discovery technique by utilizing the rich semantic information in Arabic ontology. We rely in this study on the vector space model (term frequency-inverse document frequency (TF-IDF)) as well as the cosine similarity approach to classify new Arabic textual documents.


2019 ◽  
Vol 7 (1) ◽  
pp. 1831-1840
Author(s):  
Bern Jonathan ◽  
Jay Idoan Sihotang ◽  
Stanley Martin

Introduction: Natural Language Processing is one part of Artificial Intelligence and Machine Learning to make an understanding of the interactions between computers and human (natural) languages. Sentiment analysis is one part of Natural Language Processing, that often used to analyze words based on the patterns of people in writing to find positive, negative, or neutral sentiments. Sentiment analysis is useful for knowing how users like something or not. Zomato is an application for rating restaurants. The rating has a review of the restaurant which can be used for sentiment analysis. Based on this, writers want to discuss the sentiment of the review to be predicted. Method: The method used for preprocessing the review is to make all words lowercase, tokenization, remove numbers and punctuation, stop words, and lemmatization. Then after that, we create word to vector with the term frequency-inverse document frequency (TF-IDF). The data that we process are 150,000 reviews. After that make positive with reviews that have a rating of 3 and above, negative with reviews that have a rating of 3 and below, and neutral who have a rating of 3. The author uses Split Test, 80% Data Training and 20% Data Testing. The metrics used to determine random forest classifiers are precision, recall, and accuracy. The accuracy of this research is 92%. Result: The precision of positive, negative, and neutral sentiment is 92%, 93%, 96%. The recall of positive, negative, and neutral sentiment are 99%, 89%, 73%. Average precision and recall are 93% and 87%. The 10 words that affect the results are: “bad”, “good”, “average”, “best”, “place”, “love”, “order”, “food”, “try”, and “nice”.


Author(s):  
Maria del Mar Ramirez-Alvarado

This chapter analyses the concept of post-truth related to the circumstances in which objective facts are less influential in the formation of public opinion than emotional appeals and personal beliefs, and the subsequent projection of this phenomenon in social media, as various studies have demonstrated that some fake news stories generate more engagement from users than vetted reporting from reliable news sources. This will start from a general introduction and an associated theoretical reflection, and then focus on the case of Venezuela and its recent historical circumstances in order to analyze how fake news circulates in this country stimulated by a context of widespread disinformation.


2020 ◽  
pp. 016555152097744
Author(s):  
Yongcong Luo ◽  
Jing Ma ◽  
Chai Kiat Yeo

Online social media (OSM) has become a hotbed for the rapid dissemination of disinformation or fake news. In order to recognise fake news and guide users of OSM, we focus on the stance recognition of comments, posted on OSM on the fake news-related users. In this article, we propose a framework for recognition of rumour stances (we set four categories –‘agree’, ‘disagree’, ‘neutral’ and ‘query’), combining network topology and comment semantic enhancement (CSE). We first construct a vector matrix of comments via a novel optimised term frequency–inverse document frequency (OTI). To better recognise stances, we employ another vector matrix with novel or special attributes which comprises the network topology of the OSM users derived from the random walk with restart (RWR) method. In addition, we set a weight parameter for each word in the comments to enhance comment semantic representation, where these parameters are tuned based on sentiment score, topology features and question format words. These vector matrices are optimised and combined into an integrated matrix whose transpose matrix is fed into a neural network (NN) for final rumour stance recognition. Experimental evaluations show that our approach achieves a high precision of 93.96% and F1-score of 92.02% which are superior to baselines and other existing methods.


2019 ◽  
Author(s):  
Matthew J. Lavin

This lesson focuses on a foundational natural language processing and information retrieval method called Term Frequency - Inverse Document Frequency (tf-idf). This lesson explores the foundations of tf-idf, and will also introduce you to some of the questions and concepts of computationally oriented text analysis.


Sign in / Sign up

Export Citation Format

Share Document