Reputation Scoring Fake News Using Text Mining

2017 ◽  
Vol 4 (1) ◽  
pp. 12-17
Author(s):  
Ahmad Firdaus

The classification of hoax news or news with incorrect information is one of the text categorization applications.Like text-based categorization of machine applications in general, this system consists of pre-processing andexecution of classification models. In this study, experiments were conducted to select the best technique in each sub-process by using 1200 articles hoax and 600 articles no hoax collected manually. This research Triedexperimenting to determine the best preprocessing stages between stop removals and stemming and showing the results of the deception Tree algorithm achieving an accuracy of 100% concluded above naive byes more stable level of accuracy in the number of datasets used in all candidates. Information gain, TFIDF and GGA based on using Naive Byes algorithm, supporting Vector Machine and Decision Tree no significant percentage change occurred on all candidates. But after using GGA (Optimize Generation) feature selection there is an increase of accuracy level The results of a comparison of classification algorithms between Naive Byes, decision trees and Support Vector machines combined with the GGA feature selection method for classifying the best result is generated by the selection of GGA + Decision Tree feature on candidate 2 (Paslon2) 100% and in the selection of the Information Gain + Decision Tree Feature selection with the lowest accuracy Candidate 3 at 36.67%, but overall improvement of accuracy Occurred on all algorithm after using feature selection and Naive byes more stable level of accuracy in the number of datasets used in all candidates.

2014 ◽  
Vol 2014 ◽  
pp. 1-17 ◽  
Author(s):  
Jieming Yang ◽  
Zhaoyang Qu ◽  
Zhiying Liu

The filtering feature-selection algorithm is a kind of important approach to dimensionality reduction in the field of the text categorization. Most of filtering feature-selection algorithms evaluate the significance of a feature for category based on balanced dataset and do not consider the imbalance factor of dataset. In this paper, a new scheme was proposed, which can weaken the adverse effect caused by the imbalance factor in the corpus. We evaluated the improved versions of nine well-known feature-selection methods (Information Gain, Chi statistic, Document Frequency, Orthogonal Centroid Feature Selection, DIA association factor, Comprehensive Measurement Feature Selection, Deviation from Poisson Feature Selection, improved Gini index, and Mutual Information) using naïve Bayes and support vector machines on three benchmark document collections (20-Newsgroups, Reuters-21578, and WebKB). The experimental results show that the improved scheme can significantly enhance the performance of the feature-selection methods.


2012 ◽  
Vol 532-533 ◽  
pp. 1191-1195 ◽  
Author(s):  
Zhen Yan Liu ◽  
Wei Ping Wang ◽  
Yong Wang

This paper introduces the design of a text categorization system based on Support Vector Machine (SVM). It analyzes the high dimensional characteristic of text data, the reason why SVM is suitable for text categorization. According to system data flow this system is constructed. This system consists of three subsystems which are text representation, classifier training and text classification. The core of this system is the classifier training, but text representation directly influences the currency of classifier and the performance of the system. Text feature vector space can be built by different kinds of feature selection and feature extraction methods. No research can indicate which one is the best method, so many feature selection and feature extraction methods are all developed in this system. For a specific classification task every feature selection method and every feature extraction method will be tested, and then a set of the best methods will be adopted.


2014 ◽  
Vol 618 ◽  
pp. 573-577 ◽  
Author(s):  
Yu Qiang Qin ◽  
Yu Dong Qi ◽  
Hui Ying

The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit rating for determining likelihood to default based on consumer application and credit reference agency data. We test support vector machines (SVM) against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default.


Sign in / Sign up

Export Citation Format

Share Document