An Improved Naive Bayesian Classification Algorithm for Sentiment Classification of Microblogs

2014 ◽  
Vol 543-547 ◽  
pp. 3614-3620
Author(s):  
Zhi Qiang Li ◽  
De Quan Yang ◽  
Yuan Tan ◽  
Yuan Ping Zou

For the attribute-weighted based naive Bayesian classification algorithms, the selection of the weight directly affects the classification results. Based on this, the drawbacks of the TFIDF feature selection approaches in sentiment classification for the microblogs are analyzed, and an improved algorithm named TF-D(t)-CHI is proposed, which applies statistical calculation to obtain the correlation degree between the feature words and the classes. It presents the distribution of the feature items by variance in classes, which solves the problem that the short-texts contain few feature words while the high frequency feature words have too high weight. Experimental result indicate that TF-D(T)-CHI based naive Bayesian classification for feature selection and weight calculation has better classification results in sentiment classification for microblogs.

2016 ◽  
Vol 25 (03) ◽  
pp. 1650012 ◽  
Author(s):  
Hongmei Chen ◽  
Weiyi Liu ◽  
Lizhen Wang

The potential applications and challenges of uncertain data mining have recently attracted interests from researchers. Most uncertain data mining algorithms consider aleatory (random) uncertainty of data, i.e. these algorithms require that exact probability distributions or confidence values are attached to uncertain data. However, knowledge about uncertainty may be incomplete in the case of epistemic (incomplete) uncertainty of data, i.e. probabilities of uncertain data may be imprecise, coarse, or missing in some applications. The paper focuses on uncertain data which miss probabilities, specially, value-uncertain discrete objects which miss probabilities (for short uncertain objects). On the other hand, classification is one of the most important tasks in data mining. But, to the best of our knowledge, there is no method to learn Naïve Bayesian classifier from uncertain objects. So the paper studies Naïve Bayesian classification of uncertain objects. Firstly, the paper defines interval probabilities of uncertain objects from probabilistic cardinality point of view, and bridges the gap between uncertain objects and the theory of interval probability by proving that interval probabilities are F-probabilities. Secondly, based on the theory of interval probability, the paper defines conditional interval probabilities including the intuitive concept and the canonical concept, and the conditional independence of the intuitive concept. Further, the paper gives a formula to effectively compute the intuitive concept. Thirdly, the paper presents a Naïve Bayesian classifier with interval probability parameters which can handle both uncertain objects and certain objects. Finally, experiments with uncertain objects based on UCI data show satisfactory performances.


2004 ◽  
Vol 57 (3) ◽  
pp. 233-269 ◽  
Author(s):  
Peter A. Flach ◽  
Nicolas Lachiche

2014 ◽  
Vol 1070-1072 ◽  
pp. 2066-2072
Author(s):  
Jing Zhang ◽  
Jia Jia Bi ◽  
Ning Sun ◽  
Xue Gang Hu

Nowadays, multi-relational classification has become a hotspot for research and application in the field of data mining. Compared to the single table with simple structure, multi-relational tables is more complicated. However, not all of the information in the tables has good effects on classification. It may decrease the classification accuracy of the algorithm when irrelevant relations are added. In this article, we optimized the multi-relational tables using the usefulness of the backgrounds to remove those relations which have little effect on the classification. The results show that, this method is effective.


2011 ◽  
Vol 36 (4) ◽  
pp. 51-66 ◽  
Author(s):  
Hemanta Saikia ◽  
Dibyojyoti Bhattacharjee

An all-rounder can take an imperative role in any version of the game of cricket, whether it is a test match or any other limited-over format of the game. The study classifies the performance of all-rounders who participated in IPL based on their strike rate and economy rate. Based on the factors mentioned, the all-rounders can be divided into four non-overlapping classes, viz., Performer, Batting All-rounder, Bowling All-rounder, and Under-performer. Several predictor variables that are supposed to influence the performance of all-rounders are considered. Step-wise multinomial logistic regression (SMLR) is used to identify the significant predictors. Samples of six incumbent all-rounders who had not participated in the first three seasons of IPL are considered. The significant predictors were then used to predict the expected class of an incumbent all-rounder using naive Bayesian classification model. The relevant data were collected from the websites, www.cricinfo.org and www.cricketnirvana.com. The key points of this study are as follows: The training sample is populated with 35 all-rounders who had performed in the first three seasons of IPL. Two variables, viz., strike rate (number of runs scored per 100 balls faced) and economy rate (average number of runs scored per over against the bowler) are used to classify the all-rounders as follows: Performer: An all-rounder with strike rate above median and economy rate below median. Batting All-rounder: An all-rounder with strike rate above median and economy rate above median. Bowling All-rounder: An all-rounder with strike rate below median and economy rate below median. Under-performer: An all-rounder with strike rate below median and economy rate above median. The step-wise multinomial logistic regression (SMLR) was used to identify the significant variables that are actually responsible for classification of the all-rounders. The strike rate in ODI, strike rate in Twenty-20, economy rate in ODI, economy rate in Twenty-20 and bowling type (Spin or Fast) of the all-rounders are found to be significant in determining the class of an all-rounder. The naive Bayesian classification model is used for forecasting the expected class of allrounders based on the significant predictors for six incumbent all-rounders who had played only in fourth season of IPL. The prediction done before IPL IV was then compared with the actual situation at the end of the tournament. It is found that four predictions were performed correctly out of the six. This model would be useful for the participating teams' management while deciding the bid of an all-rounder in the upcoming season of IPL as per their requirement.


Sign in / Sign up

Export Citation Format

Share Document