A Novel Hybrid Approach: Instance Weighted Hidden Naive Bayes

Naive Bayes (NB) is easy to construct but surprisingly effective, and it is one of the top ten classification algorithms in data mining. The conditional independence assumption of NB ignores the dependency between attributes, so its probability estimates are often suboptimal. Hidden naive Bayes (HNB) adds a hidden parent to each attribute, which can reflect dependencies from all the other attributes. Compared with other Bayesian network algorithms, it offers significant improvements in classification performance and avoids structure learning. However, the assumption that HNB regards each instance equivalent in terms of probability estimation is not always true in real-world applications. In order to reflect different influences of different instances in HNB, the HNB model is modified into the improved HNB model. The novel hybrid approach called instance weighted hidden naive Bayes (IWHNB) is proposed in this paper. IWHNB combines instance weighting with the improved HNB model into one uniform framework. Instance weights are incorporated into the improved HNB model to calculate probability estimates in IWHNB. Extensive experimental results show that IWHNB obtains significant improvements in classification performance compared with NB, HNB and other state-of-the-art competitors. Meanwhile, IWHNB maintains the low time complexity that characterizes HNB.

Download Full-text

Adapting Hidden Naive Bayes for Text Classification

Mathematics ◽

10.3390/math9192378 ◽

2021 ◽

Vol 9 (19) ◽

pp. 2378

Author(s):

Shengfeng Gan ◽

Shiqi Shao ◽

Long Chen ◽

Liangjun Yu ◽

Liangxiao Jiang

Keyword(s):

Text Classification ◽

Conditional Independence ◽

Structure Learning ◽

Naive Bayes ◽

Learning Algorithm ◽

Classification Performance ◽

Naïve Bayes ◽

Efficiency And Effectiveness ◽

The One ◽

Structure Extension

Due to its simplicity, efficiency, and effectiveness, multinomial naive Bayes (MNB) has been widely used for text classification. As in naive Bayes (NB), its assumption of the conditional independence of features is often violated and, therefore, reduces its classification performance. Of the numerous approaches to alleviating its assumption of the conditional independence of features, structure extension has attracted less attention from researchers. To the best of our knowledge, only structure-extended MNB (SEMNB) has been proposed so far. SEMNB averages all weighted super-parent one-dependence multinomial estimators; therefore, it is an ensemble learning model. In this paper, we propose a single model called hidden MNB (HMNB) by adapting the well-known hidden NB (HNB). HMNB creates a hidden parent for each feature, which synthesizes all the other qualified features’ influences. For HMNB to learn, we propose a simple but effective learning algorithm without incurring a high-computational-complexity structure-learning process. Our improved idea can also be used to improve complement NB (CNB) and the one-versus-all-but-one model (OVA), and the resulting models are simply denoted as HCNB and HOVA, respectively. The extensive experiments on eleven benchmark text classification datasets validate the effectiveness of HMNB, HCNB, and HOVA.

Download Full-text

Assessment of Chronic Kidney Disease using Classification Algorithms

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7847.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 10385-10389

Keyword(s):

Kidney Disease ◽

Naive Bayes ◽

Early Stage ◽

Classification Performance ◽

Naïve Bayes ◽

Conditional Independence Assumption ◽

Probabilistic Classifier ◽

Accuracy Difference ◽

Stage Renal Disease ◽

End Stage

Kidney Disease (CKD) implies the condition of kidney risk which may even get worse by time and by referring the factors. If it continues to get worse Dialysis is done and worstcase scenario it may lead to kidney failure (End-Stage Renal Disease). Detection of CKD in an early stage could help in sorting out the complications and damage. In the previous work classification used are SVM and Naïve Bayes, it resulted that the execution time took by Naïve Bayes is minimal compared to SVM, incorrect instances are less for SVM that results in less classification performance of Naïve Bayes, because of slight accuracy difference. It can be rectified by taking a smaller number of attributes. Naïve Bayes is a probabilistic classifier a simple computation by applying Bayes Theorem with a conditional independence assumption. The work mainly results in increasing diagnostic accuracy and decrease diagnosis time, that is the main aim. An attempt is made to develop a model evaluating CKD data gathered from a particular set of people. From the model data, identification can be done. This work has engrossed on developing a system based on classification methods: SVM, Naïve Bayes, KNN.

Download Full-text

DISCRIMINATIVELY WEIGHTED NAIVE BAYES AND ITS APPLICATION IN TEXT CLASSIFICATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213011004770 ◽

2012 ◽

Vol 21 (01) ◽

pp. 1250007 ◽

Cited By ~ 33

Author(s):

LIANGXIAO JIANG ◽

DIANHONG WANG ◽

ZHIHUA CAI

Keyword(s):

Naive Bayes ◽

State Of The Art ◽

Naïve Bayes ◽

Experimental Results ◽

Data Sets ◽

Conditional Independence Assumption ◽

Instance Weighting ◽

Text Classifiers ◽

Bayes Algorithm ◽

The One

Many approaches are proposed to improve naive Bayes by weakening its conditional independence assumption. In this paper, we work on the approach of instance weighting and propose an improved naive Bayes algorithm by discriminative instance weighting. We called it Discriminatively Weighted Naive Bayes. In each iteration of it, different training instances are discriminatively assigned different weights according to the estimated conditional probability loss. The experimental results based on a large number of UCI data sets validate its effectiveness in terms of the classification accuracy and AUC. Besides, the experimental results on the running time show that our Discriminatively Weighted Naive Bayes performs almost as efficiently as the state-of-the-art Discriminative Frequency Estimate learning method, and significantly more efficient than Boosted Naive Bayes. At last, we apply the idea of discriminatively weighted learning in our algorithm to some state-of-the-art naive Bayes text classifiers, such as multinomial naive Bayes, complement naive Bayes and the one-versus-all-but-one model, and have achieved remarkable improvements.

Download Full-text

EXPLORING CONDITIONS FOR THE OPTIMALITY OF NAÏVE BAYES

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001405003983 ◽

2005 ◽

Vol 19 (02) ◽

pp. 183-198 ◽

Cited By ~ 85

Author(s):

HARRY ZHANG

Keyword(s):

Conditional Independence ◽

Naive Bayes ◽

Inductive Learning ◽

Learning Algorithms ◽

Classification Performance ◽

Naïve Bayes ◽

Necessary Condition ◽

Conditional Independence Assumption ◽

Good Classification Performance ◽

Open Question

Naïve Bayes is one of the most efficient and effective inductive learning algorithms for machine learning and data mining. Its competitive performance in classification is surprising, because the conditional independence assumption on which it is based is rarely true in real-world applications. An open question is: what is the true reason for the surprisingly good performance of Naïve Bayes in classification? In this paper, we propose a novel explanation for the good classification performance of Naïve Bayes. We show that, essentially, dependence distribution plays a crucial role. Here dependence distribution means how the local dependence of an attribute distributes in each class, evenly or unevenly, and how the local dependences of all attributes work together, consistently (supporting a certain classification) or inconsistently (canceling each other out). Specifically, we show that no matter how strong the dependences among attributes are, Naïve Bayes can still be optimal if the dependences distribute evenly in classes, or if the dependences cancel each other out. We propose and prove a sufficient and necessary condition for the optimality of Naïve Bayes. Further, we investigate the optimality of Naïve Bayes under the Gaussian distribution. We present and prove a sufficient condition for the optimality of Naïve Bayes, in which the dependences among attributes exist. This provides evidence that dependences may cancel each other out. Our theoretic analysis can be used in designing learning algorithms. In fact, a major class of learning algorithms for Bayesian networks are conditional independence-based (or CI-based), which are essentially based on dependence. We design a dependence distribution-based algorithm by extending the ChowLiu algorithm, a widely used CI based algorithm. Our experiments show that the new algorithm outperforms the ChowLiu algorithm, which also provides empirical evidence to support our new explanation.

Download Full-text

Structure Extension of Tree-Augmented Naive Bayes

Entropy ◽

10.3390/e21080721 ◽

2019 ◽

Vol 21 (8) ◽

pp. 721 ◽

Cited By ~ 1

Author(s):

YuGuang Long ◽

LiMin Wang ◽

MingHui Sun

Keyword(s):

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Training Data ◽

Conditional Probability Distribution ◽

Independence Assumption ◽

Bayesian Network Classifiers ◽

Leibler Divergence ◽

The Difference ◽

Structure Extension

Due to the simplicity and competitive classification performance of the naive Bayes (NB), researchers have proposed many approaches to improve NB by weakening its attribute independence assumption. Through the theoretical analysis of Kullback–Leibler divergence, the difference between NB and its variations lies in different orders of conditional mutual information represented by these augmenting edges in the tree-shaped network structure. In this paper, we propose to relax the independence assumption by further generalizing tree-augmented naive Bayes (TAN) from 1-dependence Bayesian network classifiers (BNC) to arbitrary k-dependence. Sub-models of TAN that are built to respectively represent specific conditional dependence relationships may “best match” the conditional probability distribution over the training data. Extensive experimental results reveal that the proposed algorithm achieves bias-variance trade-off and substantially better generalization performance than state-of-the-art classifiers such as logistic regression.

Download Full-text

Multiple Naïve Bayes Classifiers Ensemble for Traffic Incident Detection

Mathematical Problems in Engineering ◽

10.1155/2014/383671 ◽

2014 ◽

Vol 2014 ◽

pp. 1-16 ◽

Cited By ~ 7

Author(s):

Qingchao Liu ◽

Jian Lu ◽

Shuyan Chen ◽

Kangjia Zhao

Keyword(s):

Decision Tree ◽

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Classifier Ensemble ◽

Optimal Threshold ◽

Incident Detection ◽

Bayes Classifier ◽

Traffic Incident ◽

Better Than

This study presents the applicability of the Naïve Bayes classifier ensemble for traffic incident detection. The standard Naive Bayes (NB) has been applied to traffic incident detection and has achieved good results. However, the detection result of the practically implemented NB depends on the choice of the optimal threshold, which is determined mathematically by using Bayesian concepts in the incident-detection process. To avoid the burden of choosing the optimal threshold and tuning the parameters and, furthermore, to improve the limited classification performance of the NB and to enhance the detection performance, we propose an NB classifier ensemble for incident detection. In addition, we also propose to combine the Naïve Bayes and decision tree (NBTree) to detect incidents. In this paper, we discuss extensive experiments that were performed to evaluate the performances of three algorithms: standard NB, NB ensemble, and NBTree. The experimental results indicate that the performances of five rules of the NB classifier ensemble are significantly better than those of standard NB and slightly better than those of NBTree in terms of some indicators. More importantly, the performances of the NB classifier ensemble are very stable.

Download Full-text

Locally-Adaptive Naïve Bayes Framework Design via Density-Based Clustering for Large Scale Datasets

Handbook of Research on Machine Learning Techniques for Pattern Recognition and Information Security - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-3299-7.ch016 ◽

2021 ◽

pp. 278-292

Author(s):

Faruk Bulut

Keyword(s):

Learning Style ◽

Large Scale ◽

Naive Bayes ◽

Empirical Studies ◽

Classification Performance ◽

Naïve Bayes ◽

Test Point ◽

Framework Design ◽

Locally Adaptive ◽

Generalized Framework

In this chapter, local conditional probabilities of a query point are used in classification rather than consulting a generalized framework containing a conditional probability. In the proposed locally adaptive naïve Bayes (LANB) learning style, a certain amount of local instances, which are close the test point, construct an adaptive probability estimation. In the empirical studies of over the 53 benchmark UCI datasets, more accurate classification performance has been obtained. A total 8.2% increase in classification accuracy has been gained with LANB when compared to the conventional naïve Bayes model. The presented LANB method has outperformed according to the statistical paired t-test comparisons: 31 wins, 14 ties, and 8 losses of all UCI sets.

Download Full-text

Visualization of Crime News Sentiment in Facebook

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.38.27616 ◽

2018 ◽

Vol 7 (4.38) ◽

pp. 955

Author(s):

M. Bakri C. Haron ◽

Siti Z. Z. Abidin ◽

N. Azmina M. Zamani ◽

. .

Keyword(s):

Data Processing ◽

Sentiment Analysis ◽

Naive Bayes ◽

Hybrid Approach ◽

Naïve Bayes ◽

Huge Amount ◽

Word Cloud ◽

Crime News ◽

News Sentiment

Facebook has become a popular platform in communicating information. People can express their opinions using texts, symbols, pictures and emoticons via Facebook posts and comments. These expressions allow sentiment analysis to be performed by collecting the data to obtain the public’s opinions and emotions toward certain issues. Due to a huge amount of data obtained from Facebook, proper approaches are required to cater the texts and symbols used in the comments. There are also limited amount of dictionary on Malay texts which make it more challenging to process and classify the positive and negative words used in the comments. Thus, hybrid approach is applied during the data processing to visualize the results. In this work, a combination of lexicon-based approach and Naïve Bayes are used. This study focuses on analyzing the public’s sentiments on crime news in Facebook by using word cloud visualization. The visualization displays important words used in a form of a word cloud. Moreover, the percentage of positive and negative words existed in the comments is also shown as part of the visualization results.

Download Full-text

Self-labeled Hidden Naive Bayes algorithm for semi-supervised classification

2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA) ◽

10.1109/iisa.2016.7785414 ◽

2016 ◽

Author(s):

Nikos Fazakis ◽

Stamatis Karlos ◽

Sotiris Kotsiantis ◽

Kyrgiakos Sgarbas

Keyword(s):

Supervised Classification ◽

Naive Bayes ◽

Naïve Bayes ◽

Bayes Algorithm ◽

Hidden Naive Bayes

Download Full-text

Twitter Sentiment Analysis Using an Ensemble Majority Vote Classifier

Journal of Southwest Jiaotong University ◽

10.35741/issn.0258-2724.55.1.9 ◽

2020 ◽

Vol 55 (1) ◽

Cited By ~ 2

Author(s):

Alaa Khudhair Abbas ◽

Ali Khalil Salih ◽

Harith A. Hussein ◽

Qasim Mohammed Hussein ◽

Saba Alaa Abdulwahhab

Keyword(s):

Social Media ◽

Naive Bayes ◽

Majority Vote ◽

Ensemble Classifier ◽

Classification Performance ◽

Naïve Bayes ◽

Classification Model ◽

Media Messages ◽

Individual Classifier ◽

The Individual

Twitter social media data generally uses ambiguous text that can cause difficulty in identifying positive or negative sentiments. There are more than one billion social media messages that need to be stored in a proper database and processed correctly to analyze them. In this paper, an ensemble majority vote classifier to enhance sentiment classification performance and accuracy is proposed. The proposed classification model is combined with four classifiers, using varying techniques—naive Bayes, decision trees, multilayer perceptron and logistic regression—to form a single ensemble classifier. In addition to these, a comparison is drawn among the four classifiers to evaluate the performance of the individual classifiers. The result shows that in terms of an individual classifier, the naive Bayes classifier is optimal as compared to the others. However, for comparing the proposed ensemble majority vote classifier with the four individual classifiers, the result illustrates that the performance of the proposed classifier is better than the independent one.

Download Full-text