Locally-Adaptive Naïve Bayes Framework Design via Density-Based Clustering for Large Scale Datasets

Author(s):  
Faruk Bulut

In this chapter, local conditional probabilities of a query point are used in classification rather than consulting a generalized framework containing a conditional probability. In the proposed locally adaptive naïve Bayes (LANB) learning style, a certain amount of local instances, which are close the test point, construct an adaptive probability estimation. In the empirical studies of over the 53 benchmark UCI datasets, more accurate classification performance has been obtained. A total 8.2% increase in classification accuracy has been gained with LANB when compared to the conventional naïve Bayes model. The presented LANB method has outperformed according to the statistical paired t-test comparisons: 31 wins, 14 ties, and 8 losses of all UCI sets.

Entropy ◽  
2019 ◽  
Vol 21 (8) ◽  
pp. 721 ◽  
Author(s):  
YuGuang Long ◽  
LiMin Wang ◽  
MingHui Sun

Due to the simplicity and competitive classification performance of the naive Bayes (NB), researchers have proposed many approaches to improve NB by weakening its attribute independence assumption. Through the theoretical analysis of Kullback–Leibler divergence, the difference between NB and its variations lies in different orders of conditional mutual information represented by these augmenting edges in the tree-shaped network structure. In this paper, we propose to relax the independence assumption by further generalizing tree-augmented naive Bayes (TAN) from 1-dependence Bayesian network classifiers (BNC) to arbitrary k-dependence. Sub-models of TAN that are built to respectively represent specific conditional dependence relationships may “best match” the conditional probability distribution over the training data. Extensive experimental results reveal that the proposed algorithm achieves bias-variance trade-off and substantially better generalization performance than state-of-the-art classifiers such as logistic regression.


2014 ◽  
Vol 2014 ◽  
pp. 1-16 ◽  
Author(s):  
Qingchao Liu ◽  
Jian Lu ◽  
Shuyan Chen ◽  
Kangjia Zhao

This study presents the applicability of the Naïve Bayes classifier ensemble for traffic incident detection. The standard Naive Bayes (NB) has been applied to traffic incident detection and has achieved good results. However, the detection result of the practically implemented NB depends on the choice of the optimal threshold, which is determined mathematically by using Bayesian concepts in the incident-detection process. To avoid the burden of choosing the optimal threshold and tuning the parameters and, furthermore, to improve the limited classification performance of the NB and to enhance the detection performance, we propose an NB classifier ensemble for incident detection. In addition, we also propose to combine the Naïve Bayes and decision tree (NBTree) to detect incidents. In this paper, we discuss extensive experiments that were performed to evaluate the performances of three algorithms: standard NB, NB ensemble, and NBTree. The experimental results indicate that the performances of five rules of the NB classifier ensemble are significantly better than those of standard NB and slightly better than those of NBTree in terms of some indicators. More importantly, the performances of the NB classifier ensemble are very stable.


2012 ◽  
Vol 490-495 ◽  
pp. 460-464 ◽  
Author(s):  
Xiao Dan Zhu ◽  
Jin Song Su ◽  
Qing Feng Wu ◽  
Huai Lin Dong

Naive Bayes classification algorithm is an effective simple classification algorithm. Most researches in traditional Naive Bayes classification focus on the improvement of the classification algorithm, ignoring the selection of training data which has a great effect on the performance of classifier. And so a method is proposed to optimize the selection of training data in this paper. Adopting this method, the noisy instances in training data are eliminated by user-defined effectiveness threshold, improving the performance of classifier. Experimental results on large-scale data show that our approach significantly outperforms the baseline classifier.


Mathematics ◽  
2021 ◽  
Vol 9 (19) ◽  
pp. 2378
Author(s):  
Shengfeng Gan ◽  
Shiqi Shao ◽  
Long Chen ◽  
Liangjun Yu ◽  
Liangxiao Jiang

Due to its simplicity, efficiency, and effectiveness, multinomial naive Bayes (MNB) has been widely used for text classification. As in naive Bayes (NB), its assumption of the conditional independence of features is often violated and, therefore, reduces its classification performance. Of the numerous approaches to alleviating its assumption of the conditional independence of features, structure extension has attracted less attention from researchers. To the best of our knowledge, only structure-extended MNB (SEMNB) has been proposed so far. SEMNB averages all weighted super-parent one-dependence multinomial estimators; therefore, it is an ensemble learning model. In this paper, we propose a single model called hidden MNB (HMNB) by adapting the well-known hidden NB (HNB). HMNB creates a hidden parent for each feature, which synthesizes all the other qualified features’ influences. For HMNB to learn, we propose a simple but effective learning algorithm without incurring a high-computational-complexity structure-learning process. Our improved idea can also be used to improve complement NB (CNB) and the one-versus-all-but-one model (OVA), and the resulting models are simply denoted as HCNB and HOVA, respectively. The extensive experiments on eleven benchmark text classification datasets validate the effectiveness of HMNB, HCNB, and HOVA.


Author(s):  
Alaa Khudhair Abbas ◽  
Ali Khalil Salih ◽  
Harith A. Hussein ◽  
Qasim Mohammed Hussein ◽  
Saba Alaa Abdulwahhab

Twitter social media data generally uses ambiguous text that can cause difficulty in identifying positive or negative sentiments. There are more than one billion social media messages that need to be stored in a proper database and processed correctly to analyze them. In this paper, an ensemble majority vote classifier to enhance sentiment classification performance and accuracy is proposed. The proposed classification model is combined with four classifiers, using varying techniques—naive Bayes, decision trees, multilayer perceptron and logistic regression—to form a single ensemble classifier. In addition to these, a comparison is drawn among the four classifiers to evaluate the performance of the individual classifiers. The result shows that in terms of an individual classifier, the naive Bayes classifier is optimal as compared to the others. However, for comparing the proposed ensemble majority vote classifier with the four individual classifiers, the result illustrates that the performance of the proposed classifier is better than the independent one.


2018 ◽  
Vol 7 (3.4) ◽  
pp. 13
Author(s):  
Gourav Bathla ◽  
Himanshu Aggarwal ◽  
Rinkle Rani

Data mining is one of the most researched fields in computer science. Several researches have been carried out to extract and analyse important information from raw data. Traditional data mining algorithms like classification, clustering and statistical analysis can process small scale of data with great efficiency and accuracy. Social networking interactions, business transactions and other communications result in Big data. It is large scale of data which is not in competency for traditional data mining techniques. It is observed that traditional data mining algorithms are not capable for storage and processing of large scale of data. If some algorithms are capable, then response time is very high. Big data have hidden information, if that is analysed in intelligent manner can be highly beneficial for business organizations. In this paper, we have analysed the advancement from traditional data mining algorithms to Big data mining algorithms. Applications of traditional data mining algorithms can be straight forward incorporated in Big data mining algorithm. Several studies have analysed traditional data mining with Big data mining, but very few have analysed most important algortihsm within one research work, which is the core motive of our paper. Readers can easily observe the difference between these algorthithms with  pros and cons. Mathemtics concepts are applied in data mining algorithms. Means and Euclidean distance calculation in Kmeans, Vectors application and margin in SVM and Bayes therorem, conditional probability in Naïve Bayes algorithm are real examples.  Classification and clustering are the most important applications of data mining. In this paper, Kmeans, SVM and Naïve Bayes algorithms are analysed in detail to observe the accuracy and response time both on concept and empirical perspective. Hadoop, Mapreduce etc. Big data technologies are used for implementing Big data mining algorithms. Performace evaluation metrics like speedup, scaleup and response time are used to compare traditional mining with Big data mining.  


2013 ◽  
Vol 303-306 ◽  
pp. 1609-1612
Author(s):  
Huai Lin Dong ◽  
Xiao Dan Zhu ◽  
Qing Feng Wu ◽  
Juan Juan Huang

Naïve Bayes classification algorithm based on validity (NBCABV) optimizes the training data by eliminating the noise samples of training data with validity to improve the effect of classification, while it ignores the associations of properties. In consideration of the associations of properties, an improved method that is classification algorithm for Naïve Bayes based on validity and correlation (CANBBVC) is proposed to delete more noise samples with validity and correlation, thus resulting in better classification performance. Experimental results show this model has higher classification accuracy comparing the one based on validity solely.


2018 ◽  
Vol 5 (7) ◽  
pp. 172108 ◽  
Author(s):  
Ling Xiao Li ◽  
Siti Soraya Abdul Rahman

Students are characterized according to their own distinct learning styles. Discovering students' learning style is significant in the educational system in order to provide adaptivity. Past researches have proposed various approaches to detect the students’ learning styles. Among all, the Bayesian network has emerged as a widely used method to automatically detect students' learning styles. On the other hand, tree augmented naive Bayesian network has the ability to improve the naive Bayesian network in terms of better classification accuracy. In this paper, we evaluate the performance of the tree augmented naive Bayesian in automatically detecting students’ learning style in the online learning environment. The experimental results are promising as the tree augmented naive Bayes network is shown to achieve higher detection accuracy when compared to the Bayesian network.


Sign in / Sign up

Export Citation Format

Share Document