Locally-Adaptive Naïve Bayes Framework Design via Density-Based Clustering for Large Scale Datasets

Handbook of Research on Machine Learning Techniques for Pattern Recognition and Information Security - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-3299-7.ch016 ◽

2021 ◽

pp. 278-292

Author(s):

Faruk Bulut

Keyword(s):

Learning Style ◽

Large Scale ◽

Naive Bayes ◽

Empirical Studies ◽

Classification Performance ◽

Naïve Bayes ◽

Test Point ◽

Framework Design ◽

Locally Adaptive ◽

Generalized Framework

In this chapter, local conditional probabilities of a query point are used in classification rather than consulting a generalized framework containing a conditional probability. In the proposed locally adaptive naïve Bayes (LANB) learning style, a certain amount of local instances, which are close the test point, construct an adaptive probability estimation. In the empirical studies of over the 53 benchmark UCI datasets, more accurate classification performance has been obtained. A total 8.2% increase in classification accuracy has been gained with LANB when compared to the conventional naïve Bayes model. The presented LANB method has outperformed according to the statistical paired t-test comparisons: 31 wins, 14 ties, and 8 losses of all UCI sets.

Download Full-text

Structure Extension of Tree-Augmented Naive Bayes

Entropy ◽

10.3390/e21080721 ◽

2019 ◽

Vol 21 (8) ◽

pp. 721 ◽

Cited By ~ 1

Author(s):

YuGuang Long ◽

LiMin Wang ◽

MingHui Sun

Keyword(s):

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Training Data ◽

Conditional Probability Distribution ◽

Independence Assumption ◽

Bayesian Network Classifiers ◽

Leibler Divergence ◽

The Difference ◽

Structure Extension

Due to the simplicity and competitive classification performance of the naive Bayes (NB), researchers have proposed many approaches to improve NB by weakening its attribute independence assumption. Through the theoretical analysis of Kullback–Leibler divergence, the difference between NB and its variations lies in different orders of conditional mutual information represented by these augmenting edges in the tree-shaped network structure. In this paper, we propose to relax the independence assumption by further generalizing tree-augmented naive Bayes (TAN) from 1-dependence Bayesian network classifiers (BNC) to arbitrary k-dependence. Sub-models of TAN that are built to respectively represent specific conditional dependence relationships may “best match” the conditional probability distribution over the training data. Extensive experimental results reveal that the proposed algorithm achieves bias-variance trade-off and substantially better generalization performance than state-of-the-art classifiers such as logistic regression.

Download Full-text

Multiple Naïve Bayes Classifiers Ensemble for Traffic Incident Detection

Mathematical Problems in Engineering ◽

10.1155/2014/383671 ◽

2014 ◽

Vol 2014 ◽

pp. 1-16 ◽

Cited By ~ 7

Author(s):

Qingchao Liu ◽

Jian Lu ◽

Shuyan Chen ◽

Kangjia Zhao

Keyword(s):

Decision Tree ◽

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Classifier Ensemble ◽

Optimal Threshold ◽

Incident Detection ◽

Bayes Classifier ◽

Traffic Incident ◽

Better Than

This study presents the applicability of the Naïve Bayes classifier ensemble for traffic incident detection. The standard Naive Bayes (NB) has been applied to traffic incident detection and has achieved good results. However, the detection result of the practically implemented NB depends on the choice of the optimal threshold, which is determined mathematically by using Bayesian concepts in the incident-detection process. To avoid the burden of choosing the optimal threshold and tuning the parameters and, furthermore, to improve the limited classification performance of the NB and to enhance the detection performance, we propose an NB classifier ensemble for incident detection. In addition, we also propose to combine the Naïve Bayes and decision tree (NBTree) to detect incidents. In this paper, we discuss extensive experiments that were performed to evaluate the performances of three algorithms: standard NB, NB ensemble, and NBTree. The experimental results indicate that the performances of five rules of the NB classifier ensemble are significantly better than those of standard NB and slightly better than those of NBTree in terms of some indicators. More importantly, the performances of the NB classifier ensemble are very stable.

Download Full-text

An Improved Naive Bayes Classifier for Large Scale Text

Proceedings of the 2018 2nd International Conference on Artificial Intelligence: Technologies and Applications (ICAITA 2018) ◽

10.2991/icaita-18.2018.9 ◽

2018 ◽

Cited By ~ 1

Author(s):

Huaixin Chen ◽

Daocai Fu

Keyword(s):

Large Scale ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Download Full-text

Large Scale Text Classification Using Map Reduce and Naive Bayes Algorithm for Domain Specified Ontology Building

2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics ◽

10.1109/ihmsc.2015.24 ◽

2015 ◽

Cited By ~ 5

Author(s):

Joan Santoso ◽

Eko Mulyanto Yuniarno ◽

Mochamad Hariadi

Keyword(s):

Text Classification ◽

Large Scale ◽

Naive Bayes ◽

Naïve Bayes ◽

Map Reduce ◽

Ontology Building ◽

Bayes Algorithm

Download Full-text

Naive Bayes Classification Algorithm Based on Optimized Training Data

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.490-495.460 ◽

2012 ◽

Vol 490-495 ◽

pp. 460-464 ◽

Cited By ~ 2

Author(s):

Xiao Dan Zhu ◽

Jin Song Su ◽

Qing Feng Wu ◽

Huai Lin Dong

Keyword(s):

Large Scale ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Algorithm ◽

Training Data ◽

Large Scale Data ◽

Naive Bayes Classification ◽

Naïve Bayes Classification ◽

Effectiveness Threshold ◽

Selection Of

Naive Bayes classification algorithm is an effective simple classification algorithm. Most researches in traditional Naive Bayes classification focus on the improvement of the classification algorithm, ignoring the selection of training data which has a great effect on the performance of classifier. And so a method is proposed to optimize the selection of training data in this paper. Adopting this method, the noisy instances in training data are eliminated by user-defined effectiveness threshold, improving the performance of classifier. Experimental results on large-scale data show that our approach significantly outperforms the baseline classifier.

Download Full-text

Adapting Hidden Naive Bayes for Text Classification

Mathematics ◽

10.3390/math9192378 ◽

2021 ◽

Vol 9 (19) ◽

pp. 2378

Author(s):

Shengfeng Gan ◽

Shiqi Shao ◽

Long Chen ◽

Liangjun Yu ◽

Liangxiao Jiang

Keyword(s):

Text Classification ◽

Conditional Independence ◽

Structure Learning ◽

Naive Bayes ◽

Learning Algorithm ◽

Classification Performance ◽

Naïve Bayes ◽

Efficiency And Effectiveness ◽

The One ◽

Structure Extension

Due to its simplicity, efficiency, and effectiveness, multinomial naive Bayes (MNB) has been widely used for text classification. As in naive Bayes (NB), its assumption of the conditional independence of features is often violated and, therefore, reduces its classification performance. Of the numerous approaches to alleviating its assumption of the conditional independence of features, structure extension has attracted less attention from researchers. To the best of our knowledge, only structure-extended MNB (SEMNB) has been proposed so far. SEMNB averages all weighted super-parent one-dependence multinomial estimators; therefore, it is an ensemble learning model. In this paper, we propose a single model called hidden MNB (HMNB) by adapting the well-known hidden NB (HNB). HMNB creates a hidden parent for each feature, which synthesizes all the other qualified features’ influences. For HMNB to learn, we propose a simple but effective learning algorithm without incurring a high-computational-complexity structure-learning process. Our improved idea can also be used to improve complement NB (CNB) and the one-versus-all-but-one model (OVA), and the resulting models are simply denoted as HCNB and HOVA, respectively. The extensive experiments on eleven benchmark text classification datasets validate the effectiveness of HMNB, HCNB, and HOVA.

Download Full-text

Twitter Sentiment Analysis Using an Ensemble Majority Vote Classifier

Journal of Southwest Jiaotong University ◽

10.35741/issn.0258-2724.55.1.9 ◽

2020 ◽

Vol 55 (1) ◽

Cited By ~ 2

Author(s):

Alaa Khudhair Abbas ◽

Ali Khalil Salih ◽

Harith A. Hussein ◽

Qasim Mohammed Hussein ◽

Saba Alaa Abdulwahhab

Keyword(s):

Social Media ◽

Naive Bayes ◽

Majority Vote ◽

Ensemble Classifier ◽

Classification Performance ◽

Naïve Bayes ◽

Classification Model ◽

Media Messages ◽

Individual Classifier ◽

The Individual

Twitter social media data generally uses ambiguous text that can cause difficulty in identifying positive or negative sentiments. There are more than one billion social media messages that need to be stored in a proper database and processed correctly to analyze them. In this paper, an ensemble majority vote classifier to enhance sentiment classification performance and accuracy is proposed. The proposed classification model is combined with four classifiers, using varying techniques—naive Bayes, decision trees, multilayer perceptron and logistic regression—to form a single ensemble classifier. In addition to these, a comparison is drawn among the four classifiers to evaluate the performance of the individual classifiers. The result shows that in terms of an individual classifier, the naive Bayes classifier is optimal as compared to the others. However, for comparing the proposed ensemble majority vote classifier with the four individual classifiers, the result illustrates that the performance of the proposed classifier is better than the independent one.

Download Full-text

Migrating From Data Mining to Big Data Mining

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.4.14667 ◽

2018 ◽

Vol 7 (3.4) ◽

pp. 13

Author(s):

Gourav Bathla ◽

Himanshu Aggarwal ◽

Rinkle Rani

Keyword(s):

Data Mining ◽

Big Data ◽

Response Time ◽

Large Scale ◽

Naive Bayes ◽

Naïve Bayes ◽

Data Mining Algorithm ◽

Big Data Mining ◽

Data Mining Algorithms ◽

Mining Algorithms

Data mining is one of the most researched fields in computer science. Several researches have been carried out to extract and analyse important information from raw data. Traditional data mining algorithms like classification, clustering and statistical analysis can process small scale of data with great efficiency and accuracy. Social networking interactions, business transactions and other communications result in Big data. It is large scale of data which is not in competency for traditional data mining techniques. It is observed that traditional data mining algorithms are not capable for storage and processing of large scale of data. If some algorithms are capable, then response time is very high. Big data have hidden information, if that is analysed in intelligent manner can be highly beneficial for business organizations. In this paper, we have analysed the advancement from traditional data mining algorithms to Big data mining algorithms. Applications of traditional data mining algorithms can be straight forward incorporated in Big data mining algorithm. Several studies have analysed traditional data mining with Big data mining, but very few have analysed most important algortihsm within one research work, which is the core motive of our paper. Readers can easily observe the difference between these algorthithms with pros and cons. Mathemtics concepts are applied in data mining algorithms. Means and Euclidean distance calculation in Kmeans, Vectors application and margin in SVM and Bayes therorem, conditional probability in Naïve Bayes algorithm are real examples. Classification and clustering are the most important applications of data mining. In this paper, Kmeans, SVM and Naïve Bayes algorithms are analysed in detail to observe the accuracy and response time both on concept and empirical perspective. Hadoop, Mapreduce etc. Big data technologies are used for implementing Big data mining algorithms. Performace evaluation metrics like speedup, scaleup and response time are used to compare traditional mining with Big data mining.

Download Full-text

Classification Algorithm for Naïve Bayes Based on Validity and Correlation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.303-306.1609 ◽

2013 ◽

Vol 303-306 ◽

pp. 1609-1612

Author(s):

Huai Lin Dong ◽

Xiao Dan Zhu ◽

Qing Feng Wu ◽

Juan Juan Huang

Keyword(s):

Classification Accuracy ◽

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Classification Algorithm ◽

Experimental Results ◽

Training Data ◽

Improved Method ◽

Naive Bayes Classification ◽

The One

Naïve Bayes classification algorithm based on validity (NBCABV) optimizes the training data by eliminating the noise samples of training data with validity to improve the effect of classification, while it ignores the associations of properties. In consideration of the associations of properties, an improved method that is classification algorithm for Naïve Bayes based on validity and correlation (CANBBVC) is proposed to delete more noise samples with validity and correlation, thus resulting in better classification performance. Experimental results show this model has higher classification accuracy comparing the one based on validity solely.

Download Full-text

Students' learning style detection using tree augmented naive Bayes

Royal Society Open Science ◽

10.1098/rsos.172108 ◽

2018 ◽

Vol 5 (7) ◽

pp. 172108 ◽

Cited By ~ 7

Author(s):

Ling Xiao Li ◽

Siti Soraya Abdul Rahman

Keyword(s):

Learning Styles ◽

Bayesian Network ◽

Learning Style ◽

Naive Bayes ◽

Naïve Bayes ◽

Online Learning Environment ◽

Detection Accuracy ◽

Naive Bayesian ◽

Bayes Network ◽

Naïve Bayesian

Students are characterized according to their own distinct learning styles. Discovering students' learning style is significant in the educational system in order to provide adaptivity. Past researches have proposed various approaches to detect the students’ learning styles. Among all, the Bayesian network has emerged as a widely used method to automatically detect students' learning styles. On the other hand, tree augmented naive Bayesian network has the ability to improve the naive Bayesian network in terms of better classification accuracy. In this paper, we evaluate the performance of the tree augmented naive Bayesian in automatically detecting students’ learning style in the online learning environment. The experimental results are promising as the tree augmented naive Bayes network is shown to achieve higher detection accuracy when compared to the Bayesian network.

Download Full-text