scholarly journals Efficient Heuristics for Structure Learning of k-Dependence Bayesian Classifier

Entropy ◽  
2018 ◽  
Vol 20 (12) ◽  
pp. 897 ◽  
Author(s):  
Yang Liu ◽  
Limin Wang ◽  
Minghui Sun

The rapid growth in data makes the quest for highly scalable learners a popular one. To achieve the trade-off between structure complexity and classification accuracy, the k-dependence Bayesian classifier (KDB) allows to represent different number of interdependencies for different data sizes. In this paper, we proposed two methods to improve the classification performance of KDB. Firstly, we use the minimal-redundancy-maximal-relevance analysis, which sorts the predictive features to identify redundant ones. Then, we propose an improved discriminative model selection to select an optimal sub-model by removing redundant features and arcs in the Bayesian network. Experimental results on 40 UCI datasets demonstrate that these two techniques are complementary and the proposed algorithm achieves competitive classification performance, and less classification time than other state-of-the-art Bayesian network classifiers like tree-augmented naive Bayes and averaged one-dependence estimators.

2021 ◽  
Vol 25 (3) ◽  
pp. 641-667
Author(s):  
Limin Wang ◽  
Sikai Qi ◽  
Yang Liu ◽  
Hua Lou ◽  
Xin Zuo

Bagging has attracted much attention due to its simple implementation and the popularity of bootstrapping. By learning diverse classifiers from resampled datasets and averaging the outcomes, bagging investigates the possibility of achieving substantial classification performance of the base classifier. Diversity has been recognized as a very important characteristic in bagging. This paper presents an efficient and effective bagging approach, that learns a set of independent Bayesian network classifiers (BNCs) from disjoint data subspaces. The number of bits needed to describe the data is measured in terms of log likelihood, and redundant edges are identified to optimize the topologies of the learned BNCs. Our extensive experimental evaluation on 54 publicly available datasets from the UCI machine learning repository reveals that the proposed algorithm achieves a competitive classification performance compared with state-of-the-art BNCs that use or do not use bagging procedures, such as tree-augmented naive Bayes (TAN), k-dependence Bayesian classifier (KDB), bagging NB or bagging TAN.


2021 ◽  
Vol 25 (1) ◽  
pp. 35-55
Author(s):  
Limin Wang ◽  
Peng Chen ◽  
Shenglei Chen ◽  
Minghui Sun

Bayesian network classifiers (BNCs) have proved their effectiveness and efficiency in the supervised learning framework. Numerous variations of conditional independence assumption have been proposed to address the issue of NP-hard structure learning of BNC. However, researchers focus on identifying conditional dependence rather than conditional independence, and information-theoretic criteria cannot identify the diversity in conditional (in)dependencies for different instances. In this paper, the maximum correlation criterion and minimum dependence criterion are introduced to sort attributes and identify conditional independencies, respectively. The heuristic search strategy is applied to find possible global solution for achieving the trade-off between significant dependency relationships and independence assumption. Our extensive experimental evaluation on widely used benchmark data sets reveals that the proposed algorithm achieves competitive classification performance compared to state-of-the-art single model learners (e.g., TAN, KDB, KNN and SVM) and ensemble learners (e.g., ATAN and AODE).


Entropy ◽  
2019 ◽  
Vol 21 (7) ◽  
pp. 665
Author(s):  
Yang Zhang ◽  
Limin Wang ◽  
Zhiyi Duan ◽  
Minghui Sun

Direct dependencies and conditional dependencies in restricted Bayesian network classifiers (BNCs) are two basic kinds of dependencies. Traditional approaches, such as filter and wrapper, have proved to be beneficial to identify non-significant dependencies one by one, whereas the high computational overheads make them inefficient especially for those BNCs with high structural complexity. Study of the distributions of information-theoretic measures provides a feasible approach to identifying non-significant dependencies in batch that may help increase the structure reliability and avoid overfitting. In this paper, we investigate two extensions to the k-dependence Bayesian classifier, MI-based feature selection, and CMI-based dependence selection. These two techniques apply a novel adaptive thresholding method to filter out redundancy and can work jointly. Experimental results on 30 datasets from the UCI machine learning repository demonstrate that adaptive thresholds can help distinguish between dependencies and independencies and the proposed algorithm achieves competitive classification performance compared to several state-of-the-art BNCs in terms of 0–1 loss, root mean squared error, bias, and variance.


Entropy ◽  
2019 ◽  
Vol 21 (5) ◽  
pp. 489 ◽  
Author(s):  
Limin Wang ◽  
Yang Liu ◽  
Musa Mammadov ◽  
Minghui Sun ◽  
Sikai Qi

Over recent decades, the rapid growth in data makes ever more urgent the quest for highly scalable Bayesian networks that have better classification performance and expressivity (that is, capacity to respectively describe dependence relationships between attributes in different situations). To reduce the search space of possible attribute orders, k-dependence Bayesian classifier (KDB) simply applies mutual information to sort attributes. This sorting strategy is very efficient but it neglects the conditional dependencies between attributes and is sub-optimal. In this paper, we propose a novel sorting strategy and extend KDB from a single restricted network to unrestricted ensemble networks, i.e., unrestricted Bayesian classifier (UKDB), in terms of Markov blanket analysis and target learning. Target learning is a framework that takes each unlabeled testing instance P as a target and builds a specific Bayesian model Bayesian network classifiers (BNC) P to complement BNC T learned from training data T . UKDB respectively introduced UKDB P and UKDB T to flexibly describe the change in dependence relationships for different testing instances and the robust dependence relationships implicated in training data. They both use UKDB as the base classifier by applying the same learning strategy while modeling different parts of the data space, thus they are complementary in nature. The extensive experimental results on the Wisconsin breast cancer database for case study and other 10 datasets by involving classifiers with different structure complexities, such as Naive Bayes (0-dependence), Tree augmented Naive Bayes (1-dependence) and KDB (arbitrary k-dependence), prove the effectiveness and robustness of the proposed approach.


2015 ◽  
Vol 24 (04) ◽  
pp. 1550012
Author(s):  
Yanying Li ◽  
Youlong Yang ◽  
Wensheng Wang ◽  
Wenming Yang

It is well known that Bayesian network structure learning from data is an NP-hard problem. Learning a correct skeleton of a DAG is the foundation of dependency analysis algorithms for this problem. Considering the unreliability of the high order condition independence (CI) tests and the aim to improve the efficiency of a dependency analysis algorithm, the key steps are to use less number of CI tests and reduce the sizes of condition sets as many as possible. Based on these analyses and inspired by the algorithm HPC, we present an algorithm, named efficient hybrid parents and child (EHPC), for learning the adjacent neighbors of every variable. We proof the validity of the algorithm. Compared with state-of-the-art algorithms, the experimental results show that EHPC can handle large network and has better accuracy with fewer number of condition independence tests and smaller size of conditioning set.


Author(s):  
Sepehr Eghbali ◽  
Mohammad Hassan Zokaei Ashtiani ◽  
Majid Nili Ahmadabadi ◽  
Babak Nadjar Araabi

Author(s):  
Duc Truong Pham ◽  
Gonzalo A. Ruz

This paper presents a new approach to the unsupervised training of Bayesian network classifiers. Three models have been analysed: the Chow and Liu (CL) multinets; the tree-augmented naive Bayes; and a new model called the simple Bayesian network classifier, which is more robust in its structure learning. To perform the unsupervised training of these models, the classification maximum likelihood criterion is used. The maximization of this criterion is derived for each model under the classification expectation–maximization (EM) algorithm framework. To test the proposed unsupervised training approach, 10 well-known benchmark datasets have been used to measure their clustering performance. Also, for comparison, the results for the k -means and the EM algorithm, as well as those obtained when the three Bayesian network classifiers are trained in a supervised way, are analysed. A real-world image processing application is also presented, dealing with clustering of wood board images described by 165 attributes. Results show that the proposed learning method, in general, outperforms traditional clustering algorithms and, in the wood board image application, the CL multinets obtained a 12 per cent increase, on average, in clustering accuracy when compared with the k -means method and a 7 per cent increase, on average, when compared with the EM algorithm.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-14 ◽  
Author(s):  
Gonzalo A. Ruz ◽  
Pamela Araya-Díaz

Bayesian networks are useful machine learning techniques that are able to combine quantitative modeling, through probability theory, with qualitative modeling, through graph theory for visualization. We apply Bayesian network classifiers to the facial biotype classification problem, an important stage during orthodontic treatment planning. For this, we present adaptations of classical Bayesian networks classifiers to handle continuous attributes; also, we propose an incremental tree construction procedure for tree like Bayesian network classifiers. We evaluate the performance of the proposed adaptations and compare them with other continuous Bayesian network classifiers approaches as well as support vector machines. The results under the classification performance measures, accuracy and kappa, showed the effectiveness of the continuous Bayesian network classifiers, especially for the case when a reduced number of attributes were used. Additionally, the resulting networks allowed visualizing the probability relations amongst the attributes under this classification problem, a useful tool for decision-making for orthodontists.


Sign in / Sign up

Export Citation Format

Share Document