Discriminative Structure Learning of Bayesian Network Classifiers from Training Dataset and Testing Instance

Limin Wang; Yang Liu; Musa Mammadov; Minghui Sun; Sikai Qi

doi:10.3390/e21050489

Discriminative Structure Learning of Bayesian Network Classifiers from Training Dataset and Testing Instance

Entropy ◽

10.3390/e21050489 ◽

2019 ◽

Vol 21 (5) ◽

pp. 489 ◽

Cited By ~ 1

Author(s):

Limin Wang ◽

Yang Liu ◽

Musa Mammadov ◽

Minghui Sun ◽

Sikai Qi

Keyword(s):

Bayesian Network ◽

Learning Strategy ◽

Structure Learning ◽

Naive Bayes ◽

Search Space ◽

Naïve Bayes ◽

Bayesian Classifier ◽

Training Data ◽

Training Dataset ◽

Bayesian Network Classifiers

Over recent decades, the rapid growth in data makes ever more urgent the quest for highly scalable Bayesian networks that have better classification performance and expressivity (that is, capacity to respectively describe dependence relationships between attributes in different situations). To reduce the search space of possible attribute orders, k-dependence Bayesian classifier (KDB) simply applies mutual information to sort attributes. This sorting strategy is very efficient but it neglects the conditional dependencies between attributes and is sub-optimal. In this paper, we propose a novel sorting strategy and extend KDB from a single restricted network to unrestricted ensemble networks, i.e., unrestricted Bayesian classifier (UKDB), in terms of Markov blanket analysis and target learning. Target learning is a framework that takes each unlabeled testing instance P as a target and builds a specific Bayesian model Bayesian network classifiers (BNC) P to complement BNC T learned from training data T . UKDB respectively introduced UKDB P and UKDB T to flexibly describe the change in dependence relationships for different testing instances and the robust dependence relationships implicated in training data. They both use UKDB as the base classifier by applying the same learning strategy while modeling different parts of the data space, thus they are complementary in nature. The extensive experimental results on the Wisconsin breast cancer database for case study and other 10 datasets by involving classifiers with different structure complexities, such as Naive Bayes (0-dependence), Tree augmented Naive Bayes (1-dependence) and KDB (arbitrary k-dependence), prove the effectiveness and robustness of the proposed approach.

Download Full-text

Structure Extension of Tree-Augmented Naive Bayes

Entropy ◽

10.3390/e21080721 ◽

2019 ◽

Vol 21 (8) ◽

pp. 721 ◽

Cited By ~ 1

Author(s):

YuGuang Long ◽

LiMin Wang ◽

MingHui Sun

Keyword(s):

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Training Data ◽

Conditional Probability Distribution ◽

Independence Assumption ◽

Bayesian Network Classifiers ◽

Leibler Divergence ◽

The Difference ◽

Structure Extension

Due to the simplicity and competitive classification performance of the naive Bayes (NB), researchers have proposed many approaches to improve NB by weakening its attribute independence assumption. Through the theoretical analysis of Kullback–Leibler divergence, the difference between NB and its variations lies in different orders of conditional mutual information represented by these augmenting edges in the tree-shaped network structure. In this paper, we propose to relax the independence assumption by further generalizing tree-augmented naive Bayes (TAN) from 1-dependence Bayesian network classifiers (BNC) to arbitrary k-dependence. Sub-models of TAN that are built to respectively represent specific conditional dependence relationships may “best match” the conditional probability distribution over the training data. Extensive experimental results reveal that the proposed algorithm achieves bias-variance trade-off and substantially better generalization performance than state-of-the-art classifiers such as logistic regression.

Download Full-text

Efficient Heuristics for Structure Learning of k-Dependence Bayesian Classifier

Entropy ◽

10.3390/e20120897 ◽

2018 ◽

Vol 20 (12) ◽

pp. 897 ◽

Cited By ~ 4

Author(s):

Yang Liu ◽

Limin Wang ◽

Minghui Sun

Keyword(s):

Bayesian Network ◽

Structure Learning ◽

State Of The Art ◽

Classification Performance ◽

Bayesian Classifier ◽

Bayesian Network Classifiers ◽

Discriminative Model ◽

Minimal Redundancy ◽

Structure Complexity ◽

Maximal Relevance

The rapid growth in data makes the quest for highly scalable learners a popular one. To achieve the trade-off between structure complexity and classification accuracy, the k-dependence Bayesian classifier (KDB) allows to represent different number of interdependencies for different data sizes. In this paper, we proposed two methods to improve the classification performance of KDB. Firstly, we use the minimal-redundancy-maximal-relevance analysis, which sorts the predictive features to identify redundant ones. Then, we propose an improved discriminative model selection to select an optimal sub-model by removing redundant features and arcs in the Bayesian network. Experimental results on 40 UCI datasets demonstrate that these two techniques are complementary and the proposed algorithm achieves competitive classification performance, and less classification time than other state-of-the-art Bayesian network classifiers like tree-augmented naive Bayes and averaged one-dependence estimators.

Download Full-text

Discovering Alzheimer Genetic Biomarkers Using Bayesian Networks

Advances in Bioinformatics ◽

10.1155/2015/639367 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8 ◽

Cited By ~ 15

Author(s):

Fayroz F. Sherif ◽

Nourhan Zayed ◽

Mahmoud Fakhr

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Bayesian Network ◽

Structure Learning ◽

Naive Bayes ◽

Strong Association ◽

Naïve Bayes ◽

Nucleotide Polymorphisms ◽

Markov Blanket ◽

Genome Wide

Single nucleotide polymorphisms (SNPs) contribute most of the genetic variation to the human genome. SNPs associate with many complex and common diseases like Alzheimer’s disease (AD). Discovering SNP biomarkers at different loci can improve early diagnosis and treatment of these diseases. Bayesian network provides a comprehensible and modular framework for representing interactions between genes or single SNPs. Here, different Bayesian network structure learning algorithms have been applied in whole genome sequencing (WGS) data for detecting the causal AD SNPs and gene-SNP interactions. We focused on polymorphisms in the top ten genes associated with AD and identified by genome-wide association (GWA) studies. New SNP biomarkers were observed to be significantly associated with Alzheimer’s disease. These SNPs are rs7530069, rs113464261, rs114506298, rs73504429, rs7929589, rs76306710, and rs668134. The obtained results demonstrated the effectiveness of using BN for identifying AD causal SNPs with acceptable accuracy. The results guarantee that the SNP set detected by Markov blanket based methods has a strong association with AD disease and achieves better performance than both naïve Bayes and tree augmented naïve Bayes. Minimal augmented Markov blanket reaches accuracy of 66.13% and sensitivity of 88.87% versus 61.58% and 59.43% in naïve Bayes, respectively.

Download Full-text

A Novel Feature Selection Technique for Text Classification Using Naïve Bayes

International Scholarly Research Notices ◽

10.1155/2014/717092 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 20

Author(s):

Subhajit Dey Sarkar ◽

Saptarsi Goswami ◽

Aman Agarwal ◽

Javed Aktar

Keyword(s):

Feature Selection ◽

Text Classification ◽

Text Categorization ◽

Naive Bayes ◽

Feature Selection Method ◽

Search Space ◽

Selection Method ◽

Naïve Bayes ◽

Training Data ◽

Feature Selection Technique

With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.

Download Full-text

Peringkasan dan Support Vector Machine pada Klasifikasi Dokumen

JURNAL INFOTEL ◽

10.20895/infotel.v9i4.312 ◽

2017 ◽

Vol 9 (4) ◽

pp. 416 ◽

Cited By ~ 1

Author(s):

Nelly Indriani Widiastuti ◽

Ednawati Rainarli ◽

Kania Evita Dewi

Keyword(s):

Support Vector Machine ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Support Vector ◽

Good Reputation ◽

Multiclass Support Vector Machine ◽

Simple Logistic ◽

Better Than

Classification is the process of grouping objects that have the same features or characteristics into several classes. The automatic documents classification use words frequency that appears on training data as features. The large number of documents cause the number of words that appears as a feature will increase. Therefore, summaries are chosen to reduce the number of words that used in classification. The classification uses multiclass Support Vector Machine (SVM) method. SVM was considered to have a good reputation in the classification. This research tests the effect of summary as selection features into documents classification. The summaries reduce text into 50%. A result obtained that the summaries did not affect value accuracy of classification of documents that use SVM. But, summaries improve the accuracy of Simple Logistic Classifier. The classification testing shows that the accuracy of Naïve Bayes Multinomial (NBM) better than SVM

Download Full-text

COMPARISON OF NAIVE BAYES ALGORITHM AND C.45 ALGORITHM IN CLASSIFICATION OF POOR COMMUNITIES RECEIVING NON CASH FOOD ASSISTANCE IN WANASARI VILLAGE KARAWANG REGENCY

Jurnal Techno Nusa Mandiri ◽

10.33480/techno.v17i1.1191 ◽

2020 ◽

Vol 17 (1) ◽

pp. 37-42

Author(s):

Yuris Alkhalifi ◽

Ainun Zumarniansyah ◽

Rian Ardianto ◽

Nila Hardi ◽

Annisa Elfina Augustia

Keyword(s):

Decision Tree ◽

Naive Bayes ◽

Confusion Matrix ◽

Total Sample ◽

Naïve Bayes ◽

Food Assistance ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Non-Cash Food Assistance or Bantuan Pangan Non-Tunai (BPNT) is food assistance from the government given to the Beneficiary Family (KPM) every month through an electronic account mechanism that is used only to buy food at the Electronic Shop Mutual Assistance Joint Business Group Hope Family Program (e-Warong KUBE PKH ) or food traders working with Bank Himbara. In its distribution, BPNT still has problems that occur that are experienced by the village apparatus especially the apparatus of Desa Wanasari on making decisions, which ones are worthy of receiving (poor) and not worthy of receiving (not poor). So one way that helps in making decisions can be done through the concept of data mining. In this study, a comparison of 2 algorithms will be carried out namely Naive Bayes Classifier and Decision Tree C.45. The total sample used is as much as 200 head of household data which will then be divided into 2 parts into validation techniques is 90% training data and 10% test data of the total sample used then the proposed model is made in the RapidMiner application and then evaluated using the Confusion Matrix table to find out the highest level of accuracy from 2 of these methods. The results in this classification indicate that the level of accuracy in the Naive Bayes Classifier method is 98.89% and the accuracy level in the Decision Tree C.45 method is 95.00%. Then the conclusion that in this study the algorithm with the highest level of accuracy is the Naive Bayes Classifier algorithm method with a difference in the accuracy rate of 3.89%.

Download Full-text

Perbandingan Metode Klasifikasi Berita Hoaks Berbahasa Indonesia Berbasis Pembelajaran Mesin

Repositor ◽

10.22219/repositor.v2i5.692 ◽

2020 ◽

Vol 2 (5) ◽

pp. 675

Author(s):

Muhammad Athaillah ◽

Yufiz Azhar ◽

Yuda Munarko

Keyword(s):

Test Data ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Features Selection ◽

Comparison Of Algorithms ◽

Bayes Algorithm

AbstrakKlasifiaksi berita hoaks merupakan salah satu aplikasi kategorisasi teks. Berita hoaks harus diklasifikasikan karena berita hoaks dapat mempengaruhi tindakan dan pola pikir pembaca. Dalam proses klasifikasi pada penelitian ini menggunakan beberapa tahapan yaitu praproses, ekstraksi fitur, seleksi fitur dan klasifikasi. Penelitian ini bertujuan membandingkan dua algoritma yaitu algoritma Naïve Bayes dan Multinomial Naïve Bayes, manakah dari kedua algoritma tersebut yang lebih efektif dalam mengklasifikasikan berita hoaks. Data yang digunakan dalam penelitian ini berasal dari www.trunbackhoax.id untuk data berita hoaks sebanyak 100 artikel dan data berita non-hoaks berasal dari kompas.com, detik.com berjumlah 100 artikel. Data latih berjumlah 140 artikel dan data uji berjumlah 60 artikel. Hasil perbandingan algoritma Naïve Bayes memiliki nilai F1-score sebesar 0,93 dan nilai F1-score Multinomial Naïve Bayes sebesar 0,92. Abstarct Classification hoax news is one of text categorizations applications. Hoax news must be classified because the hoax news can influence the reader actions and thinking patterns. Classification process in this reseacrh uses several stages, namely preprocessing, features extraxtion, features selection and classification. This research to compare Naïve Bayes algorithm and Multinomial Naïve Bayes algorithm, which of the two algorithms is more effective on classifying hoax news. The data from this research from turnbackhoax.id as hoax news of 100 articles and non-hoax news from kompas.com, detik.com of 100 articles. Training data 140 articles dan test data 60 articles. The result of the comparison of algorithms Naïve Bayes has an F1-score value of 0,93 and Naïve Bayes has an F1-score value of 0,92.

Download Full-text

Machine Learning Readmission Risk Modeling: A Pediatric Case Study

BioMed Research International ◽

10.1155/2019/8532892 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9 ◽

Cited By ~ 3

Author(s):

Patricio Wolff ◽

Manuel Graña ◽

Sebastián A. Ríos ◽

Maria Begoña Yarza

Keyword(s):

Machine Learning ◽

Multilayer Perceptron ◽

Naive Bayes ◽

Class Imbalance ◽

Predictive Performance ◽

Naïve Bayes ◽

Distribution Model ◽

Training Dataset ◽

Support Vector ◽

Pediatric Hospital

Background. Hospital readmission prediction in pediatric hospitals has received little attention. Studies have focused on the readmission frequency analysis stratified by disease and demographic/geographic characteristics but there are no predictive modeling approaches, which may be useful to identify preventable readmissions that constitute a major portion of the cost attributed to readmissions.Objective. To assess the all-cause readmission predictive performance achieved by machine learning techniques in the emergency department of a pediatric hospital in Santiago, Chile.Materials. An all-cause admissions dataset has been collected along six consecutive years in a pediatric hospital in Santiago, Chile. The variables collected are the same used for the determination of the child’s treatment administrative cost.Methods. Retrospective predictive analysis of 30-day readmission was formulated as a binary classification problem. We report classification results achieved with various model building approaches after data curation and preprocessing for correction of class imbalance. We compute repeated cross-validation (RCV) with decreasing number of folders to assess performance and sensitivity to effect of imbalance in the test set and training set size.Results. Increase in recall due to SMOTE class imbalance correction is large and statistically significant. The Naive Bayes (NB) approach achieves the best AUC (0.65); however the shallow multilayer perceptron has the best PPV and f-score (5.6 and 10.2, resp.). The NB and support vector machines (SVM) give comparable results if we consider AUC, PPV, and f-score ranking for all RCV experiments. High recall of deep multilayer perceptron is due to high false positive ratio. There is no detectable effect of the number of folds in the RCV on the predictive performance of the algorithms.Conclusions. We recommend the use of Naive Bayes (NB) with Gaussian distribution model as the most robust modeling approach for pediatric readmission prediction, achieving the best results across all training dataset sizes. The results show that the approach could be applied to detect preventable readmissions.

Download Full-text

A New Bayesian Network Based on Gaussian Naive Bayes with Fuzzy Parameters for Training Assessment in Virtual Simulators

International Journal of Fuzzy Systems ◽

10.1007/s40815-020-00936-4 ◽

2020 ◽

Author(s):

Ronei M. Moraes ◽

Jodavid A. Ferreira ◽

Liliane S. Machado

Keyword(s):

Bayesian Network ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Assessment ◽

Fuzzy Parameters

Download Full-text

Implementation of Naive Bayes Classifier and Log Probabilistic for Book Classification Based on the Title

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.44.26968 ◽

2018 ◽

Vol 7 (4.44) ◽

pp. 131

Author(s):

Ridwan Rismanto ◽

Dimas Wahyu Wibowo ◽

Arie Rachmad Syulistyo

Keyword(s):

Web Application ◽

Teaching And Learning ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Dataset ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Testing Dataset ◽

Database Evaluation

Book is an important medium for teaching in higher education. It is facilitated by a library or a reading room which enabled student and teacher to fulfill their references for teaching and learning activities. For easy searching, each book classified by categories. In our institution, Information Technology Major of State Polytechnic of Malang, those categories are specifics to computer science topics. Every book entry need to be classified accordingly and to perform such task, one need to understand major keywords of the book title to correctly classify the books. The problem is, not all the librarian have such knowledge. Therefore manually classifying hundreds and even thousands of book is an exhausting work. This research is focused on automatic book classification based on its title using Naive Bayes Classifier and Log Probabilistic. The Log Probabilistic implementation is to solve the probability calculation result that is too small that cannot be represented in a computer programming floating points variable type. The algorithm then implemented in a web application using PHP and MySQL database. Evaluation has been done using Holdout method for 240 training dataset and 80 testing dataset resulting in 75% of accuration. We also tested the accuracy using K-fold Cross Validation resulting in 66.25% of accuration.

Download Full-text