Correlation Analysis in Classifiers

This chapter presents a new method to analyze the link between the probabilities produced by a classification model and the variation of its input values. The goal is to increase the predictive probability of a given class by exploring the possible values of the input variables taken independently. The proposed method is presented in a general framework, and then detailed for naive Bayesian classifiers. We also demonstrate the importance of “lever variables”, variables which can conceivably be acted upon to obtain specific results as represented by class probabilities, and consequently can be the target of specific policies. The application of the proposed method to several data sets shows that such an approach can lead to useful indicators.

Download Full-text

Building an Ensemble of Fine-Tuned Naive Bayesian Classifiers for Text Classification

Entropy ◽

10.3390/e20110857 ◽

2018 ◽

Vol 20 (11) ◽

pp. 857 ◽

Cited By ~ 6

Author(s):

Khalil El Hindi ◽

Hussien AlSalman ◽

Safwan Qasem ◽

Saad Al Ahmadi

Keyword(s):

Text Classification ◽

Learning Algorithm ◽

Fine Tuning ◽

Data Sets ◽

Bayesian Classifiers ◽

Stable Algorithm ◽

Naive Bayesian ◽

Ensembles Of Classifiers ◽

Remarkable Improvement ◽

Naïve Bayesian

Text classification is one domain in which the naive Bayesian (NB) learning algorithm performs remarkably well. However, making further improvement in performance using ensemble-building techniques proved to be a challenge because NB is a stable algorithm. This work shows that, while an ensemble of NB classifiers achieves little or no improvement in terms of classification accuracy, an ensemble of fine-tuned NB classifiers can achieve a remarkable improvement in accuracy. We propose a fine-tuning algorithm for text classification that is both more accurate and less stable than the NB algorithm and the fine-tuning NB (FTNB) algorithm. This improvement makes it more suitable than the FTNB algorithm for building ensembles of classifiers using bagging. Our empirical experiments, using 16-benchmark text-classification data sets, show significant improvement for most data sets.

Download Full-text

Alternative prior assumptions for improving the performance of naïve Bayesian classifiers

Data Mining and Knowledge Discovery ◽

10.1007/s10618-008-0101-6 ◽

2008 ◽

Vol 18 (2) ◽

pp. 183-213 ◽

Cited By ~ 23

Author(s):

Tzu-Tsung Wong

Keyword(s):

Bayesian Classifiers ◽

Naive Bayesian ◽

Naïve Bayesian

Download Full-text

Enrichment of High-Throughput Screening Data with Increasing Levels of Noise Using Support Vector Machines, Recursive Partitioning, and Laplacian-Modified Naive Bayesian Classifiers

Journal of Chemical Information and Modeling ◽

10.1021/ci050374h ◽

2006 ◽

Vol 46 (1) ◽

pp. 193-200 ◽

Cited By ~ 85

Author(s):

Meir Glick ◽

Jeremy L. Jenkins ◽

James H. Nettles ◽

Hamilton Hitchings ◽

John W. Davies

Keyword(s):

Support Vector Machines ◽

High Throughput ◽

High Throughput Screening ◽

Recursive Partitioning ◽

Support Vector ◽

Bayesian Classifiers ◽

Naive Bayesian ◽

Naïve Bayesian ◽

Vector Machines

Download Full-text

AN INFORMATION-THEORETIC FILTER METHOD FOR FEATURE WEIGHTING IN NAIVE BAYES

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001414510070 ◽

2014 ◽

Vol 28 (05) ◽

pp. 1451007 ◽

Cited By ~ 2

Author(s):

CHANG-HWAN LEE

Keyword(s):

Data Mining ◽

Bayesian Learning ◽

State Of The Art ◽

Feature Weighting ◽

New Method ◽

Filter Method ◽

Information Theoretic ◽

Naive Bayesian ◽

Naïve Bayesian ◽

Unrealistic Assumption

In spite of its simplicity, naive Bayesian learning has been widely used in many data mining applications. However, the unrealistic assumption that all features are equally important negatively impacts the performance of naive Bayesian learning. In this paper, we propose a new method that uses a Kullback–Leibler measure to calculate the weights of the features analyzed in naive Bayesian learning. Its performance is compared to that of other state-of-the-art methods over a number of datasets.

Download Full-text