Comparison of Naïve Bayes Algorithm and Decision Tree C4.5 for Hospital Readmission Diabetes Patients  using HbA1c Measurement

Diabetes is a metabolic disorder disease in which the pancreas does not produce enough insulin or the body cannot use insulin produced effectively. The HbA1c examination, which measures the average glucose level of patients during the last 2-3 months, has become an important step to determine the condition of diabetic patients. Knowledge of the patient's condition can help medical staff to predict the possibility of patient readmissions, namely the occurrence of a patient requiring hospitalization services back at the hospital. The ability to predict patient readmissions will ultimately help the hospital to calculate and manage the quality of patient care. This study compares the performance of the Naïve Bayes method and C4.5 Decision Tree in predicting readmissions of diabetic patients, especially patients who have undergone HbA1c examination. As part of this study we also compare the performance of the classification model from a number of scenarios involving a combination of preprocessing methods, namely Synthetic Minority Over-Sampling Technique (SMOTE) and Wrapper feature selection method, with both classification techniques. The scenario of C4.5 method combined with SMOTE and feature selection method produces the best performance in classifying readmissions of diabetic patients with an accuracy value of 82.74 %, precision value of 87.1 %, and recall value of 82.7 %.

Download Full-text

Research on Security Selection by Naive Bayes Classifier Based on a New Feature Selection Method

Advances in Applied Mathematics ◽

10.12677/aam.2019.81005 ◽

2019 ◽

Vol 08 (01) ◽

pp. 41-49

Author(s):

盼盼郭

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

Feature Selection Method ◽

Selection Method ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Security Selection ◽

New Feature

Download Full-text

Optimasi Naive Bayes Dengan Pemilihan Fitur Dan Pembobotan Gain Ratio

Lontar Komputer Jurnal Ilmiah Teknologi Informasi ◽

10.24843/lkjiti.2016.v07.i01.p03 ◽

2016 ◽

pp. 22

Author(s):

I Guna Adi Socrates ◽

Afrizal Laksita Akbar ◽

Mohammad Sonhaji Akbar ◽

Agus Zainal Arifin ◽

Darlis Herumurti

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

Feature Selection Method ◽

Simple Algorithm ◽

Selection Method ◽

Naïve Bayes ◽

Computation Complexity ◽

Bayes Method ◽

Gain Ratio ◽

Bayes Methods

Naïve Bayes is one of data mining methods that are commonly used in text-based document classification. The advantage of this method is a simple algorithm with low computation complexity. However, there is weaknesses on Naïve Bayes methods where independence of Naïve Bayes features can’t be always implemented that would affect the accuracy of the calculation. Therefore, Naïve Bayes methods need to be optimized by assigning weights using Gain Ratio on its features. However, assigning weights on Naïve Bayes’s features cause problems in calculating the probability of each document which is caused by there are many features in the document that not represent the tested class. Therefore, the weighting Naïve Bayes is still not optimal. This paper proposes optimization of Naïve Bayes method using weighted by Gain Ratio and feature selection method in the case of text classification. Results of this study pointed-out that Naïve Bayes optimization using feature selection and weighting produces accuracy of 94%.

Download Full-text

Hybrid Feature Selection Method Based on a Naïve Bayes Algorithm that Enhances the Learning Speed while Maintaining a Similar Error Rate in Cyber ISR

KSII Transactions on Internet and Information Systems ◽

10.3837/tiis.2018.12.005 ◽

2018 ◽

Vol 12 (12) ◽

Keyword(s):

Feature Selection ◽

Error Rate ◽

Naive Bayes ◽

Feature Selection Method ◽

Selection Method ◽

Naïve Bayes ◽

Learning Speed ◽

Bayes Algorithm

Download Full-text

A Hybrid Feature Selection Method based on IGSBFS and Naive Bayes for the Diagnosis of Erythemato - Squamous Diseases

International Journal of Computer Applications ◽

10.5120/5552-7623 ◽

2012 ◽

Vol 41 (7) ◽

pp. 13-18 ◽

Cited By ~ 9

Author(s):

S. Aruna ◽

L. V. Nandakishore ◽

S. P. Rajagopalan

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

Feature Selection Method ◽

Selection Method ◽

Naïve Bayes

Download Full-text

A Novel Feature Selection Technique for Text Classification Using Naïve Bayes

International Scholarly Research Notices ◽

10.1155/2014/717092 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 20

Author(s):

Subhajit Dey Sarkar ◽

Saptarsi Goswami ◽

Aman Agarwal ◽

Javed Aktar

Keyword(s):

Feature Selection ◽

Text Classification ◽

Text Categorization ◽

Naive Bayes ◽

Feature Selection Method ◽

Search Space ◽

Selection Method ◽

Naïve Bayes ◽

Training Data ◽

Feature Selection Technique

With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.

Download Full-text

Children’s Activity Classification for Domestic Risk Scenarios Using Environmental Sound and a Bayesian Network

Healthcare ◽

10.3390/healthcare9070884 ◽

2021 ◽

Vol 9 (7) ◽

pp. 884

Author(s):

Antonio García-Domínguez ◽

Carlos E. Galván-Tejada ◽

Ramón F. Brena ◽

Antonio A. Aguileta ◽

Jorge I. Galván-Tejada ◽

...

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Model ◽

Activity Classification ◽

Environmental Sound ◽

Non Invasive ◽

Akaike Criterion ◽

Data Source ◽

Feature Selection Techniques

Children’s healthcare is a relevant issue, especially the prevention of domestic accidents, since it has even been defined as a global health problem. Children’s activity classification generally uses sensors embedded in children’s clothing, which can lead to erroneous measurements for possible damage or mishandling. Having a non-invasive data source for a children’s activity classification model provides reliability to the monitoring system where it is applied. This work proposes the use of environmental sound as a data source for the generation of children’s activity classification models, implementing feature selection methods and classification techniques based on Bayesian networks, focused on the recognition of potentially triggering activities of domestic accidents, applicable in child monitoring systems. Two feature selection techniques were used: the Akaike criterion and genetic algorithms. Likewise, models were generated using three classifiers: naive Bayes, semi-naive Bayes and tree-augmented naive Bayes. The generated models, combining the methods of feature selection and the classifiers used, present accuracy of greater than 97% for most of them, with which we can conclude the efficiency of the proposal of the present work in the recognition of potentially detonating activities of domestic accidents.

Download Full-text

Perbandingan Optimasi Feature Selection pada Naïve Bayes untuk Klasifikasi Kepuasan Airline Passenger

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i3.3086 ◽

2021 ◽

Vol 5 (3) ◽

pp. 527-533

Author(s):

Yoga Religia ◽

Amali Amali

Keyword(s):

Feature Selection ◽

Customer Satisfaction ◽

Naive Bayes ◽

Naïve Bayes ◽

Point Of View ◽

Classification Model ◽

Passenger Satisfaction ◽

Airline Passenger ◽

Bayes Algorithm

The quality of an airline's services cannot be measured from the company's point of view, but must be seen from the point of view of customer satisfaction. Data mining techniques make it possible to predict airline customer satisfaction with a classification model. The Naïve Bayes algorithm has demonstrated outstanding classification accuracy, but currently independent assumptions are rarely discussed. Some literature suggests the use of attribute weighting to reduce independent assumptions, which can be done using particle swarm optimization (PSO) and genetic algorithm (GA) through feature selection. This study conducted a comparison of PSO and GA optimization on Naïve Bayes for the classification of Airline Passenger Satisfaction data taken from www.kaggle.com. After testing, the best performance is obtained from the model formed, namely the classification of Airline Passenger Satisfaction data using the Naïve Bayes algorithm with PSO optimization, where the accuracy value is 86.13%, the precision value is 87.90%, the recall value is 87.29%, and the value is AUC of 0.923.

Download Full-text

Computational Method for Classification of Avian Influenza A Virus Using DNA Sequence Information and Physicochemical Properties

Frontiers in Genetics ◽

10.3389/fgene.2021.599321 ◽

2021 ◽

Vol 12 ◽

Author(s):

Fahad Humayun ◽

Fatima Khan ◽

Nasim Fawad ◽

Shazia Shamas ◽

Sahar Fazal ◽

...

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Avian Influenza ◽

Physicochemical Properties ◽

Influenza A Virus ◽

Dna Sequence ◽

Influenza A ◽

Feature Selection Method ◽

Selection Method ◽

Avian Influenza A Virus

Accurate and fast characterization of the subtype sequences of Avian influenza A virus (AIAV) hemagglutinin (HA) and neuraminidase (NA) depends on expanding diagnostic services and is embedded in molecular epidemiological studies. A new approach for classifying the AIAV sequences of the HA and NA genes into subtypes using DNA sequence data and physicochemical properties is proposed. This method simply requires unaligned, full-length, or partial sequences of HA or NA DNA as input. It allows for quick and highly accurate assignments of HA sequences to subtypes H1–H16 and NA sequences to subtypes N1–N9. For feature extraction, k-gram, discrete wavelet transformation, and multivariate mutual information were used, and different classifiers were trained for prediction. Four different classifiers, Naïve Bayes, Support Vector Machine (SVM), K nearest neighbor (KNN), and Decision Tree, were compared using our feature selection method. This comparison is based on the 30% dataset separated from the original dataset for testing purposes. Among the four classifiers, Decision Tree was the best, and Precision, Recall, F1 score, and Accuracy were 0.9514, 0.9535, 0.9524, and 0.9571, respectively. Decision Tree had considerable improvements over the other three classifiers using our method. Results show that the proposed feature selection method, when trained with a Decision Tree classifier, gives the best results for accurate prediction of the AIAV subtype.

Download Full-text

Feature Selection Method for Hydraulic System Faults Diagnosis Based on GA-PLS

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.44-47.1130 ◽

2010 ◽

Vol 44-47 ◽

pp. 1130-1134

Author(s):

Sheng Li ◽

Pei Lin Zhang ◽

Bing Li

Keyword(s):

Feature Selection ◽

Hydraulic System ◽

Nearest Neighbor ◽

Feature Selection Method ◽

Original Data ◽

Selection Method ◽

Classification Model ◽

K Nearest Neighbor ◽

K Nearest Neighbor Algorithm ◽

Faults Diagnosis

Feature selection is a key step in hydraulic system fault diagnosis. Some of the collected features are unrelated to classification model, and some are high correlated to other features. These features are harmful for establishing classification model. In order to solve this problem, genetic algorithm-partial least squares (GA-PLS) is proposed for selecting the representative and optimal features. K nearest neighbor algorithm (KNN) is used for diagnosing and classifying hydraulic system faults. For expressing better performance of GA-PLS, the original data of a model engineering hydraulic system is used, and the results of GA-PLS are compared with all feature used and GA. The experimental results show that, the proposed feature method can diagnose and classify hydraulic system faults more efficiently with using fewer features.

Download Full-text