scholarly journals Evaluation of feature selection using information gain and gain ratio on bank marketing classification using naïve bayes

2021 ◽  
Vol 1918 (4) ◽  
pp. 042153
Author(s):  
B Prasetiyo ◽  
Alamsyah ◽  
M A Muslim ◽  
N Baroroh
2021 ◽  
Vol 5 (1) ◽  
pp. 332
Author(s):  
Kurniabudi Kurniabudi ◽  
Abdul Harris ◽  
Albertus Edward Mintaria

Large data dimensionality is one of the issues in anomaly detection. One approach used to overcome large data dimensions is feature selection. An effective feature selection technique will produce the most relevant features and can improve the classification algorithm to detect attacks. There have been many studies on feature selection techniques, each using different methods and strategies to find the best and relevant features. In this study, a comparison of Information Gain, Gain Ratio, CFs-BestFirst and CFs-PSO Search techniques was compared. The selection features of the four techniques were further validated by the Naive Bayes classification algorithm, k-NN and J48. This study uses the ISCX CICIDS-2017 dataset. Based on the test results the feature selection techniques affect the performance of the Naive Bayes algorithm, k-NN and J48. Increasingly relevant and important features can improve detection performance. The test results also show that the number of features influences the processing / computing time. CFs-BestFirst produces a smaller number of features compared to CFs-PSO Search, Information Gain and Gain Ratio so it requires lower processing time. In addition, k-NN requires a higher processing time than Naive Bayes and J48


2019 ◽  
Vol 17 (1) ◽  
pp. 1
Author(s):  
Muqorobin Muqorobin ◽  
Kusrini Kusrini ◽  
Emha Taufiq Luthfi

The cost of education is one component of input that is very important in implementing education. Because costs are the main requirement in an effort to achieve educational goals. SMK Al-Islam Surakarta is a private education institution that requires students to pay school fees in the form of Education Development Donations. Educational Development Donation is a routine school fee that is conducted every month. Based on last year's TU report, many students were late in paying Education Development Donations, around 60%. This is a big problem. The purpose of this study is that researchers will build a predictive system using the Naïve Bayes method. Because the method can classify the class right or late, in the payment of school fees. Data processing was taken from the dapodik data of schools in 2017/2018 with the test dataset taking 30 records. To find out the level of accuracy, this research was conducted with the Naive Bayes Method and the Information Gain Method for feature selection. Accuracy testing is done by the Confusion Matrix method. The results showed that the highest accuracy was obtained by combining the Naive Bayes Method with the Information Gain Method obtained by 90% accuracy. 


Author(s):  
Oman Somantri ◽  
Dyah Apriliani

<p>Conducting an assessment of consumer sentiments taken from social media in assessing a culinary food gives useful information for everyone who wants to get this information especially for migrants and tourists, in th other hand that information is very valuable for food stall and restaurant owners as information in improvinf food quality. Overcoming this problem, a sentiment analysis classification model using naïve bayes algorithm (NB) was applied to get this information. This problem occurs is the level of accuracy of classification of consumer ratings of culinary food is still not optimal because the weight of values in the data preprocessing process are not optimal. In this paper proposed a hybrid feature selection models to overcome the problems in the process of selecting the feature attributes that have not been optimal by using a combination of information gain (IG) and genetic algorithm (GA) algorithms. The result of this research showed that after the experiment and compared to using others algorithms produce the best of the level occuracy is 93%.</p>


Author(s):  
I Guna Adi Socrates ◽  
Afrizal Laksita Akbar ◽  
Mohammad Sonhaji Akbar ◽  
Agus Zainal Arifin ◽  
Darlis Herumurti

Naïve Bayes is one of data mining methods that are commonly used in text-based document classification. The advantage of this method is a simple algorithm with low computation complexity. However, there is weaknesses on Naïve Bayes methods where independence of Naïve Bayes features can’t be always implemented that would affect the accuracy of the calculation. Therefore, Naïve Bayes methods need to be optimized by assigning weights using Gain Ratio on its features. However, assigning weights on Naïve Bayes’s features cause problems in calculating the probability of each document which is caused by there are many features in the document that not represent the tested class. Therefore, the weighting Naïve Bayes is still not optimal. This paper proposes optimization of Naïve Bayes method using weighted by Gain Ratio and feature selection method in the case of text classification. Results of this study pointed-out that Naïve Bayes optimization using feature selection and weighting produces accuracy of 94%.


2020 ◽  
Vol 4 (3) ◽  
pp. 486
Author(s):  
Bintang Peryoga ◽  
Adiwijaya Adiwijaya ◽  
Widi Astuti

Cancer is a deadly disease that is responsible for 9.6 million death in 2018 based on WHO data so early cancer detection is needed so can be treated immediately and cancer deaths can be reduced. Microarray is technology that can monitor and analyze the expression of cancer genes in microarray data but has high data dimension and small sample so dimensional reductions are needed for the optimal classification process. Dimension reduction can reduce the use of features for the classification process by selecting some influential features. Hybrid method is one dimension reduction by combining Filter method with Wrapper so it gets the both advantage. In this case, researchers combined Naïve Bayes with Hybrid Feature Selection (Information Gain - Genetic Algorithm) on cancer data for microarray Lung Cancer, Ovarian Cancer, Breast Cancer, Colon Tumors, and Prostate Tumors. These data were obtained from Kent-Ridge Biomedical Dataset. The results showed that from 5 data used, 4 data obtained an accuracy between 87-100% while the prostate tumor data obtained the smallest accuracy of 61.14%. The implementation of the feature selection method and the classification of the 5 cancer data above only uses less than 63 features to obtain this accuracy


2020 ◽  
Vol 11 (1) ◽  
pp. 1
Author(s):  
Riska Wibowo ◽  
Henny Indriyawati

Abstract. Becoming one of the society health problems in the world, hepatitis is an inflammation liver disease caused by a virus, bacterial infection, chemical substances including drugs and alcohol. In this research, for the dataset of hepatitis having high dimensionality, its value for each attribute was calculated using weight information gain method. Then, the attributes were selected by using top-k methods and were classified by using Naïve Bayes Algorithm respectively. This research showed that 9 out of 20 attributes had chosen to be the highest top-9 with an accuracy rate of 85.57%. Later on, this research can be useful for a consideration in a decision making process for various subjects related to feature selection and Naïve Bayes Algorithm method and also for predicting hepatitis.Keywords: data mining, weight information gain, Naïve Bayes algorithmAbstrak. Penyakit hepatitis merupakan masalah kesehatan masyarakat di dunia. Penyakit hepatitis merupakan penyakit peradangan hati yang disebabkan oleh virus, infeksi bakteri, zat-zat kimia termasuk obat-obatan dan alkohol. Pada penelitian ini, dataset hepatitis yang memiliki data berdimensi tinggi akan dihitung nilai bobot dari masing-masing atribut menggunakan metode weight information gain. Setelah dihitung nilai bobot dilakukan pemilihan atribut, atribut yang dipilih menggunakan metode top-k. Kemudian dilakukan klasifikasi menggunakan algoritme Naïve Bayes. Hasil penelitian menunjukkan dari 20 atribut, terpilih top-9 tertinggi dengan nilai akurasi 85.57%. Dengan adanya penelitian ini dapat digunakan sebagai bahan pertimbangan dan pengambilan keputusan pada berbagai bidang yang berkaitan dengan metode feature selection, algoritme Naïve Bayes, dan di dalam memprediksi penyakit hepatitis.Kata Kunci: data mining, weight information gain, algoritma Naïve Bayes


Many fraud transactions exist in the online world that affects various financial institutions but Credit Card Fraud transaction is the most occurring problem in the world. Credit Card fraud is the situation in which fraudsters misuse credit cards for illegal purposes. Hence, detection of fraudulent transactions is essen-tial. Several researchers have worked on detecting fraud transactions and also provide solutions whose surveys are given in this paper. This study makes a major contribution to research on the detection of Credit Card fraud transactions through Machine Learning Algorithms suchas Decision Tree and Naive Bayes. The data have been selected from Kag-gle and categorize into training (80%) and testing (20%) data. The whole experiment was performed on the Jupyter Notebook tool for which the Anaconda Navigator has been installed. The Heatmap is used for visualization and colorfully represents the data. The main aim of this work is to balance the dataset with Near-Miss Under-sampling Method. The information gain method is applied for feature selection. The best algorithm founded in this paper is Decision Tree with 97% accuracy as compared to Naïve Bayes with 90%. The results are achieved based on Accuracy, Recall, Precision, and F1-score. We have also shown the ROC Curve and Precision-Recall Curve of the algorithm in this paper.


Customer Relationship Ma agement tends to analyze datasets to find insights about data which in turn helps to frame the business strategy for improvement of enterprises. Analyzing data in CRM requires high intensive models. Machine Learning (ML) algorithms help in analyzing such large dimensional datasets. In most real time datasets, the strong independence assumption of Naive Bayes (NB) between the attributes are violated and due to other various drawbacks in datasets like irrelevant data, partially irrelevant data and redundant data, it leads to poor performance of prediction. Feature selection is a preprocessing method applied, to enhance the predication of the NB model. Further, empirical experiments are conducted based on NB with Feature selection and NB without feature selection. In this paper, a empirical study of attribute selection is experimented for five dissimilar filter based feature selection such as Relief-F, Pearson correlation (PCC), Symmetrical Uncertainty (SU), Gain Ratio (GR) and Information Gain (IG).


2019 ◽  
Vol 15 (2) ◽  
pp. 211-218
Author(s):  
Bobby Suryo Prakoso ◽  
Didi Rosiyadi ◽  
Dedi Aridarma ◽  
Heru Sukma Utama ◽  
Fariz Fauzi ◽  
...  

Penelitian ini adalah tentang pengklasifikasian berita yang mengoptimalisasi dengan kombinasi antar algoritma. Tentang dataset yang digunakan diambil pada situs pemberitaan online. Algoritma yang digunakan adalah algoritma Naive Bayes Classifier, dan Random Forest dengan pembobotan seleksi fitur Information Gain. Dataset yang digunakan terdapat 615 dataset dengan 3 katagori atau tema berita. Dalam permodelan terdapat 6 model skenario sebagai pembanding untuk menentukan skenario mana yang mendapatkan nilai terbaik, berdasarkan hasil penelitian ini nilai terbaik didapatkan oleh model Remove Useless Attributes, Naive bayes Classifier-Multinomial, dan Random Forest-Feature Selection Information gain. Hasil evaluasi yang didapatkan adalah nilai accuracy 85.67%, nilai recall 85.67%, dan nilai precision 86.23


Sign in / Sign up

Export Citation Format

Share Document