Evaluation of feature selection using information gain and gain ratio on bank marketing classification using naïve bayes

Large data dimensionality is one of the issues in anomaly detection. One approach used to overcome large data dimensions is feature selection. An effective feature selection technique will produce the most relevant features and can improve the classification algorithm to detect attacks. There have been many studies on feature selection techniques, each using different methods and strategies to find the best and relevant features. In this study, a comparison of Information Gain, Gain Ratio, CFs-BestFirst and CFs-PSO Search techniques was compared. The selection features of the four techniques were further validated by the Naive Bayes classification algorithm, k-NN and J48. This study uses the ISCX CICIDS-2017 dataset. Based on the test results the feature selection techniques affect the performance of the Naive Bayes algorithm, k-NN and J48. Increasingly relevant and important features can improve detection performance. The test results also show that the number of features influences the processing / computing time. CFs-BestFirst produces a smaller number of features compared to CFs-PSO Search, Information Gain and Gain Ratio so it requires lower processing time. In addition, k-NN requires a higher processing time than Naive Bayes and J48

Download Full-text

OPTIMASI METODE NAIVE BAYES DENGAN FEATURE SELECTION INFORMATION GAIN UNTUK PREDIKSI KETERLAMBATAN PEMBAYARAN SPP SEKOLAH

Jurnal Ilmiah SINUS ◽

10.30646/sinus.v17i1.378 ◽

2019 ◽

Vol 17 (1) ◽

pp. 1

Author(s):

Muqorobin Muqorobin ◽

Kusrini Kusrini ◽

Emha Taufiq Luthfi

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

Information Gain ◽

Confusion Matrix ◽

Education Institution ◽

Naïve Bayes ◽

Bayes Method ◽

Education Development ◽

Main Requirement ◽

Naive Bayes Method

The cost of education is one component of input that is very important in implementing education. Because costs are the main requirement in an effort to achieve educational goals. SMK Al-Islam Surakarta is a private education institution that requires students to pay school fees in the form of Education Development Donations. Educational Development Donation is a routine school fee that is conducted every month. Based on last year's TU report, many students were late in paying Education Development Donations, around 60%. This is a big problem. The purpose of this study is that researchers will build a predictive system using the Naïve Bayes method. Because the method can classify the class right or late, in the payment of school fees. Data processing was taken from the dapodik data of schools in 2017/2018 with the test dataset taking 30 records. To find out the level of accuracy, this research was conducted with the Naive Bayes Method and the Information Gain Method for feature selection. Accuracy testing is done by the Confusion Matrix method. The results showed that the highest accuracy was obtained by combining the Naive Bayes Method with the Information Gain Method obtained by 90% accuracy.

Download Full-text

Opinion Mining on Culinary Food Customer Satisfaction Using Naïve Bayes Based-on Hybrid Feature Selection

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v15.i1.pp468-475 ◽

2019 ◽

Vol 15 (1) ◽

pp. 468 ◽

Cited By ~ 3

Author(s):

Oman Somantri ◽

Dyah Apriliani

Keyword(s):

Feature Selection ◽

Opinion Mining ◽

Naive Bayes ◽

Information Gain ◽

Naïve Bayes ◽

Classification Model ◽

Consumer Ratings ◽

Bayes Algorithm ◽

Restaurant Owners

<p>Conducting an assessment of consumer sentiments taken from social media in assessing a culinary food gives useful information for everyone who wants to get this information especially for migrants and tourists, in th other hand that information is very valuable for food stall and restaurant owners as information in improvinf food quality. Overcoming this problem, a sentiment analysis classification model using naïve bayes algorithm (NB) was applied to get this information. This problem occurs is the level of accuracy of classification of consumer ratings of culinary food is still not optimal because the weight of values in the data preprocessing process are not optimal. In this paper proposed a hybrid feature selection models to overcome the problems in the process of selecting the feature attributes that have not been optimal by using a combination of information gain (IG) and genetic algorithm (GA) algorithms. The result of this research showed that after the experiment and compared to using others algorithms produce the best of the level occuracy is 93%.</p>

Download Full-text

Optimasi Naive Bayes Dengan Pemilihan Fitur Dan Pembobotan Gain Ratio

Lontar Komputer Jurnal Ilmiah Teknologi Informasi ◽

10.24843/lkjiti.2016.v07.i01.p03 ◽

2016 ◽

pp. 22

Author(s):

I Guna Adi Socrates ◽

Afrizal Laksita Akbar ◽

Mohammad Sonhaji Akbar ◽

Agus Zainal Arifin ◽

Darlis Herumurti

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

Feature Selection Method ◽

Simple Algorithm ◽

Selection Method ◽

Naïve Bayes ◽

Computation Complexity ◽

Bayes Method ◽

Gain Ratio ◽

Bayes Methods

Naïve Bayes is one of data mining methods that are commonly used in text-based document classification. The advantage of this method is a simple algorithm with low computation complexity. However, there is weaknesses on Naïve Bayes methods where independence of Naïve Bayes features can’t be always implemented that would affect the accuracy of the calculation. Therefore, Naïve Bayes methods need to be optimized by assigning weights using Gain Ratio on its features. However, assigning weights on Naïve Bayes’s features cause problems in calculating the probability of each document which is caused by there are many features in the document that not represent the tested class. Therefore, the weighting Naïve Bayes is still not optimal. This paper proposes optimization of Naïve Bayes method using weighted by Gain Ratio and feature selection method in the case of text classification. Results of this study pointed-out that Naïve Bayes optimization using feature selection and weighting produces accuracy of 94%.

Download Full-text

Deteksi Kanker Berdasarkan Data Microarray Menggunakan Metode Naïve Bayes dan Hybrid Feature Selection

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v4i3.2096 ◽

2020 ◽

Vol 4 (3) ◽

pp. 486

Author(s):

Bintang Peryoga ◽

Adiwijaya Adiwijaya ◽

Widi Astuti

Keyword(s):

Feature Selection ◽

Dimension Reduction ◽

Naive Bayes ◽

Information Gain ◽

Feature Selection Method ◽

Naïve Bayes ◽

Small Sample ◽

Filter Method ◽

Cancer Genes ◽

Cancer Data

Cancer is a deadly disease that is responsible for 9.6 million death in 2018 based on WHO data so early cancer detection is needed so can be treated immediately and cancer deaths can be reduced. Microarray is technology that can monitor and analyze the expression of cancer genes in microarray data but has high data dimension and small sample so dimensional reductions are needed for the optimal classification process. Dimension reduction can reduce the use of features for the classification process by selecting some influential features. Hybrid method is one dimension reduction by combining Filter method with Wrapper so it gets the both advantage. In this case, researchers combined Naïve Bayes with Hybrid Feature Selection (Information Gain - Genetic Algorithm) on cancer data for microarray Lung Cancer, Ovarian Cancer, Breast Cancer, Colon Tumors, and Prostate Tumors. These data were obtained from Kent-Ridge Biomedical Dataset. The results showed that from 5 data used, 4 data obtained an accuracy between 87-100% while the prostate tumor data obtained the smallest accuracy of 61.14%. The implementation of the feature selection method and the classification of the 5 cancer data above only uses less than 63 features to obtain this accuracy

Download Full-text

Iterative Feature Selection Using Information Gain & Naïve Bayes for Document Classification

2018 21st International Conference of Computer and Information Technology (ICCIT) ◽

10.1109/iccitechn.2018.8631971 ◽

2018 ◽

Author(s):

Chowdhury Mofizur Rahman ◽

Lameya Afroze ◽

Naznin Sultana Refath ◽

Nafin Shawon

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

Information Gain ◽

Naïve Bayes ◽

Document Classification

Download Full-text

Top-k Feature Selction Untuk Deteksi Penyakit Hepatitis Menggunakan Algoritme Naïve Bayes

Jurnal Buana Informatika ◽

10.24002/jbi.v11i1.2456 ◽

2020 ◽

Vol 11 (1) ◽

pp. 1

Author(s):

Riska Wibowo ◽

Henny Indriyawati

Keyword(s):

Data Mining ◽

Feature Selection ◽

Naive Bayes ◽

Information Gain ◽

Naïve Bayes ◽

Decision Making Process ◽

Accuracy Rate ◽

Chemical Substances ◽

Drugs And Alcohol ◽

Bayes Algorithm

Abstract. Becoming one of the society health problems in the world, hepatitis is an inflammation liver disease caused by a virus, bacterial infection, chemical substances including drugs and alcohol. In this research, for the dataset of hepatitis having high dimensionality, its value for each attribute was calculated using weight information gain method. Then, the attributes were selected by using top-k methods and were classified by using Naïve Bayes Algorithm respectively. This research showed that 9 out of 20 attributes had chosen to be the highest top-9 with an accuracy rate of 85.57%. Later on, this research can be useful for a consideration in a decision making process for various subjects related to feature selection and Naïve Bayes Algorithm method and also for predicting hepatitis.Keywords: data mining, weight information gain, Naïve Bayes algorithmAbstrak. Penyakit hepatitis merupakan masalah kesehatan masyarakat di dunia. Penyakit hepatitis merupakan penyakit peradangan hati yang disebabkan oleh virus, infeksi bakteri, zat-zat kimia termasuk obat-obatan dan alkohol. Pada penelitian ini, dataset hepatitis yang memiliki data berdimensi tinggi akan dihitung nilai bobot dari masing-masing atribut menggunakan metode weight information gain. Setelah dihitung nilai bobot dilakukan pemilihan atribut, atribut yang dipilih menggunakan metode top-k. Kemudian dilakukan klasifikasi menggunakan algoritme Naïve Bayes. Hasil penelitian menunjukkan dari 20 atribut, terpilih top-9 tertinggi dengan nilai akurasi 85.57%. Dengan adanya penelitian ini dapat digunakan sebagai bahan pertimbangan dan pengambilan keputusan pada berbagai bidang yang berkaitan dengan metode feature selection, algoritme Naïve Bayes, dan di dalam memprediksi penyakit hepatitis.Kata Kunci: data mining, weight information gain, algoritma Naïve Bayes

Download Full-text

Credit Card Fraud Detection using Imbalance Resampling Method with Feature Selection

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/811032021 ◽

2021 ◽

Vol 10 (3) ◽

pp. 2061-2071

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Credit Card ◽

Naive Bayes ◽

Information Gain ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Near Miss ◽

Credit Card Fraud ◽

Under Sampling

Many fraud transactions exist in the online world that affects various financial institutions but Credit Card Fraud transaction is the most occurring problem in the world. Credit Card fraud is the situation in which fraudsters misuse credit cards for illegal purposes. Hence, detection of fraudulent transactions is essen-tial. Several researchers have worked on detecting fraud transactions and also provide solutions whose surveys are given in this paper. This study makes a major contribution to research on the detection of Credit Card fraud transactions through Machine Learning Algorithms suchas Decision Tree and Naive Bayes. The data have been selected from Kag-gle and categorize into training (80%) and testing (20%) data. The whole experiment was performed on the Jupyter Notebook tool for which the Anaconda Navigator has been installed. The Heatmap is used for visualization and colorfully represents the data. The main aim of this work is to balance the dataset with Near-Miss Under-sampling Method. The information gain method is applied for feature selection. The best algorithm founded in this paper is Decision Tree with 97% accuracy as compared to Naïve Bayes with 90%. The results are achieved based on Accuracy, Recall, Precision, and F1-score. We have also shown the ROC Curve and Precision-Recall Curve of the algorithm in this paper.

Download Full-text

Performance Evaluation of Naive Bayes Classifier with and without Filter Based Feature Selection

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9376.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 2154-2158

Keyword(s):

Feature Selection ◽

Business Strategy ◽

Naive Bayes ◽

Information Gain ◽

Pearson Correlation ◽

Poor Performance ◽

Naïve Bayes ◽

Customer Relationship ◽

Attribute Selection ◽

Redundant Data

Customer Relationship Ma agement tends to analyze datasets to find insights about data which in turn helps to frame the business strategy for improvement of enterprises. Analyzing data in CRM requires high intensive models. Machine Learning (ML) algorithms help in analyzing such large dimensional datasets. In most real time datasets, the strong independence assumption of Naive Bayes (NB) between the attributes are violated and due to other various drawbacks in datasets like irrelevant data, partially irrelevant data and redundant data, it leads to poor performance of prediction. Feature selection is a preprocessing method applied, to enhance the predication of the NB model. Further, empirical experiments are conducted based on NB with Feature selection and NB without feature selection. In this paper, a empirical study of attribute selection is experimented for five dissimilar filter based feature selection such as Relief-F, Pearson correlation (PCC), Symmetrical Uncertainty (SU), Gain Ratio (GR) and Information Gain (IG).

Download Full-text

OPTIMALISASI KLASIFIKASI BERITA MENGGUNAKAN FEATURE INFORMATION GAIN UNTUK ALGORITMA NAIVE BAYES TERHUBUNG RANDOM FOREST

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.684 ◽

2019 ◽

Vol 15 (2) ◽

pp. 211-218

Author(s):

Bobby Suryo Prakoso ◽

Didi Rosiyadi ◽

Dedi Aridarma ◽

Heru Sukma Utama ◽

Fariz Fauzi ◽

...

Keyword(s):

Feature Selection ◽

Random Forest ◽

Naive Bayes ◽

Information Gain ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Feature Information

Penelitian ini adalah tentang pengklasifikasian berita yang mengoptimalisasi dengan kombinasi antar algoritma. Tentang dataset yang digunakan diambil pada situs pemberitaan online. Algoritma yang digunakan adalah algoritma Naive Bayes Classifier, dan Random Forest dengan pembobotan seleksi fitur Information Gain. Dataset yang digunakan terdapat 615 dataset dengan 3 katagori atau tema berita. Dalam permodelan terdapat 6 model skenario sebagai pembanding untuk menentukan skenario mana yang mendapatkan nilai terbaik, berdasarkan hasil penelitian ini nilai terbaik didapatkan oleh model Remove Useless Attributes, Naive bayes Classifier-Multinomial, dan Random Forest-Feature Selection Information gain. Hasil evaluasi yang didapatkan adalah nilai accuracy 85.67%, nilai recall 85.67%, dan nilai precision 86.23

Download Full-text