scholarly journals Attribute Selection Using Information Gain and Naïve Bayes for Traffic Classification

2019 ◽  
Vol 1196 ◽  
pp. 012021
Author(s):  
Ahmad Fali Oklilas ◽  
Tasmi ◽  
Sri Desy Siswanti ◽  
Mira Afrina ◽  
Herri Setiawan

Customer Relationship Ma agement tends to analyze datasets to find insights about data which in turn helps to frame the business strategy for improvement of enterprises. Analyzing data in CRM requires high intensive models. Machine Learning (ML) algorithms help in analyzing such large dimensional datasets. In most real time datasets, the strong independence assumption of Naive Bayes (NB) between the attributes are violated and due to other various drawbacks in datasets like irrelevant data, partially irrelevant data and redundant data, it leads to poor performance of prediction. Feature selection is a preprocessing method applied, to enhance the predication of the NB model. Further, empirical experiments are conducted based on NB with Feature selection and NB without feature selection. In this paper, a empirical study of attribute selection is experimented for five dissimilar filter based feature selection such as Relief-F, Pearson correlation (PCC), Symmetrical Uncertainty (SU), Gain Ratio (GR) and Information Gain (IG).


2019 ◽  
Vol 15 (2) ◽  
pp. 247-254
Author(s):  
Heru Sukma Utama ◽  
Didi Rosiyadi ◽  
Dedi Aridarma ◽  
Bobby Suryo Prakoso

Analysis of the odd even-numbered sentiment systems in Bekasi toll using the Naïve Bayes Algorithm, is a process of understanding, extracting, and processing textual data automatically from social media. The purpose of this study was to determine the level of accuracy, recall and precision of opinion mining generated using the Naïve Bayes algorithm to provide information community sentiment towards the effectiveness of the odd system of Bekasi tiolls on social media. The research method used in this study was to do text mining in comments-comments regarding posts regarding even odd oddities on Bekasi toll on Twitter, Instagram, Youtube and Facebook. The steps taken are starting from preprocessing, transformation, datamining and evaluation, followed by information gaon feature selection, select by weight and applying NB Algorithm model. The results obtained from the study using the NB model are obtained Confusion Matrix result, namely accuracy of 79,55%, Precision of 80,51%, and Sensitivity or Recall of 80,91%. Thus this study concludes that the use of Support Vector Machine Algorithms can analyze even odd sentiments on the Bekasi toll road.


2019 ◽  
Vol 17 (1) ◽  
pp. 1
Author(s):  
Muqorobin Muqorobin ◽  
Kusrini Kusrini ◽  
Emha Taufiq Luthfi

The cost of education is one component of input that is very important in implementing education. Because costs are the main requirement in an effort to achieve educational goals. SMK Al-Islam Surakarta is a private education institution that requires students to pay school fees in the form of Education Development Donations. Educational Development Donation is a routine school fee that is conducted every month. Based on last year's TU report, many students were late in paying Education Development Donations, around 60%. This is a big problem. The purpose of this study is that researchers will build a predictive system using the Naïve Bayes method. Because the method can classify the class right or late, in the payment of school fees. Data processing was taken from the dapodik data of schools in 2017/2018 with the test dataset taking 30 records. To find out the level of accuracy, this research was conducted with the Naive Bayes Method and the Information Gain Method for feature selection. Accuracy testing is done by the Confusion Matrix method. The results showed that the highest accuracy was obtained by combining the Naive Bayes Method with the Information Gain Method obtained by 90% accuracy. 


2019 ◽  
Vol 3 (2) ◽  
pp. 227-232
Author(s):  
Bobby Suryo Prakoso ◽  
Didi Rosiyadi ◽  
Heru Sukma Utama ◽  
Dedi Aridarma

Penelitian yang dilakukan ini merupakan bagian dari text mining untuk klasifikasi konten berita yang telah memiliki label berdasarkan katagori berita pada situs detik.com . Proses yang dilakukan adalah melakukan permodelan dan pengolahan data, mulai proses pre-processing, proses seleksi fitur information gain, dan penerapan model algoritma Naive Bayes Classifier dengan Bayesian Boosting. Hasil yang diperoleh atas model tersebut mendapatkan nilai evaluasi terhadap akurasi, recall, dan presisi sebesar 73.2%. Sedangkan dengan model yang lebih ringkas yaitu model algoritma Naive Bayes Classifier, dengan Bayesian Boosting mendapatkan nilai evaluasi yang sama besar yaitu 73.2%. Penilaian atas hasil evaluasi model yang telah terlaksankan berkesimpulan bahwa penerapan seleksi fitur Information Gain tidak berpengaruh besar atas kenaikan hasil performa terhadap kondisi label Polynomial.  


2020 ◽  
Vol 4 (1) ◽  
pp. 76-85
Author(s):  
Dwi Yuni Utami ◽  
Elah Nurlelah ◽  
Noer Hikmah

Liver disease is an inflammatory disease of the liver and can cause the liver to be unable to function as usual and even cause death. According to WHO (World Health Organization) data, almost 1.2 million people per year, especially in Southeast Asia and Africa, have died from liver disease. The problem that usually occurs is the difficulty of recognizing liver disease early on, even when the disease has spread. This study aims to compare and evaluate Naive Bayes algorithm as a selected algorithm and Naive Bayes algorithm based on Genetic Algorithm (GA) and Bagging to find out which algorithm has a higher accuracy in predicting liver disease by processing a dataset taken from the UCI Machine Learning Repository database (GA). University of California Invene). From the results of testing by evaluating both the confusion matrix and the ROC curve, it was proven that the testing carried out by the Naive Bayes Optimization algorithm using Algortima Genetics and Bagging has a higher accuracy value than only using the Naive Bayes algorithm. The accuracy value for the Naive Bayes algorithm model is 66.66% and the accuracy value for the Naive Bayes model with attribute selection using Genetic Algorithms and Bagging is 72.02%. Based on this value, the difference in accuracy is 5.36%.Keywords: Liver Disease, Naïve Bayes, Genetic Agorithms, Bagging.


Author(s):  
Muqorobin Muqorobin ◽  
Kusrini Kusrini ◽  
Siti Rokhmah ◽  
Isnawati Muslihah

The Surakarta Al-Islam Vocational School is a private educational institution that requires all students to pay school tuition fees. Education is an obligation for all Indonesian citizens. The cost of education is one of the most important input components in implementing education. Because cost is the main requirement in achieving educational goals. SPP School is a routine school fee that is carried out every month. Based on last year's School Admin report, many students were late in paying school tuition fees, around 60%. This is a very big problem because the income of school funds comes from school tuition. The purpose of this research is that the researcher will build a prediction system using the best classification method, which is to compare the accuracy level of the Naïve Bayes method with the K-K-Nearest Neighbor method. Because both methods can make class classifications right or late, in paying school fees. processing using dapodic data for 2017/2018 as many as 236 data. In improving accuracy, the researcher also applies feature selection with Information Gain, which is useful for selecting optimal parameters. System testing is carried out using the Confusion Matrix method. The final results of this study indicate that the Naïve Bayes Method + Information Gain Method produces the highest accuracy, namely 95% compared to the Naïve Bayes method alone, namely 85% and the K-NN method, namely 81%.


Author(s):  
Oman Somantri ◽  
Dyah Apriliani

<p>Conducting an assessment of consumer sentiments taken from social media in assessing a culinary food gives useful information for everyone who wants to get this information especially for migrants and tourists, in th other hand that information is very valuable for food stall and restaurant owners as information in improvinf food quality. Overcoming this problem, a sentiment analysis classification model using naïve bayes algorithm (NB) was applied to get this information. This problem occurs is the level of accuracy of classification of consumer ratings of culinary food is still not optimal because the weight of values in the data preprocessing process are not optimal. In this paper proposed a hybrid feature selection models to overcome the problems in the process of selecting the feature attributes that have not been optimal by using a combination of information gain (IG) and genetic algorithm (GA) algorithms. The result of this research showed that after the experiment and compared to using others algorithms produce the best of the level occuracy is 93%.</p>


2020 ◽  
Vol 7 (3) ◽  
pp. 599
Author(s):  
Arif Bijaksana Putra Negara ◽  
Hafiz Muhardi ◽  
Indira Melinda Putri

<p class="Abstrak">Zaman sekarang tren masyarakat untuk memesan tiket pesawat sudah melalui situs-situs <em>booking</em> <em>online</em>. Pegipegi.com merupakan salah satu <em>website</em> yang menyediakan pemesanan tiket dan menyediakan fitur ulasan bagi pengunjung untuk menyampaikan opini. Pengunjung lain yang membaca ulasan-ulasan tersebut dapat memperoleh gambaran secara lebih objektif mengenai maskapai penerbangan. Ulasan pengguna yang terdapat pada website pegipegi.com saat ini sudah sangat banyak sehingga hal ini menyulitkan dan memakan waktu untuk membaca secara keseluruhan. Oleh karena itu dirancang analisis sentimen guna membantu mengklasifikasi ulasan kedalam kategori positif atau negatif sehingga dapat memberikan rekomendasi maskapai penerbangan berdasarkan jumlah kategori ulasan. Metode yang diterapkan untuk klasifikasi sentimen adalah Naïve Bayes dengan seleksi fitur <em>Information Gain</em>. Adapun tujuan dari penelitian ini adalah mengetahui pengaruh dari pemilihan fitur <em>Information Gain</em> terhadap akurasi klasifikasi dan membuktikan bahwa metode Naïve Bayes dengan <em>Information Gain</em> dapat digunakan untuk klasifikasi analisis sentimen. Hasil pengujian yang telah dilakukan menunjukkan bahwa nilai rata-rata akurasi, <em>precision</em>, <em>recall</em> setelah penambahan <em>Information Gain</em> menunjukkan hasil yang lebih baik sebesar 0,865 jika dibandingkan sebelum penambahan information gain yakni sebesar 0,81.</p><p class="Abstrak"> </p><p class="Abstrak"><em><strong>Abstract</strong></em></p><p class="Abstrak"><em><em>Nowadays people tend to order airplane tickets through online booking sites. Pegipegi.com is a website that provides ticket reservations and a review section for visitors to express their opinions. Other visitors who read the reviews can get a more objective picture of airlines. The user reviews contained on the pegipegi.com website are currently very large so this makes it difficult and time consuming to read in its entirety. Therefore sentiment analysis is designed to help classify reviews into positive or negative categories so that they can provide airline recommendations based on the number of review categories. The method applied for sentiment classification is Naïve Bayes with the Information Gain feature selection. The purpose of this study was to determine the effect of selecting the Information Gain feature on classification accuracy and prove that the Naïve Bayes method with Information Gain can be used for the classification of sentiment analysis. The results of the tests that have been done show that the average value of accuracy, precision, recall after adding Information Gain shows better results of 0.865 compared to the addition of information gain which is equal to 0.81</em>.</em></p>


Sign in / Sign up

Export Citation Format

Share Document