Perbandingan Algoritma Naive Bayes dan C.45 dalam Klasifikasi Data Mining

Pada paper ini, telah diterapkan metode Naive Bayes serta C.45 ke dalam 4 buah studi kasus, yaitu kasus penerimaan “Kartu Indonesia Sehat”, penentuan pengajuan kartu kredit di sebuah bank, penentuan usia kelahiran, serta penentuan kelayakan calon anggota kredit pada koperasi untuk mengetahui algoritma terbaik di setiap kasus. Setelah itu, dilakukan perbandingan dalam hal Precision, Recall serta Accuracy untuk setiap data training dan data testing yang telah diberikan. Dari hasil implementasi yang dilakukan, telah dibangun sebuah aplikasi yang dapat menerapkan algoritma Naive Bayes dan C.45 di 4 buah kasus tersebut. Aplikasi telah diuji dengan blackbox dan algoritma dengan hasil valid dan dapat mengimplementasikan kedua buah algoritma dengan benar. Berdasarkan hasil pengujian, semakin banyaknya data training yang digunakan, maka nilai precision, recall dan accuracy akan semakin meningkat. Selain itu, hasil klasifikasi pada algoritma Naive Bayes dan C.45 tidak dapat memberikan nilai yang absolut atau mutlak di setiap kasus. Pada kasus penentuan penerimaan Kartu Indonesia Sehat, kedua buah algoritma tersebut sama-sama efektif untuk digunakan. Untuk kasus pengajuan kartu kredit di sebuah bank, C.45 lebih baik daripada Naive Bayes. Pada kasus penentuan usia kelahiran, Naive Bayes lebih baik daripada C.45. Sedangkan pada kasus penentuan kelayakan calon anggota kredit di koperasi, Naive Bayes memberikan nilai yang lebih baik pada precision, tapi untuk recall dan accuracy, C.45 memberikan hasil yang lebih baik. Sehingga untuk menentukan algoritma terbaik yang akan dipakai di sebuah kasus, harus melihat kriteria, variable maupun jumlah data di kasus tersebut. AbstractIn this paper, applied Naive Bayes and C.45 into 4 case studies, namely the case of acceptance of “Kartu Indonesia Sehat”, determination of credit card application in a bank, determination of birth age, and determination of eligibility of prospective members of credit to Koperasi to find out the best algorithm in each case. After that, the comparison in Precision, Recall and Accuracy for each training data and data testing has been given. From the results of the implementation, has built an application that can apply the Naive Bayes and C.45 algorithm in 4 cases. Applications have been tested in blackbox and algorithms with valid results and can implement both algorithms correctly. Based on the test results, the more training data used, the value of precision, recall and accuracy will increase. The classification results of Naive Bayes and C.45 algorithms can not provide absolute value in each case. In the case of determining the acceptance of the Kartu Indonesia Indonesia, the two algorithms are equally effective to use. For credit card submission cases at a bank, C.45 is better than Naive Bayes. In the case of determining the age of birth, Naive Bayes is better than C.45. Whereas in the case of determining the eligibility of prospective credit members in the cooperative, Naive Bayes provides better value in precision, but for recall and accuracy, C.45 gives better results. So, to determine the best algorithm to be used in a case, it must look at the criteria, variables and amount of data in the case

Download Full-text

Peringkasan dan Support Vector Machine pada Klasifikasi Dokumen

JURNAL INFOTEL ◽

10.20895/infotel.v9i4.312 ◽

2017 ◽

Vol 9 (4) ◽

pp. 416 ◽

Cited By ~ 1

Author(s):

Nelly Indriani Widiastuti ◽

Ednawati Rainarli ◽

Kania Evita Dewi

Keyword(s):

Support Vector Machine ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Support Vector ◽

Good Reputation ◽

Multiclass Support Vector Machine ◽

Simple Logistic ◽

Better Than

Classification is the process of grouping objects that have the same features or characteristics into several classes. The automatic documents classification use words frequency that appears on training data as features. The large number of documents cause the number of words that appears as a feature will increase. Therefore, summaries are chosen to reduce the number of words that used in classification. The classification uses multiclass Support Vector Machine (SVM) method. SVM was considered to have a good reputation in the classification. This research tests the effect of summary as selection features into documents classification. The summaries reduce text into 50%. A result obtained that the summaries did not affect value accuracy of classification of documents that use SVM. But, summaries improve the accuracy of Simple Logistic Classifier. The classification testing shows that the accuracy of Naïve Bayes Multinomial (NBM) better than SVM

Download Full-text

Data Mining Optimization Using Sample Bootstrapping and Particle Swarm Optimization in the Credit Approval Classification

Indonesian Journal of Artificial Intelligence and Data Mining ◽

10.24014/ijaidm.v2i1.6299 ◽

2019 ◽

Vol 2 (1) ◽

Author(s):

Andre Alvi Agustian ◽

Achmad Bisri

Keyword(s):

Data Mining ◽

Particle Swarm Optimization ◽

Credit Card ◽

Naive Bayes ◽

Particle Swarm ◽

Class Imbalance ◽

Naïve Bayes ◽

Swarm Optimization ◽

The Status ◽

Auc Value

Credit approval is a process carried out by the bank or credit provider company. Where the process is carried out based on credit requests and credit proposals from the borrower. Credit approval is often difficult for banks or credit providers. Where the number of requests and classifications must be made on various data submitted. This study aims to enable banks or credit card issuing companies to carry out credit approval processes effectively and accurately in determining the status of the submissions that have been made. This research uses data mining techniques. This study uses a Credit Approval dataset from UCI Machine Learning, where there is a class imbalance in the dataset. 14 attributes are used as system inputs. This study uses the C4.5 and Naive Bayes algorithms where optimization is needed using Sample Bootstrapping and Particle Swarm Optimization (PSO) in the algorithm so that the results of the research produce good accuracy and are included in the good classification. After using the optimization, it produces an accuracy rate of C4.5 which is initially 85.99% and the AUC value of 0.904 becomes 94.44% with the AUC value of 0.969 and Naive Bayes which initially has an accuracy value of 83.09% with an AUC value of 0.916 to 90 , 10% with an AUC value of 0.944.

Download Full-text

Analisis Sentimen Multi-Aspek Berbasis Konversi Ikon Emosi dengan Algoritme Naïve Bayes untuk Ulasan Wisata Kuliner Pada Web Tripadvisor

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2020731907 ◽

2020 ◽

Vol 7 (4) ◽

pp. 737

Author(s):

Sitti Aliyah Azzahra ◽

Arief Wibowo

Keyword(s):

Data Mining ◽

Naive Bayes ◽

Naïve Bayes ◽

Emotional Expressions ◽

Tourist Attraction ◽

Test Results ◽

Tourist Attractions ◽

Bayes Algorithm ◽

Labeling Method ◽

The City

Wisatawan seringkali mencari informasi tentang obyek wisata pada situs web seperti TripAdvisor. Situs web TripAdvisor memiliki fitur bagi penguna terdaftar untuk memberi ulasan tentang objek wisata dalam kategori kuliner dari berbagai negara. Ulasan tersebut bisa digunakan wisatawan sebagai pertimbangan sebelum mendatangi objek wisata kuliner yang ingin dituju. Komentar atau ulasan yang ada di situs TripAdvisor dapat dianalisis untuk mengetahui nilai sentimen dari suatu obyek wisata yang diulas. Hasil analisis itu dapat bermanfaat bagi pengelola tempat wisata, pengusaha kuliner maupun bagi wisatawan lain. Ada tantangan yang ditemukan saat analisis sentimen dilakukan pada kalimat ulasan yang mengandung ikon emosi atau emoticon, karena ulasan dapat mengandung arti sentimen yang berbeda antara kalimat dengan ekspresi emosi yang ada. Penelitian ini berisi analisis ulasan tentang kuliner kota Bandung pada situs TripAdvisor yang mengklasifikasi sentimen menjadi tiga kelas. Penelitian ini menggunakan teknik klasifikasi data mining dengan algoritme Naïve Bayes dikombinasi dengan metode pelabelan multi aspek yang disertai konversi ikon emosi pada teks ulasan. Selain itu, analisis dilakukan pada bobot ulasan berdasarkan jumlah kontribusi pemberi ulasan di web TripAdvisor. Hasil pengujian menunjukkan bahwa penggunaan seluruh kombinasi metode tersebut dalam proses klasifikasi sentimen mampu menghasilkan nilai akurasi sebesar 98,67%. AbstractTourists often look for information about attractions on websites such as TripAdvisor. The TripAdvisor website has a feature for registered users to provide reviews about attractions in the culinary category from various countries. These reviews can be used by tourists as a consideration before visiting culinary attractions to be addressed. Comments or reviews on the TripAdvisor site can be analyzed to determine the sentiment value of a tourist attraction being reviewed. The results of the analysis can be useful for managers of tourist attractions, culinary entrepreneurs and for other tourists. There are challenges that are found when sentiment analysis is carried out on review sentences that contain emotion icons or emoticons, because reviews may contain different sentiment meanings between sentences and existing emotional expressions. This study contains a review of the culinary analysis of the city of Bandung on the TripAdvisor site which classifies sentiments into three classes. This study uses data mining classification techniques with the Naïve Bayes algorithm combined with a multi-aspect labeling method accompanied by the conversion of emotional icons in the review text. In addition, the analysis is carried out on the weight of the review based on the number of contributing reviewers on the TripAdvisor web. The test results show that the use of all combinations of these methods in the sentiment classification process is able to produce an accuracy value of 98.67%.

Download Full-text

Comparison of Classification Data Mining C4.5 and Naïve Bayes Algorithms of EDM Dataset

TEM Journal ◽

10.18421/tem104-34 ◽

2021 ◽

pp. 1738-1744

Author(s):

Joseph Teguh Santoso ◽

Ni Luh Wiwik Sri Rahayu Ginantra ◽

Muhammad Arifin ◽

R Riinawati ◽

Dadang Sudrajat ◽

...

Keyword(s):

Data Mining ◽

Naive Bayes ◽

Educational Data Mining ◽

Naïve Bayes ◽

T Test ◽

Bayes Method ◽

Average Accuracy ◽

Difference Test ◽

Naive Bayes Method ◽

Better Than

The purpose of this research is to choose the best method by comparing two classification methods of data mining C4.5 and Naïve Bayes on Educational Data Mining, in which the data used is student graduation data consisting of 79 records. Both methods are tested for validation with 10-ford X Validation and perform a T-Test difference test to produce a table that contains the best method ranking. Different results were obtained for each method. Based on the results of these two methods, it is very influential on the dataset and the value of the area under curve in the Naïve Bayes method is better than the C4.5 method in various datasets. Comparison of the method with the 10-Ford X Validation test and the T-Test difference test is that the Naïve Bayes method is better than C4.5 with an average accuracy value of 73.41% and an under-curve area of 0.664.

Download Full-text

Random Subclasses Ensembles by Using 1-Nearest Neighbor Framework

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001417500318 ◽

2017 ◽

Vol 31 (10) ◽

pp. 1750031

Author(s):

Amir Ahmad ◽

Hamza Abujabal ◽

C. Aswani Kumar

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Ensemble Methods ◽

Naïve Bayes ◽

Training Data ◽

Classifier Ensemble ◽

Base Classifier ◽

Decision Boundaries ◽

Better Than

A classifier ensemble is a combination of diverse and accurate classifiers. Generally, a classifier ensemble performs better than any single classifier in the ensemble. Naive Bayes classifiers are simple but popular classifiers for many applications. As it is difficult to create diverse naive Bayes classifiers, naive Bayes ensembles are not very successful. In this paper, we propose Random Subclasses (RS) ensembles for Naive Bayes classifiers. In the proposed method, new subclasses for each class are created by using 1-Nearest Neighbor (1-NN) framework that uses randomly selected points from the training data. A classifier considers each subclass as a class of its own. As the method to create subclasses is random, diverse datasets are generated. Each classifier in an ensemble learns on one dataset from the pool of diverse datasets. Diverse training datasets ensure diverse classifiers in the ensemble. New subclasses create easy to learn decision boundaries that in turn create accurate naive Bayes classifiers. We developed two variants of RS, in the first variant RS(2), two subclasses per class were created whereas in the second variant RS(4), four subclasses per class were created. We studied the performance of these methods against other popular ensemble methods by using naive Bayes as the base classifier. RS(4) outperformed other popular ensemble methods. A detailed study was carried out to understand the behavior of RS ensembles.

Download Full-text

Komparasi Data Mining Naive Bayes dan Neural Network memprediksi Masa Studi Mahasiswa S1

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2020732093 ◽

2020 ◽

Vol 7 (3) ◽

pp. 443

Author(s):

Azahari Azahari ◽

Yulindawati Yulindawati ◽

Dewi Rosita ◽

Syamsuddin Mallala

Keyword(s):

Neural Network ◽

Data Mining ◽

Naive Bayes ◽

Naïve Bayes ◽

Drop Out ◽

Training Data ◽

Data Mining Algorithm ◽

Mining Algorithm ◽

Testing Data ◽

Target Data

Prediksi kelulusan dibutuhkan oleh manajemen perguruan tinggi dalam menentukan kebijakan preventif terkait pencegahan dini kasus drop out. Lama masa studi setiap mahasiswa bisa disebabkan dengan berbagai faktor. Dengan menggunakan data mining algoritma naive bayes dan neural network dapat dilakukan prediksi kelulusan mahasiswa di STMIK Widya Cipta Dharma (WiCiDa) Samarinda . Atribut yang digunakan yaitu, umur saat masuk kuliah, klasifikasi kota asal Sekolah Menengah Atas, pekerjaan ayah, program studi, kelas, jumlah saudara, dan Indeks Prestasi Kumulatif (IPK). Sampel mahasiswa yang lulus dan drop-out pada tahun 2011 sampai 2019 dijadikan sebagai data training dan data testing. Sedangkan angkatan 2015–2018 digunakan sebagai data target yang akan diprediksi masa studinya. Sebanyak 3229 mahasiswa, 1769 sebagai data training, 321 sebagai data testing, dan 1139 sebagai data target. Semua data diambil dari data mahasiswa program strata 1, dan tidak mengikut sertakan data mahasiswa D3 dan alih jenjang/transfer. Dari data testing diperoleh tingkat akurasi hanya 57,63%. Hasil penelitian menunjukkan banyaknya kelemahan dari hasil prediksi naive bayes dikarenakan tingkat akurasi kevalidannya tergolong tidak terlalu tinggi. Sedangkan akurasi prediksi neural network adalah 72,58%, sehingga metode alternatif inilah yang lebih baik. Proses evaluasi dan analisis dilakukan untuk melihat dimana letak kesalahan dan kebenaran dalam hasil prediksi masa studi.<div><div>AbstractGraduation predictions are required by the higher education institution preventive policies related to the early prevention of drop-out cases. The duration of study, for each student can be caused by various factors. By using the data mining algorithm Naive bayes and neural network, the student graduation in STMIK Widya Cipta Dharma (WiCiDa) can be predicted. The attributes used are as follows: age at admission, classification of cities from high school, father’s occupation, study program, class, number of siblings, and grade point average (GPA). Samples of students who graduated and dropped out between year 2011 and 2019 were used as training data and testing data. While the year class of 2015to 2018 is used as the target data, which will be predicted during the study period. According to the data mining algorithm Naive bayes, there are 3229 students; 1769 as training data, 321 as testing data, and 1139 as target data. All data is taken from students enrolled in undergraduate program and does not include data on diploma students and transfer student. From the testing data, an accuracy rate only 57.63%. The other side, prediction accuracy of the neural network is 72.58%, so this alternative method is the best chosen. The research results show the many weaknesses of the results of prediction of Naive bayes because the level of accuracy of its validity is not high. The evaluation and analysis process are conducted to see where the errors and truths are in the results of the study period predictions. </div></div>

Download Full-text

Prediksi Angka Kelahiran Bayi Pada Desa Tridaya Sakti Dengan Menggunakan Algoritma Naive Bayes

Journal of Students‘ Research in Computer Science ◽

10.31599/jsrcs.v1i2.423 ◽

2020 ◽

Vol 1 (2) ◽

pp. 77-88

Author(s):

Nur Isnaini Parihah ◽

Sari Hartini ◽

Juarni Siregar

Keyword(s):

Data Mining ◽

Population Growth ◽

Naive Bayes ◽

Large Population ◽

Naïve Bayes ◽

Training Data ◽

Birth Rates ◽

Testing Data ◽

Bayes Algorithm ◽

Infant Birth

The birth rate is something that can affect the increase in population growth. Large population is a burden for development. According to Malthus's Theory which states that a large population growth is not the welfare that is obtained but rather poverty will be encountered if the population is not well controlled. The number of baby births in Tridaya Sakti Village is increasing every year. Therefore Data Mining using the Naive Bayes algorithm can help in the calculation of predicting infant birth rates in Tridaya Sakti Village. Data Mining in predicting the number of infant birth rates aims to determine the number of infant birth rates for the coming year using the Naive Bayes algorithm. By looking at the prediction patterns of each variable and testing training data on testing data. It is hoped that the Naive Bayes algorithm can solve the problem in Tridaya Sakti Village in handling and overcoming the calculation of infant birth rates and can help the Tridaya Sakti Village in regulating population growth in the coming years. The results obtained from the data that have been taken and calculated by Data Mining using the Naive Bayes algorithm produce an information that can be used as a reference to find out the number of births. Performance and time in data processing are more effective and efficient as well as more accurate and accurate predictions of the number of baby births. Keywords: Naive Bayes, Birth of a Baby, Prediction Abstrak Angka kelahiran merupakan suatu hal yang dapat mempengaruhi peningkatan pertumbuhan penduduk. Jumlah penduduk yang besar merupakan beban bagi pembangunan. Menurut Teori Malthus yang menyatakan bahwa pertumbuhan jumlah penduduk yang besar bukanlah kesejahteraan yang didapat tapi justru kemelaratan akan ditemui bilamana jumlah penduduk tidak dikendalikan dengan baik. Jumlah angka kelahiran bayi di Desa Tridaya Sakti setiap tahunnya semakin bertambah. Maka dari itu Data Mining dengan menggunakan algoritman Naive Bayes dapat membantu dalam perhitungan memprediksi angka kelahiran bayi di Desa Tridaya Sakti. Data Mining dalam memprediksi jumlah angka kelahiran bayi bertujuan untuk mengetahui jumlah angka kelahiran bayi tahun yang akan mendatang mengunakan algoritma Naive Bayes. Dengan melihat pola prediksi dari setiap variabel dan melakukan pengujian data training terhadap data testing. Diharapkan algoritma Naive Bayes ini dapat menyelesaikan permasalahan di Desa Tridaya Sakti dalam menangani dan mengatasi perhitungan angka kelahiran bayi dan dapat membantu pihak Desa Tridaya Sakti dalam mengatur pertumbuhan jumlah penduduk tahun yang akan mendatang. Hasil yang diperoleh dari data yang sudah diambil dan dihitung dengan Data Mining mengunakan algoritam Naive Bayes menghasilkan sebuah informasi yang dapat digunakan sebagai acuan untuk mengetahui jumlah angka kelahiran bayi. Kinerja dan waktu dalam proses pengolahan data lebih efektif dan efesien serta dari prediksi jumlah kelahiran bayi lebih tepat dan akurat. Kata Kunci: Naive Bayes, Kelahiran Bayi, Prediks

Download Full-text

Use of Data Mining for Prediction of Customer Loyalty

CommIT (Communication and Information Technology) Journal ◽

10.21512/commit.v10i1.1660 ◽

2015 ◽

Vol 10 (1) ◽

pp. 41 ◽

Cited By ~ 3

Author(s):

Andri Wijaya ◽

Abba Suganda Girsang

Keyword(s):

Data Mining ◽

Customer Loyalty ◽

Classification Accuracy ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Training Set ◽

Use Of Data ◽

C4.5 Algorithm

This article discusses the analysis of customer loyalty using three data mining methods: C4.5,Naive Bayes, and Nearest Neighbor Algorithms and real-world empirical data. The data contain ten attributes related to the customer loyalty and are obtained from a national multimedia company in Indonesia. The dataset contains 2269 records. The study also evaluates the effects of the size of the training data to the accuracy of the classification. The results suggest that C4.5 algorithm produces highest classification accuracy at the order of 81% followed by the methods of Naive Bayes 76% and Nearest Neighbor 55%. In addition, the numerical evaluation also suggests that the proportion of 80% is optimal for the training set.

Download Full-text

IMPLEMENTASI DATA MINING UNTUK MEMPREDIKSI PEMESANAN DRIVER GO-JEK ONLINE DENGAN MENGGUNAKAN METODE NAIVE BAYES (STUDI KASUS: PT. GO-JEK INDONESIA)

KOMIK (Konferensi Nasional Teknologi Informasi dan Komputer) ◽

10.30865/komik.v2i1.972 ◽

2018 ◽

Vol 2 (1) ◽

Author(s):

Delisman Laia ◽

Efori Buulolo ◽

Matias Julyus Fika Sirait

Keyword(s):

Data Mining ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Transportation Industry ◽

Data Set ◽

Data Mining Algorithms ◽

Taxi Service ◽

Bayes Algorithm ◽

Using Data

PT. Go-Jek Indonesia is a service company. Go-jek online is a technology-based motorcycle taxi service that leads the transportation industry revolution. Predictions on ordering go-jek drivers using data mining algorithms are used to solve problems faced by the company PT. Go-Jek Indonesia to predict the level of ordering of online go-to drivers. In determining the crowded and lonely time. The proposed method is Naive Bayes. Naive Bayes algorithm aims to classify data in certain classes. The purpose of this study is to look at the prediction patterns of each of the attributes contained in the data set by using the naive algorithm and testing the training data on testing data to see whether the data pattern is good or not. what will be predicted is to collect the data of the previous driver ordering, which is based on the day, time for one month. The Naive Bayes algorithm is used to predict the ordering of online go-to-go drivers that will be experienced every day by seeing each order such as morning, afternoon and evening. The results of this study are to make it easier for the company to analyze the data of each go-jek driver booking in taking policies to ensure that both drivers and consumers or customers.Keywords: Go-jek Driver, Data Mining, Naive Bayes

Download Full-text

Prediction of User Loyalty Using the Naive Bayes Method in the "Goprint" Online Printing Marketplace

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2020.v08.i03.p03 ◽

2020 ◽

Vol 8 (3) ◽

pp. 227

Author(s):

Gede Widiastawan ◽

I Gusti Agung Gede Arya Kadyanan

Keyword(s):

Decision Making ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Test Results ◽

Bayes Method ◽

Testing Data ◽

Bayes Algorithm ◽

Naive Bayes Method

Goprint is an Online Printing Marketplace that connects printing services with users who want to print documents quickly without the need to queue. In the span of time from April 2019 to September 2019 it was found that the number of Goprint users reached 407 users, 24 partners, and 256 orders. From transactions that have been carried out by users, not a few orders are often canceled due to ineffective Goprint features or poor partner performance. This causes Goprint users to feel dissatisfied with the services provided by the Goprint application. The Naive Bayes algorithm is one of the algorithms used for classification or grouping of data, but can also be used for decision making. With this algorithm and the problems that occur, the authors make a system to predict the loyalty of Goprint users to anticipate users who stop leaving Goprint because they are not satisfied or loyal users. The data used as training data is 20 and testing data is 10. From the test results it is found that the value of precision is 80%, 100% recall, and 90% accuracy.

Download Full-text