COMPARISON OF DECISION TREE, NAÏVE BAYES, AND NEURAL NETWORK ALGORITHM FOR EARLY DETECTION OF DIABETES

Diabetes mellitus is included in the top 3 most deadly diseases in Indonesia. Based on WHO data in 2013, diabetes contributed 6.5% to the death of the Indonesian population. Diabetes is a chronic disease characterized by high blood sugar (glucose) levels that exceed normal limits. In the health sector, historical medical data can be processed to extract new information and can be used for decision-making processes such as disease prediction. This study aims to classify predictions for early detection of diabetes in order to obtain accurate results for decision making. The data used are historical data on hospital disease patients in Sylhet, Bangladesh in the form of a diabetes dataset from the UCI Repository. The algorithms used are Decision Tree, Naive Bayes, and Neural Network. Then the three methods are compared using the Rapidminer tools. The measurement results are 90% accuracy with Decision Tree, 80% with Naive Bayes, and 70% with Neural Network. So that the best algorithm is obtained, namely the Decision Tree for predicting early detection of diabetes. Rule in the form of a decision tree generated from the Decision Tree is used for input or ideas for decision making in the health sector for diabetes.

Download Full-text

Identifying Key Fraud Indicators in the Automobile Insurance Industry Using SQL Server Analysis Services

Studia Universitatis Babe-Bolyai Oeconomica ◽

10.2478/subboec-2019-0009 ◽

2019 ◽

Vol 64 (2) ◽

pp. 53-71

Author(s):

Botond Benedek ◽

Ede László

Keyword(s):

Neural Network ◽

Decision Tree ◽

Naive Bayes ◽

Insurance Industry ◽

Naïve Bayes ◽

Sql Server ◽

Categorical Variables ◽

Automobile Insurance ◽

Price Determination ◽

Mining Tool

Abstract Customer segmentation represents a true challenge in the automobile insurance industry, as datasets are large, multidimensional, unbalanced and it also requires a unique price determination based on the risk profile of the customer. Furthermore, the price determination of an insurance policy or the validity of the compensation claim, in most cases must be an instant decision. Therefore, the purpose of this research is to identify an easily usable data mining tool that is capable to identify key automobile insurance fraud indicators, facilitating the segmentation. In addition, the methods used by the tool, should be based primarily on numerical and categorical variables, as there is no well-functioning text mining tool for Central Eastern European languages. Hence, we decided on the SQL Server Analysis Services (SSAS) tool and to compare the performance of the decision tree, neural network and Naïve Bayes methods. The results suggest that decision tree and neural network are more suitable than Naïve Bayes, however the best conclusion can be drawn if we use the decision tree and neural network together.

Download Full-text

An analysis on business intelligence predicting business profitability model using Naive Bayes neural network algorithm

2017 7th IEEE International Conference on System Engineering and Technology (ICSET) ◽

10.1109/icsengt.2017.8123421 ◽

2017 ◽

Cited By ~ 2

Author(s):

Mohd Taufik Mishan ◽

Albin Lemuel Kushan ◽

Ahmad Firdaus Ahmad Fadzil ◽

Aimi Liyana Binti Amir ◽

Nurhilyana Binti Anuar

Keyword(s):

Neural Network ◽

Business Intelligence ◽

Naive Bayes ◽

Naïve Bayes ◽

Network Algorithm ◽

Neural Network Algorithm ◽

Business Profitability

Download Full-text

Komparasi Algoritma Kasifikasi dengan Pendekatan Level Data Untuk Menangani Data Kelas Tidak Seimbang

JURNAL ILMIAH ILMU KOMPUTER ◽

10.35329/jiik.v3i1.60 ◽

2017 ◽

Vol 3 (1) ◽

pp. 1-6

Author(s):

Ahmad Ilham

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Linear Regression ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Level Data ◽

Under Sampling

Masalah data kelas tidak seimbang memiliki efek buruk pada ketepatan prediksi data. Untuk menangani masalah ini, telah banyak penelitian sebelumnya menggunakan algoritma klasifikasi menangani masalah data kelas tidak seimbang. Pada penelitian ini akan menyajikan teknik under-sampling dan over-sampling untuk menangani data kelas tidak seimbang. Teknik ini akan digunakan pada tingkat preprocessing untuk menyeimbangkan kondisi kelas pada data. Hasil eksperimen menunjukkan neural network (NN) lebih unggul dari decision tree (DT), linear regression (LR), naïve bayes (NB) dan support vector machine (SVM).

Download Full-text

Komparasi Algoritma Klasifikasi dengan Pendekatan Level Data untuk Menangani Data Kelas Tidak Seimbang

10.31227/osf.io/xwefp ◽

2018 ◽

Author(s):

Ahmad Ilham

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Linear Regression ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Level Data ◽

Under Sampling

Saat ini data real dari berbagai sumber sangat banyak mengandung data dengan kelas tidak seimbang. Masalah data kelas tidak seimbang dapat menimbulkan efek buruk pada metode klasifikasi untuk ketepatan prediksi pada data. Untuk menangani masalah ini, telah banyak penelitian sebelumnya menggunakan algoritma klasifikasi menangani masalah data kelas tidak seimbang. Pada penelitian ini akan menyajikan teknik under-sampling dan over-sampling untuk menangani data kelas tidak seimbang. Teknik ini akan digunakan pada tingkat preprocessing untuk menyeimbangkan kondisi kelas pada data. Hasil eksperimen menunjukkan neural network (NN) lebih unggul dari decision tree (DT), linear regression (LR), naïve bayes (NB) dan support vector machine (SVM).

Download Full-text

KOMPARASI NAÏVE BAYES, SUPPORT VECTOR MACHINE DAN K-NEAREST NEIGHBOR UNTUK MENGETAHUI AKURASI TERTINGGI PADA PREDIKSI KELANCARAN PEMBAYARAN TV KABEL

ILKOM Jurnal Ilmiah ◽

10.33096/ilkom.v11i1.408.11-16 ◽

2019 ◽

Vol 11 (1) ◽

pp. 11-16

Author(s):

Mohamad Efendi Lasulika

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Types ◽

Neural Network Algorithm ◽

Bayes Algorithm

One obstacle of the default payment is the lack of analysis in the new customer acceptance process which is only reviewed from the form provided at registration, as for the purpose of this study to find out the highest accuracy results from the comparison of Naïve Bayes, SVM and K-NN Algorithms. It can be seen that the Naïve Bayes algorithm which has the highest accuracy value is 96%, while the K-Neural Network algorithm has the highest accuracy at K = 3 which is 92%, while Support Vector Machine only gets accuracy of 66%. The ROC Curve results show that Naïve Bayes achieved the best AUC value of 0.99. Comparison between data mining classification algorithms namely Naïve Bayes, K-Neural Network and Support Vector Machine for predicting smooth payment using multivariate data types, Naïve Bayes method is an accurate algorithm and this method is also very dominant towards other methods. Based on Accuracy, AUC and T-tests this method falls into the best classification category.

Download Full-text

KOMPARASI METODE KLASIFIKASI PADA ANALISIS SENTIMEN USAHA WARALABA BERDASARKAN DATA TWITTER

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.752 ◽

2019 ◽

Vol 15 (2) ◽

pp. 267-274

Author(s):

Tati Mardiana ◽

Hafiz Syahreva ◽

Tuslaela Tuslaela

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbor

Saat ini usaha waralaba di Indonesia memiliki daya tarik yang relatif tinggi. Namun, para pelaku usaha banyak juga yang mengalami kegagalan. Bagi seseorang yang ingin memulai usaha perlu mempertimbangkan sentimen masyarakat terhadap usaha waralaba. Meskipun demikian, tidak mudah untuk melakukan analisis sentimen karena banyaknya jumlah percakapan di Twitter terkait usaha waralaba dan tidak terstruktur. Tujuan penelitian ini adalah melakukan komparasi akurasi metode Neural Network, K-Nearest Neighbor, Naïve Bayes, Support Vector Machine, dan Decision Tree dalam mengekstraksi atribut pada dokumen atau teks yang berisi komentar untuk mengetahui ekspresi didalamnya dan mengklasifikasikan menjadi komentar positif dan negatif. Penelitian ini menggunakan data realtime dari tweets pada Twitter. Selanjutnya mengolah data tersebut dengan terlebih dulu membersihkannya dari noise dengan menggunakan Phyton. Hasil pengujian dengan confusion matrix diperoleh nilai akurasi Neural Network sebesar 83%, K-Nearest Neighbor sebesar 52%, Support Vector Machine sebesar 83%, dan Decision Tree sebesar 81%. Penelitian ini menunjukkan metode Support Vector Machine dan Neural Network paling baik untuk mengklasifikasikan komentar positif dan negatif terkait usaha waralaba.

Download Full-text

Prediksi Hipertensi menggunakan Decision Tree, Naïve Bayes dan Artificial Neural Network pada software KNIME

Techno Com ◽

10.33633/tc.v19i4.3872 ◽

2020 ◽

Vol 19 (4) ◽

pp. 353-363

Author(s):

Mayanda Mega Santoni ◽

Nurul Chamidah ◽

Nurhafifah Matondang

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Decision Tree ◽

Cross Validation ◽

Naive Bayes ◽

Naïve Bayes ◽

Network Data ◽

Artificial Neural ◽

Fold Cross Validation

Hipertensi merupakan salah satu penyakit tidak menular yang dapat menyebabkan kematian karena meningkatkan resiko munculnya berbagai penyakit seperti gagal ginjal, gagal jantung, bahkan stroke. Resiko hipertensi disebabkan oleh beberapa faktor penyebab seperti usia, keturunan, pola makan dan olahraga, dan merokok. Teknologi artificial intelligence yakni machine learning dimanfaatkan di bidang kesehatan khususnya prediksi penyakit hipertensi. Pada penelitian ini diimplementasi tiga algoritma machine learning yakni decision tree, naïve bayes dan artificial neural networks. Data yang digunakan pada penelitian ini sebanyak 274 data yang diperoleh dari hasil kuesioner dengan 26 pertanyaan, dimana 25 pertanyaan adalah variabel faktor resiko dan satu pertanyaan merupakan kelas yang menyatakan responden memiliki riwayat hipertensi atau tidak. Data diolah menggunakan platform analisis data yakni KNIME. Sebelum data diolah untuk membangun model klasifikasi menggunakan decision tree, naïve bayes dan artificial neural network, data dipraproses terlebih dahulu dengan melakukan imputasi missing value, oversampling dan normalisasi data. Selanjutnya pembagian data menggunakan 5-fold cross validation. Model klasifikasi yang diperoleh dievaluasi menggunakan nilai akurasi, recall dan precision. Hasil evaluasi dari eksperimen yang dilakukan diperoleh bahwa algoritma artificial neural network memiliki tingkat performa lebih baik dibandingkan decision tree dan naïve bayes dengan nilai akurasi sebesar 94.7%, recall sebesar 91.5% dan precision sebesar 97.7%.

Download Full-text

A COMPARISON OF ACCURACY BETWEEN TWO METHODS: NAЇVE BAYES ALGORITHM AND DECISION TREE-J48 TO PREDICT THE STOCK PRICE OF PT ASTRA INTERNATIONAL Tbk USING DATA FROM INDONESIA STOCK EXCHANGE

Abstract Proceedings International Scholars Conference ◽

10.35974/isc.v7i1.1872 ◽

2019 ◽

Vol 7 (1) ◽

pp. 1244-1258

Author(s):

Joan Yuliana Hutapea ◽

Yusran Timur Samuel ◽

Heima Sitorus

Keyword(s):

Decision Making ◽

Decision Tree ◽

Stock Prices ◽

Stock Price ◽

Naive Bayes ◽

Stock Exchange ◽

Naïve Bayes ◽

Bayes Method ◽

Testing Data ◽

Bayes Algorithm

The ability to predict the stock prices is very important for market players, whether individual or organizational investors. The market players needs to know how to predict, that will help them in their decision making process, whether to buy or to sell its shares, so that it can maximize profits and reduce potential losses due to mistakes in decision making. In accordance to this, the authors conducted a study that aimed to analyze and to compare the accuracy of two (2) methods that is used to predict the stock prices, namely: the Naїve Bayes Method and the Decision Tree-J48 Method. The amount of data used in this study were 1,195 stock datas of PT Astra International Tbk, issued by the IDX, by the period of January 1, 2013 to November 30, 2017. This study uses 7 attributes, namely: Previews, High, Low, Close, Volume, Value, and Frequency. By using the WEKA application the result shows that, the accuracy of the Naïve Bayes Method using 20% of testing data, is 92.0502%, the precision value is 0.920 and the value of recall is 0.961, while the accuracy of the Decision Tree J-48 method, using 20% of testing data, is 98.7448%, with precision value of 0.989 and the value of recall of 0.997. Through this results, it can be concluded that the decision tree J-48 algorithm has a better accuracy results compared to the Naive Bayes algorithm in predicting the stock price of PT. Astra Internasional Tbk.

Download Full-text

Oversampling Method on Classifying Hypertension Using Naive Bayes, Decision Tree, and Artificial Neural Network

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i4.2015 ◽

2020 ◽

Vol 4 (4) ◽

pp. 635-641

Author(s):

Nurul Chamidah ◽

Mayanda Mega Santoni ◽

Nurhafifah Matondang

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Decision Tree ◽

Missing Values ◽

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Training Data ◽

Validation Data ◽

Artificial Neural

Oversampling is a technique to balance the number of data records for each class by generating data with a small number of records in a class, so that the amount is balanced with data with a class with a large number of records. Oversampling in this study is applied to hypertension dataset where hypertensive class has a small number of records when compared to the number of records for non-hypertensive classes. This study aims to evaluate the effect of oversampling on the classification of hypertension dataset consisting of hypertensive and non-hypertensive classes by utilizing the Naïve Bayes, Decision Tree, and Artificial Neural Network (ANN) as well as finding the best model of the three algorithms. Evaluation of the use of oversampling on hypertension dataset is done by processing the data by imputing missing values, oversampling, and transforming data into the same range, then using the Naïve Bayes, Decision Tree, and ANN to build classification models. By dividing 80% of data as training data to build models and 20% as validation data for testing models, we had an increase in classification performance in the form of accuracy, precision, and recall of the oversampled data when compared without oversampling. The best performance in this study resulted in the highest accuracy using ANN with 0.91, precision 0.86 and recall 0.99.

Download Full-text

Perbandingan Metode Klasifikasi Multiclass untuk Pemetaan Zona Risiko COVID-19 di Pulau Jawa

Jurnal Komputer dan Informatika ◽

10.35508/jicon.v9i1.3602 ◽

2021 ◽

Vol 9 (1) ◽

pp. 98-107

Author(s):

Jesica Nauli Br. Siringo Ringo ◽

Wahyu Joko Mursalin ◽

Nisrina Citra Nurfadilah ◽

Dwiky Rachmat Ramadhan ◽

Wa Ode Zuhayeni Madjida

Keyword(s):

Neural Network ◽

Data Mining ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Imbalanced Data ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Missing Value

Penambahan kasus COVID-19 yang besar di Indonesia, khususnya Pulau Jawa, membutuhkan berbagai upaya untuk mengendalikannya. Salah satu upaya efektif yang dapat dilakukan adalah tindakan preventif dengan memberi informasi mengenai kondisi suatu wilayah. Sebagai peringatan kepada masyarakat dan sebagai upaya pengambilan kebijakan daerah, Indonesia mengeluarkan zona risiko sampai pada tingkat kabupaten/kota melalui Satgas Penanganan COVID-19. Pembentukan level zona risiko tersebut menggunakan teknik konvensional yaitu pembobotan skor menggunakan informasi dari tiga jenis indikator. Dengan mempertimbangkan bahwa zona risiko merupakan hal yang penting dalam penentuan kebijakan terkait COVID-19, penelitian ini bertujuan untuk membangun model klasifikasi zona risiko kabupaten/kota di Pulau Jawa menggunakan beberapa teknik klasifikasi data mining dan menentukan model klasifikasi terbaik berdasarkan hasil evaluasi. Teknik klasifikasi yang digunakan sebagai perbandingan dalam penelitian ini adalah naive Bayes, decision tree, k-nearest-neighbor, dan neural network. Sebelum dilakukan pemodelan, data disesuaikan terlebih dahulu pada tahap preprocessing di mana pada tahap tersebut teridentifikasi terdapat permasalahan missing value dan imbalanced data. Permasalahan tersebut diatasi dengan imputasi data dan teknik oversampling. Hasil penelitian menunjukkan bahwa model k-nearest-neighbor merupakan model terbaik dibandingkan tiga model lainnya. Hasil tersebut didasarkan pada ukuran evaluasi keempat model di mana model k-NN memiliki nilai acccuracy, nilai rata-rata makro untuk sensitivitas, spesifisitas, dan ukuran F1 paling tinggi dibandingkan model lainnya.

Download Full-text