PERBANDINGAN ALGORITMA K-NEAREST NEIGHBOR, DECISION TREE, DAN NAIVE BAYES  UNTUK MENENTUKAN KELAYAKAN PEMBERIAN KREDIT

The banking world in terms of providing credit to customers is a regular activity that has a large effect. In its application, non-performing loans or bad loans are often created due to poor credit analysis in the credit granting process, or from bad customers. The purpose of this study is to compare the results of algorithm accuracy between K-Nearest Neighbor (K-NN), Decision Tree, and Naive Bayes which results in the best accuracy will be implemented to determine creditworthiness. The attributes used in this study consisted of 11 attributes, namely marital status, number of dependents, age, last education, occupation, monthly income, home ownership, collateral, loan amount, length of loan and information as result attributes. The methods used in this research are K-Nearest Neighbor, Decision Tree, and Naive Bayes. From the results of evaluation and validation using k-5 fold that has been carried out using RapidMiner tools, the highest accuracy results from a comparison of 3 algorithms is using a decision tree (C4.5) of 98% in the 3rd test.

Download Full-text

IMPLEMENTASI DATA MINING UNTUK MENENTUKAN KELAYAKAN PEMBERIAN KREDIT DENGAN MENGGUNAKAN ALGORITMA K-NEAREST NEIGHBORS (K-NN)

Infotech: Journal of Technology Information ◽

10.37365/it.v6i1.78 ◽

2020 ◽

Vol 6 (1) ◽

pp. 43-48

Author(s):

Tupan Tri Muryono ◽

Irwansyah Irwansyah

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Home Ownership ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

Analysis Process ◽

Status Number ◽

The Right ◽

Credit Analysis ◽

Loan Amount

In English : The banking world in terms of lending to customers is routine activities that are at high risk. In its execution, the problematic credit or bad credit is often due to the lack of careful credit analysis in the process of granting credit, as well as from poor customers. The purpose of this study is to implement data mining to assist in conducting credit analysis process in order to produce the right information whether the customer who will apply for the credit is worthy or not to be able to see the potential payment by the customer. The attributes used in this study consist of 11 attributes i.e. marital status, number of liabilities, age, last education, occupation, monthly income, home ownership, warranties, loan amount, length of loan and description as a result attribute. The methods of data collection used are observation, interviews, and documentation. The method used in this study is K-Nearest Neighbor (K-NN). From the results of evaluation and validation using the K-5 fold that has been done using the RapidMiner tools obtained the highest accuracy results from the K-Nearest Neighbor (K-NN) method of 93.33% in the 5th test. In Indonesian : Dunia perbankan dalam hal pemberian kredit kepada nasabah adalah kegiatan rutin yang mempunyai resiko tinggi. Dalam pelaksanaannya, kredit yang bermasalah atau kredit macet sering terjadi akibat analisis kredit kurang cermat dalam proses pemberian kredit, maupun dari nasabah yang tidak baik. Tujuan dalam penelitian ini ialah menerapkan data mining untuk dapat membantu melakukan proses analisis kredit agar dapat menghasilkan informasi yang tepat apakah nasabah yang akan mengajukan kreditnya layak atau tidaknya sehingga dapat melihat potensi pembayaran kredit yang dilakukan nasabah. Atribut yang digunakan dalam penelitian ini terdiri dari 11 atribut yaitu status perkawinan, jumlah tanggungan, usia, pendidikan terakhir, pekerjaan, penghasilan perbulan, kepemilikan rumah, jaminan, jumlah pinjaman, lama pinjaman dan keterangan sebagai atribut hasil. Metode pungumpulan data yang digunakan ialah observasi, wawancara, dan dokumentasi. Metode yang digunakan dalam penelitian ini adalah K-Nearest Neighbor (K-NN). Dari hasil evaluasi dan validasi menggunakan k-5 fold yang telah dilakukan menggunakan tools RapidMiner diperoleh hasil akurasi tertinggi dari Metode K-Nearest Neighbor (K-NN) sebesar 93.33% pada pengujian ke 5.

Download Full-text

IMPLEMENTASI DATA MINING UNTUK MENENTUKAN KELAYAKAN PEMBERIAN KREDIT DENGAN MENGGUNAKAN ALGORITMA K-NEAREST NEIGHBORS (K-NN)

Infotech: Journal of Technology Information ◽

10.37365/jti.v6i1.78 ◽

2020 ◽

Vol 6 (1) ◽

pp. 43-48

Author(s):

Tupan Tri Muryono ◽

Irwansyah Irwansyah

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Home Ownership ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

Analysis Process ◽

Status Number ◽

The Right ◽

Credit Analysis ◽

Loan Amount

The banking world in terms of lending to customers is routine activities that are at high risk. In its execution, the problematic credit or bad credit is often due to the lack of careful credit analysis in the process of granting credit, as well as from poor customers. The purpose of this study is to implement data mining to assist in conducting credit analysis process in order to produce the right information whether the customer who will apply for the credit is worthy or not to be able to see the potential payment by the customer. The attributes used in this study consist of 11 attributes i.e. marital status, number of liabilities, age, last education, occupation, monthly income, home ownership, warranties, loan amount, length of loan and description as a result attribute. The methods of data collection used are observation, interviews, and documentation. The method used in this study is K-Nearest Neighbor (K-NN). From the results of evaluation and validation using the K-5 fold that has been done using the RapidMiner tools obtained the highest accuracy results from the K-Nearest Neighbor (K-NN) method of 93.33% in the 5th test.

Download Full-text

Sentiment Analysis about E-Commerce from Tweets Using Decision Tree, K-Nearest Neighbor, and Naïve Bayes

2018 International Conference on Orange Technologies (ICOT) ◽

10.1109/icot.2018.8705796 ◽

2018 ◽

Cited By ~ 2

Author(s):

Achmad Bayhaqy ◽

Sfenrianto Sfenrianto ◽

Kaman Nainggolan ◽

Emil R. Kaburuan

Keyword(s):

Decision Tree ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

K Nearest Neighbor

Download Full-text

Centroid Based Classifier With TF – IDF – ICF for Classfication of Student’s Complaint at Appliation E-Complaint in Muhammadiyah University of Sidoarjo

JEEE-U (Journal of Electrical and Electronic Engineering-UMSIDA) ◽

10.21070/jeee-u.v1i1.23 ◽

2016 ◽

Vol 1 (1) ◽

pp. 17 ◽

Cited By ~ 1

Author(s):

Mochamad Alfan Rosid ◽

Gunawan Gunawan ◽

Edwin Pramana

Keyword(s):

Text Mining ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Base Classifier

Text mining mengacu pada proses mengambil informasi berkualitas tinggi dari teks. Informasi berkualitas tinggi biasanya diperoleh melalui peramalan pola dan kecenderungan melalui sarana seperti pembelajaran pola statistik. Salah satu kegiatan penting dalam text mining adalah klasifikasi atau kategorisasi teks. Kategorisasi teks sendiri saat ini memiliki berbagai metode antara lain metode K-Nearest Neighbor, Naïve Bayes, dan Centroid Base Classifier, atau decision tree classification.Pada penelitian ini, klasifikasi keluhan mahasiswa dilakukan dengan metode centroid based classifier dan dengan fitur TF-IDF-ICF, Ada lima tahap yang dilakukan untuk mendapatkan hasil klasifikasi. Tahap pengambilan data keluhan kemudian dilanjutkan dengan tahap preprosesing yaitu mempersiapkan data yang tidak terstruktur sehingga siap digunakan untuk proses selanjutnya, kemudian dilanjutkan dengan proses pembagian data, data dibagi menjadi dua macam yaitu data latih dan data uji, tahap selanjutnya yaitu tahap pelatihan untuk menghasilkan model klasifikasi dan tahap terakhir adalah tahap pengujian yaitu menguji model klasifikasi yang telah dibuat pada tahap pelatihan terhadap data uji. Keluhan untuk pengujian akan diambilkan dari database aplikasi e-complaint Universitas Muhammadiyah Sidoarjo. Adapun hasil uji coba menunjukkan bahwa klasifikasi keluhan dengan algoritma centroid based classifier dan dengan fitur TF-IDF-ICF memiliki rata-rata akurasi yang cukup tinggi yaitu 79.5%. Nilai akurasi akan meningkat dengan meningkatnya data latih dan efesiensi sistem semakin menurun dengan meningkatnya data latih.

Download Full-text

COMPARATIVE STUDY OF CLASSIFICATION ALGORITHMS: HOLDOUTS AS ACCURACY ESTIMATION

CogITo Smart Journal ◽

10.31154/cogito.v1i1.2.13-23 ◽

2016 ◽

Vol 1 (1) ◽

pp. 13 ◽

Cited By ~ 1

Author(s):

Debby Erce Sondakh

Keyword(s):

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Decision Rules ◽

Naïve Bayes ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Accuracy Estimation ◽

F Measure

Penelitian ini bertujuan untuk mengukur dan membandingkan kinerja lima algoritma klasifikasi teks berbasis pembelajaran mesin, yaitu decision rules, decision tree, k-nearest neighbor (k-NN), naïve Bayes, dan Support Vector Machine (SVM), menggunakan dokumen teks multi-class. Perbandingan dilakukan pada efektifiatas algoritma, yaitu kemampuan untuk mengklasifikasi dokumen pada kategori yang tepat, menggunakan metode holdout atau percentage split. Ukuran efektifitas yang digunakan adalah precision, recall, F-measure, dan akurasi. Hasil eksperimen menunjukkan bahwa untuk algoritma naïve Bayes, semakin besar persentase dokumen pelatihan semakin tinggi akurasi model yang dihasilkan. Akurasi tertinggi naïve Bayes pada persentase 90/10, SVM pada 80/20, dan decision tree pada 70/30. Hasil eksperimen juga menunjukkan, algoritma naïve Bayes memiliki nilai efektifitas tertinggi di antara lima algoritma yang diuji, dan waktu membangun model klasiifikasi yang tercepat, yaitu 0.02 detik. Algoritma decision tree dapat mengklasifikasi dokumen teks dengan nilai akurasi yang lebih tinggi dibanding SVM, namun waktu membangun modelnya lebih lambat. Dalam hal waktu membangun model, k-NN adalah yang tercepat namun nilai akurasinya kurang.

Download Full-text

Analisis Komparatif Evaluasi Performa Algoritma Klasifikasi pada Readmisi Pasien Diabetes

Jurnal Buana Informatika ◽

10.24002/jbi.v7i4.770 ◽

2016 ◽

Vol 7 (4) ◽

Author(s):

Mochammad Yusa ◽

Ema Utami ◽

Emha T. Luthfi

Keyword(s):

Data Mining ◽

Decision Tree ◽

Cross Validation ◽

Nearest Neighbor ◽

Naive Bayes ◽

Kappa Statistic ◽

Naïve Bayes ◽

Validation Dataset ◽

K Nearest Neighbor ◽

Fold Cross Validation

Abstract. Readmission is associated with quality measures on patients in hospitals. Different attributes related to diabetic patients such as medication, ethnicity, race, lifestyle, age, and others result in the calculation of quality care that tends to be complicated. Classification techniques of data mining can solve this problem. In this paper, the evaluation on three different classifiers, i.e. Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes with various settingparameter, is developed by using 10-Fold Cross Validation technique. The targets of parameter performance evaluated is based on term of Accuracy, Mean Absolute Error (MAE), dan Kappa Statistic. The selected dataset consists of 47 attributes and 49.735 records. The result shows that k-NN classifier with k=100 has a better performance in term of accuracy and Kappa Statistic, but Naive Bayes outperforms in term of MAE among other classifiers. Keywords: k-NN, naive bayes, diabetes, readmissionAbstrak. Proses Readmisi dikaitkan dengan perhitungan kualitas penanganan pasien di rumah sakit. Perbedaan atribut-atribut yang berhubungan dengan pasien diabetes proses medikasi, etnis, ras, gaya hidup, umur, dan lain-lain, mengakibatkan perhitungan kualitas cenderung rumit. Teknik klasifikasi data mining dapat menjadi solusi dalam perhitungan kualitas ini. Teknik klasifikasi merupakan salah satu teknik data mining yang perkembangannya cukup signifikan. Di dalam penelitian ini, model algoritma klasifikasi Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes dengan berbagai parameter setting akan dievaluasi performanya berdasarkan nilai performa Accuracy, Mean AbsoluteError (MAE), dan Kappa Statistik dengan metode 10-Fold Cross Validation. Dataset yang dievaluasi memiliki 47 atribut dengan 49.735 records. Hasil penelitian menunjukan bahwa performa accuracy, MAE, dan Kappa Statistik terbaik didapatkan dari Model Algoritma Naive Bayes.Kata Kunci: k-NN, naive bayes, diabetes, readmisi

Download Full-text

Analisis Perbandingan Kinerja Algoritma Naïve Bayes, Decision Tree-J48 dan Lazy-IBK

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i3.3055 ◽

2021 ◽

Vol 5 (3) ◽

pp. 1038

Author(s):

Indra Rukmana ◽

Arvin Rasheda ◽

Faiz Fathulhuda ◽

Muh Rizky Cahyadi ◽

Fitriyani Fitriyani

Keyword(s):

Breast Cancer ◽

Decision Tree ◽

Thoracic Surgery ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Breast Cancer Dataset ◽

Decision Tree Algorithm ◽

K Nearest Neighbor ◽

Cancer Dataset

This research is focused on knowing the performance of the classification algorithms, namely Naïve Bayes, Decision Tree-J48 and K-Nearest Neighbor. The speed and the percentage of accuracy in this study are the benchmarks for the performance of the algorithm. This study uses the Breast Cancer and Thoracic Surgery dataset, which is downloaded on the UCI Machine Learning Repository website. Using the help of Weka software Version 3.8.5 to find out the classification algorithm testing. The results show that the J-48 Decision Tree algorithm has the best accuracy, namely 75.6% in the cross-validation test mode for the Breast Cancer dataset and 84.5% for the Thoracic Surgery dataset.

Download Full-text

KOMPARASI METODE KLASIFIKASI PADA ANALISIS SENTIMEN USAHA WARALABA BERDASARKAN DATA TWITTER

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.752 ◽

2019 ◽

Vol 15 (2) ◽

pp. 267-274

Author(s):

Tati Mardiana ◽

Hafiz Syahreva ◽

Tuslaela Tuslaela

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbor

Saat ini usaha waralaba di Indonesia memiliki daya tarik yang relatif tinggi. Namun, para pelaku usaha banyak juga yang mengalami kegagalan. Bagi seseorang yang ingin memulai usaha perlu mempertimbangkan sentimen masyarakat terhadap usaha waralaba. Meskipun demikian, tidak mudah untuk melakukan analisis sentimen karena banyaknya jumlah percakapan di Twitter terkait usaha waralaba dan tidak terstruktur. Tujuan penelitian ini adalah melakukan komparasi akurasi metode Neural Network, K-Nearest Neighbor, Naïve Bayes, Support Vector Machine, dan Decision Tree dalam mengekstraksi atribut pada dokumen atau teks yang berisi komentar untuk mengetahui ekspresi didalamnya dan mengklasifikasikan menjadi komentar positif dan negatif. Penelitian ini menggunakan data realtime dari tweets pada Twitter. Selanjutnya mengolah data tersebut dengan terlebih dulu membersihkannya dari noise dengan menggunakan Phyton. Hasil pengujian dengan confusion matrix diperoleh nilai akurasi Neural Network sebesar 83%, K-Nearest Neighbor sebesar 52%, Support Vector Machine sebesar 83%, dan Decision Tree sebesar 81%. Penelitian ini menunjukkan metode Support Vector Machine dan Neural Network paling baik untuk mengklasifikasikan komentar positif dan negatif terkait usaha waralaba.

Download Full-text

Performance Comparison between Naïve Bayes, Decision Tree and k-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2013.041105 ◽

2013 ◽

Vol 4 (11) ◽

Cited By ~ 22

Author(s):

Ahmad Ashari ◽

Iman Paryudi ◽

A Min

Keyword(s):

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Performance Comparison ◽

Energy Simulation ◽

Simulation Tool ◽

K Nearest Neighbor ◽

Alternative Design

Download Full-text

Sistem Prediksi Penyakit Kanker Serviks Menggunakan CART, Naive Bayes, dan k-NN

Creative Information Technology Journal ◽

10.24076/citec.2017v4i2.100 ◽

2018 ◽

Vol 4 (2) ◽

pp. 83

Author(s):

Tutus Praningki ◽

Indra Budi

Keyword(s):

Data Mining ◽

Decision Tree ◽

Pap Smear ◽

Nearest Neighbor ◽

Naive Bayes ◽

Confusion Matrix ◽

Regression Trees ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Classification And Regression

Tersedianya data histori rekam medis pasien kanker serviks pada institusi pelayanan kesehatan, tidak disertai dengan proses ekstraksi menjadi sebuah pengetahuan atau informasi. Penggunaan teknik data mining sangat berpotensi untuk diimplementasikan kedalam sistem yang dapat melakukan prediksi penyakit kanker serviks. Pada penelitian ini berfokus pada dataset diagnosa medis pasien yang akan melakukan tes Pap Smear. Algoritma yang digunakan untuk melakukan klasifikasi penyakit kanker serviks adalah Classification And Regression Trees (CART), Naive Bayes, dan k-Nearest Neighbor (k-NN). Pengujian yang dilakukan terhadap algoritma CART Decision Tree, Naive Bayes, dan k-NN, menggunakan formula Confusion Matrix, dengan menggunakan teknik pemecahan dataset Holdout. Hasil pengujian terhadap algoritma yang digunakan, menunjukkan algoritma Naive Bayes memiliki akurasi terbaik sebesar 94,44%, sedangkan tingkat akurasi yang dihasilkan algoritma CART dan k-NN adalah 88,89%, 85,04%. Performa yang didapatkan oleh masing-masing algoritma yang digunakan, memungkinkan penggunaan sistem prediksi penyakit kanker serviks untuk mendukung keputusan klinis pada pasien baru.

Download Full-text