Comparison Analysis of K-Nearest Neighbor and Naïve Bayes in Determining Talent of Adolescence

Adolescence always searches for the identity to shape the personality character. This paper aims to use the artificial intelligent analysis to determine the talent of the adolescence. This study uses a sample of children aged 10-18 years with testing data consisting of 100 respondents. The algorithm used for analysis is the K-Nearest Neigbor and Naive Bayes algorithm. The analysis results are performance of accuracy results of both algorithms of classification. In knowing the accurate algorithm in determining children's interests and talents, it can be seen from the accuracy of the data with the confusion matrix using the RapidMiner software for training data, testing data, and combined training and testing data. This study concludes that the K-Nearest Neighbor algorithm is better than Naive Bayes in terms of classification accuracy.

Download Full-text

Performance comparison between naive bayes and k- nearest neighbor algorithm for the classification of Indonesian language articles

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i2.pp452-457 ◽

2021 ◽

Vol 10 (2) ◽

pp. 452

Author(s):

Titin Winarti ◽

Henny Indriyawati ◽

Vensy Vydia ◽

Febrian Wahyu Christanto

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Performance Comparison ◽

Training Data ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm ◽

Bayes Algorithm

The match between the contents of the article and the article theme is the main factor whether or not an article is accepted. Many people are still confused to determine the theme of the article appropriate to the article they have. For that reason, we need a document classification algorithm that can group the articles automatically and accurately. Many classification algorithms can be used. The algorithm used in this study is naive bayes and the k-nearest neighbor algorithm is used as the baseline. The naive bayes algorithm was chosen because it can produce maximum accuracy with little training data. While the k-nearest neighbor algorithm was chosen because the algorithm is robust against data noise. The performance of the two algorithms will be compared, so it can be seen which algorithm is better in classifying documents. The comes about obtained show that the naive bayes algorithm has way better execution with an accuracy rate of 88%, while the k-nearest neighbor algorithm has a fairly low accuracy rate of 60%.

Download Full-text

RB-Bayes algorithm for the prediction of diabetic in Pima Indian dataset

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i6.pp4866-4872 ◽

2019 ◽

Vol 9 (6) ◽

pp. 4866

Author(s):

Rajni Rajni ◽

Amandeep Amandeep

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Early Stage ◽

Human Life ◽

Naïve Bayes ◽

Support Vector ◽

Pima Indians ◽

K Nearest Neighbor ◽

Fast Pace ◽

Bayes Algorithm

Diabetes is a major concern all over the world. It is increasing at a fast pace. People can avoid diabetes at an early stage without any test. The goal of this paper is to predict the probability of whether the person has a risk of diabetes or not at an early stage. This would lead to having a great impact on their quality of human life. The datasets are Pima Indians diabetes and Cleveland coronary illness and consist of 768 records. Though there are a number of solutions available for information extraction from a huge datasets and to predict the possibility of having diabetes, but the accuracy of their mining process is far from accurate. For achieving highest accuracy, the issue of zero probability which is generally faced by naïve bayes analysis needs to be addressed suitably. The proposed framework RB-Bayes aims to extract the required information with high accuracy that could survive the problem of zero probability and also configure accuracy with other methods like Support Vector Machine, Naive Bayes, and K Nearest Neighbor. We calculated mean to handle missing data and calculated probability for yes (positive) and no (negative). The highest value between yes and no decide the value for the tuple. It is mostly used in text classification. The outcomes on Pima Indian diabetes dataset demonstrate that the proposed methodology enhances the precision as a contrast with other regulated procedures. The accuracy of the proposed methodology large dataset is 72.9%.

Download Full-text

Penerapan Metode Naive Bayes dalam Memprediksi Kepuasan Mahasiswa Terhadap Cara Pengajaran Dosen

Prosiding Seminar Nasional Riset Information Science (SENARIS) ◽

10.30645/senaris.v1i0.34 ◽

2019 ◽

Vol 1 ◽

pp. 287

Author(s):

Desi Ratna Sari ◽

Dedy Hartama ◽

Irfan Sudahri Damanik ◽

Anjar Wanto

Keyword(s):

Higher Education ◽

Student Satisfaction ◽

Teaching Methods ◽

Naive Bayes ◽

Teaching Method ◽

Higher Education Institutions ◽

Naïve Bayes ◽

Training Data ◽

Testing Data ◽

Bayes Algorithm

This research aims to classify in determining student satisfaction with teaching methods at STIKOM Tunas Bangsa. Data obtained from the results of the 2015 and 2016 semester student questionnaires were odd, with a sample of 80 students. Attributes used are 4, namely communication (C1), Building learning atmosphere (C2), Assessment of students (C3) and delivery of material (C4). The method used in this study is the Naïve Bayes Algorithm and is processed using RapidMiner studio 5.3 software to determine student satisfaction with teaching methods. Training data used 100 data while testing data used in manual calculations as much as 5 data. From the results of data testing the five data expressed satisfaction with the way teaching lecturers at STIKOM Tunas Bangsa. While the training data that is processed with RapidMiner has an accuracy of 92.00%. With this analysis, it is expected to be able to help higher education institutions to evaluate the performance of lecturers, especially in evaluating one of the three triharma colleges, namely the teaching method of lecturers.

Download Full-text

KOMPARASI NAÏVE BAYES, SUPPORT VECTOR MACHINE DAN K-NEAREST NEIGHBOR UNTUK MENGETAHUI AKURASI TERTINGGI PADA PREDIKSI KELANCARAN PEMBAYARAN TV KABEL

ILKOM Jurnal Ilmiah ◽

10.33096/ilkom.v11i1.408.11-16 ◽

2019 ◽

Vol 11 (1) ◽

pp. 11-16

Author(s):

Mohamad Efendi Lasulika

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Types ◽

Neural Network Algorithm ◽

Bayes Algorithm

One obstacle of the default payment is the lack of analysis in the new customer acceptance process which is only reviewed from the form provided at registration, as for the purpose of this study to find out the highest accuracy results from the comparison of Naïve Bayes, SVM and K-NN Algorithms. It can be seen that the Naïve Bayes algorithm which has the highest accuracy value is 96%, while the K-Neural Network algorithm has the highest accuracy at K = 3 which is 92%, while Support Vector Machine only gets accuracy of 66%. The ROC Curve results show that Naïve Bayes achieved the best AUC value of 0.99. Comparison between data mining classification algorithms namely Naïve Bayes, K-Neural Network and Support Vector Machine for predicting smooth payment using multivariate data types, Naïve Bayes method is an accurate algorithm and this method is also very dominant towards other methods. Based on Accuracy, AUC and T-tests this method falls into the best classification category.

Download Full-text

KOMPARASI METODE KLASIFIKASI PADA ANALISIS SENTIMEN USAHA WARALABA BERDASARKAN DATA TWITTER

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.752 ◽

2019 ◽

Vol 15 (2) ◽

pp. 267-274

Author(s):

Tati Mardiana ◽

Hafiz Syahreva ◽

Tuslaela Tuslaela

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbor

Saat ini usaha waralaba di Indonesia memiliki daya tarik yang relatif tinggi. Namun, para pelaku usaha banyak juga yang mengalami kegagalan. Bagi seseorang yang ingin memulai usaha perlu mempertimbangkan sentimen masyarakat terhadap usaha waralaba. Meskipun demikian, tidak mudah untuk melakukan analisis sentimen karena banyaknya jumlah percakapan di Twitter terkait usaha waralaba dan tidak terstruktur. Tujuan penelitian ini adalah melakukan komparasi akurasi metode Neural Network, K-Nearest Neighbor, Naïve Bayes, Support Vector Machine, dan Decision Tree dalam mengekstraksi atribut pada dokumen atau teks yang berisi komentar untuk mengetahui ekspresi didalamnya dan mengklasifikasikan menjadi komentar positif dan negatif. Penelitian ini menggunakan data realtime dari tweets pada Twitter. Selanjutnya mengolah data tersebut dengan terlebih dulu membersihkannya dari noise dengan menggunakan Phyton. Hasil pengujian dengan confusion matrix diperoleh nilai akurasi Neural Network sebesar 83%, K-Nearest Neighbor sebesar 52%, Support Vector Machine sebesar 83%, dan Decision Tree sebesar 81%. Penelitian ini menunjukkan metode Support Vector Machine dan Neural Network paling baik untuk mengklasifikasikan komentar positif dan negatif terkait usaha waralaba.

Download Full-text

Sistem Prediksi Penyakit Kanker Serviks Menggunakan CART, Naive Bayes, dan k-NN

Creative Information Technology Journal ◽

10.24076/citec.2017v4i2.100 ◽

2018 ◽

Vol 4 (2) ◽

pp. 83

Author(s):

Tutus Praningki ◽

Indra Budi

Keyword(s):

Data Mining ◽

Decision Tree ◽

Pap Smear ◽

Nearest Neighbor ◽

Naive Bayes ◽

Confusion Matrix ◽

Regression Trees ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Classification And Regression

Tersedianya data histori rekam medis pasien kanker serviks pada institusi pelayanan kesehatan, tidak disertai dengan proses ekstraksi menjadi sebuah pengetahuan atau informasi. Penggunaan teknik data mining sangat berpotensi untuk diimplementasikan kedalam sistem yang dapat melakukan prediksi penyakit kanker serviks. Pada penelitian ini berfokus pada dataset diagnosa medis pasien yang akan melakukan tes Pap Smear. Algoritma yang digunakan untuk melakukan klasifikasi penyakit kanker serviks adalah Classification And Regression Trees (CART), Naive Bayes, dan k-Nearest Neighbor (k-NN). Pengujian yang dilakukan terhadap algoritma CART Decision Tree, Naive Bayes, dan k-NN, menggunakan formula Confusion Matrix, dengan menggunakan teknik pemecahan dataset Holdout. Hasil pengujian terhadap algoritma yang digunakan, menunjukkan algoritma Naive Bayes memiliki akurasi terbaik sebesar 94,44%, sedangkan tingkat akurasi yang dihasilkan algoritma CART dan k-NN adalah 88,89%, 85,04%. Performa yang didapatkan oleh masing-masing algoritma yang digunakan, memungkinkan penggunaan sistem prediksi penyakit kanker serviks untuk mendukung keputusan klinis pada pasien baru.

Download Full-text

DIAGNOSA HAMA DAN PENYAKIT PADA TANAMAN PADI MENGGUNAKAN METODE NAIVE BAYES DAN K-NEAREST NEIGHBOR

Jurnal Komputer dan Informatika ◽

10.35508/jicon.v8i2.2906 ◽

2020 ◽

Vol 8 (2) ◽

pp. 156-162

Author(s):

Restanti M Bianome ◽

Yelly Y Nabuasa ◽

Derwin R Sina

Keyword(s):

Test Data ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Case Based Reasoning ◽

K Nearest Neighbor ◽

Rice Plants ◽

Average Value ◽

Bayes Algorithm ◽

Degree Of Similarity

This study builds systems Case Based Reasoning (CBR) to diagnose pests and diseases in rice plants using Naïve Bayes algorithm and K-Nearest Neighbor. CBR is one method of solving the problem with new cases of decision making based on the solution of previous cases by calculating the degree of similarity (similarity), The case consists of 13 species and 10 types of disease pests of rice plants. The degree of similarity can be determined by indexing and nonindexing. Indexing is the process of grouping the cases by classes that have been determined, while nonindexing a process without grouping cases. Based on cross validation testing using average values obtained accuracy of 92.88% to 153 test data on testing using the indexing and the average value of 89.63% accuracy of the test data in the test 153 using nonindexing.

Download Full-text

ANALISA PERBANDINGAN METODE NAÏVE BAYES CLASSIFIER DAN K-NEAREST NEIGHBOR TERHADAP KLASIFIKASI DATA

Sebatik ◽

10.46984/sebatik.v24i1.909 ◽

2020 ◽

Vol 24 (1) ◽

pp. 1-7

Author(s):

Aida Indriani

Keyword(s):

Text Mining ◽

Nearest Neighbor ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

K Nearest Neighbor ◽

Naïve Bayes Classifier

Penggunaan forum sebagai sarana pembelajaran telah banyak digunakan pada kalangan Mahasiswa. Forum digunakan sebagai sarana berdiskusi antar sesama anggota forum untuk membahas materi sesuai dengan judul topik. Judul topik biasanya ditentukan sesuai dengan isi materi yang akan dibahas. Judul topik yang sudah terlalu banyak di dalam sebuah forum dapat berakibat salah dalam pemilihan judul. Salah satu cara untuk mengatasinya yaitu dengan melakukan klasifikasi judul topik secara otomatis sesuai dengan isi materi. Klasifikasi teks dapat diselesaikan dengan menggunakan teknik text mining. Pada proses klasifikasi yang dilakukan yaitu dengan membagi dataset menjadi 2 (dua) bagian menjadi data latih (training) dan data uji (testing). Pada tahapan awal klasifikasi dilakukan proses pre-processing yang diawali dengan tahapan tokenisasi, kemudian dilanjutkan dengan filtering dan diakhiri dengan stemming. Ada beberapa metode yang dapat digunakan dalam klasifikasi teks antara lain naïve bayes classifier (nbc), k-nearest neighbor (k-nn), rocchio, weight adjusted k-nearest neighbor (wa k-nn) dan lain-lain. Pada penelitian ini, penulis membandingkan 2 (dua) metode yaitu nbc dan k-nn. Dari hasil perbandingan kedua metode dapat disimpulkan bahwa metode k-nn lebih baik tingkat akurasinya daripada metode nbc. Hal ini dibuktikan dengan tingkat akurasi sebesar 80% untuk metode k-nn dan sebesar 73% untuk nbc yang dihitung dengan menggunakan metode confusion matrix.

Download Full-text

Prediksi Angka Kelahiran Bayi Pada Desa Tridaya Sakti Dengan Menggunakan Algoritma Naive Bayes

Journal of Students‘ Research in Computer Science ◽

10.31599/jsrcs.v1i2.423 ◽

2020 ◽

Vol 1 (2) ◽

pp. 77-88

Author(s):

Nur Isnaini Parihah ◽

Sari Hartini ◽

Juarni Siregar

Keyword(s):

Data Mining ◽

Population Growth ◽

Naive Bayes ◽

Large Population ◽

Naïve Bayes ◽

Training Data ◽

Birth Rates ◽

Testing Data ◽

Bayes Algorithm ◽

Infant Birth

The birth rate is something that can affect the increase in population growth. Large population is a burden for development. According to Malthus's Theory which states that a large population growth is not the welfare that is obtained but rather poverty will be encountered if the population is not well controlled. The number of baby births in Tridaya Sakti Village is increasing every year. Therefore Data Mining using the Naive Bayes algorithm can help in the calculation of predicting infant birth rates in Tridaya Sakti Village. Data Mining in predicting the number of infant birth rates aims to determine the number of infant birth rates for the coming year using the Naive Bayes algorithm. By looking at the prediction patterns of each variable and testing training data on testing data. It is hoped that the Naive Bayes algorithm can solve the problem in Tridaya Sakti Village in handling and overcoming the calculation of infant birth rates and can help the Tridaya Sakti Village in regulating population growth in the coming years. The results obtained from the data that have been taken and calculated by Data Mining using the Naive Bayes algorithm produce an information that can be used as a reference to find out the number of births. Performance and time in data processing are more effective and efficient as well as more accurate and accurate predictions of the number of baby births. Keywords: Naive Bayes, Birth of a Baby, Prediction Abstrak Angka kelahiran merupakan suatu hal yang dapat mempengaruhi peningkatan pertumbuhan penduduk. Jumlah penduduk yang besar merupakan beban bagi pembangunan. Menurut Teori Malthus yang menyatakan bahwa pertumbuhan jumlah penduduk yang besar bukanlah kesejahteraan yang didapat tapi justru kemelaratan akan ditemui bilamana jumlah penduduk tidak dikendalikan dengan baik. Jumlah angka kelahiran bayi di Desa Tridaya Sakti setiap tahunnya semakin bertambah. Maka dari itu Data Mining dengan menggunakan algoritman Naive Bayes dapat membantu dalam perhitungan memprediksi angka kelahiran bayi di Desa Tridaya Sakti. Data Mining dalam memprediksi jumlah angka kelahiran bayi bertujuan untuk mengetahui jumlah angka kelahiran bayi tahun yang akan mendatang mengunakan algoritma Naive Bayes. Dengan melihat pola prediksi dari setiap variabel dan melakukan pengujian data training terhadap data testing. Diharapkan algoritma Naive Bayes ini dapat menyelesaikan permasalahan di Desa Tridaya Sakti dalam menangani dan mengatasi perhitungan angka kelahiran bayi dan dapat membantu pihak Desa Tridaya Sakti dalam mengatur pertumbuhan jumlah penduduk tahun yang akan mendatang. Hasil yang diperoleh dari data yang sudah diambil dan dihitung dengan Data Mining mengunakan algoritam Naive Bayes menghasilkan sebuah informasi yang dapat digunakan sebagai acuan untuk mengetahui jumlah angka kelahiran bayi. Kinerja dan waktu dalam proses pengolahan data lebih efektif dan efesien serta dari prediksi jumlah kelahiran bayi lebih tepat dan akurat. Kata Kunci: Naive Bayes, Kelahiran Bayi, Prediks

Download Full-text

PENERAPAN DATA MINING TERHADAP DATA COVID-19 MENGGUNAKAN ALGORITMA KLASIFIKASI

Jurnal Informatika ◽

10.30873/ji.v21i1.2868 ◽

2021 ◽

Vol 21 (1) ◽

pp. 44-52

Author(s):

Rizka Dahlia ◽

Nanik Wuryani ◽

Sri Hadianti ◽

Windu Gata ◽

Arina Selawati

Keyword(s):

Data Mining ◽

South Korea ◽

Respiratory System ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

The World ◽

Bayes Algorithm

Coronavirus 2019 or more commonly referred to as COVID-19 is a type of virus that attacks the respiratory system. Until now the number of spread and the number of deaths caused by this virus continues to increase. As of April 21, 2020, based on data from the WHO, the total number of cases infected with this virus reached 2,397,217 with 162 deaths from all over the world. For South Korea itself, as of March 21, 2020, the total number of infected cases was 10,683 with a total of 237 deaths. In this study, researchers conducted data processing on the spread of COVID-19 in South Korea with Rapidminer using a classification algorithm, namely Naïve Bayes, C4.5, and K-Nearest Neighbor by performing the stages of selection, preprocessing, transfotmating, data mining and interpretation or evaluating the quality of the best accuracy of 80.79% with AUC of 0.881 achieved by the Naïve Bayes algorithm. The distribution of the data found that the influential attribute of the isolated class factor from the patient contained in the sex attribute where more women experienced isolation. Keywords— COVID-19, data mining, classification, C4.5, Naïve Bayes, K-NN

Download Full-text