scholarly journals Comparison of Data Mining Methods Using the Naïve Bayes Algorithm and K-Nearest Neighbor in Predicting Immunotherapy Success

Tech-E ◽  
2019 ◽  
Vol 2 (2) ◽  
pp. 30
Author(s):  
Budi Harto ◽  
Rino Rino

tumor or cancer is a disease that is a problem for people who are increasing every year. This disease in both the early and final stages requires attention because in this disease sufferers have a large risk of death. along with the rapid development of technology, we can use the technology to facilitate in all fields one of which is to predict success in a therapy. Data mining is one of the techniques used by the author in testing the dataset used in this study to get the best algorithm between Naïve Bayes and the K-Nearest Neighbor algorithm by using the Rapid Miner S tudio application and applying the best algorithm into the expected application or expert system. can help users predict the success of a therapy.

2021 ◽  
Vol 21 (1) ◽  
pp. 44-52
Author(s):  
Rizka Dahlia ◽  
Nanik Wuryani ◽  
Sri Hadianti ◽  
Windu Gata ◽  
Arina Selawati

Coronavirus 2019 or more commonly referred to as COVID-19 is a type of virus that attacks the respiratory system. Until now the number of spread and the number of deaths caused by this virus continues to increase. As of April 21, 2020, based on data from the WHO, the total number of cases infected with this virus reached 2,397,217 with 162 deaths from all over the world. For South Korea itself, as of March 21, 2020, the total number of infected cases was 10,683 with a total of 237 deaths. In this study, researchers conducted data processing on the spread of COVID-19 in South Korea with Rapidminer using a classification algorithm, namely Naïve Bayes, C4.5, and K-Nearest Neighbor by performing the stages of selection, preprocessing, transfotmating, data mining and interpretation or evaluating the quality of the best accuracy of 80.79% with AUC of 0.881 achieved by the Naïve Bayes algorithm. The distribution of the data found that the influential attribute of the isolated class factor from the patient contained in the sex attribute where more women experienced isolation. Keywords— COVID-19, data mining, classification, C4.5, Naïve Bayes, K-NN


Author(s):  
Titin Winarti ◽  
Henny Indriyawati ◽  
Vensy Vydia ◽  
Febrian Wahyu Christanto

<span id="docs-internal-guid-210930a7-7fff-b7fb-428b-3176d3549972"><span>The match between the contents of the article and the article theme is the main factor whether or not an article is accepted. Many people are still confused to determine the theme of the article appropriate to the article they have. For that reason, we need a document classification algorithm that can group the articles automatically and accurately. Many classification algorithms can be used. The algorithm used in this study is naive bayes and the k-nearest neighbor algorithm is used as the baseline. The naive bayes algorithm was chosen because it can produce maximum accuracy with little training data. While the k-nearest neighbor algorithm was chosen because the algorithm is robust against data noise. The performance of the two algorithms will be compared, so it can be seen which algorithm is better in classifying documents. The comes about obtained show that the naive bayes algorithm has way better execution with an accuracy rate of 88%, while the k-nearest neighbor algorithm has a fairly low accuracy rate of 60%.</span></span>


2020 ◽  
Vol 12 (4) ◽  
pp. 151-159
Author(s):  
Irma Handayani ◽  
Ikrimach Ikrimach

In the medical field, there are many records of disease sufferers, one of which is data on breast cancer. An extraction process to fine information in previously unknown data is known as data mining. Data mining uses pattern recognition techniques such as statistics and mathematics to find patterns from old data or cases. One of the main roles of data mining is classification. In the classification dataset, there is one objective attribute or it can be called the label attribute. This attribute will be searched from new data on the basis of other attributes in the past. The number of attributes can affect the performance of an algorithm. This results in if the classification process is inaccurate, the researcher needs to double-check at each previous stage to look for errors. The best algorithm for one data type is not necessarily good for another data type. For this reason, the K-Nearest Neighbor and Naïve Bayes algorithms will be used as a solution to this problem. The research method used was to prepare data from the breast cancer dataset, conduct training and test the data, then perform a comparative analysis. The research target is to produce the best algorithm in classifying breast cancer, so that patients with existing parameters can be predicted which ones are malignant and benign breast cancer. This pattern can be used as a diagnostic measure so that it can be detected earlier and is expected to reduce the mortality rate from breast cancer. By making comparisons, this method produces 95.79% for K-Nearest Neighbor and 93.39% for Naïve Bayes


Data mining usually specifies the discovery of specific pattern or analysis of data from a large dataset. Classification is one of an efficient data mining technique, in which class the data are classified are already predefined using the existing datasets. The classification of medical records in terms of its symptoms using computerized method and storing the predicted information in the digital format is of great importance in the diagnosis of various diseases in the medical field. In this paper, finding the algorithm with highest accuracy range is concentrated so that a cost-effective algorithm can be found. Here the data mining classification algorithms are compared with their accuracy of finding exact data according to the diagnosis report and their execution rate to identify how fast the records are classified. The classification technique based algorithms used in this study are the Naive Bayes Classifier, the C4.5 tree classifier and the K-Nearest Neighbor (KNN) to predict which algorithm is the best suited for classifying any kind of medical dataset. Here the datasets such as Breast Cancer, Iris and Hypothyroid are used to predict which of the three algorithms is suitable for classifying the datasets with highest accuracy of finding the records of patients with the particular health problems. The experimental results represented in the form of table and graph shows the performance and the importance of Naïve Bayes, C4.5 and K-Nearest Neighbor algorithms. From the performance outcome of the three algorithms the C4.5 algorithm is a lot better than the Naïve Bayes and the K-Nearest Neighbor algorithm.


Author(s):  
Rajni Rajni ◽  
Amandeep Amandeep

<p>Diabetes is a major concern all over the world. It is increasing at a fast pace. People can avoid diabetes at an early stage without any test. The goal of this paper is to predict the probability of whether the person has a risk of diabetes or not at an early stage. This would lead to having a great impact on their quality of human life. The datasets are Pima Indians diabetes and Cleveland coronary illness and consist of 768 records. Though there are a number of solutions available for information extraction from a huge datasets and to predict the possibility of having diabetes, but the accuracy of their mining process is far from accurate. For achieving highest accuracy, the issue of zero probability which is generally faced by naïve bayes analysis needs to be addressed suitably. The proposed framework RB-Bayes aims to extract the required information with high accuracy that could survive the problem of zero probability and also configure accuracy with other methods like Support Vector Machine, Naive Bayes, and K Nearest Neighbor. We calculated mean to handle missing data and calculated probability for yes (positive) and no (negative). The highest value between yes and no decide the value for the tuple. It is mostly used in text classification. The outcomes on Pima Indian diabetes dataset demonstrate that the proposed methodology enhances the precision as a contrast with other regulated procedures. The accuracy of the proposed methodology large dataset is 72.9%.</p>


2019 ◽  
Vol 7 (1) ◽  
pp. 7-16
Author(s):  
Sidik Rahmatullah

 Lulusan adalah status yang dicapai mahasiswa setelah menyelesaikan proses pendidikan sesuai dengan persyaratan kelulusan yang ditetapkan oleh program studi. Sebagai salah satu keluaran langsung dari proses pendidikan yang dilakukan oleh program studi, lulusan yang bermutu memiliki ciri penguasaan kompetensi akademik termasuk hard skills dan soft skills sebagaimana dinyatakan dalam sasaran mutu serta dibuktikan dengan kinerja lulusan di masyarakat sesuai dengan profesi dan bidang ilmu. Program studi yang bermutu memiliki sistem pengelolaan lulusan yang baik sehingga mampu menjadikannya sebagai human capital bagi progam studi yang bersangkutan.  Penelitian ini menggunakan metode data mining yang digunakan untuk memprediksi tingkat kelulusan mahasiswa menggunakan dua metode yaitu Naive Bayes dan K-Nearest Neighbor. Hasil dari penelitian ini dapat memprediksi mahasiswa tepat lulus atau terlambat. Uji coba dilakukan dengan menggunakan data lulusan mahasiswa S1 Sistem informasi STMIK Dian Cipta Cendikia Kotabumi  sebanyak 600 data untuk training dan 180 data untuk testing. Hasil uji coba menunjukkan bahwa dengan menggunakan Naive Bayes menghasilkan akurasi  sebesar 85%, sedangkan menggunakan algoritma K-nearest neighbor menghasilkan akurasi sebesar 68.89 %.


Tech-E ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 44
Author(s):  
Rino Rino

Heart disease is a condition of the presence of fatty deposits in the coronary arteries in the heart which changes the role and shape of the arteries so that blood flow to the heart is obstructed. Data mining methods can predict this disease, some of the methods are C4.5 Algorithm and Naive Bayes which are often used in research.The data set in this research was obtained from the uci machine learning repository site, where the dataset has 3546 records and 13 attributes.The accuracy value of the Naïve Bayes algorithm has a high value of 81.40% compared to the C4.5 algorithm which only has an accuracy value of 79.07%. Based on the calculation results, it can be concluded that the Naïve Bayes Algorithm is a very good clarification because it has a value between 0.709 - 1.00.From conclusion above, the Naïve Bayes algorithm has a higher accuracy value than the C4.5 algorithm so the researchers decided to use the Naïve Bayes algorithm in predicting heart disease.


2016 ◽  
Vol 7 (4) ◽  
Author(s):  
Mochammad Yusa ◽  
Ema Utami ◽  
Emha T. Luthfi

Abstract. Readmission is associated with quality measures on patients in hospitals. Different attributes related to diabetic patients such as medication, ethnicity, race, lifestyle, age, and others result in the calculation of quality care that tends to be complicated. Classification techniques of data mining can solve this problem. In this paper, the evaluation on three different classifiers, i.e. Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes with various settingparameter, is developed by using 10-Fold Cross Validation technique. The targets of parameter performance evaluated is based on term of Accuracy, Mean Absolute Error (MAE), dan Kappa Statistic. The selected dataset consists of 47 attributes and 49.735 records. The result shows that k-NN classifier with k=100 has a better performance in term of accuracy and Kappa Statistic, but Naive Bayes outperforms in term of MAE among other classifiers. Keywords: k-NN, naive bayes, diabetes, readmissionAbstrak. Proses Readmisi dikaitkan dengan perhitungan kualitas penanganan pasien di rumah sakit. Perbedaan atribut-atribut yang berhubungan dengan pasien diabetes proses medikasi, etnis, ras, gaya hidup, umur, dan lain-lain, mengakibatkan perhitungan kualitas cenderung rumit. Teknik klasifikasi data mining dapat menjadi solusi dalam perhitungan kualitas ini. Teknik klasifikasi merupakan salah satu teknik data mining yang perkembangannya cukup signifikan. Di dalam penelitian ini, model algoritma klasifikasi Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes dengan berbagai parameter setting akan dievaluasi performanya berdasarkan nilai performa Accuracy, Mean AbsoluteError (MAE), dan Kappa Statistik dengan metode 10-Fold Cross Validation. Dataset yang dievaluasi memiliki 47 atribut dengan 49.735 records. Hasil penelitian menunjukan bahwa performa accuracy, MAE, dan Kappa Statistik terbaik didapatkan dari Model Algoritma Naive Bayes.Kata Kunci: k-NN, naive bayes, diabetes, readmisi


2018 ◽  
Vol 14 (2) ◽  
pp. 261
Author(s):  
Lila Dini Utami

At this time the freedom to express opinions in oral and written forms about everything is very easy. This activity can be used to make decisions by some business people. Especially by service providers, such as hotels. This will be very useful in the development of the hotel business itself. But the review data must be processed using the right algorithm. So this study was conducted to find out which algorithms are more feasible to use to get the highest accuracy. The methods used are Naïve Bayes (NB), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN). From the process that has been done, the results of Naïve Bayes accuracy are 71.50% with the AUC value is 0.500, Support Vector Machine is 72.50% with the AUC value is 0.936 and the accuracy results if using the k-Nearest Neighbor algorithm is 75.00% with the AUC value is 0.500. The use of the k-Nearest Neighbor algorithm can help in making more appropriate decisions for hotel reviews at this time.


Sign in / Sign up

Export Citation Format

Share Document