scholarly journals Improvement of Accuracy and Handling of Missing Value Data in the Naive Bayes Kernel Algorithm

2021 ◽  
Vol 6 (2) ◽  
pp. 134-143
Author(s):  
Bijanto Bijanto ◽  
Ryan Yunus

The lost impact on the research process, can be serious in classifying results leading to biased parameter estimates, statistical information, decreased quality, increased standard error, and weak generalization of the findings. In this paper, we discuss the problems that exist in one of the algorithms, namely the Naive Bayes Kernel algorithm. The Naive Bayes kernel algorithm has the disadvantage of not being able to process data with the mission value. Therefore, in order to process missing value data, there is one method that we propose to overcome, namely using the mean imputation method. The data we use is public data from UCI, namely the HCV (Hepatisis C Virus) dataset. The input method used to correct the missing data so that it can be filled with the average value of the existing data. Before the imputation process means, the dataset uses yahoo bootstrap first. The data that has been corrected using the mean imputation method has just been processed using the Naive Bayes Kernel Algorithm. From the results of the research tests that have been carried out, it can be obtained an accuracy value of 96.05% and the speed of the data computing process with 1 second.

2019 ◽  
Vol 5 (2) ◽  
pp. 85-90
Author(s):  
Taufiq Rizaldi ◽  
Fendik Eko Purnomo ◽  
Aji Seto Arifianto

The problem of data loss in a dataset is experienced in surveys for data collection which are usually caused by no response from units or items during the survey data collection process. The loss of a data can significantly influence the results of a study. The inaccuracy in choosing a solution to overcome these problems can result in a less than optimal outcome that tends to be biased. Some methods that are widely used to solve these problems are using the K Nearest Neighbor (K-NN) and Naïve Bayes methods, the main purpose of this study is to compare the performance of the two methods. From the results of the K-NN, the results were better, where the Mean Square Error (MSE) is bigger than 1 and MAPE around 10-16%, while Naïve Bayes got MSE values bigger than 1 and MAPE ​​around 26%.


2019 ◽  
Vol 13 (1) ◽  
pp. 26-30
Author(s):  
Toni Arifin ◽  
Daniel Ariesta

Penyakit ginjal kronis (PGK) merupakan masalah kesehatan masyarakat global dengan prevalens dan insidens gagal ginjal yang meningkat, prognosis yang buruk dan biaya yang tinggi. nilai prevalensi di seluruh Indonesia untuk penyakit gagal ginjal memiliki nilai rata - rata berkisar kurang lebih 0.2 persen. Langkah pertama dalam pengelolaan penyakit ginjal adalah penetapan diagnosis yang tepat. Maka dibutuhkan sebuah metode untuk memprediksi penyakit ginjalkronis. Naïve Bayes memiliki beberapa kelebihan, yaitu cepat dalam perhitungan, algoritma yang sederhana dan berakurasi tinggi. Naïve Bayes Classifier lebih tepat diterapkan pada data yang besar dan dapat menangani data yang tidak lengkap (missing value) serta kuat terhadap atribut yang tidak relevan dan noise pada data. Untuk meningkatkan akurasi maka digunakan Particle Swarm Optimization untuk pembobotan atribut. Dari hasil penelitian Naive Bayes Classification berbasis Particle Swarm Optimization memiliki akurasi confusion matrix sebesar 98,75% dan AUC sebesar 99%. sedangkan Naive Bayes memiliki akurasi confusion matrix  97.00% dan AUC sebesar 99.8%.


2021 ◽  
Vol 9 (1) ◽  
pp. 98-107
Author(s):  
Jesica Nauli Br. Siringo Ringo ◽  
Wahyu Joko Mursalin ◽  
Nisrina Citra Nurfadilah ◽  
Dwiky Rachmat Ramadhan ◽  
Wa Ode Zuhayeni Madjida

Penambahan kasus COVID-19 yang besar di Indonesia, khususnya Pulau Jawa, membutuhkan berbagai upaya untuk mengendalikannya. Salah satu upaya efektif yang dapat dilakukan adalah tindakan preventif dengan memberi informasi mengenai kondisi suatu wilayah. Sebagai peringatan kepada masyarakat dan sebagai upaya pengambilan kebijakan daerah, Indonesia mengeluarkan zona risiko sampai pada tingkat kabupaten/kota melalui Satgas Penanganan COVID-19. Pembentukan level zona risiko tersebut menggunakan teknik konvensional yaitu pembobotan skor menggunakan informasi dari tiga jenis indikator. Dengan mempertimbangkan bahwa zona risiko merupakan hal yang penting dalam penentuan kebijakan terkait COVID-19, penelitian ini bertujuan untuk membangun model klasifikasi zona risiko kabupaten/kota di Pulau Jawa menggunakan beberapa teknik klasifikasi data mining dan menentukan model klasifikasi terbaik berdasarkan hasil evaluasi. Teknik klasifikasi yang digunakan sebagai perbandingan dalam penelitian ini adalah naive Bayes, decision tree, k-nearest-neighbor, dan neural network. Sebelum dilakukan pemodelan, data disesuaikan terlebih dahulu pada tahap preprocessing di mana pada tahap tersebut teridentifikasi terdapat permasalahan missing value dan imbalanced data. Permasalahan tersebut diatasi dengan imputasi data dan teknik oversampling. Hasil penelitian menunjukkan bahwa model k-nearest-neighbor merupakan model terbaik dibandingkan tiga model lainnya. Hasil tersebut didasarkan pada ukuran evaluasi keempat model di mana model k-NN memiliki nilai acccuracy, nilai rata-rata makro untuk sensitivitas, spesifisitas, dan ukuran F1 paling tinggi dibandingkan model lainnya.


2021 ◽  
Vol 21 (1) ◽  
pp. 259
Author(s):  
F Lia Dwi Cahyanti ◽  
Windu Gata ◽  
Fajar Sarasati

Cancer is a disease that grows in the skin tissue where this condition is characterized by changes in the skin, such as the appearance of lumps, spots, or moles with abnormal sizes, one of the causes of skin cancer is exposure to ultraviolet rays from the sun. One of the treatments for skin cancer is immunotherapy, the immunotherapy method is the treatment of disease by activating or suppressing the immune system in the body. In this study, a comparison with data mining methods for classification was carried out, namely Naïve Bayes and K-Nearest Neighbor to predict the success rate of immunotherapy in curing skin cancer. In the testing process, the researcher uses the Weka application to process data and conduct tests. The results of the tests that have been carried out show that the K-Nearest Neighbor model has the best accuracy value of 91.1111%. while Naïve Bayes obtained a smaller accuracy value, namely 82.2222%. From the test results, it can be concluded that the K-Nearest Neighbor method has better accuracy in determining the success rate of immunotherapy.


Mathematics ◽  
2021 ◽  
Vol 9 (17) ◽  
pp. 2036
Author(s):  
Andreas Wichert

Probability theory is built around Kolmogorov’s axioms. To each event, a numerical degree of belief between 0 and 1 is assigned, which provides a way of summarizing the uncertainty. Kolmogorov’s probabilities of events are added, the sum of all possible events is one. The numerical degrees of belief can be estimated from a sample by its true fraction. The frequency of an event in a sample is counted and normalized resulting in a linear relation. We introduce quantum-like sampling. The resulting Kolmogorov’s probabilities are in a sigmoid relation. The sigmoid relation offers a better importability since it induces the bell-shaped distribution, it leads also to less uncertainty when computing the Shannon’s entropy. Additionally, we conducted 100 empirical experiments by quantum-like sampling 100 times a random training sets and validation sets out of the Titanic data set using the Naïve Bayes classifier. In the mean the accuracy increased from 78.84% to 79.46%.


2018 ◽  
Vol 2 (1) ◽  
pp. 354-360
Author(s):  
Mohammad Guntur ◽  
Julius Santony ◽  
Yuhandri Yuhandri

The high low price of gold influenced by many factors such as economic conditions, inflation rate, supply and demand and much more. The Naïve Bayes algorithm is capable of generating a classification that is used to predict future opportunities. By using the Naïve Bayes Classifier algorithm obtained a prediction of gold prices that can help decision makers in determining whether to sell or buy gold. By using the Naïve Bayes Classifier algorithm obtained a prediction of gold prices that can help decision makers in determining whether to sell or buy gold. Gold data will be processed using Rapidminer software. Stages of processing are reading training data, calculating the mean and standard deviation, entering the test data and finding the density value of gauss and then looking for probability value. Based on the calculation that has been done, Naïve Bayes Classifier method is able to predict the price of gold for 1 day ahead or every day. With the results of this calculation is expected to help gold investment actors in increasing accuracy to predict gold prices for decision making.


Machine Learning Applications have been well accepted for various financial processes throughout the world. Supervised Learning processes for objective classification by Naïve Bayes classifiers have been supporting many definitive segregation processes. Various banks in Bangladesh have found challenging moments to identify financially and ethically qualified loan applicants. In this research process, we have confirmed the safe applicant’s list using definitive variable measures through identifiable questions. Our research process has successfully segregated the given applicants using Naïve Bayes classifier with the proof of lowering loan default rate from an average of 23.26%% to 11.76% and development of financial ratios as performance indicators of these banks through various financial ratios as indicators of these banks.


Author(s):  
Agung Eddy Suryo Saputro ◽  
Khairil Anwar Notodiputro ◽  
Indahwati A

In 2018, Indonesia implemented a Governor's Election which included 17 provinces. For several months before the Election, news and opinions regarding the Governor's Election were often trending topics on Twitter. This study aims to describe the results of sentiment mining and determine the best method for predicting sentiment classes. Sentiment mining is based on Lexicon. While the methods used for sentiment analysis are Naive Bayes and C5.0. The results showed that the percentage of positive sentiment in 17 provinces was greater than the negative and neutral sentiments. In addition, method C5.0 produces a better prediction than Naive Bayes.


2019 ◽  
Vol 15 (2) ◽  
pp. 275-280
Author(s):  
Agus Setiyono ◽  
Hilman F Pardede

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam.  One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.


Sign in / Sign up

Export Citation Format

Share Document