Improvement  of Accuracy and Handling of Missing Value Data in the Naive Bayes Kernel Algorithm

Bijanto Bijanto; Ryan Yunus

doi:10.33633/jais.v6i2.5288

Improvement of Accuracy and Handling of Missing Value Data in the Naive Bayes Kernel Algorithm

Journal of Applied Intelligent System ◽

10.33633/jais.v6i2.5288 ◽

2021 ◽

Vol 6 (2) ◽

pp. 134-143

Author(s):

Bijanto Bijanto ◽

Ryan Yunus

Keyword(s):

Naive Bayes ◽

Research Process ◽

Naïve Bayes ◽

Imputation Method ◽

Parameter Estimates ◽

Process Data ◽

Missing Value ◽

Mean Imputation ◽

Public Data ◽

The Mean

The lost impact on the research process, can be serious in classifying results leading to biased parameter estimates, statistical information, decreased quality, increased standard error, and weak generalization of the findings. In this paper, we discuss the problems that exist in one of the algorithms, namely the Naive Bayes Kernel algorithm. The Naive Bayes kernel algorithm has the disadvantage of not being able to process data with the mission value. Therefore, in order to process missing value data, there is one method that we propose to overcome, namely using the mean imputation method. The data we use is public data from UCI, namely the HCV (Hepatisis C Virus) dataset. The input method used to correct the missing data so that it can be filled with the average value of the existing data. Before the imputation process means, the dataset uses yahoo bootstrap first. The data that has been corrected using the mean imputation method has just been processed using the Naive Bayes Kernel Algorithm. From the results of the research tests that have been carried out, it can be obtained an accuracy value of 96.05% and the speed of the data computing process with 1 second.

Download Full-text

PASS Targets: Ligand-based multi-target computational system based on a public data and naïve Bayes approach

SAR and QSAR in Environmental Research ◽

10.1080/1062936x.2015.1078407 ◽

2015 ◽

Vol 26 (10) ◽

pp. 783-793 ◽

Cited By ~ 29

Author(s):

P.V. Pogodin ◽

A.A. Lagunin ◽

D.A. Filimonov ◽

V.V. Poroikov

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Computational System ◽

Public Data ◽

Bayes Approach

Download Full-text

PERBANDINGAN METODE K-NN DAN BAYES PADA MISSING IMPUTATION

Jurnal Teknologi Informasi dan Terapan ◽

10.25047/jtit.v5i2.84 ◽

2019 ◽

Vol 5 (2) ◽

pp. 85-90

Author(s):

Taufiq Rizaldi ◽

Fendik Eko Purnomo ◽

Aji Seto Arifianto

Keyword(s):

Data Collection ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Data Loss ◽

Optimal Outcome ◽

K Nearest Neighbor ◽

Bayes Methods ◽

Data Collection Process ◽

The Mean

The problem of data loss in a dataset is experienced in surveys for data collection which are usually caused by no response from units or items during the survey data collection process. The loss of a data can significantly influence the results of a study. The inaccuracy in choosing a solution to overcome these problems can result in a less than optimal outcome that tends to be biased. Some methods that are widely used to solve these problems are using the K Nearest Neighbor (K-NN) and Naïve Bayes methods, the main purpose of this study is to compare the performance of the two methods. From the results of the K-NN, the results were better, where the Mean Square Error (MSE) is bigger than 1 and MAPE around 10-16%, while Naïve Bayes got MSE values bigger than 1 and MAPE around 26%.

Download Full-text

PREDIKSI PENYAKIT GINJAL KRONIS MENGGUNAKAN ALGORITMA NAIVE BAYES CLASSIFIER BERBASIS PARTICLE SWARM OPTIMIZATION

Jurnal Tekno Insentif ◽

10.36787/jti.v13i1.97 ◽

2019 ◽

Vol 13 (1) ◽

pp. 26-30

Author(s):

Toni Arifin ◽

Daniel Ariesta

Keyword(s):

Particle Swarm Optimization ◽

Naive Bayes ◽

Confusion Matrix ◽

Particle Swarm ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Swarm Optimization ◽

Missing Value

Penyakit ginjal kronis (PGK) merupakan masalah kesehatan masyarakat global dengan prevalens dan insidens gagal ginjal yang meningkat, prognosis yang buruk dan biaya yang tinggi. nilai prevalensi di seluruh Indonesia untuk penyakit gagal ginjal memiliki nilai rata - rata berkisar kurang lebih 0.2 persen. Langkah pertama dalam pengelolaan penyakit ginjal adalah penetapan diagnosis yang tepat. Maka dibutuhkan sebuah metode untuk memprediksi penyakit ginjalkronis. Naïve Bayes memiliki beberapa kelebihan, yaitu cepat dalam perhitungan, algoritma yang sederhana dan berakurasi tinggi. Naïve Bayes Classifier lebih tepat diterapkan pada data yang besar dan dapat menangani data yang tidak lengkap (missing value) serta kuat terhadap atribut yang tidak relevan dan noise pada data. Untuk meningkatkan akurasi maka digunakan Particle Swarm Optimization untuk pembobotan atribut. Dari hasil penelitian Naive Bayes Classification berbasis Particle Swarm Optimization memiliki akurasi confusion matrix sebesar 98,75% dan AUC sebesar 99%. sedangkan Naive Bayes memiliki akurasi confusion matrix 97.00% dan AUC sebesar 99.8%.

Download Full-text

Perbandingan Metode Klasifikasi Multiclass untuk Pemetaan Zona Risiko COVID-19 di Pulau Jawa

Jurnal Komputer dan Informatika ◽

10.35508/jicon.v9i1.3602 ◽

2021 ◽

Vol 9 (1) ◽

pp. 98-107

Author(s):

Jesica Nauli Br. Siringo Ringo ◽

Wahyu Joko Mursalin ◽

Nisrina Citra Nurfadilah ◽

Dwiky Rachmat Ramadhan ◽

Wa Ode Zuhayeni Madjida

Keyword(s):

Neural Network ◽

Data Mining ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Imbalanced Data ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Missing Value

Penambahan kasus COVID-19 yang besar di Indonesia, khususnya Pulau Jawa, membutuhkan berbagai upaya untuk mengendalikannya. Salah satu upaya efektif yang dapat dilakukan adalah tindakan preventif dengan memberi informasi mengenai kondisi suatu wilayah. Sebagai peringatan kepada masyarakat dan sebagai upaya pengambilan kebijakan daerah, Indonesia mengeluarkan zona risiko sampai pada tingkat kabupaten/kota melalui Satgas Penanganan COVID-19. Pembentukan level zona risiko tersebut menggunakan teknik konvensional yaitu pembobotan skor menggunakan informasi dari tiga jenis indikator. Dengan mempertimbangkan bahwa zona risiko merupakan hal yang penting dalam penentuan kebijakan terkait COVID-19, penelitian ini bertujuan untuk membangun model klasifikasi zona risiko kabupaten/kota di Pulau Jawa menggunakan beberapa teknik klasifikasi data mining dan menentukan model klasifikasi terbaik berdasarkan hasil evaluasi. Teknik klasifikasi yang digunakan sebagai perbandingan dalam penelitian ini adalah naive Bayes, decision tree, k-nearest-neighbor, dan neural network. Sebelum dilakukan pemodelan, data disesuaikan terlebih dahulu pada tahap preprocessing di mana pada tahap tersebut teridentifikasi terdapat permasalahan missing value dan imbalanced data. Permasalahan tersebut diatasi dengan imputasi data dan teknik oversampling. Hasil penelitian menunjukkan bahwa model k-nearest-neighbor merupakan model terbaik dibandingkan tiga model lainnya. Hasil tersebut didasarkan pada ukuran evaluasi keempat model di mana model k-NN memiliki nilai acccuracy, nilai rata-rata makro untuk sensitivitas, spesifisitas, dan ukuran F1 paling tinggi dibandingkan model lainnya.

Download Full-text

Implementasi Algoritma Naïve Bayes dan K-Nearest Neighbor Dalam Menentukan Tingkat Keberhasilan Immunotherapy Untuk Pengobatan Penyakit Kanker Kulit

Jurnal Ilmiah Universitas Batanghari Jambi ◽

10.33087/jiubj.v21i1.1189 ◽

2021 ◽

Vol 21 (1) ◽

pp. 259

Author(s):

F Lia Dwi Cahyanti ◽

Windu Gata ◽

Fajar Sarasati

Keyword(s):

Skin Cancer ◽

Success Rate ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

The Body ◽

Ultraviolet Rays ◽

Test Results ◽

K Nearest Neighbor ◽

Process Data

Cancer is a disease that grows in the skin tissue where this condition is characterized by changes in the skin, such as the appearance of lumps, spots, or moles with abnormal sizes, one of the causes of skin cancer is exposure to ultraviolet rays from the sun. One of the treatments for skin cancer is immunotherapy, the immunotherapy method is the treatment of disease by activating or suppressing the immune system in the body. In this study, a comparison with data mining methods for classification was carried out, namely Naïve Bayes and K-Nearest Neighbor to predict the success rate of immunotherapy in curing skin cancer. In the testing process, the researcher uses the Weka application to process data and conduct tests. The results of the tests that have been carried out show that the K-Nearest Neighbor model has the best accuracy value of 91.1111%. while Naïve Bayes obtained a smaller accuracy value, namely 82.2222%. From the test results, it can be concluded that the K-Nearest Neighbor method has better accuracy in determining the success rate of immunotherapy.

Download Full-text

Quantum-Like Sampling

Mathematics ◽

10.3390/math9172036 ◽

2021 ◽

Vol 9 (17) ◽

pp. 2036

Author(s):

Andreas Wichert

Keyword(s):

Probability Theory ◽

Linear Relation ◽

Naive Bayes ◽

Naïve Bayes ◽

Degrees Of Belief ◽

Bayes Classifier ◽

Data Set ◽

Degree Of Belief ◽

The Mean ◽

Training Sets

Probability theory is built around Kolmogorov’s axioms. To each event, a numerical degree of belief between 0 and 1 is assigned, which provides a way of summarizing the uncertainty. Kolmogorov’s probabilities of events are added, the sum of all possible events is one. The numerical degrees of belief can be estimated from a sample by its true fraction. The frequency of an event in a sample is counted and normalized resulting in a linear relation. We introduce quantum-like sampling. The resulting Kolmogorov’s probabilities are in a sigmoid relation. The sigmoid relation offers a better importability since it induces the bell-shaped distribution, it leads also to less uncertainty when computing the Shannon’s entropy. Additionally, we conducted 100 empirical experiments by quantum-like sampling 100 times a random training sets and validation sets out of the Titanic data set using the Naïve Bayes classifier. In the mean the accuracy increased from 78.84% to 79.46%.

Download Full-text

Prediksi Harga Emas dengan Menggunakan Metode Naïve Bayes dalam Investasi untuk Meminimalisasi Resiko

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v2i1.276 ◽

2018 ◽

Vol 2 (1) ◽

pp. 354-360

Author(s):

Mohammad Guntur ◽

Julius Santony ◽

Yuhandri Yuhandri

Keyword(s):

Naive Bayes ◽

Supply And Demand ◽

Naïve Bayes ◽

Decision Makers ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

The Mean ◽

Bayes Algorithm

The high low price of gold influenced by many factors such as economic conditions, inflation rate, supply and demand and much more. The Naïve Bayes algorithm is capable of generating a classification that is used to predict future opportunities. By using the Naïve Bayes Classifier algorithm obtained a prediction of gold prices that can help decision makers in determining whether to sell or buy gold. By using the Naïve Bayes Classifier algorithm obtained a prediction of gold prices that can help decision makers in determining whether to sell or buy gold. Gold data will be processed using Rapidminer software. Stages of processing are reading training data, calculating the mean and standard deviation, entering the test data and finding the density value of gauss and then looking for probability value. Based on the calculation that has been done, Naïve Bayes Classifier method is able to predict the price of gold for 1 day ahead or every day. With the results of this calculation is expected to help gold investment actors in increasing accuracy to predict gold prices for decision making.

Download Full-text

Machine Learning application for selecting efficient Loan Applicants in Private Banks of Bangladesh

International Journal of Management and Accounting ◽

10.34104/ijma.021.01140121 ◽

2021 ◽

pp. 114-121

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Research Process ◽

Naïve Bayes ◽

Financial Ratios ◽

Loan Default ◽

Private Banks ◽

Machine Learning Applications ◽

Objective Classification ◽

The Given

Machine Learning Applications have been well accepted for various financial processes throughout the world. Supervised Learning processes for objective classification by Naïve Bayes classifiers have been supporting many definitive segregation processes. Various banks in Bangladesh have found challenging moments to identify financially and ethically qualified loan applicants. In this research process, we have confirmed the safe applicant’s list using definitive variable measures through identifiable questions. Our research process has successfully segregated the given applicants using Naïve Bayes classifier with the proof of lowering loan default rate from an average of 23.26%% to 11.76% and development of financial ratios as performance indicators of these banks through various financial ratios as indicators of these banks.

Download Full-text

Study of Sentiment of Governor's Election Opinion in 2018

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset21841124 ◽

2018 ◽

pp. 231-238

Author(s):

Agung Eddy Suryo Saputro ◽

Khairil Anwar Notodiputro ◽

Indahwati A

Keyword(s):

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Addition Method ◽

Sentiment Mining ◽

Positive Sentiment ◽

KLASIFIKASI SMS SPAM MENGGUNAKAN SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.693 ◽

2019 ◽

Vol 15 (2) ◽

pp. 275-280

Author(s):

Agus Setiyono ◽

Hilman F Pardede

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Spam Detection ◽

Support Vector Machine Algorithm ◽

Data Mining Techniques ◽

To Receive

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam. One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.

Download Full-text