Random Subclasses Ensembles by Using 1-Nearest Neighbor Framework

A classifier ensemble is a combination of diverse and accurate classifiers. Generally, a classifier ensemble performs better than any single classifier in the ensemble. Naive Bayes classifiers are simple but popular classifiers for many applications. As it is difficult to create diverse naive Bayes classifiers, naive Bayes ensembles are not very successful. In this paper, we propose Random Subclasses (RS) ensembles for Naive Bayes classifiers. In the proposed method, new subclasses for each class are created by using 1-Nearest Neighbor (1-NN) framework that uses randomly selected points from the training data. A classifier considers each subclass as a class of its own. As the method to create subclasses is random, diverse datasets are generated. Each classifier in an ensemble learns on one dataset from the pool of diverse datasets. Diverse training datasets ensure diverse classifiers in the ensemble. New subclasses create easy to learn decision boundaries that in turn create accurate naive Bayes classifiers. We developed two variants of RS, in the first variant RS(2), two subclasses per class were created whereas in the second variant RS(4), four subclasses per class were created. We studied the performance of these methods against other popular ensemble methods by using naive Bayes as the base classifier. RS(4) outperformed other popular ensemble methods. A detailed study was carried out to understand the behavior of RS ensembles.

Download Full-text

Analysis on Ensemble Methods for the Prediction of Cardiovascular Disease

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2021.3839 ◽

2021 ◽

Vol 11 (10) ◽

pp. 2529-2537

Author(s):

C. Murale ◽

M. Sundarambal ◽

R. Nedunchezhian

Keyword(s):

Cardiac Disease ◽

Nearest Neighbor ◽

Naive Bayes ◽

Ensemble Methods ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Reduced Dimensions ◽

The People ◽

Ensemble Algorithms ◽

Better Than

Coronary Heart disease is one of the dominant sources of death and morbidity for the people worldwide. The identification of cardiac disease in the clinical review is considered one of the main problems. As the amount of data grows increasingly, interpretation and retrieval become even more complex. In addition, the Ensemble learning prediction model seems to be an important fact in this area of study. The prime aim of this paper is also to forecast CHD accurately. This paper is intended to offer a modern paradigm for prediction of cardiovascular diseases with the use of such processes such as pre-processing, detection of features, feature selection and classification. The pre-processing will initially be performed using the ordinal encoding technique, and the statistical and the features of higher order are extracted using the Fisher algorithm. Later, the minimization of record and attribute is performed, in which principle component analysis performs its extensive part in figuring out the “curse of dimensionality.” Lastly, the process of prediction is carried out by the different Ensemble models (SVM, Gaussian Naïve Bayes, Random forest, K-nearest neighbor, Logistic regression, decision tree and Multilayer perceptron that intake the features with reduced dimensions. Finally, in comparison to such success metrics the reliability of the proposal work is compared and its superiority has been confirmed. From the analysis, Naïve bayes with regards to accuracy is 98.4% better than other Ensemble algorithms.

Download Full-text

Multiple Naïve Bayes Classifiers Ensemble for Traffic Incident Detection

Mathematical Problems in Engineering ◽

10.1155/2014/383671 ◽

2014 ◽

Vol 2014 ◽

pp. 1-16 ◽

Cited By ~ 7

Author(s):

Qingchao Liu ◽

Jian Lu ◽

Shuyan Chen ◽

Kangjia Zhao

Keyword(s):

Decision Tree ◽

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Classifier Ensemble ◽

Optimal Threshold ◽

Incident Detection ◽

Bayes Classifier ◽

Traffic Incident ◽

Better Than

This study presents the applicability of the Naïve Bayes classifier ensemble for traffic incident detection. The standard Naive Bayes (NB) has been applied to traffic incident detection and has achieved good results. However, the detection result of the practically implemented NB depends on the choice of the optimal threshold, which is determined mathematically by using Bayesian concepts in the incident-detection process. To avoid the burden of choosing the optimal threshold and tuning the parameters and, furthermore, to improve the limited classification performance of the NB and to enhance the detection performance, we propose an NB classifier ensemble for incident detection. In addition, we also propose to combine the Naïve Bayes and decision tree (NBTree) to detect incidents. In this paper, we discuss extensive experiments that were performed to evaluate the performances of three algorithms: standard NB, NB ensemble, and NBTree. The experimental results indicate that the performances of five rules of the NB classifier ensemble are significantly better than those of standard NB and slightly better than those of NBTree in terms of some indicators. More importantly, the performances of the NB classifier ensemble are very stable.

Download Full-text

Peringkasan dan Support Vector Machine pada Klasifikasi Dokumen

JURNAL INFOTEL ◽

10.20895/infotel.v9i4.312 ◽

2017 ◽

Vol 9 (4) ◽

pp. 416 ◽

Cited By ~ 1

Author(s):

Nelly Indriani Widiastuti ◽

Ednawati Rainarli ◽

Kania Evita Dewi

Keyword(s):

Support Vector Machine ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Support Vector ◽

Good Reputation ◽

Multiclass Support Vector Machine ◽

Simple Logistic ◽

Better Than

Classification is the process of grouping objects that have the same features or characteristics into several classes. The automatic documents classification use words frequency that appears on training data as features. The large number of documents cause the number of words that appears as a feature will increase. Therefore, summaries are chosen to reduce the number of words that used in classification. The classification uses multiclass Support Vector Machine (SVM) method. SVM was considered to have a good reputation in the classification. This research tests the effect of summary as selection features into documents classification. The summaries reduce text into 50%. A result obtained that the summaries did not affect value accuracy of classification of documents that use SVM. But, summaries improve the accuracy of Simple Logistic Classifier. The classification testing shows that the accuracy of Naïve Bayes Multinomial (NBM) better than SVM

Download Full-text

Centroid Based Classifier With TF – IDF – ICF for Classfication of Student’s Complaint at Appliation E-Complaint in Muhammadiyah University of Sidoarjo

JEEE-U (Journal of Electrical and Electronic Engineering-UMSIDA) ◽

10.21070/jeee-u.v1i1.23 ◽

2016 ◽

Vol 1 (1) ◽

pp. 17 ◽

Cited By ~ 1

Author(s):

Mochamad Alfan Rosid ◽

Gunawan Gunawan ◽

Edwin Pramana

Keyword(s):

Text Mining ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Base Classifier

Text mining mengacu pada proses mengambil informasi berkualitas tinggi dari teks. Informasi berkualitas tinggi biasanya diperoleh melalui peramalan pola dan kecenderungan melalui sarana seperti pembelajaran pola statistik. Salah satu kegiatan penting dalam text mining adalah klasifikasi atau kategorisasi teks. Kategorisasi teks sendiri saat ini memiliki berbagai metode antara lain metode K-Nearest Neighbor, Naïve Bayes, dan Centroid Base Classifier, atau decision tree classification.Pada penelitian ini, klasifikasi keluhan mahasiswa dilakukan dengan metode centroid based classifier dan dengan fitur TF-IDF-ICF, Ada lima tahap yang dilakukan untuk mendapatkan hasil klasifikasi. Tahap pengambilan data keluhan kemudian dilanjutkan dengan tahap preprosesing yaitu mempersiapkan data yang tidak terstruktur sehingga siap digunakan untuk proses selanjutnya, kemudian dilanjutkan dengan proses pembagian data, data dibagi menjadi dua macam yaitu data latih dan data uji, tahap selanjutnya yaitu tahap pelatihan untuk menghasilkan model klasifikasi dan tahap terakhir adalah tahap pengujian yaitu menguji model klasifikasi yang telah dibuat pada tahap pelatihan terhadap data uji. Keluhan untuk pengujian akan diambilkan dari database aplikasi e-complaint Universitas Muhammadiyah Sidoarjo. Adapun hasil uji coba menunjukkan bahwa klasifikasi keluhan dengan algoritma centroid based classifier dan dengan fitur TF-IDF-ICF memiliki rata-rata akurasi yang cukup tinggi yaitu 79.5%. Nilai akurasi akan meningkat dengan meningkatnya data latih dan efesiensi sistem semakin menurun dengan meningkatnya data latih.

Download Full-text

Perbandingan Algoritma Naive Bayes dan C.45 dalam Klasifikasi Data Mining

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.201854803 ◽

2018 ◽

Vol 5 (4) ◽

pp. 455 ◽

Cited By ~ 3

Author(s):

Yogiek Indra Kurniawan

Keyword(s):

Data Mining ◽

Case Studies ◽

Credit Card ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Test Results ◽

Absolute Value ◽

Better Than

Pada paper ini, telah diterapkan metode Naive Bayes serta C.45 ke dalam 4 buah studi kasus, yaitu kasus penerimaan “Kartu Indonesia Sehat”, penentuan pengajuan kartu kredit di sebuah bank, penentuan usia kelahiran, serta penentuan kelayakan calon anggota kredit pada koperasi untuk mengetahui algoritma terbaik di setiap kasus. Setelah itu, dilakukan perbandingan dalam hal Precision, Recall serta Accuracy untuk setiap data training dan data testing yang telah diberikan. Dari hasil implementasi yang dilakukan, telah dibangun sebuah aplikasi yang dapat menerapkan algoritma Naive Bayes dan C.45 di 4 buah kasus tersebut. Aplikasi telah diuji dengan blackbox dan algoritma dengan hasil valid dan dapat mengimplementasikan kedua buah algoritma dengan benar. Berdasarkan hasil pengujian, semakin banyaknya data training yang digunakan, maka nilai precision, recall dan accuracy akan semakin meningkat. Selain itu, hasil klasifikasi pada algoritma Naive Bayes dan C.45 tidak dapat memberikan nilai yang absolut atau mutlak di setiap kasus. Pada kasus penentuan penerimaan Kartu Indonesia Sehat, kedua buah algoritma tersebut sama-sama efektif untuk digunakan. Untuk kasus pengajuan kartu kredit di sebuah bank, C.45 lebih baik daripada Naive Bayes. Pada kasus penentuan usia kelahiran, Naive Bayes lebih baik daripada C.45. Sedangkan pada kasus penentuan kelayakan calon anggota kredit di koperasi, Naive Bayes memberikan nilai yang lebih baik pada precision, tapi untuk recall dan accuracy, C.45 memberikan hasil yang lebih baik. Sehingga untuk menentukan algoritma terbaik yang akan dipakai di sebuah kasus, harus melihat kriteria, variable maupun jumlah data di kasus tersebut. AbstractIn this paper, applied Naive Bayes and C.45 into 4 case studies, namely the case of acceptance of “Kartu Indonesia Sehat”, determination of credit card application in a bank, determination of birth age, and determination of eligibility of prospective members of credit to Koperasi to find out the best algorithm in each case. After that, the comparison in Precision, Recall and Accuracy for each training data and data testing has been given. From the results of the implementation, has built an application that can apply the Naive Bayes and C.45 algorithm in 4 cases. Applications have been tested in blackbox and algorithms with valid results and can implement both algorithms correctly. Based on the test results, the more training data used, the value of precision, recall and accuracy will increase. The classification results of Naive Bayes and C.45 algorithms can not provide absolute value in each case. In the case of determining the acceptance of the Kartu Indonesia Indonesia, the two algorithms are equally effective to use. For credit card submission cases at a bank, C.45 is better than Naive Bayes. In the case of determining the age of birth, Naive Bayes is better than C.45. Whereas in the case of determining the eligibility of prospective credit members in the cooperative, Naive Bayes provides better value in precision, but for recall and accuracy, C.45 gives better results. So, to determine the best algorithm to be used in a case, it must look at the criteria, variables and amount of data in the case

Download Full-text

LL-KNN ACW-NB: Local Learning K-Nearest Neighbor in Absolute Correlation Weighted Naïve Bayes for Numerical Data Classification

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i1.1348 ◽

2020 ◽

Vol 4 (1) ◽

pp. 28-36

Author(s):

Azminuddin I. S. Azis ◽

Budy Santoso ◽

Serwin

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Numerical Data ◽

Naïve Bayes ◽

Local Learning ◽

K Nearest Neighbor ◽

Data Mining Algorithms ◽

Kernel Approach ◽

Absolute Correlation ◽

Better Than

Naïve Bayes (NB) algorithm is still in the top ten of the Data Mining algorithms because of it is simplicity, efficiency, and performance. To handle classification on numerical data, the Gaussian distribution and kernel approach can be applied to NB (GNB and KNB). However, in the process of NB classifying, attributes are considered independent, even though the assumption is not always right in many cases. Absolute Correlation Coefficient can determine correlations between attributes and work on numerical attributes, so that it can be applied for attribute weighting to GNB (ACW-NB). Furthermore, because performance of NB does not increase in large datasets, so ACW-NB can be a classifier in the local learning model, where other classification methods, such as K-Nearest Neighbor (K-NN) which are very well known in local learning can be used to obtain sub-dataset in the ACW-NB training. To reduction of noise/bias, then missing value replacement and data normalization can also be applied. This proposed method is termed "LL-KNN ACW-NB (Local Learning K-Nearest Neighbor in Absolute Correlation Weighted Naïve Bayes)," with the objective to improve the performance of NB (GNB and KNB) in handling classification on numerical data. The results of this study indicate that the LL-KNN ACW-NB is able to improve the performance of NB, with an average accuracy of 91,48%, 1,92% better than GNB and 2,86% better than KNB.

Download Full-text

Use of Data Mining for Prediction of Customer Loyalty

CommIT (Communication and Information Technology) Journal ◽

10.21512/commit.v10i1.1660 ◽

2015 ◽

Vol 10 (1) ◽

pp. 41 ◽

Cited By ~ 3

Author(s):

Andri Wijaya ◽

Abba Suganda Girsang

Keyword(s):

Data Mining ◽

Customer Loyalty ◽

Classification Accuracy ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Training Set ◽

Use Of Data ◽

C4.5 Algorithm

This article discusses the analysis of customer loyalty using three data mining methods: C4.5,Naive Bayes, and Nearest Neighbor Algorithms and real-world empirical data. The data contain ten attributes related to the customer loyalty and are obtained from a national multimedia company in Indonesia. The dataset contains 2269 records. The study also evaluates the effects of the size of the training data to the accuracy of the classification. The results suggest that C4.5 algorithm produces highest classification accuracy at the order of 81% followed by the methods of Naive Bayes 76% and Nearest Neighbor 55%. In addition, the numerical evaluation also suggests that the proportion of 80% is optimal for the training set.

Download Full-text

Performance comparison between naive bayes and k- nearest neighbor algorithm for the classification of Indonesian language articles

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i2.pp452-457 ◽

2021 ◽

Vol 10 (2) ◽

pp. 452

Author(s):

Titin Winarti ◽

Henny Indriyawati ◽

Vensy Vydia ◽

Febrian Wahyu Christanto

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Performance Comparison ◽

Training Data ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm ◽

Bayes Algorithm

The match between the contents of the article and the article theme is the main factor whether or not an article is accepted. Many people are still confused to determine the theme of the article appropriate to the article they have. For that reason, we need a document classification algorithm that can group the articles automatically and accurately. Many classification algorithms can be used. The algorithm used in this study is naive bayes and the k-nearest neighbor algorithm is used as the baseline. The naive bayes algorithm was chosen because it can produce maximum accuracy with little training data. While the k-nearest neighbor algorithm was chosen because the algorithm is robust against data noise. The performance of the two algorithms will be compared, so it can be seen which algorithm is better in classifying documents. The comes about obtained show that the naive bayes algorithm has way better execution with an accuracy rate of 88%, while the k-nearest neighbor algorithm has a fairly low accuracy rate of 60%.

Download Full-text

Comparison Analysis of K-Nearest Neighbor and Naïve Bayes in Determining Talent of Adolescence

International Journal of Artificial Intelligence Research ◽

10.29099/ijair.v4i1.118 ◽

2020 ◽

Vol 4 (1) ◽

Author(s):

Yessi Jusman ◽

Widdya Rahmalina ◽

Juni Zarman

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Training Data ◽

K Nearest Neighbor ◽

Combined Training ◽

Testing Data ◽

Bayes Algorithm ◽

Children's Interests

Adolescence always searches for the identity to shape the personality character. This paper aims to use the artificial intelligent analysis to determine the talent of the adolescence. This study uses a sample of children aged 10-18 years with testing data consisting of 100 respondents. The algorithm used for analysis is the K-Nearest Neigbor and Naive Bayes algorithm. The analysis results are performance of accuracy results of both algorithms of classification. In knowing the accurate algorithm in determining children's interests and talents, it can be seen from the accuracy of the data with the confusion matrix using the RapidMiner software for training data, testing data, and combined training and testing data. This study concludes that the K-Nearest Neighbor algorithm is better than Naive Bayes in terms of classification accuracy.

Download Full-text

Perbandingan Kinerja Metode Naive Bayes dan K-Nearest Neighbor untuk Klasifikasi Artikel Berbahasa indonesia

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.201854773 ◽

2018 ◽

Vol 5 (4) ◽

pp. 427 ◽

Cited By ~ 1

Author(s):

Riri Nada Devita ◽

Heru Wahyu Herwanto ◽

Aji Prasetya Wibawa

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Classification Methods ◽

K Nearest Neighbor ◽

Bayes Method ◽

K Nearest Neighbors ◽

Naive Bayes Method ◽

Main Factor

Kecocokan isi artikel dengan sebuah tema jurnal menjadi faktor utama diterima tidaknya sebuah artikel. Tetapi masih banyak mahasiswa yang bingung untuk menentukan jurnal yang sesuai dengan artikel yang dimilikinya. Untuk itu diperlukannya sebuah metode klasifikasi dokumen yang dapat mengelompokkan artikel secara otomatis dan akurat. Terdapat banyak metode klasifikasi yang dapat digunakan. Metode yang digunakan dalam penelitian ini adalah Naive Bayes dan sebagai baseline digunakan metode K-Nearest Neighbor. Metode Naive Bayes dipilih karena dapat menghasilkan akurasi yang maksimal dengan data latih yang sedikit. Sedangkan metode K-Nearest Neighbor dipilih karena metode tersebut tangguh terhadap data noise. Kinerja dari kedua metode tersebut akan dibandingkan, sehingga dapat diketahui metode mana yang lebih baik dalam melakukan klasifikasi dokumen. Hasil yang didapatkan menunjukkan metode Naive Bayes memiliki kinerja yang lebih baik dengan tingkat akurasi 70%, sedangkan metode K-Nearest Neighbor memiliki tingkat akurasi yang cukup rendah yaitu 40%. AbstractOne way to be accepted in a journal conference and get the publication is to create an article with perfect suitability content of the journal. Matching the content of the article with a journal theme is the main factor for acceptability an article. But there are still many students who are confused to choose the journal in accordance with the articles it has. So we need a method to classification article documents category automatically and accurately group articles. There are many classification methods that can be used. The method used in this study is Naive Bayes and as a baseline the K-Nearest Neighbor method. Naive Bayes method is chosen because it can produce maximum accuracy with little training data. While K-Nearest Neighbor method was chosen because the method is robust to data noise. The performance of the two methods will be compared, so we can be known which method is better in classifying the document. The results show that the Naive Bayes method performs is more accurate with 70% accuracy and K-Nearest Neighbors method has a fairly low accuracy of 40% on classification test.

Download Full-text