Integrasi Metode Naive Bayes dengan K-Means dan K-Means-Smote untuk Klasifikasi Jurusan SMAN 3 Mataram

Hairani Hairani;  Muhammad Ridho Hansyah;  Lalu Zazuli Azhar Mardedi

doi:10.30864/jsi.v15i1.317

Integrasi Metode Naive Bayes dengan K-Means dan K-Means-Smote untuk Klasifikasi Jurusan SMAN 3 Mataram

Jurnal Sistem dan Informatika (JSI) ◽

10.30864/jsi.v15i1.317 ◽

2020 ◽

Vol 15 (1) ◽

pp. 8-12

Author(s):

Hairani Hairani ◽

Muhammad Ridho Hansyah ◽

Lalu Zazuli Azhar Mardedi

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

F Measure

Pihak SMAN 3 Mataram memiliki permasalahan yaitu kesulitan untuk memilihkan jurusan yang tepat bagi siswanya, karena tidak ada sistem yang memberi keputusan jurusan yang sesuai dengan minat dan bakat siswa, serta dibatasi dengan jumlah kuota di tiap kelasnya. Tujuan dari penelitian ini adalah integrasi metode Naive Bayes dengan K-Means dan K-Means-Smote untuk klasifikasi penjurusan SMAN 3 Mataram. Metodologi penelitian ini terdiri dari pengumpulan data siswa, pengolahan data, pengujian metode, dan evaluasi kinerja metode yang diusulkan. Berdasarkan hasil pengujian yang telah dilakukan, metode yang diusulkan memperoleh kinerja terbaik dibandingkan penelitian sebelumnya menggunakan metode C.45 dengan akurasi sebesar 99,16%, sensitivitas 99,58%, spesifisitas 98,77%, dan f-measure 99,16%. Dengan demikian metode yang diusulkan dapat digunakan untuk klasifikasi jurusan SMAN 3 Mataram karena memiliki kinerja paling baik.

Download Full-text

An Indonesian Hoax News Detection System Using Reader Feedback and Naïve Bayes Algorithm

Cybernetics and Information Technologies ◽

10.2478/cait-2020-0006 ◽

2020 ◽

Vol 20 (1) ◽

pp. 82-94

Author(s):

Badrus Zaman ◽

Army Justitia ◽

Kretawiweka Nuraga Sani ◽

Endah Purwanti

Keyword(s):

Performance Evaluation ◽

System Performance ◽

Naive Bayes ◽

Detection System ◽

Naïve Bayes ◽

Bayes Algorithm ◽

F Measure ◽

System Performance Evaluation

AbstractHoax news in Indonesia spread at an alarming rate. To reduce this, hoax news detection system needs to be created and put into practice. Such a system may use readers’ feedback and Naïve Bayes algorithm, which is used to verify news. Overtime, by using readers’ feedback, database corpus will continue to grow and could improve system performance. The current research aims to reach this. System performance evaluation is carried out under two conditions ‒ with and without sources (URL). The system is able to detect hoax news very well under both conditions. The highest precision, recall and f-measure values when including URL are 0.91, 1, and 0.95 respectively. Meanwhile, the highest value of precision, recall and f-measure without URL are 0.88, 1 and 0.94, respectively.

Download Full-text

Perbandingan Metode Klasifikasi Data Mining untuk Nasabah Bank Telemarketing

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v20i1.826 ◽

2020 ◽

Vol 20 (1) ◽

pp. 139-148

Author(s):

Pungkas Subarkah ◽

Enggar Pri Pambudi ◽

Septi Oktaviani Nur Hidayah

Keyword(s):

Data Mining ◽

Cross Validation ◽

Naive Bayes ◽

Confusion Matrix ◽

Regression Trees ◽

Classification And Regression Trees ◽

Naïve Bayes ◽

University Of California ◽

Classification And Regression ◽

F Measure

Bank merupakan perusahaan yang memiliki data yang besar yang tersimpan di dalam database dan diolah menghasilkan sebuah informasi yang saling berkaitan tentang nasabah. Bank, harus memiliki ide dan terobosan baru guna mengetahui kendala pada nasabah telemarketing yang ingin melakukan deposito pada Bank tersebut, agar Bank terhindar dari ancaman krisis keuangan. Penelitian ini menguji keberhasilan Bank telemarketing dengan cara melakukan klasifikasi keputusan nasabah dengan menerapkan data mining. Metode yang di gunakan algoritma Classification and Regression Trees (CART) dan naive bayes menggunakan dataset diambil dari University of California Irvine (UCI) Repository Learning. Adapun metode validasi dan evaluasi yang digunakan yaitu 10-cross validation dan confusion matrix. Hasil akurasi pada algoritma CART yaitu 89.51% dengan nilai precision 87%, Recall 89% dan F-Measure 88% dan pada algoritma naive bayes mendapatkan nilai akurasi sebesar 86.88% dengan nilai precision 87%, Recall 86% dan F-Measure 87%. Dari hasil tersebut dapat disimpulkan bahwa algoritma CART lebih baik dalam memprediksi keputusan nasabah telemarketing tepat dalam penawaran deposito.

Download Full-text

COMPARATIVE STUDY OF CLASSIFICATION ALGORITHMS: HOLDOUTS AS ACCURACY ESTIMATION

CogITo Smart Journal ◽

10.31154/cogito.v1i1.2.13-23 ◽

2016 ◽

Vol 1 (1) ◽

pp. 13 ◽

Cited By ~ 1

Author(s):

Debby Erce Sondakh

Keyword(s):

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Decision Rules ◽

Naïve Bayes ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Accuracy Estimation ◽

F Measure

Penelitian ini bertujuan untuk mengukur dan membandingkan kinerja lima algoritma klasifikasi teks berbasis pembelajaran mesin, yaitu decision rules, decision tree, k-nearest neighbor (k-NN), naïve Bayes, dan Support Vector Machine (SVM), menggunakan dokumen teks multi-class. Perbandingan dilakukan pada efektifiatas algoritma, yaitu kemampuan untuk mengklasifikasi dokumen pada kategori yang tepat, menggunakan metode holdout atau percentage split. Ukuran efektifitas yang digunakan adalah precision, recall, F-measure, dan akurasi. Hasil eksperimen menunjukkan bahwa untuk algoritma naïve Bayes, semakin besar persentase dokumen pelatihan semakin tinggi akurasi model yang dihasilkan. Akurasi tertinggi naïve Bayes pada persentase 90/10, SVM pada 80/20, dan decision tree pada 70/30. Hasil eksperimen juga menunjukkan, algoritma naïve Bayes memiliki nilai efektifitas tertinggi di antara lima algoritma yang diuji, dan waktu membangun model klasiifikasi yang tercepat, yaitu 0.02 detik. Algoritma decision tree dapat mengklasifikasi dokumen teks dengan nilai akurasi yang lebih tinggi dibanding SVM, namun waktu membangun modelnya lebih lambat. Dalam hal waktu membangun model, k-NN adalah yang tercepat namun nilai akurasinya kurang.

Download Full-text

Sentiment Classification Using Text Embedding for Thai Teaching Evaluation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.886.221 ◽

2019 ◽

Vol 886 ◽

pp. 221-226 ◽

Cited By ~ 1

Author(s):

Kesinee Boonchuay

Keyword(s):

Naive Bayes ◽

Geometric Mean ◽

Nearest Neighbors ◽

Naïve Bayes ◽

Teaching Evaluation ◽

Sentiment Classification ◽

Teaching Skills ◽

K Nearest Neighbors ◽

Overall Performance ◽

F Measure

Sentiment classification gains a lot of attention nowadays. For a university, the knowledge obtained from classifying sentiments of student learning in courses is highly valuable, and can be used to help teachers improve their teaching skills. In this research, sentiment classification based on text embedding is applied to enhance the performance of sentiment classification for Thai teaching evaluation. Text embedding techniques considers both syntactic and semantic elements of sentences that can be used to improve the performance of the classification. This research uses two approaches to apply text embedding for classification. The first approach uses fastText classification. According to the results, fastText provides the best overall performance; its highest F-measure was at 0.8212. The second approach constructs text vectors for classification using traditional classifiers. This approach provides better performance over TF-IDF for k-nearest neighbors and naïve Bayes. For naïve Bayes, the second approach yields the best performance of geometric mean at 0.8961. The performance of TF-IDF is better suited to using decision tree than the second approach. The benefit of this research is that it presents the workflow of using text embedding for Thai teaching evaluation to improve the performance of sentiment classification. By using embedding techniques, similarity and analogy tasks of texts are established along with the classification.

Download Full-text

Evaluating the Performance of Supervised Classification Models: Decision Tree and Naïve Bayes Using KNIME

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.5.20079 ◽

2018 ◽

Vol 7 (4.5) ◽

pp. 248 ◽

Cited By ~ 1

Author(s):

Syed Muzamil Basha ◽

Dharmendra Singh Rajput ◽

Ravi Kumar Poluru ◽

S. Bharath Bhushan ◽

Shaik Abdul Khalandar Basha

Keyword(s):

Decision Tree ◽

Classification Accuracy ◽

Supervised Classification ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Task ◽

Classification Models ◽

Target Variable ◽

Input Variables ◽

F Measure

The classification task is to predict the value of the target variable from the values of the input variables. If a target is provided as part of the dataset, then classification is a supervised task. It is important to analysis the performance of supervised classification models before using them in classification task. In our research we would like to propose a novel way to evaluated the performance of supervised classification models like Decision Tree and Naïve Bayes using KNIME Analytics platform. Experiments are conducted on Multi variant dataset consisting 58000 instances, 9 columns associated specially for classification, collected from UCI Machine learning repositories (http://archive.ics.uci.edu/ml/datasets/statlog+(shuttle)) and compared the performance of both the models in terms of Classification Accuracy (CA) and Error Rate. Finally, validated both the models using Metric precision, recall and F-measure. In our finding, we found that Decision tree acquires CA (99.465%) where as Naïve Bayes attain CA (90.358%). The F-measure of Decision tree is 0.984, whereas Naïve Bayes acquire 0.7045.

Download Full-text

A Naïve Bayes Approach to Classifying Topics in Suicide Notes

Biomedical Informatics Insights ◽

10.4137/bii.s8945 ◽

2012 ◽

Vol 5s1 ◽

pp. BII.S8945 ◽

Cited By ~ 9

Author(s):

Irena Spasić ◽

Pete Burnap ◽

Mark Greenwood ◽

Michael Arribas-Ayllon

Keyword(s):

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Suicide Notes ◽

Matching Rules ◽

F Measure

The authors present a system developed for the 2011 i2b2 Challenge on Sentiment Classification, whose aim was to automatically classify sentences in suicide notes using a scheme of 15 topics, mostly emotions. The system combines machine learning with a rule-based methodology. The features used to represent a problem were based on lexico–semantic properties of individual words in addition to regular expressions used to represent patterns of word usage across different topics. A naïve Bayes classifier was trained using the features extracted from the training data consisting of 600 manually annotated suicide notes. Classification was then performed using the naïve Bayes classifier as well as a set of pattern–matching rules. The classification performance was evaluated against a manually prepared gold standard consisting of 300 suicide notes, in which 1,091 out of a total of 2,037 sentences were associated with a total of 1,272 annotations. The competing systems were ranked using the micro-averaged F-measure as the primary evaluation metric. Our system achieved the F-measure of 53% (with 55% precision and 52% recall), which was significantly better than the average performance of 48.75% achieved by the 26 participating teams.

Download Full-text

A Comparative Study of Bug Classification Algorithms

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194014500053 ◽

2014 ◽

Vol 24 (01) ◽

pp. 111-138 ◽

Cited By ~ 6

Author(s):

Naresh Kumar Nagwani ◽

Shrish Verma

Keyword(s):

Comparative Study ◽

Naive Bayes ◽

Rbf Neural Network ◽

Naïve Bayes ◽

Support Vector ◽

Adaptive Boosting ◽

Software Bugs ◽

The Comparative Study ◽

Bug Repositories ◽

F Measure

The performance of ten classic algorithms to classify the software bugs for different bug repositories are compared. The algorithms included in the study are Naïve Bayes, Naïve Bayes Multinomial, Discriminative Multinomial Naïve Bayes (DMNB), J48, Support Vector Machine, Radial Basis Function (RBF) Neural Network, Classification using Clustering, Classification using Regression, Adaptive Boosting (AdaBoost) and Bagging. These algorithms are applied on four open source bug repositories namely Android, JBoss-Seam, Mozilla and MySql. The classification is evaluated using 10-fold cross validation technique. The accuracy and F-measure parameters are compared for all of the algorithms. The concept of software bug taxonomy hierarchy is also introduced with eleven standard bug categories (classes). The comparative study also covers the effect of number of categories over performance of classifiers in terms of accuracy and F-measure. The results are produced in tabular and graphical forms.

Download Full-text

Pemodelan Prediksi Status Keberlanjutan Polis Asuransi Kendaraan dengan Teknik Pemilihan Mayoritas Menggunakan Algoritma-Algoritma Klasifikasi Data Mining

Prosiding Seminar Nasional Teknoka ◽

10.22236/teknoka.v5i.391 ◽

2020 ◽

Vol 5 ◽

pp. 19-24

Author(s):

Dyah Retno Utari ◽

Arief Wibowo

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Majority Voting ◽

Support Vector ◽

F Measure

Asuransi kendaraan bermotor merupakan jenis usaha pertanggungan terhadap kerugian atau risiko kerusakan yang dapat timbul dari berbagai macam potensi kejadian yang menimpa kendaraan. Persaingan dalam bisnis asuransi khususnya untuk kendaraan bermotor menuntut inovasi dan strategi agar keberlangsungan bisnis tetap terjamin. Salah satu upaya yang dapat dilakukan perusahaan adalah memprediksi status keberlanjutan polis asuransi kendaraan dengan menganalisis data-data profil dan transaksi nasabah. Prediksi terhadap keputusan pemegang polis menjadi sangat penting bagi perusahaan, karena dapat menentukan strategi pemasaran yang mempengaruhi keputusan pelanggan untuk pembaharuan polis asuransi. Penelitian ini telah mengusulkan suatu model prediksi status keberlanjutan polis asuransi kendaraan dengan teknik pemilihan mayoritas dari hasil klasifikasi menggunakan algoritma- algoritma data mining seperti Naive Bayes, Support Vector Machine dan Decision Tree. Hasil pengujian menggunakan confusion matrix menunjukkan nilai akurasi terbaik diperoleh sebesar 93,57%, apapun untuk nilai precision mencapai 97,20%, dan nilai recall sebesar 95,20% serta nilai F-Measure sebesar 95,30%. Nilai evaluasi model terbaik dihasilkan menggunakan pendekatan pemilihan mayoritas (majority voting), mengungguli kinerja model prediksi berbasis pengklasifikasi tunggal.

Download Full-text

Algoritma Naïve Bayes Untuk Klasifikasi Penerima Bantuan Pangan Non Tunai ( Studi Kasus Kelurahan Utama )

Techno Com ◽

10.33633/tc.v18i4.2587 ◽

2019 ◽

Vol 18 (4) ◽

pp. 321-331

Author(s):

Castaka Agus Sugianto ◽

Firdi Rizky Maulana

Keyword(s):

Data Mining ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

T Test ◽

Model Data ◽

F Measure

Kelurahan Utama merupakan instansi pemerintahan di cimahi selatan. Kelurahan utama menjalankan program pemerintah yaitu program Bantuan Pangan Non Tunai, dalam menjalankan program Bantuan Pangan Non Tunai sebagian warga banyak yang mengeluh karena tidak mendapat bantuan, sedangkan ada beberapa warga yang dianggap mampu justru mendapatkan bantuan. Berdasarkan latar belakang tersebut maka penulis melakukan proses pengolahan data menggunakan data mining untuk mengklasifikasi penerima dan bukan penerima bantuan pangan non tunai dengan metode klasifikasi menggunakan Algoritma Naïve Bayes dan Algoritma Decision Tree sebagai pembanding. Diharapkan data yang dihasilkan dari proses data mining bisa menjadi bahan evaluasi untuk pemerintah. Dalam penelitian ini penulis mengklasifikasi data penerima dan bukan penerima bantuan pangan non tunai menggunakan teknik klasifikasi pada data mining menggunakan Algoritma Naïve Bayes dan Algoritma Decision Tree sebagai pembanding. Model data mining di buat menggunakan RapidMiner, dengan hasil nilai Probabilitas untuk class ‘’PENERIMA’’ yaitu 0,481 dengan pembulatan nilai menjadi 0,48 dan nilai Probabilitas untuk class ‘’Bukan Penerima’’ yaitu 0,519 dengan pembulatan nilai menjadi 0,52. Algoritma Naïve Bayes mempunyai tingkat Accuracy sebesar 58,29%, Precision 92,90%, Recall 21,84%, AUC 0,765, F-Measure 34.42%. Sedangkan algoritma Decision Tree mempunyai tingkat Accuracy sebesar 73,97%, Precision 85,04%, Recall 61,92%, AUC 0,746, F-Measure 71,17%. Dalam hasil pengujian T-Test antara Algoritma Naive Bayes dan Algoritma Decision Tree didapat alpha ≤ 0.000, maka dapat disimpulkan pengujian T-Test antara Algoritma Naïve Bayes dan Algoritma Decision Tree hasilnya signifikan.

Download Full-text

Designing a Data Mining System to Predict Treatment-Requiring Retinopathy of Prematurity in Neonates: A Pilot Study

Iranian Journal of Pediatrics ◽

10.5812/ijp.103094 ◽

2021 ◽

Vol In Press (In Press) ◽

Author(s):

Farshid Khorasani ◽

Ramak Roohi poor ◽

Afsar Dastjani Farahani ◽

Azam Orooji ◽

Mohammad Reza Zarkesh

Keyword(s):

Risk Factors ◽

Data Mining ◽

Predictive Value ◽

Naive Bayes ◽

Naïve Bayes ◽

Data Mining Techniques ◽

Screening Programs ◽

Software Models ◽

Study Results ◽

F Measure

Background: Nowadays with advanced improvement in NICUs, more preterm infants are surviving with more risks related to ROP. Objectives: The aim of the present study was to collect ROP risk factors and design data mining techniques to suggest a predictive ROP treatment-requiring model. Methods: A cross-sectional study was carried out in an Iranian hospital (2014 - 2018). The population study consisted of 76 preterm neonates with ROP diagnosis. Of all, retinopathy was treated in 35 cases and others had not received any treatment associated with retinopathy. The pre-set questionnaire was used to extract the risk factors leading to treatment-requiring retinopathy. Then specific software models were designed for predicting ROP treatment-requiring model. In order to compare the performance of data mining methods, several performance metrics such as accuracy, precision, sensitivity, specificity, and F-measure have been used. Results: Seventy neonates with ROP entered the study. Results have shown that among four models, Naive Bayes had the best performance with the highest accuracy (87.14), precision (96.43), sensitivity (77.14) and F-measure (85.71). Confusion matrix for Naive Bayes classifier showed that positive predictive value and negative predictive value were 0.7714 and 0.9714, respectively. Overall 87.14% of all data were correctly classified. Moreover, of all data mining techniques, decision tree model could indicate understandable findings as follow; if oxygen therapy continues more than 16 days or blood infusion is > 6 units of packed cells then patients need treatment. Conclusions: The results of the present study have demonstrated that data mining techniques could be effectively implemented in ROP screening programs.

Download Full-text