Perbandingan Algoritme C4.5 and Naive Bayes untuk Mengetahui Masa Studi

Siti Nur‘Aisyah

doi:10.36596/jcse.v1i2.75

Perbandingan Algoritme C4.5 and Naive Bayes untuk Mengetahui Masa Studi

Journal of Computer Science and Engineering (JCSE) ◽

10.36596/jcse.v1i2.75 ◽

2020 ◽

Vol 1 (2) ◽

pp. 116-127

Author(s):

Siti Nur‘Aisyah

Keyword(s):

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Test Results ◽

Testing Methods ◽

Study Program ◽

Engineering Study ◽

Gender Achievement ◽

F Measure

This research was to make comparisons between Algorithms C4.5 and Naive Bayes which is implemented on the data of graduation of Universitas Amikom Purwokerto students from 2011 to 2013 for Strata 1, Informatics Engineering study program for the renowned study. Attributes are NIM, gender, Achievement Index Semester 1 through 6 and graduation period. The test results with both algorithms use the Selection-Based Correlation feature feature (CFS) and testing methods using Confusion Matrix. Known Algorithm C4.5 has an accuracy of 72.679% with Precision value of 0.742, Remember 0.936 and F - Measure 0.828 whereas Naive Bayes obtained an accuracy of 73.6074% with a Precision value of 0.755, Remember 0.924 and F - Measure 0.831

Download Full-text

Perbandingan Metode Klasifikasi Data Mining untuk Nasabah Bank Telemarketing

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v20i1.826 ◽

2020 ◽

Vol 20 (1) ◽

pp. 139-148

Author(s):

Pungkas Subarkah ◽

Enggar Pri Pambudi ◽

Septi Oktaviani Nur Hidayah

Keyword(s):

Data Mining ◽

Cross Validation ◽

Naive Bayes ◽

Confusion Matrix ◽

Regression Trees ◽

Classification And Regression Trees ◽

Naïve Bayes ◽

University Of California ◽

Classification And Regression ◽

F Measure

Bank merupakan perusahaan yang memiliki data yang besar yang tersimpan di dalam database dan diolah menghasilkan sebuah informasi yang saling berkaitan tentang nasabah. Bank, harus memiliki ide dan terobosan baru guna mengetahui kendala pada nasabah telemarketing yang ingin melakukan deposito pada Bank tersebut, agar Bank terhindar dari ancaman krisis keuangan. Penelitian ini menguji keberhasilan Bank telemarketing dengan cara melakukan klasifikasi keputusan nasabah dengan menerapkan data mining. Metode yang di gunakan algoritma Classification and Regression Trees (CART) dan naive bayes menggunakan dataset diambil dari University of California Irvine (UCI) Repository Learning. Adapun metode validasi dan evaluasi yang digunakan yaitu 10-cross validation dan confusion matrix. Hasil akurasi pada algoritma CART yaitu 89.51% dengan nilai precision 87%, Recall 89% dan F-Measure 88% dan pada algoritma naive bayes mendapatkan nilai akurasi sebesar 86.88% dengan nilai precision 87%, Recall 86% dan F-Measure 87%. Dari hasil tersebut dapat disimpulkan bahwa algoritma CART lebih baik dalam memprediksi keputusan nasabah telemarketing tepat dalam penawaran deposito.

Download Full-text

Uji Performa Algoritma Naïve Bayes untuk Prediksi Masa Studi Mahasiswa

Creative Information Technology Journal ◽

10.24076/citec.2019v6i1.178 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Irkham Widhi Saputro ◽

Bety Wulan Sari

Keyword(s):

Data Mining ◽

Cross Validation ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Study Program ◽

New Students ◽

Using Data ◽

The Many ◽

Fold Cross Validation

Universitas AMIKOM Yogyakarta adalah salah satu perguruan tinggi yang memiliki ribuan mahasiswa baru khususnya pada prodi Informatika. Pada tahun 2012 tercatat ada 1009 mahasiswa baru, dan pada tahun 2013 juga tercatat ada sebanyak 859 mahasiswa baru. Namun sayangnya, dari sekian banyak mahasiswa hanya sekitar 50% saja yang dapat lulus dengan tepat waktu. Data tersebut untuk membuat sistem klasifikasi menggunakan teknik data mining dengan metode Naïve Bayes. Dataset yang akan digunakan sebanyak 300 data yang bersumber dari data alumni angkatan 2012, dan 2013 dengan masing-masing data sebanyak 150. Data yang diperoleh memiliki 144 mahasiswa dengan keterangan lulus tepat waktu, dan 156 mahasiswa dengan keterangan lulus tidak tepat waktu. Proses pengujian akan dilakukan menggunakan metode 10-Fold Cross Validation, dan Confusion Matrix. Hasil pengujian menunjukkan bahwa rata-rata performa dari model Naïve Bayes mempunyai nilai akurasi sebesar 68%, nilai precision sebesar 61.3%, nilai recall sebesar 65.3%, dan nilai f1-score sebesar 61%. Nilai performa dari model dapat dipengaruhi oleh dataset yang digunakan untuk pembuatan model.Kata Kunci — data mining, Naïve Bayes, K-Fold Cross Validation, Confusion MatrixAMIKOM Yogyakarta University is one of the colleges that has thousands of new students, especially in the Informatics study program. In 2012 there were 1009 new students, and in 2013 there were 859 new students. But unfortunately, of the many students only around 50% can graduate on time. The data is to make the classification system using data mining techniques with the Naïve Bayes method. The dataset will be used as much as 300 data sourced from alumni data of 2012, and 2013 with each data as much as 150. The data obtained has 144 students with information passed on time, and 156 students with graduation information not on time. The testing process will be carried out using the 10-Fold Cross Validation, and Confusion Matrix method. The test results show that the average performance of the Naïve Bayes model has an accuracy value of 68%, precision value is 61.3%, recall value is 65.3%, and f1-score is 61%. The performance value of the model can be influenced by the dataset used for modeling.Keywords — data mining, classification, Naïve Bayes, graduation time

Download Full-text

Pemodelan Prediksi Status Keberlanjutan Polis Asuransi Kendaraan dengan Teknik Pemilihan Mayoritas Menggunakan Algoritma-Algoritma Klasifikasi Data Mining

Prosiding Seminar Nasional Teknoka ◽

10.22236/teknoka.v5i.391 ◽

2020 ◽

Vol 5 ◽

pp. 19-24

Author(s):

Dyah Retno Utari ◽

Arief Wibowo

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Majority Voting ◽

Support Vector ◽

F Measure

Asuransi kendaraan bermotor merupakan jenis usaha pertanggungan terhadap kerugian atau risiko kerusakan yang dapat timbul dari berbagai macam potensi kejadian yang menimpa kendaraan. Persaingan dalam bisnis asuransi khususnya untuk kendaraan bermotor menuntut inovasi dan strategi agar keberlangsungan bisnis tetap terjamin. Salah satu upaya yang dapat dilakukan perusahaan adalah memprediksi status keberlanjutan polis asuransi kendaraan dengan menganalisis data-data profil dan transaksi nasabah. Prediksi terhadap keputusan pemegang polis menjadi sangat penting bagi perusahaan, karena dapat menentukan strategi pemasaran yang mempengaruhi keputusan pelanggan untuk pembaharuan polis asuransi. Penelitian ini telah mengusulkan suatu model prediksi status keberlanjutan polis asuransi kendaraan dengan teknik pemilihan mayoritas dari hasil klasifikasi menggunakan algoritma- algoritma data mining seperti Naive Bayes, Support Vector Machine dan Decision Tree. Hasil pengujian menggunakan confusion matrix menunjukkan nilai akurasi terbaik diperoleh sebesar 93,57%, apapun untuk nilai precision mencapai 97,20%, dan nilai recall sebesar 95,20% serta nilai F-Measure sebesar 95,30%. Nilai evaluasi model terbaik dihasilkan menggunakan pendekatan pemilihan mayoritas (majority voting), mengungguli kinerja model prediksi berbasis pengklasifikasi tunggal.

Download Full-text

IMPLEMENTASI ALGORITMA MULTINOMIAL NAIVE BAYES CLASSIFIER

JURNAL TEKNIK INFORMATIKA ◽

10.15408/jti.v10i2.6822 ◽

2018 ◽

Vol 10 (2) ◽

pp. 109-118

Author(s):

Anif Hanifa Setianingrum ◽

Dea Herwinda Kalokasari ◽

Imam Marzuki Shofi

Keyword(s):

Text Mining ◽

Classification System ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Classification Code ◽

F Measure

ABSTRAK Informasi diperkirakan lebih dari 80% tersimpan dalam bentuk teks tidak terstruktur. Oleh karena itu, dibutuhkan sistem pengelolaan teks yaitu dengan metode text mining yang diyakini memiliki potensial nilai komersial tinggi. Salah satu implementasi dari text mining yaitu klasifikasi teks. Tidak hanya dokumen, pemanfaatan klasifikasi juga digunakan pada surat. Peneliti mengkaji Multinomial Naive Bayes Classifier untuk mengklasifikasi surat keluar sehingga dapat menentukan nomor surat secara otomatis. Sistem klasifikasi didukung dengan confix-stripping stemmer untuk menemukan kata dasar dan TF-IDF untuk pembobotan kata. Pengujian diukur dengan menggunakan confusion matrix. Dari hasil pengujian menunjukkan bahwa implementasi Multinomial Naive Bayes Classifier pada sistem klasifikasi surat memiliki tingkat accuracy, precision, recall, dan F-measure berturut-turut sebesar 89,58%, 79,17%, 78,72%, dan 77,05%. ABSTRACT The information estimated that more than 80% is stored in the form of unstructured text. Therefore, it takes a text management system, namely text mining method is believed to have high potential commercial. One of text mining implementation is text classification. Not only documents, the use of classification is also used in official letter. Researcher examined Multinomial Naive Bayes Classifier to classify the letter so it can determine the letters classification code automatically. The classification system is supported by confix-stripping stemmer to find root and TF-IDF for term weighting. The test used by confusion matrix of a classified as a measure of its quality. The test results showed that the implementation of Multinomial Naive Bayes Classifier on letter classification system has a level of accuracy, precision, recall, and F-measure respectively for 89.58%, 79.17%, 78.72% and 77.05%.How to Cite : Setianingrum, A. H. Kalokasari, D.H . Shofi. I. M. (2017). IMPLEMENTASI ALGORITMA MULTINOMIAL NAIVE BAYES CLASSIFIER. Jurnal Teknik Informatika, 10(2), 109-118. doi: 10.15408/jti.v10i2.6822Permalink/DOI: http://dx.doi.org/10.15408/jti.v10i2.6822

Download Full-text

Implementasi Metode Bayes untuk Menentukan Potensi Diri Beserta Pengaruhnya Terhadap IPK Mahasiswa

Creative Information Technology Journal ◽

10.24076/citec.2019v6i1.228 ◽

2020 ◽

Vol 6 (1) ◽

pp. 38

Author(s):

Agung Jasuma ◽

Kusrini Kusrini ◽

M Rudyanto Arief

Keyword(s):

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Study Program ◽

Grade Point ◽

Self Potential

Tingginya minat siswa untuk melanjutkan sekolah ke jenjang yang lebih tinggi berpengaruh pada tingginya jumlah mahasiswa yang terdaftar di Indonesia. Kesulitan dalam menyelesaikan perkuliahan menjadi masalah yang sering terjadi, salah satu penyebabnya adalah ketidak-sesuaian potensi diri dengan program studi yang dipilih. Penelitian ini menjadikan FIK Universitas Amikom Yogyakarta sebagai tempat studi kasus untuk mencari tahu korelasi potensi diri dengan IPK mahasiswa, mencari potensi diri terbaik pada masing-masing program studi serta mencari tahu akurasi metode naive bayes dalam mengklasifikasi potensi diri mahasiswa. Responden yang digunakan dalam penelitian ini berjumlah 50 orang yang terdiri dari mahasiswa minimal semester 4, alumni dan 1 pakar. Pengumpulan data menggunakan metode wawancara dan koesioner, sedangkan pengolahan data menggunakan metode naive bayes classifier, confusion matrix untuk pengujian, dan korelasi pearson product moment untuk mencari tahu ada tidaknya korelasi. Penelitian ini mendapatkan hasil bahwa potensi diri kemampuan logika, visual dan interpersonal berpengaruh terhadap tingginya IPK mahasiswa dimana nilai signifikansi logika=0.043<0.05, interpersonal=0,029<0.05 dan visual=0,05<0.05. Kemampuan logika cenderung akan berdampak baik pada IPK mahasiswa prodi S1-SI, S1-IF, serta S1-TK, sedangkan kemampuan visual berdampak baik pada program studi S1-TI, D3-MI dan D3-IF. Naive bayes juga diketahui memiliki tingkat akurasi sebesar 90,625% dalam mengklasifikasikan mahasiswa berdasarkan potensi diri. Kata Kunci — bayes, IPK, mahasiswa The high interest of students to continue their education has an effect on the high number of students in Indonesia. Difficulties in completing lectures become a problem that often occurs, one of the reasons is incompatibility study program and student talents. This research made FIK Amikom University Yogyakarta as a case study to find out the correlation of Grade Point average (GPA) and student talents, best talents that’s needed in each study program and the accuracy of naive bayes in classifying students' talents, 50 people consisting of students at least semester 4, alumni and 1 expert as respondents. Data collection uses interview and questionnaire methods, while data processing uses the naive bayes classifier, confusion matrix for testing, and Pearson product moment correlation to find out whether there is correlation. This study found that the self-potential logic, visual and interpersonal abilities influence the high GPA of students where the significance value of logic=0.043<0.05, interpersonal=0.029<0.05 and visual=0.05<0.05. Logical ability tends to have a good impact on the GPA of S1-SI, S1-IF, and S1-TK study program, while visual abilities have an impact on S1-TI, D3-MI and D3-IF. Naive Bayes is also known to have an accuracy rate of 90.625% in classifying students based on their talents. Keywords — bayes, GPA, students

Download Full-text

Sentimen Analisis Komentar Toxic pada Grup Facebook Game Online Menggunakan Klasifikasi Naïve Bayes

Jurnal Informatika Universitas Pamulang ◽

10.32493/informatika.v5i3.6571 ◽

2020 ◽

Vol 5 (3) ◽

pp. 356

Author(s):

Renaldy Permana Sidiq ◽

Budi Arif Dermawan ◽

Yuyun Umaidah

Keyword(s):

Social Media ◽

Feature Selection ◽

Naive Bayes ◽

Information Gain ◽

Text Processing ◽

Confusion Matrix ◽

Naïve Bayes ◽

Classification Model ◽

Testing Data ◽

F Measure

Toxic comments are comments made by social media users that contain expressions of hatred, condescension, threatening, and insulting. Social media users who are on average still teenagers with a nature that still cannot be controlled completely becomes a matter of great concern when they comment, their comments can be studied as text processing. Sentiment analysis can be used as a solution to identifying toxic comments by dividing them into two classifications. Where the data used amounted to 1,500 taken from social media Facebook in the private group Arena of Valor community. The dataset is divided into 2 classes: toxic and non-toxic. This research uses Naive Bayes with TF-IDF transformation and Information Gain feature selection and use distribution ratio 80:20. It will be compared the results of the evaluation where Naive Bayes without transformation, using TF-IDF transformation, and TF-IDF using Information Gain feature selection. The results of the comparison of evaluations from confusion matrix that have been carried out obtained the best classification model is to use the ratio of training and testing data 80:20 with TF-IDF transformation resulting in an accuracy of 75%, precision of 63%, recall of 67%, and F-measure of 64%.

Download Full-text

QUERY ANSWERING SYSTEM OF SHAHIH HADITH MUTTAFAQUN ‘ALAIH USING INDONESIAN THESAURUS BASED ON QUERY EXPANSION AND NAÏVE BAYES CLASSIFIER

MATICS ◽

10.18860/mat.v12i1.8320 ◽

2020 ◽

Vol 12 (1) ◽

pp. 10

Author(s):

Muhammad Fairuz Zumar Rounaqi

Keyword(s):

Query Expansion ◽

Naive Bayes ◽

Naïve Bayes ◽

Test Results ◽

Bayes Classifier ◽

Font Size ◽

Average Precision ◽

Average Recall ◽

Average Accuracy ◽

F Measure

Abstract— Hadith are all the words, deeds and provisions of the Prophet Muhammad SAW that are used as the second of Islamic law after Al-Quran. The purpose of this study is to make an Information Retrieval system called the Query Answering System is expected to facilitate users in searching and finding the hadith documents as the user's needs. This study implements the Naïve Bayes Classifier method combined with Indonesian thesaurus as a query expansion to find the hadith documents that relevant to the input query. Based on the testing of 50 query data, the test results show that the use of query expansion gives better results than without using query expansion. Where based on testing of the top 1 data without using query expansion obtained an average recall value of 62%, an average precision value of 62%, an average accuracy value of 92.4% and an average value of the f-measure of 62%, while testing using query expansion obtained an average recall value of 66%, an average precision value of 66%, an average accuracy value of 93.2% and an average f -measure value of 66%. Based on the test results, the use of query expansion shows an improvement in the average recall value of 4%, an improvement in the average precision value of 4%, and an improvement in the average accuracy value of 0.8% and an improvement in the average f-measure value of 4% compared on without using query expansion. Index Terms—hadith, information retrieval, query expansion, naïve bayes.

Download Full-text

Sentiment Analysis of Movie Opinion in Twitter Using Dynamic Convolutional Neural Network Algorithm

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.19237 ◽

2018 ◽

Vol 12 (1) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Fajar Ratnawati ◽

Edi Winarko

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Test Results ◽

Network Algorithm ◽

The People ◽

Neural Network Algorithm ◽

F Measure

Movie has unique characteristics. When someone writes an opinions about a movie, not only the story in the movie itself is written, but also the people involved in the movie are also written. Opinion ordinary movie written in social media primarily twitter.To get a tendency of opinion on the movie, whether opinion is likely positive, negative or neutral, it takes a sentiment analysis. This study aims to classify the sentiment is positive, negative and neutral from opinions Indonesian language movie and look for the accuracy, precission, recall and f-meausre of the method used is Dynamic Convolutional Neural Network. The test results on a system that is built to show that Dynamic Convolutional Neural Network algorithm provides accuracy results better than Naive Bayes method, the value of accuracy of 80,99%, the value of precission 81,00%, recall 81,00%, f-measure 79,00% while the value of the resulting accuracy Naive Bayes amounted to 76,21%, precission 78,00%, recall 76,00%, f-measure 75,00%.

Download Full-text

Comparison of Cart and Naive Bayesian Algorithm Performance to Diagnose Diabetes Mellitus

IJIIS: International Journal of Informatics and Information Systems ◽

10.47738/ijiis.v2i1.9 ◽

2019 ◽

Vol 2 (1) ◽

pp. 9-16

Author(s):

Irfan Santiko ◽

Pungkas Subarkah

Keyword(s):

Diabetes Mellitus ◽

Evaluation Method ◽

Naive Bayes ◽

Adult Population ◽

Confusion Matrix ◽

Naïve Bayes ◽

Accuracy Result ◽

Bayes Algorithm ◽

Cart Algorithm ◽

F Measure

Based on Indonesia's health profile in 2008, Diabetes Mellitus is the cause of the ranking of six for all ages in Indonesia with the proportion of deaths of 5.7% under stroke, TB, hypertension, injury and perinatal. This is reinforced by WHO (2003), Diabetes Mellitus disease reached 194 million people or 5.1 percent of the world's adult population and in 2025 is expected to increase to 333 million inhabitants. In particular, in Indonesia, people with Diabetes Mellitus are increasing. In 2000, Diabetes Mellitus sufferers have reached 8.4 million people and it is estimated that the prevalence of Diabetes Mellitus in 2030 in Indonesia reaches 21.3 million people.This allows researchers and practitioners to focus their attention on detecting/diagnosing diabetes mellitus and to prevent it because the disease can cause complications. The method used in this research was problem identification, data collection, pre-processing stage, classification method, validation and evaluation and conclusion. The algorithm used in this research was CART and Naïve Bayes using dataset taken from UCI Indian Pima database repository consisting of clinical data ofpatients who detected positive and negative diabetes mellitus. Validation and evaluation method used was 10-crossvalidation and confusion Matrix for the assessment of precision, recall and F-Measure. The result of calculation has been done, got the accuracy result on CART algorithm equaled to 76.9337% with precision 0.764%, recall 0.769%, and F-Measure 0.765%. Whilethe diabetes dataset was tested with the Naïve Bayes algorithm, got an accuracy of 73.7569% with precision 0.732%, recall 0.738%, and F-Measure 0.734%. From these results it can be concluded that to diagnose diabetes mellitus disease it is suggested to use CART algorithm.

Download Full-text

An Indonesian Hoax News Detection System Using Reader Feedback and Naïve Bayes Algorithm

Cybernetics and Information Technologies ◽

10.2478/cait-2020-0006 ◽

2020 ◽

Vol 20 (1) ◽

pp. 82-94

Author(s):

Badrus Zaman ◽

Army Justitia ◽

Kretawiweka Nuraga Sani ◽

Endah Purwanti

Keyword(s):

Performance Evaluation ◽

System Performance ◽

Naive Bayes ◽

Detection System ◽

Naïve Bayes ◽

Bayes Algorithm ◽

F Measure ◽

System Performance Evaluation

AbstractHoax news in Indonesia spread at an alarming rate. To reduce this, hoax news detection system needs to be created and put into practice. Such a system may use readers’ feedback and Naïve Bayes algorithm, which is used to verify news. Overtime, by using readers’ feedback, database corpus will continue to grow and could improve system performance. The current research aims to reach this. System performance evaluation is carried out under two conditions ‒ with and without sources (URL). The system is able to detect hoax news very well under both conditions. The highest precision, recall and f-measure values when including URL are 0.91, 1, and 0.95 respectively. Meanwhile, the highest value of precision, recall and f-measure without URL are 0.88, 1 and 0.94, respectively.

Download Full-text