Supervised data mining approach for predicting student performance

<span>Data mining approach has been successfully implemented in higher education and emerge as an interesting area in educational data mining research. The approach is intended for identification and extraction of new and potentially valuable knowledge from the data. Predictive model developed using supervised data mining approach can derive conclusion on students' academic success. The ability to predict student’s performance can be beneficial for innovation in modern educational systems. The main objective of this paper is to develop predictive models using classification algorithm to predict student’s performance at selected university in Malaysia. The prediction model developed can be used to identify the most important attributes in the data. Several predictive modelling techniques of K-Nearest Neighbor, Naïve Bayes, Decision Tree and Logistic Regression Model models were used to predict student’s performance whether excellent or non-excellent. Based on accuracy measure, precision, recall and ROC curve, results show that the Naïve Bayes outperform other classification algorithm. The Naïve Bayes reveals that the most significant factors contributing to prediction of excellent students is when the student scores A+ and A in Multivariate Analysis; A+, A and A- in SAS Programming and A, A- and B+ in ITS 472.</span>

Download Full-text

Performance of Naïve Bayes, C4.5 and KNN using Breast Cancer, Iris and Hypothyroid Datasets

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8795.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2193-2197

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Specific Pattern ◽

K Nearest Neighbor ◽

Data Mining Technique ◽

Digital Format ◽

Tree Classifier

Data mining usually specifies the discovery of specific pattern or analysis of data from a large dataset. Classification is one of an efficient data mining technique, in which class the data are classified are already predefined using the existing datasets. The classification of medical records in terms of its symptoms using computerized method and storing the predicted information in the digital format is of great importance in the diagnosis of various diseases in the medical field. In this paper, finding the algorithm with highest accuracy range is concentrated so that a cost-effective algorithm can be found. Here the data mining classification algorithms are compared with their accuracy of finding exact data according to the diagnosis report and their execution rate to identify how fast the records are classified. The classification technique based algorithms used in this study are the Naive Bayes Classifier, the C4.5 tree classifier and the K-Nearest Neighbor (KNN) to predict which algorithm is the best suited for classifying any kind of medical dataset. Here the datasets such as Breast Cancer, Iris and Hypothyroid are used to predict which of the three algorithms is suitable for classifying the datasets with highest accuracy of finding the records of patients with the particular health problems. The experimental results represented in the form of table and graph shows the performance and the importance of Naïve Bayes, C4.5 and K-Nearest Neighbor algorithms. From the performance outcome of the three algorithms the C4.5 algorithm is a lot better than the Naïve Bayes and the K-Nearest Neighbor algorithm.

Download Full-text

PREDIKSI TINGKAT KELULUSAN TEPAT WAKTU DENGAN METODE NAÏVE BAYES DAN K-NEAREST NEIGHBOR

Jurnal Informasi dan Komputer ◽

10.35959/jik.v7i1.118 ◽

2019 ◽

Vol 7 (1) ◽

pp. 7-16

Author(s):

Sidik Rahmatullah

Keyword(s):

Data Mining ◽

Human Capital ◽

Nearest Neighbor ◽

Naive Bayes ◽

Soft Skills ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Hard Skills

Lulusan adalah status yang dicapai mahasiswa setelah menyelesaikan proses pendidikan sesuai dengan persyaratan kelulusan yang ditetapkan oleh program studi. Sebagai salah satu keluaran langsung dari proses pendidikan yang dilakukan oleh program studi, lulusan yang bermutu memiliki ciri penguasaan kompetensi akademik termasuk hard skills dan soft skills sebagaimana dinyatakan dalam sasaran mutu serta dibuktikan dengan kinerja lulusan di masyarakat sesuai dengan profesi dan bidang ilmu. Program studi yang bermutu memiliki sistem pengelolaan lulusan yang baik sehingga mampu menjadikannya sebagai human capital bagi progam studi yang bersangkutan. Penelitian ini menggunakan metode data mining yang digunakan untuk memprediksi tingkat kelulusan mahasiswa menggunakan dua metode yaitu Naive Bayes dan K-Nearest Neighbor. Hasil dari penelitian ini dapat memprediksi mahasiswa tepat lulus atau terlambat. Uji coba dilakukan dengan menggunakan data lulusan mahasiswa S1 Sistem informasi STMIK Dian Cipta Cendikia Kotabumi sebanyak 600 data untuk training dan 180 data untuk testing. Hasil uji coba menunjukkan bahwa dengan menggunakan Naive Bayes menghasilkan akurasi sebesar 85%, sedangkan menggunakan algoritma K-nearest neighbor menghasilkan akurasi sebesar 68.89 %.

Download Full-text

Optimasi Naive Bayes Menggunakan Algoritma Genetika Sebagai Seleksi Fitur Untuk Memprediksi Performa Siswa

Jurnal Ilmiah Teknologi Informasi Asia ◽

10.32815/jitika.v14i1.400 ◽

2020 ◽

Vol 14 (1) ◽

pp. 31

Author(s):

Suhendro Busono

Keyword(s):

Data Mining ◽

Genetic Algorithm ◽

Student Performance ◽

Parent Education ◽

Naive Bayes ◽

Electronic Media ◽

Naïve Bayes ◽

Parent Support ◽

Long Time ◽

Parent Relation

In this globalisation era, the morality tenegers decrease.This fenomena can be seen on mass or electronic media. Mass or electronic media inform that the negatif case often happend on teenegers community. Negatif case such as brawl, drug, gambling, rape, disobidience to parents, and others. The cause of negatif case is not from himself or hisself but it is triggered by bad customs. The less of parent attention, the low of parent relation quality can inflict bad customs from children. Parent education, parent job, the parent support of education can influence children mainset. How long time children study, how long time children have sparetime, how long time children make friend, and how long time children acess internet can influence mainset of children. The customs of children explained on sentences before, can be measured by science and tecnology. Data Mining that is branch of computer science can measure how much quality children or adult perform based on custom framer indicator. In the last research of student performance using Naive Bayes Methode, the number of attribute is too much (33 attribut) and the score of accuracy is 91.15 %. In this research, the researcher optimize attributes of the last research using Genetic Algorithm. Genetic Algorithm can choose relevant attribut. The choice of relevant attributes can increase score of accuracy. The score of accuracy after using Genetic Algorithm is 97.21 %.

Download Full-text

Analisis Komparatif Evaluasi Performa Algoritma Klasifikasi pada Readmisi Pasien Diabetes

Jurnal Buana Informatika ◽

10.24002/jbi.v7i4.770 ◽

2016 ◽

Vol 7 (4) ◽

Author(s):

Mochammad Yusa ◽

Ema Utami ◽

Emha T. Luthfi

Keyword(s):

Data Mining ◽

Decision Tree ◽

Cross Validation ◽

Nearest Neighbor ◽

Naive Bayes ◽

Kappa Statistic ◽

Naïve Bayes ◽

Validation Dataset ◽

K Nearest Neighbor ◽

Fold Cross Validation

Abstract. Readmission is associated with quality measures on patients in hospitals. Different attributes related to diabetic patients such as medication, ethnicity, race, lifestyle, age, and others result in the calculation of quality care that tends to be complicated. Classification techniques of data mining can solve this problem. In this paper, the evaluation on three different classifiers, i.e. Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes with various settingparameter, is developed by using 10-Fold Cross Validation technique. The targets of parameter performance evaluated is based on term of Accuracy, Mean Absolute Error (MAE), dan Kappa Statistic. The selected dataset consists of 47 attributes and 49.735 records. The result shows that k-NN classifier with k=100 has a better performance in term of accuracy and Kappa Statistic, but Naive Bayes outperforms in term of MAE among other classifiers. Keywords: k-NN, naive bayes, diabetes, readmissionAbstrak. Proses Readmisi dikaitkan dengan perhitungan kualitas penanganan pasien di rumah sakit. Perbedaan atribut-atribut yang berhubungan dengan pasien diabetes proses medikasi, etnis, ras, gaya hidup, umur, dan lain-lain, mengakibatkan perhitungan kualitas cenderung rumit. Teknik klasifikasi data mining dapat menjadi solusi dalam perhitungan kualitas ini. Teknik klasifikasi merupakan salah satu teknik data mining yang perkembangannya cukup signifikan. Di dalam penelitian ini, model algoritma klasifikasi Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes dengan berbagai parameter setting akan dievaluasi performanya berdasarkan nilai performa Accuracy, Mean AbsoluteError (MAE), dan Kappa Statistik dengan metode 10-Fold Cross Validation. Dataset yang dievaluasi memiliki 47 atribut dengan 49.735 records. Hasil penelitian menunjukan bahwa performa accuracy, MAE, dan Kappa Statistik terbaik didapatkan dari Model Algoritma Naive Bayes.Kata Kunci: k-NN, naive bayes, diabetes, readmisi

Download Full-text

An Ingenious Methodology for the Collation of Existing Algorithms for the Prognosis of Student Performance

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2874.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 1749-1752

Keyword(s):

Data Mining ◽

Academic Performance ◽

Random Forest ◽

Student Performance ◽

Naive Bayes ◽

Research Work ◽

Large Data ◽

Naïve Bayes ◽

Impact Factors ◽

Data Mining Technique

In this proposed research work we use a profound Data mining technique which is an automated procedure of discovering interesting patterns by means of comprehensible predictive models from large data sets by grouping them. Predicting a student's academic performance is very crucial especially for universities. Educational Data Mining (EDM) is an approach for extricating useful data that could possibly affect a firm. Nowadays student’s performance is swayed by a lot of aspects. These aspects might involve the academic performance of a student. This subject evaluates numerous factors probably suspected to alter a student’s empirical performance in scholastic, and discover a subjective design which classifies and forecast the student’s learning outcomes. The intention of this research is to conduct a case study on factors swayed by the student’s academic achievements and to dictate greater impact factors. In this paper we focus on the academic achievement evaluation on the basis of correct instances and incorrect instances by means of Naive Bayes and Random Forest algorithms. This paper intends to make a metaphorical assessment of Naive Bayes and random Forest classifier on student data and dictate the best algorithm.

Download Full-text

Sistem Prediksi Penyakit Kanker Serviks Menggunakan CART, Naive Bayes, dan k-NN

Creative Information Technology Journal ◽

10.24076/citec.2017v4i2.100 ◽

2018 ◽

Vol 4 (2) ◽

pp. 83

Author(s):

Tutus Praningki ◽

Indra Budi

Keyword(s):

Data Mining ◽

Decision Tree ◽

Pap Smear ◽

Nearest Neighbor ◽

Naive Bayes ◽

Confusion Matrix ◽

Regression Trees ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Classification And Regression

Tersedianya data histori rekam medis pasien kanker serviks pada institusi pelayanan kesehatan, tidak disertai dengan proses ekstraksi menjadi sebuah pengetahuan atau informasi. Penggunaan teknik data mining sangat berpotensi untuk diimplementasikan kedalam sistem yang dapat melakukan prediksi penyakit kanker serviks. Pada penelitian ini berfokus pada dataset diagnosa medis pasien yang akan melakukan tes Pap Smear. Algoritma yang digunakan untuk melakukan klasifikasi penyakit kanker serviks adalah Classification And Regression Trees (CART), Naive Bayes, dan k-Nearest Neighbor (k-NN). Pengujian yang dilakukan terhadap algoritma CART Decision Tree, Naive Bayes, dan k-NN, menggunakan formula Confusion Matrix, dengan menggunakan teknik pemecahan dataset Holdout. Hasil pengujian terhadap algoritma yang digunakan, menunjukkan algoritma Naive Bayes memiliki akurasi terbaik sebesar 94,44%, sedangkan tingkat akurasi yang dihasilkan algoritma CART dan k-NN adalah 88,89%, 85,04%. Performa yang didapatkan oleh masing-masing algoritma yang digunakan, memungkinkan penggunaan sistem prediksi penyakit kanker serviks untuk mendukung keputusan klinis pada pasien baru.

Download Full-text

PENERAPAN DATA MINING TERHADAP DATA COVID-19 MENGGUNAKAN ALGORITMA KLASIFIKASI

Jurnal Informatika ◽

10.30873/ji.v21i1.2868 ◽

2021 ◽

Vol 21 (1) ◽

pp. 44-52

Author(s):

Rizka Dahlia ◽

Nanik Wuryani ◽

Sri Hadianti ◽

Windu Gata ◽

Arina Selawati

Keyword(s):

Data Mining ◽

South Korea ◽

Respiratory System ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

The World ◽

Bayes Algorithm

Coronavirus 2019 or more commonly referred to as COVID-19 is a type of virus that attacks the respiratory system. Until now the number of spread and the number of deaths caused by this virus continues to increase. As of April 21, 2020, based on data from the WHO, the total number of cases infected with this virus reached 2,397,217 with 162 deaths from all over the world. For South Korea itself, as of March 21, 2020, the total number of infected cases was 10,683 with a total of 237 deaths. In this study, researchers conducted data processing on the spread of COVID-19 in South Korea with Rapidminer using a classification algorithm, namely Naïve Bayes, C4.5, and K-Nearest Neighbor by performing the stages of selection, preprocessing, transfotmating, data mining and interpretation or evaluating the quality of the best accuracy of 80.79% with AUC of 0.881 achieved by the Naïve Bayes algorithm. The distribution of the data found that the influential attribute of the isolated class factor from the patient contained in the sex attribute where more women experienced isolation. Keywords— COVID-19, data mining, classification, C4.5, Naïve Bayes, K-NN

Download Full-text

Penerapan Na ̈ıve Bayes Classifier, K-Nearest Neighbor (KNN) dan Decision Tree untuk Menganalisis Sentimen pada Interaksi Netizen danPemeritah

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v21i1.1092 ◽

2021 ◽

Vol 21 (1) ◽

pp. 139-150

Author(s):

M. Khairul Anam ◽

Bunga Nanti Pikir ◽

Muhammad Bambang Firdaus

Keyword(s):

Data Mining ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

K Nearest Neighbor ◽

Naïve Bayes Classifier ◽

Command Center

Pemerintah Pekanbaru saat ini sudah menerapkan teknologi dalam sistem pemerintahan, penerapannya saat ini masih mendapat keluhan dari masyarakat seperti layanan publik command center yang hanya sebagian masyarakat mengetahuinya dan penerapan cctv yang ada di Alat Pemberi Isyarat Lalu Lintas (APILL) yang belum berfungsi dengan baik. Penerapan teknologi lainnya oleh Pemerintah Pekanbaru dapat kita lihat dari keberadaan portal-portal web situs resmi Pemerintah. Sedangkan untuk melihat beragam komentar netizen dari twitter. Twitter menjadi tempat untuk mendapatkan data yang diungkapkan masyarakat melalui tweets yang diposting ke timeline. Analisa sentimen dilakukan untuk melihat pendapat atau kecenderungan opini netizen terhadap pemerintah Pekanbaru yang mengandung sentimen positif, negatif, dan netral. Data yang digunakan adalah tweet dengan jumlah dataset sebanyak 150 tweets. Data tersebut kemudian di analisa agar menjadi informasi. Analisa dilakukan menggunakan metode data mining yaitu Naïve Bayes Classifier, K-Nearest Neighbor (KNN), dan Decision tree. Penggunaan ketiga pendekatan ini berupaya untuk mengkategorikan hasil komentar netizen terkait penggunaan teknologi yang telah melalui proses analisis sentimen dan membandingkan keakuratan ketiga cara tersebut. Hasil akurasi yang didapatkan cukup beragam yaitu dari metode Naïve Bayes akurasi 100%, metode KKN akurasi 98,25%, dan metode decision tree akurasi 62,28%.

Download Full-text

Educational Data Mining in Predicting Student Final Grades

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/521012021 ◽

2021 ◽

Vol 10 (1) ◽

pp. 366-371

Keyword(s):

Data Mining ◽

Feature Selection ◽

Student Performance ◽

Parameter Optimization ◽

Nearest Neighbor ◽

Educational Data Mining ◽

Real Data ◽

K Nearest Neighbor ◽

Student’S Performance ◽

Student Grades

Educational data mining is a field of science that extracts knowledge from educational data. One of its implementations is to predict student performance, it helps teachers to identify students that need more support. This can potentially increase learning effectiveness and elevate overall student’s grades. There are various algorithms and optimization solutions to predict student’s performance. In this paper, we use real data from one of Indonesia’s public junior high schools to compare naive bayes, decision tree, and k-nearest neighbor algorithms and implement feature selection and parameter optimization to identify which combination of algorithm and optimization can achieve the highest accuracy in predicting student grades, i.e. 7-grade classification.The results show that k-NN achieves the highest accuracy with 77.36%, where both feature selection and parameter optimization are applied

Download Full-text

COMPARISON OF DATA MINING CLASSIFICATION ALGORITHM FOR PREDICTING THE PERFORMANCE OF HIGH SCHOOL STUDENTS

Jurnal Techno Nusa Mandiri ◽

10.33480/techno.v17i1.1226 ◽

2020 ◽

Vol 17 (1) ◽

pp. 22-30

Author(s):

Tiska Pattiasina ◽

Didi Rosiyadi

Keyword(s):

Data Mining ◽

High School Students ◽

Student Performance ◽

Extracurricular Activities ◽

Nearest Neighbor ◽

Classification Algorithm ◽

Added Value ◽

K Nearest Neighbor ◽

School Students ◽

Pocket Money

Data Mining is a series of processes to explore added value in the form of unknown information manually from the database. In the world of data mining education can be used to obtain information about student performance. In this study the researchers took research samples from class XI (eleven) students at SMAN 3 Ambon by classifying student performance based on thirteen attributes, namely: age, sex, school organization, extracurricular activities, pocket money, duration of study at home, duration of social media, online game duration, attendance, illness, permits, semester 1 and semester 2 grades. Using the KDD (Knowledge Discovery Database) method and classification algorithm that will be used, namely, decision tree, Naïve Bayes and K-Nearest Neighbor. And then do the test using k-fold cross validation.

Download Full-text