scholarly journals Using Educational Data Mining to Predict Students’ Academic Performance for Applying Early Interventions

10.28945/4835 ◽  
2021 ◽  
Vol 20 ◽  
pp. 121-137
Author(s):  
Sarah Alturki ◽  
Nazik Alturki ◽  
Heiner Stuckenschmidt

Aim/Purpose: One of the main objectives of higher education institutions is to provide a high-quality education to their students and reduce dropout rates. This can be achieved by predicting students’ academic achievement early using Educational Data Mining (EDM). This study aims to predict students’ final grades and identify honorary students at an early stage. Background: EDM research has emerged as an exciting research area, which can unfold valuable knowledge from educational databases for many purposes, such as identifying the dropouts and students who need special attention and discovering honorary students for allocating scholarships. Methodology: In this work, we have collected 300 undergraduate students’ records from three departments of a Computer and Information Science College at a university located in Saudi Arabia. We compared the performance of six data mining methods in predicting academic achievement. Those methods are C4.5, Simple CART, LADTree, Naïve Bayes, Bayes Net with ADTree, and Random Forest. Contribution: We tested the significance of correlation attribute predictors using four different methods. We found 9 out of 18 proposed features with a significant correlation for predicting students’ academic achievement after their 4th semester. Those features are student GPA during the first four semesters, the number of failed courses during the first four semesters, and the grades of three core courses, i.e., database fundamentals, programming language (1), and computer network fundamentals. Findings: The empirical results show the following: (i) the main features that can predict students’ academic achievement are the student GPA during the first four semesters, the number of failed courses during the first four semesters, and the grades of three core courses; (ii) Naïve Bayes classifier performed better than Tree-based Models in predicting students’ academic achievement in general, however, Random Forest outperformed Naïve Bayes in predicting honorary students; (iii) English language skills do not play an essential role in students’ success at the college of Computer and Information Sciences; and (iv) studying an orientation year does not contribute to students’ success. Recommendations for Practitioners: We would recommend instructors to consider using EDM in predicting students’ academic achievement and benefit from that in customizing students’ learning experience based on their different needs. Recommendation for Researchers: We would highly endorse that researchers apply more EDM studies across various universities and compare between them. For example, future research could investigate the effects of offering tutoring sessions for students who fail core courses in their first semesters, examine the role of language skills in social science programs, and examine the role of the orientation year in other programs. Impact on Society: The prediction of academic performance can help both teachers and students in many ways. It also enables the early discovery of honorary students. Thus, well-deserved opportunities can be offered; for example, scholarships, internships, and workshops. It can also help identify students who require special attention to take an appropriate intervention at the earliest stage possible. Moreover, instructors can be aware of each student’s capability and customize the teaching tasks based on students’ needs. Future Research: For future work, the experiment can be repeated with a larger dataset. It could also be extended with more distinctive attributes to reach more accurate results that are useful for improving the students’ learning outcomes. Moreover, experiments could be done using other data mining algorithms to get a broader approach and more valuable and accurate outputs.

2020 ◽  
Vol 4 (1) ◽  
pp. 95-101 ◽  
Author(s):  
Edi Sutoyo ◽  
Ahmad Almaarif

The quality of students can be seen from the academic achievements, which are evidence of the efforts made by students. Student academic achievement is evaluated at the end of each semester to determine the learning outcomes that have been achieved. If a student cannot meet certain academic criteria that are stated by fulfilling the requirements to continue his studies, the student may have the potential to not graduate on time or even Drop Out (DO). The high number of students who do not graduate on time or DO in higher education institutions can be minimized by detecting students who are at risk in the early stages of education and is supported by making policies that can direct students to complete their education. Also, if the time for completion of student studies can be predicted then the handling of students will be more effective. One technique for making predictions that can be used is data mining techniques. Therefore, in this study, the Naive Bayes Classifier (NBC) algorithm will be used to predict student graduation at Telkom University. The dataset was obtained from the Information Systems Directorate (SISFO), Telkom University which contained 4000 instance data. The results of this study prove that NBC was successfully implemented to predict student graduation. Prediction of the graduation of these students is able to produce an accuracy of 73,725%, precision 0.742, recall 0.736 and F-measure of 0.735.


2017 ◽  
Vol 3 (1) ◽  
Author(s):  
Mohamad Fajarianditya Nugroho ◽  
Setyoningsih Wibowo

Data mining dalam dunia pendidikan dikenal dengan Educational Data Mining. EDM mengembangkan metode untuk menggali data pendidikan dan menggunakan metode tersebut untuk lebih memahami siswa. EDM dapat membantu pendidik untuk menganalisis cara belajar, mendeteksi mahasiswa yang memerlukan dukungan dan memprediksi kinerja mahasiswa. Perguruan tinggi perlu melakukan prediksi perilaku mahasiswa dan peringatan dini untuk mencegah secara dini kegagalan akademik mahasiswa. Naive Bayes memanfaatkan fungsi seleksi fitur dari Forward Selection untuk pemilihan atribut data dengan karakteristik data itu sendiri, dan meningkatkan ketepatan klasifikasi Naïve Bayes. Forward Selection berbasis Naive Bayes lebih akurat dan efektif dalam mengklasifikasikan status kelulusan mahasiswa dengan hasil akurasi 97,14% dan termasuk dalam kategori “excellent classification” dan memperoleh atribut yang berpengaruh yaitu: status pekerjaan dan IPK semester 4


TEM Journal ◽  
2021 ◽  
pp. 1738-1744
Author(s):  
Joseph Teguh Santoso ◽  
Ni Luh Wiwik Sri Rahayu Ginantra ◽  
Muhammad Arifin ◽  
R Riinawati ◽  
Dadang Sudrajat ◽  
...  

The purpose of this research is to choose the best method by comparing two classification methods of data mining C4.5 and Naïve Bayes on Educational Data Mining, in which the data used is student graduation data consisting of 79 records. Both methods are tested for validation with 10-ford X Validation and perform a T-Test difference test to produce a table that contains the best method ranking. Different results were obtained for each method. Based on the results of these two methods, it is very influential on the dataset and the value of the area under curve in the Naïve Bayes method is better than the C4.5 method in various datasets. Comparison of the method with the 10-Ford X Validation test and the T-Test difference test is that the Naïve Bayes method is better than C4.5 with an average accuracy value of 73.41% and an under-curve area of 0.664.


2020 ◽  
Vol 7 (6) ◽  
pp. 1237
Author(s):  
Anita Desiani ◽  
Sugandi Yahdin ◽  
Desty Rodiah

<p align="justify"><em>Educational data mining</em> (EDM) adalah suatu bidang aplikasi antara pendidikan dan komputer. Salah satu yang dapat dilakukan pada EDM adalah memprediksi tingkat prestasi mahasiswa. Tingkat indeks prestasi kumulatif (IPK) akademik mahasiswa sangat penting karena menentukan tingkat kelulusan dan kualiatas institusi pendidikan. Penelitian ini bertujuan untuk menganalisa atribut-atribut yang mempengaruhi tingkat indeks prestasi kumulatif (IPK) mahasiswa yang berasal dari faktor eksternal pada mahasiswa. Adapun atribut yang digunakan adalah 10 variabel atribut yaitu nilai TOEFL, pendidikan ayah, pendidikan ibu, pekerjaan ayah, pekerjaan ibu, asal daerah, tempat tinggal selama kuliah dan tingkat prestasi akademik yang dicapai. Hasil akurasi pengolahan dengan menggunakan Algoritma C4.5 adalah 75,18% dan <em>Naive Bayes</em> 74,47% menunjukkan bahwa model dan atribut yang digunakan baik untuk memprediksi tingkat IPK  mahasiswa. Algoritma C4.5 mampu menunjukkan atribut apa yang berpengaruh langsung pada tingkat IPK  mahasiswa yaitu Nilai TOEFL, jam belajar, pendidikan ayah, pekerjaan ayah, dan tempat tinggal mahasiswa.  Algoritma C4.5 tidak mampu  memperhitungkan peluang suatu klasifikasi jika jumlah  instan pada klasifikasi tersebut sangat sedikit pada kejadian data. Sebaliknya <em>Naive Bayes</em> tetap mampu memperhitungkan peluang kemunculan dan ketepatannya informasi yang dihasilkan  meski jumlah instan yang sedikit. Dalam penelitian ini data mahasiswa yang memiliki tingkat IPK <em>cumlaude</em> sangat sedikit, namun <em>Naive Bayes</em> tetap mampu mengukur <em>Recall</em> pada kelas ini sebesar 28,6% dan <em>Precision</em> sebesar 40%. </p>


Educational organizations are unique and play the utmost significant role in the development of any country. In the Educational database, due to the enormous volume of data for predicting student's achievement becomes more complicated. To upgrade a student's performance and triumph is more efficient in a practical way using Educational Data Mining Techniques. Data Mining Techniques could deliver favor and brunt to educators and academic institutions. The student's data ((i.e.) Name,10th %,12th cut off, CGPA, No of arrears, etc.) are gathered. Then, the datasets are imported into the Anaconda Navigator. Then, analysis and classification based on attributes of the students and the schemes are performed. Then using the prediction algorithm Naïve Bayes what are all the features the particular student is eligible for are predicted as placed. The student's input that has disparate data about their past and present academics report and then apply the Naïve Bayes algorithm using Anaconda Navigator to search the student's achievement for placement. A proposed methodology based on a classification approach to finding an improved estimation method for predicting the placement for students. This project can find the association for academic achievement of each particular student and their placement achievement in campus selection.


2019 ◽  
Vol 15 (2) ◽  
pp. 275-280
Author(s):  
Agus Setiyono ◽  
Hilman F Pardede

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam.  One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.


2020 ◽  
Vol 10 (1) ◽  
pp. 12
Author(s):  
Ekka Pujo Ariesanto Akhmad

<strong> </strong>Bagian pemasaran bank sudah menampung data dari nasabah atau pelanggan bank dengan cara memasarkan atau mensosialisasikan kartu kredit lewat telepon (telemarketing). Evaluasi telemarketing kartu kredit yang sudah dilakukan bank masih kurang membawa hasil dan berdaya guna. Salah satu cara yang tepat untuk evaluasi laporan telemarketing kartu kredit bank adalah menggunakan teknik data mining. Tujuan penggunaan data mining untuk mengetahui kecenderungan dan pola nasabah yang berpeluang untuk berlangganan kartu kredit yang ditawarkan bank. Metode penelitian menggunakan Cross Industry Standard Process for Data Mining (CRISP-DM) dengan Algoritma Genetika untuk Seleksi Fitur (GAFS) dan Naive Bayes (NB). Hasil penelitian menunjukkan jumlah atribut pada dataset telemarketing kartu kredit bank sejumlah 15 atribut terdiri dari 14 atribut biasa dan 1 atribut spesial. Dataset telemarketing bank mengandung data berdimensi tinggi, sehingga diterapkan metode GAFS. Setelah menerapkan metode GAFS diperoleh 7 atribut optimal terdiri dari 6 atribut biasa dan 1 atribut spesial. Enam atribut biasa meliputi pekerjaan, balance, rumah, pinjaman, durasi, poutcome. Sedangkan atribut spesial adalah target. Hasil penelitian menunjukkan algoritma NB mempunyai nilai akurasi <em>86,71</em>%. Algoritma GAFS dan NB meningkatkan nilai akurasi menjadi <em>90,27</em>% untuk prediksi nasabah bank yang mengambil kartu kredit.


2018 ◽  
Vol 12 (2) ◽  
pp. 119-126 ◽  
Author(s):  
Vikas Chaurasia ◽  
Saurabh Pal ◽  
BB Tiwari

Breast cancer is the second most leading cancer occurring in women compared to all other cancers. Around 1.1 million cases were recorded in 2004. Observed rates of this cancer increase with industrialization and urbanization and also with facilities for early detection. It remains much more common in high-income countries but is now increasing rapidly in middle- and low-income countries including within Africa, much of Asia, and Latin America. Breast cancer is fatal in under half of all cases and is the leading cause of death from cancer in women, accounting for 16% of all cancer deaths worldwide. The objective of this research paper is to present a report on breast cancer where we took advantage of those available technological advancements to develop prediction models for breast cancer survivability. We used three popular data mining algorithms (Naïve Bayes, RBF Network, J48) to develop the prediction models using a large dataset (683 breast cancer cases). We also used 10-fold cross-validation methods to measure the unbiased estimate of the three prediction models for performance comparison purposes. The results (based on average accuracy Breast Cancer dataset) indicated that the Naïve Bayes is the best predictor with 97.36% accuracy on the holdout sample (this prediction accuracy is better than any reported in the literature), RBF Network came out to be the second with 96.77% accuracy, J48 came out third with 93.41% accuracy.


Sign in / Sign up

Export Citation Format

Share Document