Students’ Class Performance Prediction Using Machine Learning Classifiers

Nowadays, educational data mining is being employed as assessing tool for study and analysis of hidden patterns in academic databases which can be used to predict student’s academic performance. This paper implements various machine learning classification techniques on students’ academic records for results predication. For this purpose, data of MS(CS) students were collected from a public university of Pakistan through their assignments, quizzes, and sessional marks. The WEKA data mining tool has been used for performing all experiments namely, data pre-processing, classification, and visualization. For performance measure, classifier models were trained with 3- and 10-fold cross validation methods to evaluate classifiers' accuracy. The results show that bagging classifier combined with support vector machines outperform other classifiers in terms of accuracy, precision, recall, and F-measure score. The obtained outcomes confirm that our research provides significant contribution in prediction of students’ academic performance which can ultimately be used to assists faculty members to focus low grades students in improving their academic records.

Download Full-text

Teknik Resampling untuk Mengatasi Ketidakseimbangan Kelas pada Klasifikasi Penyakit Diabetes Menggunakan C4.5, Random Forest, dan SVM

Techno Com ◽

10.33633/tc.v20i3.4762 ◽

2021 ◽

Vol 20 (3) ◽

pp. 352-361

Author(s):

Wahyu Nugraha ◽

Raja Sabaruddin

Keyword(s):

Machine Learning ◽

Data Mining ◽

Random Forest ◽

Area Under Curve ◽

Support Vector ◽

Pima Indians ◽

R Language ◽

Level Data ◽

Vector Machines ◽

Under Sampling

Penderita diabetes di seluruh dunia terus mengalami peningkatan dengan angka kematian sebesar 4,6 juta pada tahun 2011 dan diperkirakan akan terus meningkat secara global menjadi 552 juta pada tahun 2030. Pencegahan Penyakit diabetes mungkin dapat dilakukan secara efektif dengan cara mendeteksinya sejak dini. Data mining dan machine learning terus dikembangkan agar menjadi alat yang handal dalam membangun model komputasi untuk mengidentifikasi penyakit diabetes pada tahap awal. Namun, masalah yang sering dihadapi dalam menganalisis penyakit diabetes ialah masalah ketidakseimbangan class. Kelas yang tidak seimbang membuat model pembelajaran akan sulit melakukan prediksi karena model pembelajaran didominasi oleh instance kelas mayoritas sehingga mengabaikan prediksi kelas minoritas. Pada penelitian ini kami mencoba menganalisa dan mencoba mengatasi masalah ketidakseimbangan kelas dengan menggunakan pendekatan level data yaitu teknik resampling data. Eksperimen ini menggunakan R language dengan library ROSE (version 0.0-4). Dataset Pima Indians dipilih pada penelitian ini karena merupakan salah satu dataset yang mengalami ketidakseimbangan kelas. Model pengklasifikasian pada penelitian ini menggunakan algoritma decision tree C4.5, RF (Random Forest), dan SVM (Support Vector Machines). Dari hasil eksperimen yang dilakukan model klasifikasi SVM dengan teknik resampling yang menggabungkan over dan under-sampling menjadi model yang memiliki performa terbaik dengan nilai AUC (Area Under Curve) sebesar 0.80

Download Full-text

Prediction of course completion by students of a university in Brazil

Psico-USF ◽

10.1590/1413-82712018230303 ◽

2018 ◽

Vol 23 (3) ◽

pp. 425-436

Author(s):

Alessandra Turini Bolsoni-Silva ◽

Rommel Melgaço Barbosa ◽

Alessandra Salina Brandão ◽

Sonia Regina Loureiro

Keyword(s):

Data Mining ◽

Academic Performance ◽

Support Vector Machines ◽

Young People ◽

University Students ◽

Course Completion ◽

Simple Approach ◽

Support Vector ◽

Reliability Sensitivity ◽

Vector Machines

Abstract The conclusion of the undergraduate course by university students in the time predicted by the curriculum is desirable for young people and for society. The aim was to verify the reliability, sensitivity and specificity of a broad set of predictors for academic performance of university students, who completed the undergraduate course within the time predicted by the curricula, through data mining methodology, provided by the Support Vector Machines algorithm. A simple approach is proposed for the prediction of course completion by students in a university in Brazil. The dataset has 170 students who finished the course and 117 who did not finish. With the proposed methodology, it was possible to predict the course completion by students with an accuracy of 79.5% when using the 19 original variables. An accuracy of 75% was found using only 05 variables: Course, year of the course, gender, initial and final academic performance.

Download Full-text

Machine Learning in Higher Education

Handbook of Research on Emerging Trends and Applications of Machine Learning - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-9643-1.ch002 ◽

2020 ◽

pp. 27-46

Author(s):

Garima Jaiswal ◽

Arun Sharma ◽

Reeti Sarup

Keyword(s):

Higher Education ◽

Machine Learning ◽

Data Mining ◽

At Risk ◽

Educational Data Mining ◽

Training Data ◽

Pedagogical Practices ◽

Dropping Out ◽

Support Vector ◽

Class Labels

Machine learning aims to give computers the ability to automatically learn from data. It can enable computers to make intelligent decisions by recognizing complex patterns from data. Through data mining, humongous amounts of data can be explored and analyzed to extract useful information and find interesting patterns. Classification, a supervised learning technique, can be beneficial in predicting class labels for test data by referring the already labeled classes from available training data set. In this chapter, educational data mining techniques are applied over a student dataset to analyze the multifarious factors causing alarmingly high number of dropouts. This work focuses on predicting students at risk of dropping out using five classification algorithms, namely, K-NN, naive Bayes, decision tree, random forest, and support vector machine. This can assist in improving pedagogical practices in order to enhance the performance of students predicted at risk of dropping out, thus reducing the dropout rates in higher education.

Download Full-text

Educational data mining for predicting students’ academic performance using machine learning algorithms

Materials Today Proceedings ◽

10.1016/j.matpr.2021.05.646 ◽

2021 ◽

Author(s):

Pranav Dabhade ◽

Ravina Agarwal ◽

K.P. Alameen ◽

A.T. Fathima ◽

R. Sridharan ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Academic Performance ◽

Learning Algorithms ◽

Educational Data Mining ◽

Machine Learning Algorithms

Download Full-text

Comparison of Some Classification Algorithms for the Analysis of Students Academic Performance in Educational Data Mining Using Orange

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1394 ◽

2021 ◽

pp. 318-324

Author(s):

Vanthana V

Keyword(s):

Data Mining ◽

Academic Performance ◽

Random Forest ◽

Educational Data Mining ◽

Evaluation Process ◽

Support Vector ◽

Classification Algorithms ◽

Academic Improvement ◽

Academic Information ◽

Tools And Techniques

In the modern education system, many higher education institutions prefer data mining tools and techniques to analyze the academic improvement of their students. To support that many data mining techniques and tools are available. This paper uses the classification concept to analyze the student’s academic performance. This paper presents the comparison result of five classification algorithms – Decision Tree, Naïve Bayesian, K-Nearest Neighbour, Support Vector Machine and Random Forest which is applied to the data collected from three colleges of Assam, India. The data consists of socio-economic, demographic as well as academic information of three hundred students with twenty-four attributes. The data mining tool used was ORANGE. The internal assessment attribute in the continuous evaluation process makes the highest impact in the final semester results of the students in the dataset. The results showed that Random Forest out performs the other classifiers based on accuracy.

Download Full-text

FINDING THE BEST ALGORITHMS AND EFFECTIVE FACTORS IN CLASSIFICATION OF TURKISH SCIENCE STUDENT SUCCESS

Journal of Baltic Science Education ◽

10.33225/jbse/19.18.239 ◽

2019 ◽

Vol 18 (2) ◽

pp. 239-253 ◽

Cited By ~ 2

Author(s):

Enes Filiz ◽

Ersoy Öz

Keyword(s):

Data Mining ◽

Educational Data Mining ◽

Eighth Grade ◽

Support Vector ◽

Educational Strategies ◽

Vector Machines ◽

Eighth Grade Students ◽

Timss 2015 ◽

Mathematics And Science

Educational Data Mining (EDM) is an important tool in the field of classification of educational data that helps researchers and education planners analyse and model available educational data for specific needs such as developing educational strategies. Trends International Mathematics and Science Study (TIMSS) which is a notable study in educational area was used in this research. EDM methodology was applied to the results of TIMSS 2015 that presents data culled from eighth grade students from Turkey. The main purposes are to find the algorithms that are most appropriate for classifying the successes of students, especially in science subjects, and ascertaining the factors that lead to this success. It was found that logistic regression and support vector machines – poly kernel are the most suitable algorithms. A diverse set of features obtained by feature selection methods are “Computer Tablet Shared”, “Extra Lessons Last 12 Month”, “Extra Lessons How Many Month”, “How Far in Education Do You Expect to Go”, “Home Educational Resources”, and “Student Confident in Science” and these features are the most effective features in science success. Keywords: classification algorithms, educational data mining, eighth grade, science success, TIMSS 2015.

Download Full-text

Human Papillomavirus Targeted Immunotherapy Outcome Prediction Using Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37197 ◽

2021 ◽

Vol 9 (VII) ◽

pp. 3598-3611

Author(s):

Vidya Moni

Keyword(s):

Machine Learning ◽

Human Papillomavirus ◽

Outcome Prediction ◽

Performance Comparison ◽

Gradient Boosting ◽

Support Vector ◽

Machine Learning Classification ◽

Nearest Neighbours ◽

Vector Machines ◽

Modern Machine

Warts caused by the Human Papillomavirus (HPV) is a highly contagious disease, and affects several million people across the globe every year, in the form of small lesions on the skin, commonly known as warts. Warts can be treated effectively with several methods, the most effective being Immunotherapy and Cryotherapy. Our research is focused on the performance comparison of modern Machine Learning classification techniques to predict the outcome (positive or negative) of Immunotherapy treatment given to a patient, by using patient data as input features to our classifiers. The precision, recall, f-measure and accuracy were used to compare the performance of the various classifiers considered in this study. We considered Logistic Regression, ZeroR, AdaBoost, K-Nearest Neighbours (KNN), Support Vector Machines (SVM), Gradient Boosting, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), Decision Trees and Random Forests. The ZeroR classifier was used as a baseline to provide us with insights into the skewed nature of the data, so as to enable us to better understand the comparison in performance of the various classifiers.

Download Full-text

Student Intervention System using Machine Learning Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1392.0986s319 ◽

2019 ◽

Vol 8 (6S3) ◽

pp. 2061-2065

Keyword(s):

Machine Learning ◽

Student Performance ◽

Machine Learning Techniques ◽

Support Vector ◽

Student Records ◽

Teaching Methodologies ◽

Machine Learning Classification ◽

Classification Technique ◽

Learning Techniques ◽

Vector Machines

Now a days, the educational institutes are adopting technologies for betterment of student’s quality, in respect to teaching methodologies etc. For which the huge information available with educational institutes can be used to predict student’s future in academics. The main objective of this paper is to predict the student performance in the examination and also to predict the student will graduate or not. Hence forth we are using statistical analytical method which is F1 score. F1 score or F measure is used to test the prediction accuracy by considering precision and recall to compute the score. To fulfill this requirement in machine learning, classification technique is used. The dataset used in this analysis contains 395 student records, having attributes, such as age, health, internet, school, father job, mother job etc. Using support vector machines (SVM), Decision Tree and Naïve Bayes (NB) classification algorithms F1 score is calculated for each algorithm. Based on the analysis done the F1 score of support vector machine is giving the better prediction compared to rest of the two algorithms.

Download Full-text

An Exploratory Study on the Use of Machine Learning to Predict Student Academic Performance

International Journal of Knowledge-Based Organizations ◽

10.4018/ijkbo.2018100104 ◽

2018 ◽

Vol 8 (4) ◽

pp. 67-79 ◽

Cited By ~ 1

Author(s):

Patrick Kenekayoro

Keyword(s):

Higher Education ◽

Machine Learning ◽

Academic Performance ◽

Support Vector Machines ◽

Student Performance ◽

Higher Education Institutions ◽

Classification Model ◽

Support Vector ◽

Student Academic Performance ◽

Vector Machines

Optimal student performance is integral for successful higher education institutions. The consensus is that big data analytics can be used to identify ways for achieving better student academic performance. This article used support vector machines to predict future student performance in computing and mathematics disciplines based on past scores in computing, mathematics and statistics subjects. Past subjects passed by students were ranked with state of art feature selection techniques in an attempt to identify any connection between good performance in a particular discipline and past subject knowledge. Up to 80% classification accuracy was achieved with support vector machines, demonstrating that this method can be developed to produce recommender or guidance systems for students, however the classification model will still benefit from more training examples. The results from this research reemphasizes the possibility and benefits of using machine learning techniques to improve teaching and learning in higher education institutions.

Download Full-text

Support Vector Machines Illuminated

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch201 ◽

2011 ◽

pp. 1071-1076 ◽

Cited By ~ 1

Author(s):

David R. Musicant

Keyword(s):

Machine Learning ◽

Data Mining ◽

Support Vector Machine ◽

Support Vector Machines ◽

Data Storage ◽

Research Data ◽

Support Vector ◽

Machine Learning Technique ◽

Vector Machines ◽

A Current

In recent years, massive quantities of business and research data have been collected and stored, partly due to the plummeting cost of data storage. Much interest has therefore arisen in how to mine this data to provide useful information. Data mining as a discipline shares much in common with machine learning and statistics, as all of these endeavors aim to make predictions about data as well as to better understand the patterns that can be found in a particular dataset. The support vector machine (SVM) is a current machine learning technique that performs quite well in solving common data mining problems.

Download Full-text