scholarly journals Applicability of Traditional Classification Techniques on Educational Data

2020 ◽  
Vol 8 (6) ◽  
pp. 1672-1677

Student performance prediction and analysis is an essential part of higher educational institutions, which helps in overall betterment of the educational system. Various traditional Data Mining (DM) techniques like Regression, Classification, etc. are prominently utilized for analyzing the data coming from educational settings. The usage of DM in the area of academics is called Educational Data Mining (EDM). The current pilot study aims to determine the applicability of these standalone classification techniques namely; Decision Tree, BayesNet, Nearest Neighbor, Rule-Based, and Random Forest (RF). The present pilot study uses the WEKA tool to implement traditional classification techniques on a standard dataset containing student academic information and background. The paper also implements feature selection to identify the high influential features from the dataset. It helps in reducing the dimensionality of the dataset as well as enhancing the accuracy of the classifier. The results of classifiers are compared on basis of standard statistical measures like Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Kappa, etc. The results show the applicability of classification algorithms for student performance prediction which will help under-achievers and struggling students to improve. It is found the output that, J48 algorithm of the Decision tree gave the best results. Further, it is deduced from the comparative analysis that individual classifiers give different accuracy on the same dataset due to class imbalance in a multiclass dataset.

Author(s):  
Muhammad Imran ◽  
Shahzad Latif ◽  
Danish Mehmood ◽  
Muhammad Saqlain Shah

Automatic Student performance prediction is a crucial job due to the large volume of data in educational databases. This job is being addressed by educational data mining (EDM). EDM develop methods for discovering data that is derived from educational environment. These methods are used for understanding student and their learning environment. The educational institutions are often curious that how many students will be pass/fail for necessary arrangements. In previous studies, it has been observed that many researchers have intension on the selection of appropriate algorithm for just classification and ignores the solutions of the problems which comes during data mining phases such as data high dimensionality ,class imbalance and classification error etc. Such types of problems reduced the accuracy of the model. Several well-known classification algorithms are applied in this domain but this paper proposed a student performance prediction model based on supervised learning decision tree classifier. In addition, an ensemble method is applied to improve the performance of the classifier. Ensemble methods approach is designed to solve classification, predictions problems. This study proves the importance of data preprocessing and algorithms fine-tuning tasks to resolve the data quality issues. The experimental dataset used in this work belongs to Alentejo region of Portugal which is obtained from UCI Machine Learning Repository. Three supervised learning algorithms (J48, NNge and MLP) are employed in this study for experimental purposes. The results showed that J48 achieved highest accuracy 95.78% among others.


2020 ◽  
Vol 17 (9) ◽  
pp. 4548-4552
Author(s):  
Vikas Rattan ◽  
Ruchi Mittal ◽  
Varun Malik

Tremendous growth of educational institutions forced educational institutes to adopt data mining techniques to bring out important and yet unknown facts from educational data to have a competitive edge over their counterparts. In this paper, student performance dataset comprises of 131 records is taken from UCI repository and data mining tool Orange is used to study the comparative analyses of accuracy for classifying the performance of student in graduation using four classifiers namely random forest, k nearest neighbor (KNN), decision tree and naïve bayes. The result shows that decision tree accuracy is highest among all other classifier


The students’ performance in higher education has become one of the most widely studied area. Modelling student performance play a pivotal role in forecasting students’ performance where the data mining applications are now becoming most widely used techniques in this study. There are various factors, which determine the student performance. Eight attributes are used as input, which is considered most influential in determining students’ performance in the Pacific. Statistical analysis is done to see which attribute has the highest influence to student performance. In this research, different algorithms are utilized for building the classification model, each of them using various classification techniques. Some of classification techniques used are Artificial Neural Network, Decision Tree, Decision Table, and Naïve Bayes. The WEKA explorer application and R software are used for correlation test between different variables. The dataset used in this research is an imbalanced set, which is later transformed to balance set through under sampling. Neural Network is one of the classification techniques that has done well on both, imbalanced and balanced dataset. Another technique which has done well is Decision tree. Statistical analysis shows that internal assessment has weak positive relationship with student performance while demographic data is not. Further observations are reported in this research in relation to two types of datasets with application to different classification techniques


2018 ◽  
Vol 37 (4) ◽  
pp. 1087 ◽  
Author(s):  
Y.K. Saheed ◽  
T.O. Oladele ◽  
A.O. Akanni ◽  
W.M. Ibrahim

2021 ◽  
Vol 10 (3) ◽  
pp. 121-127
Author(s):  
Bareen Haval ◽  
Karwan Jameel Abdulrahman ◽  
Araz Rajab

This article presents the results of connecting an educational data mining techniques to the academic performance of students. Three classification models (Decision Tree, Random Forest and Deep Learning) have been developed to analyze data sets and predict the performance of students. The projected submission of the three classificatory was calculated and matched. The academic history and data of the students from the Office of the Registrar were used to train the models. Our analysis aims to evaluate the results of students using various variables such as the student's grade. Data from (221) students with (9) different attributes were used. The results of this study are very important, provide a better understanding of student success assessments and stress the importance of data mining in education. The main purpose of this study is to show the student successful forecast using data mining techniques to improve academic programs. The results of this research indicate that the Decision Tree classifier overtakes two other classifiers by achieving a total prediction accuracy of 97%.


Author(s):  
Maryam Zaffar ◽  
Manzoor Ahmad Hashmani ◽  
K.S. Savita ◽  
Syed Sajjad Hussain Rizvi ◽  
Mubashar Rehman

The Educational Data Mining (EDM) is a very vigorous area of Data Mining (DM), and it is helpful in predicting the performance of students. Student performance prediction is not only important for the student but also helpful for academic organization to detect the causes of success and failures of students. Furthermore, the features selected through the students’ performance prediction models helps in developing action plans for academic welfare. Feature selection can increase the prediction accuracy of the prediction model. In student performance prediction model, where every feature is very important, as a neglection of any important feature can cause the wrong development of academic action plans. Moreover, the feature selection is a very important step in the development of student performance prediction models. There are different types of feature selection algorithms. In this paper, Fast Correlation-Based Filter (FCBF) is selected as a feature selection algorithm. This paper is a step on the way to identifying the factors affecting the academic performance of the students. In this paper performance of FCBF is being evaluated on three different student’s datasets. The performance of FCBF is detected well on a student dataset with greater no of features.


2016 ◽  
Vol 7 (4) ◽  
Author(s):  
Mochammad Yusa ◽  
Ema Utami ◽  
Emha T. Luthfi

Abstract. Readmission is associated with quality measures on patients in hospitals. Different attributes related to diabetic patients such as medication, ethnicity, race, lifestyle, age, and others result in the calculation of quality care that tends to be complicated. Classification techniques of data mining can solve this problem. In this paper, the evaluation on three different classifiers, i.e. Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes with various settingparameter, is developed by using 10-Fold Cross Validation technique. The targets of parameter performance evaluated is based on term of Accuracy, Mean Absolute Error (MAE), dan Kappa Statistic. The selected dataset consists of 47 attributes and 49.735 records. The result shows that k-NN classifier with k=100 has a better performance in term of accuracy and Kappa Statistic, but Naive Bayes outperforms in term of MAE among other classifiers. Keywords: k-NN, naive bayes, diabetes, readmissionAbstrak. Proses Readmisi dikaitkan dengan perhitungan kualitas penanganan pasien di rumah sakit. Perbedaan atribut-atribut yang berhubungan dengan pasien diabetes proses medikasi, etnis, ras, gaya hidup, umur, dan lain-lain, mengakibatkan perhitungan kualitas cenderung rumit. Teknik klasifikasi data mining dapat menjadi solusi dalam perhitungan kualitas ini. Teknik klasifikasi merupakan salah satu teknik data mining yang perkembangannya cukup signifikan. Di dalam penelitian ini, model algoritma klasifikasi Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes dengan berbagai parameter setting akan dievaluasi performanya berdasarkan nilai performa Accuracy, Mean AbsoluteError (MAE), dan Kappa Statistik dengan metode 10-Fold Cross Validation. Dataset yang dievaluasi memiliki 47 atribut dengan 49.735 records. Hasil penelitian menunjukan bahwa performa accuracy, MAE, dan Kappa Statistik terbaik didapatkan dari Model Algoritma Naive Bayes.Kata Kunci: k-NN, naive bayes, diabetes, readmisi


Sign in / Sign up

Export Citation Format

Share Document