The Best Ensemble Learner of Bagged Tree Algorithm for Student Performance Prediction

Author(s):  
Afiqah Zahirah Zakaria ◽  
Ali Selamat ◽  
Hamido Fujita ◽  
Ondrej Krejcar

Student performance is the most factor that can be beneficial for many parties, including students, parents, instructors, and administrators. Early prediction is needed to give the early monitor by the responsible person in charge of developing a better person for the nation. In this paper, the improvement of Bagged Tree to predict student performance based on four main classes, which are distinction, pass, fail, and withdrawn. The accuracy is used as an evaluation parameter for this prediction technique. The Bagged Tree with the addition of Bag, AdaBoost, RUSBoost learners helps to predict the student performance with the massive datasets. The use of the RUSBoost algorithm proved that it is very suitable for the imbalance datasets as the accuracy is 98.6% after implementing the feature selection and 99.1% without feature selection compared to other learner types even though the data is more than 30,000 datasets.

Author(s):  
Maryam Zaffar ◽  
Manzoor Ahmad Hashmani ◽  
K.S. Savita ◽  
Syed Sajjad Hussain Rizvi ◽  
Mubashar Rehman

The Educational Data Mining (EDM) is a very vigorous area of Data Mining (DM), and it is helpful in predicting the performance of students. Student performance prediction is not only important for the student but also helpful for academic organization to detect the causes of success and failures of students. Furthermore, the features selected through the students’ performance prediction models helps in developing action plans for academic welfare. Feature selection can increase the prediction accuracy of the prediction model. In student performance prediction model, where every feature is very important, as a neglection of any important feature can cause the wrong development of academic action plans. Moreover, the feature selection is a very important step in the development of student performance prediction models. There are different types of feature selection algorithms. In this paper, Fast Correlation-Based Filter (FCBF) is selected as a feature selection algorithm. This paper is a step on the way to identifying the factors affecting the academic performance of the students. In this paper performance of FCBF is being evaluated on three different student’s datasets. The performance of FCBF is detected well on a student dataset with greater no of features.


Author(s):  
Haixia Lu ◽  
Jinsong Yuan

It is a hot issue to be widely studied to determine the factors affecting students' performance from the perspective of data mining. In order to find the key factors that significantly affect students' performance from complex data, this paper pro-poses an integrated Optimized Ensemble Feature Selection Algorithm by Density Peaks (DPEFS). This algorithm is applied to the education data collected by two high schools in China, and the selected discriminative features are used to con-struct a student performance prediction model based on support vector machine (SVM). The results of the 10-fold cross-validation experiment show that, com-pared with various feature selection algorithms such as mRMR, Relief, SVM-RFE and AVC, the SVM student performance prediction model based on the fea-ture selection algorithm proposed in this paper has better prediction performance. In addition, some factors and rules affecting student performance can be extracted from the discriminative features selected by the feature selection algorithm in this paper, which provides a methodological and technical reference for teachers, edu-cation management staffs and schools to predict and analyze the students’ per-formances.


2021 ◽  
Vol 3 (3) ◽  
Author(s):  
Wokili Abdullahi ◽  
Mary Ogbuka Kenneth ◽  
Morufu Olalere

Features in educational data are ambiguous which leads to noisy features and curse of dimensionality problems. These problems are solved via feature selection. There are existing models for features selection. These models were created using either a single-level embedded, wrapperbased or filter-based methods. However single-level filter-based methods ignore feature dependencies and ignore the interaction with the classifier. The embedded and wrapper based feature selection methods interact with the classifier, but they can only select the optimal subset for a particular classifier. So their selected features may be worse for other classifiers. Hence this research proposes a robust Cascade Bi-Level (CBL) feature selection technique for student performance prediction that will minimize the limitations of using a single-level technique. The proposed CBL feature selection technique consists of the Relief technique at first-level and the Particle Swarm Optimization (PSO) at the second-level. The proposed technique was evaluated using the UCI student performance dataset. In comparison with the performance of the single-level feature selection technique the proposed technique achieved an accuracy of 94.94% which was better than the values achieved by the single-level PSO with an accuracy of 93.67% for the binary classification task. These results show that CBL can effectively predict student performance.


2021 ◽  
Vol 30 (1) ◽  
pp. 511-523
Author(s):  
Ephrem Admasu Yekun ◽  
Abrahaley Teklay Haile

Abstract One of the important measures of quality of education is the performance of students in academic settings. Nowadays, abundant data is stored in educational institutions about students which can help to discover insight on how students are learning and to improve their performance ahead of time using data mining techniques. In this paper, we developed a student performance prediction model that predicts the performance of high school students for the next semester for five courses. We modeled our prediction system as a multi-label classification task and used support vector machine (SVM), Random Forest (RF), K-nearest Neighbors (KNN), and Multi-layer perceptron (MLP) as base-classifiers to train our model. We further improved the performance of the prediction model using a state-of-the-art partitioning scheme to divide the label space into smaller spaces and used Label Powerset (LP) transformation method to transform each labelset into a multi-class classification task. The proposed model achieved better performance in terms of different evaluation metrics when compared to other multi-label learning tasks such as binary relevance and classifier chains.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 219775-219787
Author(s):  
Peichao Jiang ◽  
Xiaodong Wang

Author(s):  
Muhammad Imran ◽  
Shahzad Latif ◽  
Danish Mehmood ◽  
Muhammad Saqlain Shah

Automatic Student performance prediction is a crucial job due to the large volume of data in educational databases. This job is being addressed by educational data mining (EDM). EDM develop methods for discovering data that is derived from educational environment. These methods are used for understanding student and their learning environment. The educational institutions are often curious that how many students will be pass/fail for necessary arrangements. In previous studies, it has been observed that many researchers have intension on the selection of appropriate algorithm for just classification and ignores the solutions of the problems which comes during data mining phases such as data high dimensionality ,class imbalance and classification error etc. Such types of problems reduced the accuracy of the model. Several well-known classification algorithms are applied in this domain but this paper proposed a student performance prediction model based on supervised learning decision tree classifier. In addition, an ensemble method is applied to improve the performance of the classifier. Ensemble methods approach is designed to solve classification, predictions problems. This study proves the importance of data preprocessing and algorithms fine-tuning tasks to resolve the data quality issues. The experimental dataset used in this work belongs to Alentejo region of Portugal which is obtained from UCI Machine Learning Repository. Three supervised learning algorithms (J48, NNge and MLP) are employed in this study for experimental purposes. The results showed that J48 achieved highest accuracy 95.78% among others.


Sign in / Sign up

Export Citation Format

Share Document