An Interpretable Pipeline for Identifying At-Risk Students

2021 ◽  
pp. 073563312110381
Author(s):  
Bo Pei ◽  
Wanli Xing

This paper introduces a novel approach to identify at-risk students with a focus on output interpretability through analyzing learning activities at a finer granularity on a weekly basis. Specifically, this approach converts the predicted output from the former weeks into meaningful probabilities to infer the predictions in the current week for maintaining the consecutiveness among learning activities. To demonstrate the efficacy of our model in identifying at-risk students, we compare the weekly AUCs and averaged performance (i.e., accuracy, precision, recall, and f1-score) over each course with the baseline models (i.e., Random Forest, Support Vector Machine, and Decision Tree), respectively. Furthermore, we adopt a Top- K metric to examine the number of at-risk students that the model is able to identify with high precision during each week. Finally, the model output is interpreted through a model-agnostic interpretation approach to support instructors to make informed recommendations for students’ learning. The experimental results demonstrate the capability and interpretability of our model in identifying at-risk students in online learning settings. In addition to that our work also provides significant implications in building accountable machine learning pipelines that can be used to automatically generated individualized learning interventions while considering fairness between different learning groups.

2016 ◽  
Vol 23 (2) ◽  
pp. 124 ◽  
Author(s):  
Douglas Detoni ◽  
Cristian Cechinel ◽  
Ricardo Araujo Matsumura ◽  
Daniela Francisco Brauner

Student dropout is one of the main problems faced by distance learning courses. One of the major challenges for researchers is to develop methods to predict the behavior of students so that teachers and tutors are able to identify at-risk students as early as possible and provide assistance before they drop out or fail in their courses. Machine Learning models have been used to predict or classify students in these settings. However, while these models have shown promising results in several settings, they usually attain these results using attributes that are not immediately transferable to other courses or platforms. In this paper, we provide a methodology to classify students using only interaction counts from each student. We evaluate this methodology on a data set from two majors based on the Moodle platform. We run experiments consisting of training and evaluating three machine learning models (Support Vector Machines, Naive Bayes and Adaboost decision trees) under different scenarios. We provide evidences that patterns from interaction counts can provide useful information for classifying at-risk students. This classification allows the customization of the activities presented to at-risk students (automatically or through tutors) as an attempt to avoid students drop out.


Learning data analytics improves the learning field in higher education using educational data for extracting useful patterns and making better decision. Identifying potential at-risk students may help instructors and academic guidance to improve the students’ performance and the achievement of learning outcomes. The aim of this research study is to predict at early phases the student’s failure in a particular course using the standards-based grading. Several machines learning techniques were implemented to predict the student failure based on Support Vector Machine, Multilayer Perceptron, Naïve Bayes, and decision tree. The results on each technique shows the ability of machine learning algorithms to predict the student failure accurately after the third week and before the course dropout week. This study provides a strong knowledge for student performance in all courses. It also provides faculty members the ability to help student at-risk by focusing on them and providing necessary support to improve their performance and avoid failure.


2007 ◽  
Vol 44 (1) ◽  
pp. 13-17
Author(s):  
Gene White ◽  
Douglas Lare ◽  
Suzanne Mueller ◽  
Patricia Smeaton ◽  
Faith Waters

Author(s):  
Ashok Kumar Veerasamy ◽  
Daryl D'Souza ◽  
Rolf Lindén ◽  
Mikko-Jussi Laakso

This paper presents a Support Vector Machine predictive model to determine if prior programming knowledge and completion of in-class and take home formative assessment tasks might be suitable predictors of examination performance. Student data from the academic years 2012 - 2016 for an introductory programming course was captured via ViLLE e-learning tool for analysis. The results revealed that student prior programming knowledge and assessment scores captured in a predictive model, is a good fit of the data. However, while overall success of the model is significant, predictions on identifying at-risk students is neither high nor low and that persuaded us to include two more research questions. However, our preliminary post analysis on these test results show that on average students who secured less than 70% in formative assessment scores with little or basic prior programming knowledge in programming may fail in the final programming exam and increase the prediction accuracy in identifying at-risk students from 46% to nearly 63%. Hence, these results provide immediate information for programming course instructors and students to enhance teaching and learning process. 


2021 ◽  
Vol 13 (22) ◽  
pp. 12461
Author(s):  
Chih-Chang Yu ◽  
Yufeng (Leon) Wu

While the use of deep neural networks is popular for predicting students’ learning outcomes, convolutional neural network (CNN)-based methods are used more often. Such methods require numerous features, training data, or multiple models to achieve week-by-week predictions. However, many current learning management systems (LMSs) operated by colleges cannot provide adequate information. To make the system more feasible, this article proposes a recurrent neural network (RNN)-based framework to identify at-risk students who might fail the course using only a few common learning features. RNN-based methods can be more effective than CNN-based methods in identifying at-risk students due to their ability to memorize time-series features. The data used in this study were collected from an online course that teaches artificial intelligence (AI) at a university in northern Taiwan. Common features, such as the number of logins, number of posts and number of homework assignments submitted, are considered to train the model. This study compares the prediction results of the RNN model with the following conventional machine learning models: logistic regression, support vector machines, decision trees and random forests. This work also compares the performance of the RNN model with two neural network-based models: the multi-layer perceptron (MLP) and a CNN-based model. The experimental results demonstrate that the RNN model used in this study is better than conventional machine learning models and the MLP in terms of F-score, while achieving similar performance to the CNN-based model with fewer parameters. Our study shows that the designed RNN model can identify at-risk students once one-third of the semester has passed. Some future directions are also discussed.


2020 ◽  
Author(s):  
Pablo Schoeffel ◽  
Vinicius Faria Culmant Ramos ◽  
Raul Sidnei Wazlawick

Despite being a problem reported in a long time, the high rate of dropout and failure in computing courses remains a problem. Although there is a strong relationship between the motivation and the students outcome, few works use the motivation as a factor to identify students at risk. This work presents and evaluates a method to identify features that allow predicting at-risk students in introductory computing courses, based on four main components: pre-university factors, initial motivation, motivation through the course, and professor perception. The method created, named EMMECS, was applied with 245 students from different programs in four different universities in southern Brazil. We carried out several simulations of prediction, using ten different classification algorithms and different datasets. As a result, using support vector machine and AdaBoostM1 algorithms, we identified on average more than 80% of students that would fail, since the first week of the study. The results show that the proposed method is effective compared with related works and it has as advantages its independence of programmatic content, specific assessments, grades, and interaction with learning systems. Furthermore, the method allows the weekly prediction, with good results since the first few weeks.


1998 ◽  
Vol 29 (2) ◽  
pp. 109-116 ◽  
Author(s):  
Margie Gilbertson ◽  
Ronald K. Bramlett

The purpose of this study was to investigate informal phonological awareness measures as predictors of first-grade broad reading ability. Subjects were 91 former Head Start students who were administered standardized assessments of cognitive ability and receptive vocabulary, and informal phonological awareness measures during kindergarten and early first grade. Regression analyses indicated that three phonological awareness tasks, Invented Spelling, Categorization, and Blending, were the most predictive of standardized reading measures obtained at the end of first grade. Discriminant analyses indicated that these three phonological awareness tasks correctly identified at-risk students with 92% accuracy. Clinical use of a cutoff score for these measures is suggested, along with general intervention guidelines for practicing clinicians.


Sign in / Sign up

Export Citation Format

Share Document