Using Machine Learning to Identify At-risk Students in an Introductory Programming Course at a Two-year Public College

10.21203/rs.3.rs-1025335/v5 ◽

2021 ◽

Author(s):

Cameron I. Cooper ◽

Kamea J. Cooper ◽

Cameron Collyer

Keyword(s):

Machine Learning ◽

At Risk ◽

Student Success ◽

At Risk Students ◽

Supervised Machine Learning ◽

Success Rates ◽

Data Set ◽

Public College ◽

Out Of Sample ◽

Early Alert

Abstract Nationally, more than one-third of students enrolling in introductory computer science programming courses (CS101) do not succeed. To improve student success rates, this research team used supervised machine learning to identify students who are “at-risk” of not succeeding in CS101 at a two-year public college. The resultant predictive model accurately identifies \(\approx\)99% of “at-risk” students in an out-of-sample test data set. The programming instructor piloted the use of the model’s predictive factors as early alert triggers to intervene with individualized outreach and support across three course sections of CS101 in fall 2020. The outcome of this pilot study was a 23% increase in student success and a 7.3 percentage point decrease in the DFW rate. More importantly, this study identified academic, early alert triggers for CS101. Specifically, the first two graded programs are of paramount importance for student success in the course.

Download Full-text

Using Machine Learning to Identify At-risk Students in an Introductory Programming Course at a Two-year Public College

10.21203/rs.3.rs-1025335/v6 ◽

2021 ◽

Author(s):

Cameron I. Cooper ◽

Kamea J. Cooper ◽

Cameron Collyer

Keyword(s):

Machine Learning ◽

At Risk ◽

Student Success ◽

At Risk Students ◽

Supervised Machine Learning ◽

Success Rates ◽

Data Set ◽

Public College ◽

Out Of Sample ◽

Early Alert

Abstract Nationally, more than one-third of students enrolling in introductory computer science programming courses (CS101) do not succeed. To improve student success rates, this research team used supervised machine learning to identify students who are “at-risk” of not succeeding in CS101 at a two-year public college. The resultant predictive model accurately identifies \(\approx\)99% of “at-risk” students in an out-of-sample test data set. The programming instructor piloted the use of the model’s predictive factors as early alert triggers to intervene with individualized outreach and support across three course sections of CS101 in fall 2020. The outcome of this pilot study was a 23% increase in student success and a 7.3 percentage point decrease in the DFW rate. More importantly, this study identified academic, early alert triggers for CS101. Specifically, the first two graded programs are of paramount importance for student success in the course.

Download Full-text

Using Machine Learning to Identify At-risk Students in an Introductory Programming Course at a Two-year Public College

10.21203/rs.3.rs-1096817/v1 ◽

2021 ◽

Author(s):

Cameron I. Cooper ◽

Kamea J. Cooper ◽

Cameron Collyer

Keyword(s):

Machine Learning ◽

At Risk ◽

Student Success ◽

At Risk Students ◽

Supervised Machine Learning ◽

Success Rates ◽

Data Set ◽

Public College ◽

Out Of Sample ◽

Early Alert

Abstract Nationally, more than one-third of students enrolling in introductory computer science programming courses (CS101) do not succeed. To improve student success rates, this research team used supervised machine learning to identify students who are “at-risk” of not succeeding in CS101 at a two-year public college. The resultant predictive model accurately identifies \(\approx\)99% of “at-risk” students in an out-of-sample test data set. The programming instructor piloted the use of the model’s predictive factors as early alert triggers to intervene with individualized outreach and support across three course sections of CS101 in fall 2020. The outcome of this pilot study was a 23% increase in student success and a 7.3 percentage point decrease in the DFW rate. More importantly, this study identified academic, early alert triggers for CS101. Specifically, the first two graded programs are of paramount importance for student success in the course.

Download Full-text

Using Machine Learning to Identify At-risk Students in an Introductory Programming Course at a Two-year Public College

10.21203/rs.3.rs-1096817/v2 ◽

2021 ◽

Author(s):

Cameron I. Cooper ◽

Kamea J. Cooper ◽

Cameron Collyer

Keyword(s):

Machine Learning ◽

At Risk ◽

Student Success ◽

At Risk Students ◽

Supervised Machine Learning ◽

Success Rates ◽

Data Set ◽

Public College ◽

Out Of Sample ◽

Early Alert

Abstract Nationally, more than one-third of students enrolling in introductory computer science programming courses (CS101) do not succeed. To improve student success rates, this research team used supervised machine learning to identify students who are “at-risk” of not succeeding in CS101 at a two-year public college. The resultant predictive model accurately identifies \(\approx\)99% of “at-risk” students in an out-of-sample test data set. The programming instructor piloted the use of the model’s predictive factors as early alert triggers to intervene with individualized outreach and support across three course sections of CS101 in fall 2020. The outcome of this pilot study was a 23% increase in student success and a 7.3 percentage point decrease in the DFW rate. More importantly, this study identified academic, early alert triggers for CS101. Specifically, the first two graded programs are of paramount importance for student success in the course.

Download Full-text

Using Machine Learning to Identify At-risk Students in an Introductory Programming Course

10.21203/rs.3.rs-1025335/v2 ◽

2021 ◽

Author(s):

Cameron I. Cooper ◽

Kamea J. Cooper

Keyword(s):

Machine Learning ◽

At Risk ◽

Student Success ◽

At Risk Students ◽

Supervised Machine Learning ◽

Success Rates ◽

Data Set ◽

Out Of Sample ◽

Sample Test ◽

Early Alert

Abstract Nationally, more than one-third of students enrolling in introductory computer science programming courses (CS101) do not succeed. To improve student success rates, this research team used supervised machine learning to identify students who are “at-risk” of not succeeding in CS101 at a two-year public college. The resultant predictive model accurately identifies \(\approx\)99% of “at-risk” students in an out-of-sample test data set. The programming instructor piloted the use of the model’s predictive factors as early alert triggers to intervene with individualized outreach and support across three course sections of CS101 in fall 2020. The outcome of this pilot study was a 23% increase in student success and a 7.3 percentage point decrease in DFW rate. More importantly, this study identified academic, early alert triggers for CS101. Specifically, the first two graded programs are of paramount importance for student success in the course.

Download Full-text

Using Machine Learning to Identify At-risk Students in an Introductory Programming Course

10.21203/rs.3.rs-1025335/v3 ◽

2021 ◽

Author(s):

Cameron I. Cooper ◽

Kamea J. Cooper

Keyword(s):

Machine Learning ◽

At Risk ◽

Student Success ◽

At Risk Students ◽

Supervised Machine Learning ◽

Success Rates ◽

Data Set ◽

Out Of Sample ◽

Sample Test ◽

Early Alert

Abstract Nationally, more than one-third of students enrolling in introductory computer science programming courses (CS101) do not succeed. To improve student success rates, this research team used supervised machine learning to identify students who are “at-risk” of not succeeding in CS101 at a two-year public college. The resultant predictive model accurately identifies \(\approx\)99% of “at-risk” students in an out-of-sample test data set. The programming instructor piloted the use of the model’s predictive factors as early alert triggers to intervene with individualized outreach and support across three course sections of CS101 in fall 2020. The outcome of this pilot study was a 23% increase in student success and a 7.3 percentage point decrease in DFW rate. More importantly, this study identified academic, early alert triggers for CS101. Specifically, the first two graded programs are of paramount importance for student success in the course.

Download Full-text

Using Machine Learning to Identify At-risk Students in an Introductory Programming Course

10.21203/rs.3.rs-1025335/v1 ◽

2021 ◽

Author(s):

Cameron I. Cooper ◽

Kamea J. Cooper

Keyword(s):

Machine Learning ◽

At Risk ◽

Student Success ◽

At Risk Students ◽

Supervised Machine Learning ◽

Success Rates ◽

Data Set ◽

Out Of Sample ◽

Sample Test ◽

Early Alert

Abstract Nationally, more than one-third of students enrolling in introductory computer science programming courses (CS101) do not succeed. To improve student success rates, this research team used supervised machine learning to identify students who are “at-risk” of not succeeding in CS101 at a two-year public college. The resultant predictive model accurately identifies \(\approx\)99% of “at-risk” students in an out-of-sample test data set. The programming instructor piloted the use of the model’s predictive factors as early alert triggers to intervene with individualized outreach and support across three course sections of CS101 in fall 2020. The outcome of this pilot study was a 23% increase in student success and a 7.3 percentage point decrease in DFW rate. More importantly, this study identified academic, early alert triggers for CS101. Specifically, the first two graded programs are of paramount importance for student success in the course.

Download Full-text

Using Machine Learning to Identify At-risk Students in an Introductory Programming Course

10.21203/rs.3.rs-1025335/v4 ◽

2021 ◽

Author(s):

Cameron I. Cooper ◽

Kamea J. Cooper

Keyword(s):

Machine Learning ◽

At Risk ◽

Student Success ◽

At Risk Students ◽

Supervised Machine Learning ◽

Success Rates ◽

Data Set ◽

Out Of Sample ◽

Sample Test ◽

Early Alert

Abstract Nationally, more than one-third of students enrolling in introductory computer science programming courses (CS101) do not succeed. To improve student success rates, this research team used supervised machine learning to identify students who are “at-risk” of not succeeding in CS101 at a two-year public college. The resultant predictive model accurately identifies \(\approx\)99% of “at-risk” students in an out-of-sample test data set. The programming instructor piloted the use of the model’s predictive factors as early alert triggers to intervene with individualized outreach and support across three course sections of CS101 in fall 2020. The outcome of this pilot study was a 23% increase in student success and a 7.3 percentage point decrease in the DFW rate. More importantly, this study identified academic, early alert triggers for CS101. Specifically, the first two graded programs are of paramount importance for student success in the course.

Download Full-text

Learning to Identify At-Risk Students in Distance Education Using Interaction Counts

Revista de Informática Teórica e Aplicada ◽

10.22456/2175-2745.62211 ◽

2016 ◽

Vol 23 (2) ◽

pp. 124 ◽

Cited By ~ 2

Author(s):

Douglas Detoni ◽

Cristian Cechinel ◽

Ricardo Araujo Matsumura ◽

Daniela Francisco Brauner

Keyword(s):

Machine Learning ◽

At Risk ◽

At Risk Students ◽

Drop Out ◽

Support Vector ◽

Learning Models ◽

Data Set ◽

Student Dropout ◽

Vector Machines ◽

Machine Learning Models

Student dropout is one of the main problems faced by distance learning courses. One of the major challenges for researchers is to develop methods to predict the behavior of students so that teachers and tutors are able to identify at-risk students as early as possible and provide assistance before they drop out or fail in their courses. Machine Learning models have been used to predict or classify students in these settings. However, while these models have shown promising results in several settings, they usually attain these results using attributes that are not immediately transferable to other courses or platforms. In this paper, we provide a methodology to classify students using only interaction counts from each student. We evaluate this methodology on a data set from two majors based on the Moodle platform. We run experiments consisting of training and evaluating three machine learning models (Support Vector Machines, Naive Bayes and Adaboost decision trees) under different scenarios. We provide evidences that patterns from interaction counts can provide useful information for classifying at-risk students. This classification allows the customization of the activities presented to at-risk students (automatically or through tutors) as an attempt to avoid students drop out.

Download Full-text

Detecting Tonic-Clonic Seizures in Multimodal Biosignal Data From Wearables: Methodology Design and Validation (Preprint)

10.2196/preprints.27674 ◽

2021 ◽

Author(s):

Sebastian Böttcher ◽

Elisa Bruno ◽

Nikolay V Manyakov ◽

Nino Epitashvili ◽

Kasper Claes ◽

...

Keyword(s):

Machine Learning ◽

False Alarm ◽

Cross Validation ◽

Electrodermal Activity ◽

Supervised Machine Learning ◽

Data Set ◽

Test Set ◽

Out Of Sample ◽

Clonic Seizures ◽

Epilepsy Monitoring

BACKGROUND Video electroencephalography recordings, routinely used in epilepsy monitoring units, are the gold standard for monitoring epileptic seizures. However, monitoring is also needed in the day-to-day lives of people with epilepsy, where video electroencephalography is not feasible. Wearables could fill this gap by providing patients with an accurate log of their seizures. OBJECTIVE Although there are already systems available that provide promising results for the detection of tonic-clonic seizures (TCSs), research in this area is often limited to detection from 1 biosignal modality or only during the night when the patient is in bed. The aim of this study is to provide evidence that supervised machine learning can detect TCSs from multimodal data in a new data set during daytime and nighttime. METHODS An extensive data set of biosignals from a multimodal watch worn by people with epilepsy was recorded during their stay in the epilepsy monitoring unit at 2 European clinical sites. From a larger data set of 243 enrolled participants, those who had data recorded during TCSs were selected, amounting to 10 participants with 21 TCSs. Accelerometry and electrodermal activity recorded by the wearable device were used for analysis, and seizure manifestation was annotated in detail by clinical experts. Ten accelerometry and 3 electrodermal activity features were calculated for sliding windows of variable size across the data. A gradient tree boosting algorithm was used for seizure detection, and the optimal parameter combination was determined in a leave-one-participant-out cross-validation on a training set of 10 seizures from 8 participants. The model was then evaluated on an out-of-sample test set of 11 seizures from the remaining 2 participants. To assess specificity, we additionally analyzed data from up to 29 participants without TCSs during the model evaluation. RESULTS In the leave-one-participant-out cross-validation, the model optimized for sensitivity could detect all 10 seizures with a false alarm rate of 0.46 per day in 17.3 days of data. In a test set of 11 out-of-sample TCSs, amounting to 8.3 days of data, the model could detect 10 seizures and produced no false positives. Increasing the test set to include data from 28 more participants without additional TCSs resulted in a false alarm rate of 0.19 per day in 78 days of wearable data. CONCLUSIONS We show that a gradient tree boosting machine can robustly detect TCSs from multimodal wearable data in an original data set and that even with very limited training data, supervised machine learning can achieve a high sensitivity and low false-positive rate. This methodology may offer a promising way to approach wearable-based nonconvulsive seizure detection.

Download Full-text