Predicting Dropout for Nontraditional Undergraduate Students: A Machine Learning Approach

Author(s):  
Huade Huo ◽  
Jiashan Cui ◽  
Sarah Hein ◽  
Zoe Padgett ◽  
Mark Ossolinski ◽  
...  

Student attrition represents one of the greatest challenges facing U.S. postsecondary institutions. Approximately 40 percent of students seeking a bachelor’s degree do not graduate within 6 years; among nontraditional students, who make up half of the undergraduate population, dropout rates are even higher. In this study, we developed a machine learning classifier using the XGBoost model and data from the National Center for Education Statistics (NCES) Beginning Postsecondary Students (BPS) Longitudinal Study: 2012/14 to predict nontraditional student dropout. In comparison with baseline models, the XGBoost model and logistic regression model with features identified by the XGBoost model displayed superior performance in predicting dropout. The predictive ability of the model and the features it identified as being most important in predicting nontraditional student dropout can inform discussion among educators seeking ways to identify and support at-risk students early in their postsecondary careers.

2017 ◽  
Vol 7 (3) ◽  
pp. 42
Author(s):  
Vikash Rowtho

Undergraduate student dropout is gradually becoming a global problem and the 39 Small Islands Developing States (SIDS) are no exception to this trend. The purpose of this research was to develop a method that can be used for early detection of students who are at-risk of performing poorly in their undergraduate studies. A sample of 279 students participated in the study conducted in a Mauritian private tertiary academic institution. Results of regression analyses identified the variables having a significant influence on academic performance. These variables were used in a linear discriminant analysis where 74 percent of the students could be correctly classified into three categories: at-risk, pass or fail. In conclusion, this study has proposed a new technique that can be used by institutions to determine significant academic performance predictors and then identify at-risk students upon whom interventions can be implemented prior to exams to address the problem of dropouts.


2016 ◽  
Vol 23 (2) ◽  
pp. 124 ◽  
Author(s):  
Douglas Detoni ◽  
Cristian Cechinel ◽  
Ricardo Araujo Matsumura ◽  
Daniela Francisco Brauner

Student dropout is one of the main problems faced by distance learning courses. One of the major challenges for researchers is to develop methods to predict the behavior of students so that teachers and tutors are able to identify at-risk students as early as possible and provide assistance before they drop out or fail in their courses. Machine Learning models have been used to predict or classify students in these settings. However, while these models have shown promising results in several settings, they usually attain these results using attributes that are not immediately transferable to other courses or platforms. In this paper, we provide a methodology to classify students using only interaction counts from each student. We evaluate this methodology on a data set from two majors based on the Moodle platform. We run experiments consisting of training and evaluating three machine learning models (Support Vector Machines, Naive Bayes and Adaboost decision trees) under different scenarios. We provide evidences that patterns from interaction counts can provide useful information for classifying at-risk students. This classification allows the customization of the activities presented to at-risk students (automatically or through tutors) as an attempt to avoid students drop out.


2020 ◽  
Vol 10 (23) ◽  
pp. 8413
Author(s):  
Stamatis Karlos ◽  
Georgios Kostopoulos ◽  
Sotiris Kotsiantis

Multi-view learning is a machine learning app0roach aiming to exploit the knowledge retrieved from data, represented by multiple feature subsets known as views. Co-training is considered the most representative form of multi-view learning, a very effective semi-supervised classification algorithm for building highly accurate and robust predictive models. Even though it has been implemented in various scientific fields, it has not adequately used in educational data mining and learning analytics, since the hypothesis about the existence of two feature views cannot be easily implemented. Some notable studies have emerged recently dealing with semi-supervised classification tasks, such as student performance or student dropout prediction, while semi-supervised regression is uncharted territory. Therefore, the present study attempts to implement a semi-regression algorithm for predicting the grades of undergraduate students in the final exams of a one-year online course, which exploits three independent and naturally formed feature views, since they are derived from different sources. Moreover, we examine a well-established framework for interpreting the acquired results regarding their contribution to the final outcome per student/instance. To this purpose, a plethora of experiments is conducted based on data offered by the Hellenic Open University and representative machine learning algorithms. The experimental results demonstrate that the early prognosis of students at risk of failure can be accurately achieved compared to supervised models, even for a small amount of initially collected data from the first two semesters. The robustness of the applying semi-supervised regression scheme along with supervised learners and the investigation of features’ reasoning could highly benefit the educational domain.


2019 ◽  
Vol 28 (6) ◽  
pp. 52-62 ◽  
Author(s):  
A. F. Smyk ◽  
V. I. Prusova ◽  
L. L. Zimanov ◽  
A. A. Solntsev

The article addresses the problem of student dropout from technical university due to their academic failure. The modern technical universities face a decline in youth’s interest to technical and natural sciences and engineering specialties. In enrolling new students on these specialties the universities have to set lower standards and requirements for the admission. So, the students are less motivated, and the number of expelled students increases.The article presents the results of data analysis for the two leading areas of training at Moscow Automobile and Road State Technical University (MADI) – “Construction of unique buildings and structures” and “Land transport and technological means” carried out respectively at the Road-Building faculty (DSF) and the faculty of Automobile transport (ATF). The article adduces the total number of students in MADI, as well as the number of the enrolled students on the specified training directions, and the number of expelled students of all courses for the period from 1935 to 2018. The results in physics exams showed by undergraduate students were compared with the Unified State Exam scores in physics. There is a clear parallel between the Unified State Exam scores, an entrance testing and “residual knowledge” tests.The authors have also analyzed the results of an anonymous questioning among the same groups of students, which make it possible to understand their point of view on the problems of learning difficulties. The main reasons for student attrition are both the low scores of the applicants’ results of the Unified State Exam, and insufficient actions and ineffective strategies undertaken by the university to involve undergraduate students in its educational environment.


2020 ◽  
Vol 17 (3) ◽  
pp. 365-375
Author(s):  
Vasyl Kovalishyn ◽  
Diana Hodyna ◽  
Vitaliy O. Sinenko ◽  
Volodymyr Blagodatny ◽  
Ivan Semenyuta ◽  
...  

Background: Tuberculosis (TB) is an infection disease caused by Mycobacterium tuberculosis (Mtb) bacteria. One of the main causes of mortality from TB is the problem of Mtb resistance to known drugs. Objective: The goal of this work is to identify potent small molecule anti-TB agents by machine learning, synthesis and biological evaluation. Methods: The On-line Chemical Database and Modeling Environment (OCHEM) was used to build predictive machine learning models. Seven compounds were synthesized and tested in vitro for their antitubercular activity against H37Rv and resistant Mtb strains. Results: A set of predictive models was built with OCHEM based on a set of previously synthesized isoniazid (INH) derivatives containing a thiazole core and tested against Mtb. The predictive ability of the models was tested by a 5-fold cross-validation, and resulted in balanced accuracies (BA) of 61–78% for the binary classifiers. Test set validation showed that the models could be instrumental in predicting anti- TB activity with a reasonable accuracy (with BA = 67–79 %) within the applicability domain. Seven designed compounds were synthesized and demonstrated activity against both the H37Rv and multidrugresistant (MDR) Mtb strains resistant to rifampicin and isoniazid. According to the acute toxicity evaluation in Daphnia magna neonates, six compounds were classified as moderately toxic (LD50 in the range of 10−100 mg/L) and one as practically harmless (LD50 in the range of 100−1000 mg/L). Conclusion: The newly identified compounds may represent a starting point for further development of therapies against Mtb. The developed models are available online at OCHEM http://ochem.eu/article/11 1066 and can be used to virtually screen for potential compounds with anti-TB activity.


2021 ◽  
Vol 11 (15) ◽  
pp. 6787
Author(s):  
Jože M. Rožanec ◽  
Blaž Kažič ◽  
Maja Škrjanc ◽  
Blaž Fortuna ◽  
Dunja Mladenić

Demand forecasting is a crucial component of demand management, directly impacting manufacturing companies’ planning, revenues, and actors through the supply chain. We evaluate 21 baseline, statistical, and machine learning algorithms to forecast smooth and erratic demand on a real-world use case scenario. The products’ data were obtained from a European original equipment manufacturer targeting the global automotive industry market. Our research shows that global machine learning models achieve superior performance than local models. We show that forecast errors from global models can be constrained by pooling product data based on the past demand magnitude. We also propose a set of metrics and criteria for a comprehensive understanding of demand forecasting models’ performance.


2020 ◽  
Vol 10 (24) ◽  
pp. 9151
Author(s):  
Yun-Chia Liang ◽  
Yona Maimury ◽  
Angela Hsiang-Ling Chen ◽  
Josue Rodolfo Cuevas Juarez

Air, an essential natural resource, has been compromised in terms of quality by economic activities. Considerable research has been devoted to predicting instances of poor air quality, but most studies are limited by insufficient longitudinal data, making it difficult to account for seasonal and other factors. Several prediction models have been developed using an 11-year dataset collected by Taiwan’s Environmental Protection Administration (EPA). Machine learning methods, including adaptive boosting (AdaBoost), artificial neural network (ANN), random forest, stacking ensemble, and support vector machine (SVM), produce promising results for air quality index (AQI) level predictions. A series of experiments, using datasets for three different regions to obtain the best prediction performance from the stacking ensemble, AdaBoost, and random forest, found the stacking ensemble delivers consistently superior performance for R2 and RMSE, while AdaBoost provides best results for MAE.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sarv Priya ◽  
Tanya Aggarwal ◽  
Caitlin Ward ◽  
Girish Bathla ◽  
Mathews Jacob ◽  
...  

AbstractSide experiments are performed on radiomics models to improve their reproducibility. We measure the impact of myocardial masks, radiomic side experiments and data augmentation for information transfer (DAFIT) approach to differentiate patients with and without pulmonary hypertension (PH) using cardiac MRI (CMRI) derived radiomics. Feature extraction was performed from the left ventricle (LV) and right ventricle (RV) myocardial masks using CMRI in 82 patients (42 PH and 40 controls). Various side study experiments were evaluated: Original data without and with intraclass correlation (ICC) feature-filtering and DAFIT approach (without and with ICC feature-filtering). Multiple machine learning and feature selection strategies were evaluated. Primary analysis included all PH patients with subgroup analysis including PH patients with preserved LVEF (≥ 50%). For both primary and subgroup analysis, DAFIT approach without feature-filtering was the highest performer (AUC 0.957–0.958). ICC approaches showed poor performance compared to DAFIT approach. The performance of combined LV and RV masks was superior to individual masks alone. There was variation in top performing models across all approaches (AUC 0.862–0.958). DAFIT approach with features from combined LV and RV masks provide superior performance with poor performance of feature filtering approaches. Model performance varies based upon the feature selection and model combination.


2020 ◽  
Vol 48 (10) ◽  
pp. 030006052095880
Author(s):  
Jianping Wu ◽  
Sulai Liu ◽  
Xiaoming Chen ◽  
Hongfei Xu ◽  
Yaoping Tang

Objective Colorectal cancer (CRC) is the most common cancer worldwide. Patient outcomes following recurrence of CRC are very poor. Therefore, identifying the risk of CRC recurrence at an early stage would improve patient care. Accumulating evidence shows that autophagy plays an active role in tumorigenesis, recurrence, and metastasis. Methods We used machine learning algorithms and two regression models, univariable Cox proportion and least absolute shrinkage and selection operator (LASSO), to identify 26 autophagy-related genes (ARGs) related to CRC recurrence. Results By functional annotation, these ARGs were shown to be enriched in necroptosis and apoptosis pathways. Protein–protein interactions identified SQSTM1, CASP8, HSP80AB1, FADD, and MAPK9 as core genes in CRC autophagy. Of 26 ARGs, BAX and PARP1 were regarded as having the most significant predictive ability of CRC recurrence, with prediction accuracy of 71.1%. Conclusion These results shed light on prediction of CRC recurrence by ARGs. Stratification of patients into recurrence risk groups by testing ARGs would be a valuable tool for early detection of CRC recurrence.


Sign in / Sign up

Export Citation Format

Share Document