Predicting at-risk university students based on their e-book reading behaviours by using machine learning classifiers

Providing early predictions of academic performance is necessary for identifying at-risk students and subsequently providing them with timely intervention for critical factors affecting their academic performance. Although e-book systems are often used to provide students with teaching/learning materials in university courses, seldom has research made the early prediction based on their online reading behaviours by implementing machine learning classifiers. This study explored to what extent university students’ academic achievement can be predicted, based on their reading behaviours in an e-book supported course, using the classifiers. It further investigated which of the features extracted from the reading logs influence the predictions. The participants were 100 first-year undergraduates enrolled in a compulsory course at a university in Taiwan. The results suggest that logistic regression supports vector classification, decision trees, and random forests, and neural networks achieved moderate prediction performance with accuracy, precision, and recall metrics. The Bayes classifier identified almost all at-risk students. Additionally, student online reading behaviours affecting the prediction models included: turning pages, going back to previous pages and jumping to other pages, adding/deleting markers, and editing/removing memos. These behaviours were significantly positively correlated to academic achievement and should be encouraged during courses supported by e-books. Implications for practice or policy: For identifying at-risk students, educators could prioritise using Gaussian naïve Bayes in an e-book supported course, as it shows almost perfect recall performance. Assessors could give priority to logistic regression and neural networks in this context because they have stable achievement prediction performance with different evaluation metrics. The prediction models are strongly affected by student online reading behaviours, in particular by locating/returning to relevant pages and modifying markers.

Download Full-text

Early Warning System for Online STEM Learning—A Slimmer Approach Using Recurrent Neural Networks

Sustainability ◽

10.3390/su132212461 ◽

2021 ◽

Vol 13 (22) ◽

pp. 12461

Author(s):

Chih-Chang Yu ◽

Yufeng (Leon) Wu

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

At Risk ◽

At Risk Students ◽

Training Data ◽

Support Vector ◽

Learning Models ◽

Conventional Machine ◽

Machine Learning Models

While the use of deep neural networks is popular for predicting students’ learning outcomes, convolutional neural network (CNN)-based methods are used more often. Such methods require numerous features, training data, or multiple models to achieve week-by-week predictions. However, many current learning management systems (LMSs) operated by colleges cannot provide adequate information. To make the system more feasible, this article proposes a recurrent neural network (RNN)-based framework to identify at-risk students who might fail the course using only a few common learning features. RNN-based methods can be more effective than CNN-based methods in identifying at-risk students due to their ability to memorize time-series features. The data used in this study were collected from an online course that teaches artificial intelligence (AI) at a university in northern Taiwan. Common features, such as the number of logins, number of posts and number of homework assignments submitted, are considered to train the model. This study compares the prediction results of the RNN model with the following conventional machine learning models: logistic regression, support vector machines, decision trees and random forests. This work also compares the performance of the RNN model with two neural network-based models: the multi-layer perceptron (MLP) and a CNN-based model. The experimental results demonstrate that the RNN model used in this study is better than conventional machine learning models and the MLP in terms of F-score, while achieving similar performance to the CNN-based model with fewer parameters. Our study shows that the designed RNN model can identify at-risk students once one-third of the semester has passed. Some future directions are also discussed.

Download Full-text

Predictors of remission from body dysmorphic disorder after internet-delivered cognitive behavior therapy: a machine learning approach

10.31234/osf.io/eqcdx ◽

2019 ◽

Author(s):

Oskar Flygare ◽

Jesper Enander ◽

Erik Andersson ◽

Brjánn Ljótsson ◽

Volen Z Ivanov ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Clinical Utility ◽

Body Dysmorphic Disorder ◽

Prediction Models ◽

Behavioral Therapy ◽

Learning Approach ◽

Learning Approaches ◽

Machine Learning Approach

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.

Download Full-text

Development of Machine Learning Models to Predict Probabilities and Types of Stroke at Prehospital Stage: the Japan Urgent Stroke Triage Score Using Machine Learning (JUST-ML)

Translational Stroke Research ◽

10.1007/s12975-021-00937-x ◽

2021 ◽

Author(s):

Kazutaka Uchida ◽

Junichi Kouno ◽

Shinichi Yoshimura ◽

Norito Kinjo ◽

Fumihiro Sakakibara ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Prediction Models ◽

Characteristic Curve ◽

Predictive Performance ◽

Vessel Occlusion ◽

Predictive Values ◽

Training Cohort ◽

Sensitivity Specificity

AbstractIn conjunction with recent advancements in machine learning (ML), such technologies have been applied in various fields owing to their high predictive performance. We tried to develop prehospital stroke scale with ML. We conducted multi-center retrospective and prospective cohort study. The training cohort had eight centers in Japan from June 2015 to March 2018, and the test cohort had 13 centers from April 2019 to March 2020. We use the three different ML algorithms (logistic regression, random forests, XGBoost) to develop models. Main outcomes were large vessel occlusion (LVO), intracranial hemorrhage (ICH), subarachnoid hemorrhage (SAH), and cerebral infarction (CI) other than LVO. The predictive abilities were validated in the test cohort with accuracy, positive predictive value, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and F score. The training cohort included 3178 patients with 337 LVO, 487 ICH, 131 SAH, and 676 CI cases, and the test cohort included 3127 patients with 183 LVO, 372 ICH, 90 SAH, and 577 CI cases. The overall accuracies were 0.65, and the positive predictive values, sensitivities, specificities, AUCs, and F scores were stable in the test cohort. The classification abilities were also fair for all ML models. The AUCs for LVO of logistic regression, random forests, and XGBoost were 0.89, 0.89, and 0.88, respectively, in the test cohort, and these values were higher than the previously reported prediction models for LVO. The ML models developed to predict the probability and types of stroke at the prehospital stage had superior predictive abilities.

Download Full-text

Prediction of Hourly Effect of Land Use on Crime

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8010016 ◽

2018 ◽

Vol 8 (1) ◽

pp. 16 ◽

Cited By ~ 4

Author(s):

Irina Matijosaitiene ◽

Peng Zhao ◽

Sylvain Jaume ◽

Joseph Gilkey Jr

Keyword(s):

Machine Learning ◽

Land Use ◽

Logistic Regression ◽

Hot Spots ◽

Urban Areas ◽

Prediction Models ◽

Hot Spot ◽

Time Range ◽

Land Uses ◽

Police Departments

Predicting the exact urban places where crime is most likely to occur is one of the greatest interests for Police Departments. Therefore, the goal of the research presented in this paper is to identify specific urban areas where a crime could happen in Manhattan, NY for every hour of a day. The outputs from this research are the following: (i) predicted land uses that generates the top three most committed crimes in Manhattan, by using machine learning (random forest and logistic regression), (ii) identifying the exact hours when most of the assaults are committed, together with hot spots during these hours, by applying time series and hot spot analysis, (iii) built hourly prediction models for assaults based on the land use, by deploying logistic regression. Assault, as a physical attack on someone, according to criminal law, is identified as the third most committed crime in Manhattan. Land use (residential, commercial, recreational, mixed use etc.) is assigned to every area or lot in Manhattan, determining the actual use or activities within each particular lot. While plotting assaults on the map for every hour, this investigation has identified that the hot spots where assaults occur were ‘moving’ and not confined to specific lots within Manhattan. This raises a number of questions: Why are hot spots of assaults not static in an urban environment? What makes them ‘move’—is it a particular urban pattern? Is the ‘movement’ of hot spots related to human activities during the day and night? Answering these questions helps to build the initial frame for assault prediction within every hour of a day. Knowing a specific land use vulnerability to assault during each exact hour can assist the police departments to allocate forces during those hours in risky areas. For the analysis, the study is using two datasets: a crime dataset with geographical locations of crime, date and time, and a geographic dataset about land uses with land use codes for every lot, each obtained from open databases. The study joins two datasets based on the spatial location and classifies data into 24 classes, based on the time range when the assault occurred. Machine learning methods reveal the effect of land uses on larceny, harassment and assault, the three most committed crimes in Manhattan. Finally, logistic regression provides hourly prediction models and unveils the type of land use where assaults could occur during each hour for both day and night.

Download Full-text

45 Application of Machine Learning Models to Thermal Burn Patient Outcome Predictions in the Aftermath of a Nuclear Event

Journal of Burn Care & Research ◽

10.1093/jbcr/irab032.049 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

pp. S33-S34

Author(s):

Morgan A Taylor ◽

Randy D Kearns ◽

Jeffrey E Carter ◽

Mark H Ebell ◽

Curt A Harris

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Length Of Stay ◽

Regression Models ◽

Large Scale ◽

Prediction Models ◽

Burn Patients ◽

Thermal Burn ◽

Logistic Regression Models ◽

Burn Patient

Abstract Introduction A nuclear disaster would generate an unprecedented volume of thermal burn patients from the explosion and subsequent mass fires (Figure 1). Prediction models characterizing outcomes for these patients may better equip healthcare providers and other responders to manage large scale nuclear events. Logistic regression models have traditionally been employed to develop prediction scores for mortality of all burn patients. However, other healthcare disciplines have increasingly transitioned to machine learning (ML) models, which are automatically generated and continually improved, potentially increasing predictive accuracy. Preliminary research suggests ML models can predict burn patient mortality more accurately than commonly used prediction scores. The purpose of this study is to examine the efficacy of various ML methods in assessing thermal burn patient mortality and length of stay in burn centers. Methods This retrospective study identified patients with fire/flame burn etiologies in the National Burn Repository between the years 2009 – 2018. Patients were randomly partitioned into a 67%/33% split for training and validation. A random forest model (RF) and an artificial neural network (ANN) were then constructed for each outcome, mortality and length of stay. These models were then compared to logistic regression models and previously developed prediction tools with similar outcomes using a combination of classification and regression metrics. Results During the study period, 82,404 burn patients with a thermal etiology were identified in the analysis. The ANN models will likely tend to overfit the data, which can be resolved by ending the model training early or adding additional regularization parameters. Further exploration of the advantages and limitations of these models is forthcoming as metric analyses become available. Conclusions In this proof-of-concept study, we anticipate that at least one ML model will predict the targeted outcomes of thermal burn patient mortality and length of stay as judged by the fidelity with which it matches the logistic regression analysis. These advancements can then help disaster preparedness programs consider resource limitations during catastrophic incidents resulting in burn injuries.

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

An Optimal Model of Financial Distress Prediction: A Comparative Study between Neural Networks and Logistic Regression

Risks ◽

10.3390/risks9110200 ◽

2021 ◽

Vol 9 (11) ◽

pp. 200

Author(s):

Youssef Zizi ◽

Amine Jamali-Alaoui ◽

Badreddine El Goumi ◽

Mohamed Oudgou ◽

Abdeslam El Moudden

Keyword(s):

Neural Networks ◽

Logistic Regression ◽

Financial Distress ◽

Prediction Models ◽

Type I ◽

Return On Assets ◽

Logistic Regression Models ◽

The Face ◽

One Year ◽

Type Ii Errors

In the face of rising defaults and limited studies on the prediction of financial distress in Morocco, this article aims to determine the most relevant predictors of financial distress and identify its optimal prediction models in a normal Moroccan economic context over two years. To achieve these objectives, logistic regression and neural networks are used based on financial ratios selected by lasso and stepwise techniques. Our empirical results highlight the significant role of predictors, namely interest to sales and return on assets in predicting financial distress. The results show that logistic regression models obtained by stepwise selection outperform the other models with an overall accuracy of 93.33% two years before financial distress and 95.00% one year prior to financial distress. Results also show that our models classify distressed SMEs better than healthy SMEs with type I errors lower than type II errors.

Download Full-text

Machine Learning-based in-hospital Mortality Prediction Models for Patients With Acute Coronary Syndrome

10.21203/rs.3.rs-134944/v1 ◽

2020 ◽

Author(s):

Jun Ke ◽

Yiwei Chen ◽

Xiaoping Wang ◽

Zhiyong Wu ◽

qiongyao Zhang ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Hospital Mortality ◽

Operating Characteristic ◽

Prediction Models ◽

Characteristic Curve ◽

Multivariate Logistic Regression Analysis ◽

Hdl Cholesterol ◽

Coronary Syndrome

Abstract BackgroundThe purpose of this study is to identify the risk factors of in-hospital mortality in patients with acute coronary syndrome (ACS) and to evaluate the performance of traditional regression and machine learning prediction models.MethodsThe data of ACS patients who entered the emergency department of Fujian Provincial Hospital from January 1, 2017 to March 31, 2020 for chest pain were retrospectively collected. The study used univariate and multivariate logistic regression analysis to identify risk factors for in-hospital mortality of ACS patients. The traditional regression and machine learning algorithms were used to develop predictive models, and the sensitivity, specificity, and receiver operating characteristic curve were used to evaluate the performance of each model.ResultsA total of 7810 ACS patients were included in the study, and the in-hospital mortality rate was 1.75%. Multivariate logistic regression analysis found that age and levels of D-dimer, cardiac troponin I, N-terminal pro-B-type natriuretic peptide (NT-proBNP), lactate dehydrogenase (LDH), high-density lipoprotein (HDL) cholesterol, and calcium channel blockers were independent predictors of in-hospital mortality. The study found that the area under the receiver operating characteristic curve of the models developed by logistic regression, gradient boosting decision tree (GBDT), random forest, and support vector machine (SVM) for predicting the risk of in-hospital mortality were 0.963, 0.960, 0.963, and 0.959, respectively. Feature importance evaluation found that NT-proBNP, LDH, and HDL cholesterol were top three variables that contribute the most to the prediction performance of the GBDT model and random forest model.ConclusionsThe predictive model developed using logistic regression, GBDT, random forest, and SVM algorithms can be used to predict the risk of in-hospital death of ACS patients. Based on our findings, we recommend that clinicians focus on monitoring the changes of NT-proBNP, LDH, and HDL cholesterol, as this may improve the clinical outcomes of ACS patients.

Download Full-text

Survival prediction models since liver transplantation - comparisons between Cox models and machine learning techniques

10.21203/rs.3.rs-22670/v3 ◽

2020 ◽

Author(s):

Georgios Kantidakis ◽

Hein Putter ◽

Carlo Lancia ◽

Jacob de Boer ◽

Andries E Braat ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Liver Transplantation ◽

Prediction Models ◽

Machine Learning Techniques ◽

Brier Score ◽

Survival Prediction ◽

Cox Models ◽

Learning Techniques ◽

Random Survival Forest

Abstract Background: Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians.Methods: In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques.Results: Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years.Conclusion: In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables.

Download Full-text

Machine Learning Dimensionality Reduction Showed Marginal Performance Benefit Over Stepwise Regression for Risk Stratification of Chest Pain Patients in the Emergency Department

10.21203/rs.3.rs-43703/v1 ◽

2020 ◽

Author(s):

Nan Liu ◽

Marcel Lucas Chee ◽

Zhi Xiong Koh ◽

Su Li Leow ◽

Andrew Fu Wah Ho ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Chest Pain ◽

Risk Stratification ◽

Dimensionality Reduction ◽

Prediction Models ◽

Stepwise Logistic Regression ◽

Stepwise Variable Selection ◽

Reduction Methods ◽

Pain Patients

Abstract Background: Chest pain is among the most common presenting complaints in the emergency department (ED). Swift and accurate risk stratification of chest pain patients in the ED may improve patient outcomes and reduce unnecessary costs. Traditional logistic regression with stepwise variable selection has been used to build risk prediction models for ED chest pain patients. In this study, we aimed to investigate if machine learning dimensionality reduction methods can achieve superior performance than the stepwise approach in deriving risk stratification models. Methods: A retrospective analysis was conducted on the data of patients >20 years old who presented to the ED of Singapore General Hospital with chest pain between September 2010 and July 2015. Variables used included demographics, medical history, laboratory findings, heart rate variability (HRV), and HRnV parameters calculated from five to six-minute electrocardiograms (ECGs). The primary outcome was 30-day major adverse cardiac events (MACE), which included death, acute myocardial infarction, and revascularization. Candidate variables identified using univariable analysis were then used to generate the stepwise logistic regression model and eight machine learning dimensionality reduction prediction models. A separate set of models was derived by excluding troponin. Receiver operating characteristic (ROC) and calibration analysis was used to compare model performance.Results: 795 patients were included in the analysis, of which 247 (31%) met the primary outcome of 30-day MACE. Patients with MACE were older and more likely to be male. All eight dimensionality reduction methods marginally but non-significantly outperformed stepwise variable selection; The multidimensional scaling algorithm performed the best with an area under the curve (AUC) of 0.901. All HRnV-based models generated in this study outperformed several existing clinical scores in ROC analysis.Conclusions: HRnV-based models using stepwise logistic regression performed better than existing chest pain scores for predicting MACE, with only marginal improvements using machine learning dimensionality reduction. Moreover, traditional stepwise approach benefits from model transparency and interpretability; in comparison, machine learning dimensionality reduction models are black boxes, making them difficult to explain in clinical practice.

Download Full-text