scholarly journals Review on Predictive Modelling Techniques for Identifying Students at Risk in University Environment

2019 ◽  
Vol 255 ◽  
pp. 03002
Author(s):  
Mat Yaacob Nik Nurul Hafzan ◽  
Deris Safaai ◽  
Mat Asiah ◽  
Mohamad Mohd Saberi ◽  
Safaai Siti Syuhaida

Predictive analytics including statistical techniques, predictive modelling, machine learning, and data mining that analyse current and historical facts to make predictions about future or otherwise unknown events. Higher education institutions nowadays are under increasing pressure to respond to national and global economic, political and social changes such as the growing need to increase the proportion of students in certain disciplines, embedding workplace graduate attributes and ensuring that the quality of learning programs are both nationally and globally relevant. However, in higher education institution, there are significant numbers of students that stop their studies before graduation, especially for undergraduate students. Problem related to stopping out student and late or not graduating student can be improved by applying analytics. Using analytics, administrators, instructors and student can predict what will happen in future. Administrator and instructors can decide suitable intervention programs for at-risk students and before students decide to leave their study. Many different machine learning techniques have been implemented for predictive modelling in the past including decision tree, k-nearest neighbour, random forest, neural network, support vector machine, naïve Bayesian and a few others. A few attempts have been made to use Bayesian network and dynamic Bayesian network as modelling techniques for predicting at- risk student but a few challenges need to be resolved. The motivation for using dynamic Bayesian network is that it is robust to incomplete data and it provides opportunities for handling changing and dynamic environment. The trends and directions of research on prediction and identifying at-risk student are developing prediction model that can provide as early as possible alert to administrators, predictive model that handle dynamic and changing environment and the model that provide real-time prediction.

Author(s):  
Garima Jaiswal ◽  
Arun Sharma ◽  
Reeti Sarup

Machine learning aims to give computers the ability to automatically learn from data. It can enable computers to make intelligent decisions by recognizing complex patterns from data. Through data mining, humongous amounts of data can be explored and analyzed to extract useful information and find interesting patterns. Classification, a supervised learning technique, can be beneficial in predicting class labels for test data by referring the already labeled classes from available training data set. In this chapter, educational data mining techniques are applied over a student dataset to analyze the multifarious factors causing alarmingly high number of dropouts. This work focuses on predicting students at risk of dropping out using five classification algorithms, namely, K-NN, naive Bayes, decision tree, random forest, and support vector machine. This can assist in improving pedagogical practices in order to enhance the performance of students predicted at risk of dropping out, thus reducing the dropout rates in higher education.


2019 ◽  
Vol 23 (1) ◽  
pp. 12-21 ◽  
Author(s):  
Shikha N. Khera ◽  
Divya

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.


Author(s):  
Afizan Azman ◽  
Mohd. Fikri Azli Abdullah ◽  
Sumendra Yogarayan ◽  
Siti Fatimah Abdul Razak ◽  
Hartini Azman ◽  
...  

<span>Cognitive distraction is one of the several contributory factors in road accidents. A number of cognitive distraction detection methods have been developed. One of the most popular methods is based on physiological measurement. Head orientation, gaze rotation, blinking and pupil diameter are among popular physiological parameters that are measured for driver cognitive distraction. In this paper, lips and eyebrows are studied. These new features on human facial expression are obvious and can be easily measured when a person is in cognitive distraction. There are several types of movement on lips and eyebrows that can be captured to indicate cognitive distraction. Correlation and classification techniques are used in this paper for performance measurement and comparison. Real time driving experiment was setup and faceAPI was installed in the car to capture driver’s facial expression. Linear regression, support vector machine (SVM), static Bayesian network (SBN) and logistic regression (LR) are used in this study. Results showed that lips and eyebrows are strongly correlated and have a significant role in improving cognitive distraction detection. Dynamic Bayesian network (DBN) with different confidence of levels was also used in this study to classify whether a driver is distracted or not.</span>


2020 ◽  
Vol 9 (2) ◽  
pp. 343 ◽  
Author(s):  
Arash Kia ◽  
Prem Timsina ◽  
Himanshu N. Joshi ◽  
Eyal Klang ◽  
Rohit R. Gupta ◽  
...  

Early detection of patients at risk for clinical deterioration is crucial for timely intervention. Traditional detection systems rely on a limited set of variables and are unable to predict the time of decline. We describe a machine learning model called MEWS++ that enables the identification of patients at risk of escalation of care or death six hours prior to the event. A retrospective single-center cohort study was conducted from July 2011 to July 2017 of adult (age > 18) inpatients excluding psychiatric, parturient, and hospice patients. Three machine learning models were trained and tested: random forest (RF), linear support vector machine, and logistic regression. We compared the models’ performance to the traditional Modified Early Warning Score (MEWS) using sensitivity, specificity, and Area Under the Curve for Receiver Operating Characteristic (AUC-ROC) and Precision-Recall curves (AUC-PR). The primary outcome was escalation of care from a floor bed to an intensive care or step-down unit, or death, within 6 h. A total of 96,645 patients with 157,984 hospital encounters and 244,343 bed movements were included. Overall rate of escalation or death was 3.4%. The RF model had the best performance with sensitivity 81.6%, specificity 75.5%, AUC-ROC of 0.85, and AUC-PR of 0.37. Compared to traditional MEWS, sensitivity increased 37%, specificity increased 11%, and AUC-ROC increased 14%. This study found that using machine learning and readily available clinical data, clinical deterioration or death can be predicted 6 h prior to the event. The model we developed can warn of patient deterioration hours before the event, thus helping make timely clinical decisions.


2016 ◽  
Vol 23 (2) ◽  
pp. 124 ◽  
Author(s):  
Douglas Detoni ◽  
Cristian Cechinel ◽  
Ricardo Araujo Matsumura ◽  
Daniela Francisco Brauner

Student dropout is one of the main problems faced by distance learning courses. One of the major challenges for researchers is to develop methods to predict the behavior of students so that teachers and tutors are able to identify at-risk students as early as possible and provide assistance before they drop out or fail in their courses. Machine Learning models have been used to predict or classify students in these settings. However, while these models have shown promising results in several settings, they usually attain these results using attributes that are not immediately transferable to other courses or platforms. In this paper, we provide a methodology to classify students using only interaction counts from each student. We evaluate this methodology on a data set from two majors based on the Moodle platform. We run experiments consisting of training and evaluating three machine learning models (Support Vector Machines, Naive Bayes and Adaboost decision trees) under different scenarios. We provide evidences that patterns from interaction counts can provide useful information for classifying at-risk students. This classification allows the customization of the activities presented to at-risk students (automatically or through tutors) as an attempt to avoid students drop out.


Author(s):  
Helper Zhou ◽  
Victor Gumbo

The emergence of machine learning algorithms presents the opportunity for a variety of stakeholders to perform advanced predictive analytics and to make informed decisions. However, to date there have been few studies in developing countries that evaluate the performance of such algorithms—with the result that pertinent stakeholders lack an informed basis for selecting appropriate techniques for modelling tasks. This study aims to address this gap by evaluating the performance of three machine learning techniques: ordinary least squares (OLS), least absolute shrinkage and selection operator (LASSO), and artificial neural networks (ANNs). These techniques are evaluated in respect of their ability to perform predictive modelling of the sales performance of small, medium and micro enterprises (SMMEs) engaged in manufacturing. The evaluation finds that the ANNs algorithm’s performance is far superior to that of the other two techniques, OLS and LASSO, in predicting the SMMEs’ sales performance.


2019 ◽  
Vol 8 (2) ◽  
pp. 1211-1216

Healthcare is a major sector where there is demand for predictive analytics using machine learning. Healthcare will be largely benefited when useful knowledge can be transferred into timely action to manage hazardous situations in medical sector. Chronic kidney disease is a life threatening disease which can be prevented with timely right predictions and appropriate precautionary measures. In this paper, various machine learning classifiers are applied on the medical dataset to develop a prediction model to tell if a person's present medical condition can lead to the chronic stage of the disease in future. The higher prediction accuracy and decreased build time is obtained with reduced feature set attributes by applying Best First and Greedy stepwise algorithm combined with different classification techniques like Naive Bayes ,Support vector machine (SVM), J48, Random Forest, and K Nearest Neighbor(KNN).


2021 ◽  
Author(s):  
Steven F. Lehrer ◽  
Tian Xie

There exists significant hype regarding how much machine learning and incorporating social media data can improve forecast accuracy in commercial applications. To assess if the hype is warranted, we use data from the film industry in simulation experiments that contrast econometric approaches with tools from the predictive analytics literature. Further, we propose new strategies that combine elements from each literature in a bid to capture richer patterns of heterogeneity in the underlying relationship governing revenue. Our results demonstrate the importance of social media data and value from hybrid strategies that combine econometrics and machine learning when conducting forecasts with new big data sources. Specifically, although both least squares support vector regression and recursive partitioning strategies greatly outperform dimension reduction strategies and traditional econometrics approaches in forecast accuracy, there are further significant gains from using hybrid approaches. Further, Monte Carlo experiments demonstrate that these benefits arise from the significant heterogeneity in how social media measures and other film characteristics influence box office outcomes. This paper was accepted by J. George Shanthikumar, big data analytics.


Author(s):  
Nick Dix ◽  
Andrew Lail ◽  
Matt Birnbaum ◽  
Joseph Paris

Institutions of higher education often use the term “at-risk” to label undergraduate students who have a higher likelihood of not persisting. However, it is not clear how the use of this label impacts the perspectives of the higher education professionals who serve and support these students. Our qualitative study explores the descriptions and understandings of higher education professionals who serve and support at-risk students. We use thematic analysis (Braun & Clark, 2006) to interpret our data and develop our themes. These themes include conflicting views of the “at-risk” definition, attempts to normalize at-risk, fostering relationships, and “at-promise.”


2019 ◽  
Author(s):  
Safwan Wshah ◽  
Christian Skalka ◽  
Matthew Price

BACKGROUND A majority of adults in the United States are exposed to a potentially traumatic event but only a handful go on to develop impairing mental health conditions such as posttraumatic stress disorder (PTSD). OBJECTIVE Identifying those at elevated risk shortly after trauma exposure is a clinical challenge. The aim of this study was to develop computational methods to more effectively identify at-risk patients and, thereby, support better early interventions. METHODS We proposed machine learning (ML) induction of models to automatically predict elevated PTSD symptoms in patients 1 month after a trauma, using self-reported symptoms from data collected via smartphones. RESULTS We show that an ensemble model accurately predicts elevated PTSD symptoms, with an area under the curve (AUC) of .85, using a bag of support vector machines, naive Bayes, logistic regression, and random forest algorithms. Furthermore, we show that only 7 self-reported items (features) are needed to obtain this AUC. Most importantly, we show that accurate predictions can be made 10 to 20 days posttrauma. CONCLUSIONS These results suggest that simple smartphone-based patient surveys, coupled with automated analysis using ML-trained models, can identify those at risk for developing elevated PTSD symptoms and thus target them for early intervention.


Sign in / Sign up

Export Citation Format

Share Document