Learning to Identify At-Risk Students in Distance Education Using Interaction Counts

Student dropout is one of the main problems faced by distance learning courses. One of the major challenges for researchers is to develop methods to predict the behavior of students so that teachers and tutors are able to identify at-risk students as early as possible and provide assistance before they drop out or fail in their courses. Machine Learning models have been used to predict or classify students in these settings. However, while these models have shown promising results in several settings, they usually attain these results using attributes that are not immediately transferable to other courses or platforms. In this paper, we provide a methodology to classify students using only interaction counts from each student. We evaluate this methodology on a data set from two majors based on the Moodle platform. We run experiments consisting of training and evaluating three machine learning models (Support Vector Machines, Naive Bayes and Adaboost decision trees) under different scenarios. We provide evidences that patterns from interaction counts can provide useful information for classifying at-risk students. This classification allows the customization of the activities presented to at-risk students (automatically or through tutors) as an attempt to avoid students drop out.

Download Full-text

Early Warning System for Online STEM Learning—A Slimmer Approach Using Recurrent Neural Networks

Sustainability ◽

10.3390/su132212461 ◽

2021 ◽

Vol 13 (22) ◽

pp. 12461

Author(s):

Chih-Chang Yu ◽

Yufeng (Leon) Wu

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

At Risk ◽

At Risk Students ◽

Training Data ◽

Support Vector ◽

Learning Models ◽

Conventional Machine ◽

Machine Learning Models

While the use of deep neural networks is popular for predicting students’ learning outcomes, convolutional neural network (CNN)-based methods are used more often. Such methods require numerous features, training data, or multiple models to achieve week-by-week predictions. However, many current learning management systems (LMSs) operated by colleges cannot provide adequate information. To make the system more feasible, this article proposes a recurrent neural network (RNN)-based framework to identify at-risk students who might fail the course using only a few common learning features. RNN-based methods can be more effective than CNN-based methods in identifying at-risk students due to their ability to memorize time-series features. The data used in this study were collected from an online course that teaches artificial intelligence (AI) at a university in northern Taiwan. Common features, such as the number of logins, number of posts and number of homework assignments submitted, are considered to train the model. This study compares the prediction results of the RNN model with the following conventional machine learning models: logistic regression, support vector machines, decision trees and random forests. This work also compares the performance of the RNN model with two neural network-based models: the multi-layer perceptron (MLP) and a CNN-based model. The experimental results demonstrate that the RNN model used in this study is better than conventional machine learning models and the MLP in terms of F-score, while achieving similar performance to the CNN-based model with fewer parameters. Our study shows that the designed RNN model can identify at-risk students once one-third of the semester has passed. Some future directions are also discussed.

Download Full-text

Daily Cryptocurrency Returns Forecasting and Trading via Machine Learning

Journal of Student Research ◽

10.47611/jsrhs.v10i4.2217 ◽

2021 ◽

Vol 10 (4) ◽

Author(s):

Andrew Falcon ◽

Tianshu Lyu

Keyword(s):

Machine Learning ◽

Support Vector ◽

Learning Models ◽

Investor Attention ◽

Vector Machines ◽

Price Trends ◽

Sharpe Ratios ◽

Returns Forecasting ◽

Machine Learning Models ◽

Significant Factors

We execute a comparative analysis of machine learning models for the time-series forecasting of the sign of next-day cryptocurrency returns. We begin by compiling a proprietary dataset that encompasses a wide array of potential cryptocurrency valuation factors (price trends, liquidity, volatility, network, production, investor attention), subsequently identifying and evaluating the most significant factors. We apply eight machine learning models to the dataset, utilizing them as classifiers to predict the sign of next day price returns for the three largest cryptocurrencies by market capitalization: bitcoin, ethereum, and ripple. We show that the most significant valuation factors for cryptocurrency returns are price trend variables, seven and thirty-day reversal, to be specific. We conclude that support vector machines result in the most accurate classifications for all three cryptocurrencies. Additionally, we find that boosted models like AdaBoost and XGBoost have the poorest classification accuracy. At length, we construct a probability-based trading strategy that secures either a daily long or short position on one of the three examined cryptocurrencies. Ultimately, the strategy yields a Sharpe of 2.8 and a cumulative log return of 3.72. On average, the strategy’s log returns outperformed standalone investments in all three cryptocurrencies by a factor of 5.64, and Sharpe ratios more than threefold.

Download Full-text

Prediction of significant wave height; comparison between nested grid numerical model, and machine learning models of artificial neural networks, extreme learning and support vector machines

Engineering Applications of Computational Fluid Mechanics ◽

10.1080/19942060.2020.1773932 ◽

2020 ◽

Vol 14 (1) ◽

pp. 805-817

Author(s):

Shahaboddin Shamshirband ◽

Amir Mosavi ◽

Timon Rabczuk ◽

Narjes Nabipour ◽

Kwok-wing Chau

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Artificial Neural Networks ◽

Support Vector Machines ◽

Numerical Model ◽

Support Vector ◽

Learning Models ◽

Vector Machines ◽

Nested Grid ◽

Machine Learning Models

Download Full-text

A Method to Extract Feature Variables Contributed in Nonlinear Machine Learning Prediction

Methods of Information in Medicine ◽

10.1055/s-0040-1701615 ◽

2020 ◽

Vol 59 (01) ◽

pp. 001-008

Author(s):

Mayumi Suzuki ◽

Takuma Shibahara ◽

Yoshihiro Muragaki

Keyword(s):

Machine Learning ◽

Prediction Accuracy ◽

Deep Neural Networks ◽

Extraction Technique ◽

Support Vector ◽

Learning Models ◽

Analysis Technique ◽

Vector Machines ◽

Backward Analysis ◽

Machine Learning Models

Abstract Background Although advances in prediction accuracy have been made with new machine learning methods, such as support vector machines and deep neural networks, these methods make nonlinear machine learning models and thus lack the ability to explain the basis of their predictions. Improving their explanatory capabilities would increase the reliability of their predictions. Objective Our objective was to develop a factor analysis technique that enables the presentation of the feature variables used in making predictions, even in nonlinear machine learning models. Methods A factor analysis technique was consisted of two techniques: backward analysis technique and factor extraction technique. We developed a factor extraction technique extracted feature variables that was obtained from the posterior probability distribution of a machine learning model which was calculated by backward analysis technique. Results In evaluation, using gene expression data from prostate tumor patients and healthy subjects, the prediction accuracy of a model of deep neural networks was approximately 5% better than that of a model of support vector machines. Then the rate of concordance between the feature variables extracted in an earlier report using Jensen–Shannon divergence and the ones extracted in this report using backward elimination using Hilbert–Schmidt independence criteria was 40% for the top five variables, 40% for the top 10, and 49% for the top 100. Conclusion The results showed that models can be evaluated from different viewpoints by using different factor extraction techniques. In the future, we hope to use this technique to verify the characteristics of features extracted by factor extraction technique, and to perform clinical studies using the genes, we extracted in this experiment.

Download Full-text

Prediction of the Temperature of Liquid Aluminum and the Dissolved Hydrogen Content in Liquid Aluminum with a Machine Learning Approach

Metals ◽

10.3390/met10030330 ◽

2020 ◽

Vol 10 (3) ◽

pp. 330 ◽

Cited By ~ 1

Author(s):

Moon-Jo Kim ◽

Jong Pil Yun ◽

Ji-Ba-Reum Yang ◽

Seung-Jun Choi ◽

DongEung Kim

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Hydrogen Content ◽

Liquid Aluminum ◽

Support Vector ◽

Learning Models ◽

Data Set ◽

Window Method ◽

Dissolved Hydrogen ◽

Machine Learning Models

In aluminum casting, the temperature of liquid aluminum and the dissolved hydrogen density are crucial factors to be controlled for the purpose of both quality control of molten metal and cost efficiency. However, the empirical and numerical approaches to predict these parameters are quite complex and time consuming, and it is necessary to develop an alternative method for rapid prediction with a small number of experiments. In this study, the machine learning models were developed to predict the temperature of liquid aluminum and the dissolved hydrogen content in liquid aluminum. The obtained experimental data was preprocessed to be used for constructing the machine learning models by the sliding time window method. The machine learning models of linear regression, regression tree, Gaussian process regression (GPR), Support vector machine (SVM), and ensembles of regression trees were compared to find the model with the highest performance to predict the target properties. For the prediction of the temperature of liquid aluminum and the dissolved hydrogen content in liquid aluminum, the linear regression and GPR models were selected with the high accuracy of prediction, respectively. In comparison to the numerical modeling, the machine learning modeling had better performance, and was more effective for predicting the target property even with the limited data set when the characteristics of the data were properly considered in data preprocessing.

Download Full-text

Meta-Signer: Metagenomic Signature Identifier based on Rank Aggregation of Features

10.1101/2020.05.09.085993 ◽

2020 ◽

Author(s):

Derek Reiman ◽

Ahmed A. Metwally ◽

Jun Sun ◽

Yang Dai

Keyword(s):

Machine Learning ◽

Optimization Procedure ◽

Rank Aggregation ◽

Support Vector ◽

Learning Models ◽

Ranking Methods ◽

Vector Machines ◽

User Friendly ◽

Novel Model ◽

Machine Learning Models

AbstractBackgroundThe advance of metagenomic studies provides the opportunity to identify microbial taxa that are associated to human diseases. Multiple methods exist for the association analysis. However, the results could be inconsistent, presenting challenges in interpreting the host-microbiome interactions. To address this issue, we introduce Meta-Signer, a novel Metagenomic Signature Identifier tool based on rank aggregation of features identified from multiple machine learning models including Random Forest, Support Vector Machines, LASSO, Multi-Layer Perceptron Neural Networks, and our recently developed Convolutional Neural Network framework (PopPhy-CNN). Meta-Signer generates ranked taxa lists by training individual machine learning models over multiple training partitions and aggregates them into a single ranked list by an optimization procedure to represent the most informative and robust microbial features. Meta-Signer can rank taxa using two input forms of the data: the relative abundances of the original taxa and taxa from the populated taxonomic trees generated from the original taxa. The latter form allows the evaluation of the association of microbial features at different taxonomic levels to the disease, which is attributed to our novel model of PopPhy-CNN.ResultsWe evaluate Mega-Signer on five different human gut-microbiome datasets. We demonstrate that the features derived from Meta-Signer were more informative compared to those obtained from other available feature ranking methods. The highly ranked features are strongly supported by published literature.ConclusionMeta-Signer is capable of deriving a robust set of microbial features at multiple taxonomic levels for the prediction of host phenotype. Meta-Signer is user-friendly and customizable, allowing users to explore their datasets quickly and efficiently.

Download Full-text

Predicting at-risk students at different percentages of course length for early intervention using machine learning models

IEEE Access ◽

10.1109/access.2021.3049446 ◽

2021 ◽

pp. 1-1

Author(s):

Muhammad Adnan ◽

Asad Habib ◽

Jawad Ashraf ◽

Shafaq Mussadiq ◽

Arsalan Ali Raza ◽

...

Keyword(s):

Machine Learning ◽

At Risk ◽

Early Intervention ◽

At Risk Students ◽

Learning Models ◽

Course Length ◽

Machine Learning Models

Download Full-text

A comparison of machine learning models for predicting rehospitalisation and death after a first hospitalisation with heart failure

European Heart Journal ◽

10.1093/ehjci/ehaa946.0984 ◽

2020 ◽

Vol 41 (Supplement_2) ◽

Author(s):

Y Jones ◽

N Hillen ◽

J Friday ◽

P Pellicori ◽

S Kean ◽

...

Keyword(s):

Machine Learning ◽

Heart Failure ◽

Haemoglobin Concentration ◽

Hospital Length Of Stay ◽

Support Vector ◽

Health Board ◽

Funding Source ◽

Learning Models ◽

Data Set ◽

Machine Learning Models

Abstract Background Many machine learning models exist, including Multilayer Perceptron (MLP), Random Forest algorithm (RF), Support Vector Machine (SVM), and Gradient Boosted Machine (GBM), but their value for predicting outcome in patients with heart failure has not been compared. Aim To predict rehospitalisation (all-cause) and death (all-cause) at 1-, 3- and 12 months after discharge from a first hospitalisation for heart failure using four machine learning models. Methods The National Health Service Greater Glasgow and Clyde Health Board serves a population of ∼1.1 million. We obtained de-identified administrative data, including investigations, diagnosis and prescriptions, linked to hospital admissions and deaths for anyone with a diagnosis of vascular disease or heart failure or prescribed loop diuretics, statins or neuro-endocrine antagonists at any time between 1st January 2010 and 1st June 2018. Patients who were under 18 or had no prior hospitalisation for heart failure were excluded. Four ML algorithms using 46 variables were applied. Results Of 360,000 people who met the above criteria between 2010–2018, 6,372 had a hospitalisation for heart failure prior to 1st January 2010 and 8,304 had a first hospitalisation for heart failure thereafter. Between 2010 and 2018 there were 3,086 re-hospitalisations over 24 hours and 3,706 patients died, with 5,070 patients experiencing the composite outcome. GBM and RF consistently outperformed MLP and SVM when comparing AUC, sensitivity and specificity combined, with GBM performing best in all scenarios. Since GBM and RF are both tree-based models, and with SVM and MLP regularly reporting very poor sensitivity or specificity despite a similar AUC to the others, this suggests that SVM and MLP may be suffering from overfitting and might perform better in larger data-sets. Both GBM and RF work by ordering variables, so the final model can be used to determine the most important prediction variables. Age, number of times a blood sample was taken out of hospital, length of stay, social deprivation index and haemoglobin concentration consistently ranked amongst the most important variables. Models predicted all 1-month events better than later events. Conclusions Some, but not all, ML models applied to this data-set predicted rehospitalisation and death with great accuracy for up to 3 months after a first hospitalisation for heart failure. The models identified several important prognostic variables that are currently seldom collected in clinical research registries but perhaps should be. Funding Acknowledgement Type of funding source: Public grant(s) – National budget only. Main funding source(s): Medical Research Council

Download Full-text

Comparison of Machine Learning Models to Predict Risk of Falling in Osteoporosis Elderly

Foundations of Computing and Decision Sciences ◽

10.2478/fcds-2020-0005 ◽

2020 ◽

Vol 45 (2) ◽

pp. 66-77

Author(s):

German Cuaya-Simbro ◽

Alberto-Isaac Perez-Sanpablo ◽

Angélica Muñoz-Meléndez ◽

Ivett Quiñones Uriostegui ◽

Eduardo-F. Morales-Manzanares ◽

...

Keyword(s):

Machine Learning ◽

Fall Risk ◽

Dynamic Bayesian Networks ◽

Community Dwelling ◽

Support Vector ◽

Learning Models ◽

Risk Of Falling ◽

Gait Parameters ◽

Vector Machines ◽

Machine Learning Models

AbstractFalls are a multifactorial cause of injuries for older people. Subjects with osteoporosis are more vulnerable to falls. The focus of this study is to investigate the performance of the different machine learning models built on spatiotemporal gait parameters to predict falls particularly in subjects with osteoporosis. Spatiotemporal gait parameters and prospective registration of falls were obtained from a sample of 110 community dwelling older women with osteoporosis (age 74.3 ± 6.3) and 143 without osteoporosis (age 68.7 ± 6.8). We built four different models, Support Vector Machines, Neuronal Networks, Decision Trees, and Dynamic Bayesian Networks (DBN), for each specific set of parameters used, and compared them considering their accuracy, precision, recall and F-score to predict fall risk. The F-score value shows that DBN based models are more efficient to predict fall risk, and the best result obtained is when we use a DBN model using the experts’ variables with FSMC’s variables, mixed variables set, obtaining an accuracy of 80%, and recall of 73%. The results confirm the feasibility of computational methods to complement experts’ knowledge to predict risk of falling within a period of time as high as 12 months.

Download Full-text

Important citations identification by exploiting generative model into discriminative model

Journal of Information Science ◽

10.1177/0165551521991034 ◽

2021 ◽

pp. 016555152199103

Author(s):

Xin An ◽

Xin Sun ◽

Shuo Xu ◽

Liyuan Hao ◽

Jinghong Li

Keyword(s):

Machine Learning ◽

Topic Model ◽

Kernel Functions ◽

Support Vector ◽

Svm Classifier ◽

Learning Models ◽

Data Set ◽

Discriminative Models ◽

Influence Model ◽

Machine Learning Models

Although the citations between scientific documents are deemed as a vehicle for dissemination, inheritance and development of scientific knowledge, not all citations are well-positioned to be equal. A plethora of taxonomies and machine-learning models have been implemented to tackle the task of citation function and importance classification from qualitative aspect. Inspired by the success of kernel functions from resulting general models to promote the performance of the support vector machine (SVM) model, this work exploits the potential of combining generative and discriminative models for the task of citation importance classification. In more detail, generative features are generated from a topic model, citation influence model (CIM) and then fed to two discriminative traditional machine-learning models, SVM and RF (random forest), and a deep learning model, convolutional neural network (CNN), with other 13 traditional features to identify important citations. The extensive experiments are performed on two data sets with different characteristics. These three models perform better on the data set from one discipline. It is very possible that the patterns for important citations may vary by the fields, which disable machine-learning models to learn effectively the discriminative patterns from publications from multiple domains. The RF classifier outperforms the SVM classifier, which accords with many prior studies. However, the CNN model does not achieve the desired performance due to small-scaled data set. Furthermore, our CIM model–based features improve further the performance for identifying important citations.

Download Full-text