Predicting Student Dropout in Self-Paced MOOC Course Using Random Forest Model

A significant problem in Massive Open Online Courses (MOOCs) is the high rate of student dropout in these courses. An effective student dropout prediction model of MOOC courses can identify the factors responsible and provide insight on how to initiate interventions to increase student success in a MOOC. Different features and various approaches are available for the prediction of student dropout in MOOC courses. In this paper, the data derived from a self-paced math course, College Algebra and Problem Solving, offered on the MOOC platform Open edX partnering with Arizona State University (ASU) from 2016 to 2020 is considered. This paper presents a model to predict the dropout of students from a MOOC course given a set of features engineered from student daily learning progress. The Random Forest Model technique in Machine Learning (ML) is used in the prediction and is evaluated using validation metrics including accuracy, precision, recall, F1-score, Area Under the Curve (AUC), and Receiver Operating Characteristic (ROC) curve. The model developed can predict the dropout or continuation of students on any given day in the MOOC course with an accuracy of 87.5%, AUC of 94.5%, precision of 88%, recall of 87.5%, and F1-score of 87.5%, respectively. The contributing features and interactions were explained using Shapely values for the prediction of the model.

Download Full-text

Vaginal Microbiome-Based Bacterial Signatures for Predicting the Severity of Cervical Intraepithelial Neoplasia

Diagnostics ◽

10.3390/diagnostics10121013 ◽

2020 ◽

Vol 10 (12) ◽

pp. 1013

Author(s):

Yoon Hee Lee ◽

Gi-Ung Kang ◽

Se Young Jeon ◽

Setu Bazie Tagele ◽

Huy Quang Pham ◽

...

Keyword(s):

Random Forest ◽

Bacterial Species ◽

Area Under The Curve ◽

Intraepithelial Neoplasia ◽

Vaginal Swab ◽

Random Forest Model ◽

Rrna Gene ◽

Vaginal Microbiome ◽

Dominant Type ◽

Forest Model

Although emerging evidence revealed that the gut microbiome served as a tool and as biomarkers for predicting and detecting specific cancer or illness, it is yet unknown if vaginal microbiome-derived bacterial markers can be used as a predictive model to predict the severity of CIN. In this study, we sequenced V3 region of 16S rRNA gene on vaginal swab samples from 66 participants (24 CIN 1−, 42 CIN 2+ patients) and investigated the taxonomic composition. The vaginal microbial diversity was not significantly different between the CIN 1− and CIN 2+ groups. However, we observed Lactobacillus amylovorus dominant type (16.7%), which does not belong to conventional community state type (CST). Moreover, a minimal set of 33 bacterial species was identified to maximally differentiate CIN 2+ from CIN 1− in a random forest model, which can distinguish CIN 2+ from CIN 1− (area under the curve (AUC) = 0.952). Among the 33 bacterial species, Lactobacillus iners was selected as the most impactful predictor in our model. This finding suggests that the random forest model is able to predict the severity of CIN and vaginal microbiome may play a role as biomarker.

Download Full-text

Accurate prediction of birth implementing a statistical model through the determination of steroid hormones in saliva

Scientific Reports ◽

10.1038/s41598-021-84924-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Silvia Alonso ◽

Sara Cáceres ◽

Daniel Vélez ◽

Luis Sanz ◽

Gema Silvan ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Area Under The Curve ◽

Machine Learning Algorithms ◽

Random Forest Model ◽

Forest Model ◽

Spontaneous Labour ◽

Hormonal Mechanism ◽

First Time ◽

Estrone Sulphate

AbstractSteroidal hormone interaction in pregnancy is crucial for adequate fetal evolution and preparation for childbirth and extrauterine life. Estrone sulphate, estriol, progesterone and cortisol play important roles in the initiation of labour mechanism at the start of contractions and cervical effacement. However, their interaction remains uncertain. Although several studies regarding the hormonal mechanism of labour have been reported, the prediction of date of birth remains a challenge. In this study, we present for the first time machine learning algorithms for the prediction of whether spontaneous labour will occur from week 37 onwards. Estrone sulphate, estriol, progesterone and cortisol were analysed in saliva samples collected from 106 pregnant women since week 34 by enzyme-immunoassay (EIA) techniques. We compared a random forest model with a traditional logistic regression over a dataset constructed with the values observed of these measures. We observed that the results, evaluated in terms of accuracy and area under the curve (AUC) metrics, are sensibly better in the random forest model. For this reason, we consider that machine learning methods contribute in an important way to the obstetric practice.

Download Full-text

Spatial modeling of gully head erosion on the Loess Plateau using a certainty factor and random forest model

The Science of The Total Environment ◽

10.1016/j.scitotenv.2021.147040 ◽

2021 ◽

Vol 783 ◽

pp. 147040

Author(s):

Chengcheng Jiang ◽

Wen Fan ◽

Ningyu Yu ◽

Enlong Liu

Keyword(s):

Random Forest ◽

Loess Plateau ◽

Spatial Modeling ◽

Random Forest Model ◽

Certainty Factor ◽

The Loess Plateau ◽

Forest Model ◽

Gully Head

Download Full-text

Clinical trial registries as Scientometric data: A novel solution for linking and deduplicating clinical trials from multiple registries

Scientometrics ◽

10.1007/s11192-021-04111-w ◽

2021 ◽

Author(s):

Christian Thiele ◽

Gerrit Hirschfeld ◽

Ruth von Brachel

Keyword(s):

Clinical Trials ◽

Random Forest ◽

Random Forest Model ◽

Scientometric Analysis ◽

Data Set ◽

The Public ◽

Forest Model ◽

Clinical Trial Registries ◽

Multiple Primary ◽

Clinical Trials Registry

AbstractRegistries of clinical trials are a potential source for scientometric analysis of medical research and serve important functions for the research community and the public at large. Clinical trials that recruit patients in Germany are usually registered in the German Clinical Trials Register (DRKS) or in international registries such as ClinicalTrials.gov. Furthermore, the International Clinical Trials Registry Platform (ICTRP) aggregates trials from multiple primary registries. We queried the DRKS, ClinicalTrials.gov, and the ICTRP for trials with a recruiting location in Germany. Trials that were registered in multiple registries were linked using the primary and secondary identifiers and a Random Forest model based on various similarity metrics. We identified 35,912 trials that were conducted in Germany. The majority of the trials was registered in multiple databases. 32,106 trials were linked using primary IDs, 26 were linked using a Random Forest model, and 10,537 internal duplicates on ICTRP were identified using the Random Forest model after finding pairs with matching primary or secondary IDs. In cross-validation, the Random Forest increased the F1-score from 96.4% to 97.1% compared to a linkage based solely on secondary IDs on a manually labelled data set. 28% of all trials were registered in the German DRKS. 54% of the trials on ClinicalTrials.gov, 43% of the trials on the DRKS and 56% of the trials on the ICTRP were pre-registered. The ratio of pre-registered studies and the ratio of studies that are registered in the DRKS increased over time.

Download Full-text

Discrimination of the geographic origins and varieties of wine grapes using high-throughput sequencing assisted by a random forest model

LWT ◽

10.1016/j.lwt.2021.111333 ◽

2021 ◽

pp. 111333

Author(s):

Feifei Gao ◽

Guihua Zeng ◽

Bin Wang ◽

Jing Xiao ◽

Liang Zhang ◽

...

Keyword(s):

Random Forest ◽

High Throughput ◽

High Throughput Sequencing ◽

Random Forest Model ◽

Wine Grapes ◽

Forest Model ◽

Geographic Origins

Download Full-text

Multi-Scenario Prediction of Intra-Urban Land Use Change Using a Cellular Automata-Random Forest Model

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10080503 ◽

2021 ◽

Vol 10 (8) ◽

pp. 503

Author(s):

Hang Liu ◽

Riken Homma ◽

Qiang Liu ◽

Congying Fang

Keyword(s):

Land Use ◽

Random Forest ◽

Cellular Automata ◽

Land Use Change ◽

Urban Land ◽

Urban Land Use ◽

Random Forest Model ◽

Growth Trend ◽

Related Factors ◽

Forest Model

The simulation of future land use can provide decision support for urban planners and decision makers, which is important for sustainable urban development. Using a cellular automata-random forest model, we considered two scenarios to predict intra-land use changes in Kumamoto City from 2018 to 2030: an unconstrained development scenario, and a planning-constrained development scenario that considers disaster-related factors. The random forest was used to calculate the transition probabilities and the importance of driving factors, and cellular automata were used for future land use prediction. The results show that disaster-related factors greatly influence land vacancy, while urban planning factors are more important for medium high-rise residential, commercial, and public facilities. Under the unconstrained development scenario, urban land use tends towards spatially disordered growth in the total amount of steady growth, with the largest increase in low-rise residential areas. Under the planning-constrained development scenario that considers disaster-related factors, the urban land area will continue to grow, albeit slowly and with a compact growth trend. This study provides planners with information on the relevant trends in different scenarios of land use change in Kumamoto City. Furthermore, it provides a reference for Kumamoto City’s future post-disaster recovery and reconstruction planning.

Download Full-text

Estimates of daily ground-level NO2 concentrations in China based on Random Forest model integrated K-means

Advances in Applied Energy ◽

10.1016/j.adapen.2021.100017 ◽

2021 ◽

pp. 100017

Author(s):

Xinyu Dou ◽

Cuijuan Liao ◽

Hengqi Wang ◽

Ying Huang ◽

Ying Tu ◽

...

Keyword(s):

Random Forest ◽

Ground Level ◽

Random Forest Model ◽

Forest Model

Download Full-text

Improving satellite-based estimation of surface ozone across China during 2008–2019 using iterative random forest model and high-resolution grid meteorological data

Sustainable Cities and Society ◽

10.1016/j.scs.2021.102807 ◽

2021 ◽

pp. 102807

Author(s):

Gongbo Chen ◽

Jiang Chen ◽

Guang-hui Dong ◽

Bo-yi Yang ◽

Yisi Liu ◽

...

Keyword(s):

High Resolution ◽

Random Forest ◽

Meteorological Data ◽

Surface Ozone ◽

Random Forest Model ◽

Forest Model

Download Full-text

Identification of candidate biomarkers of liver hydatid disease via microarray profiling, bioinformatics analysis, and machine learning

Journal of International Medical Research ◽

10.1177/0300060521993980 ◽

2021 ◽

Vol 49 (3) ◽

pp. 030006052199398

Author(s):

Jinwu Peng ◽

Zhili Duan ◽

Yamin Guo ◽

Xiaona Li ◽

Xiaoqin Luo ◽

...

Keyword(s):

Random Forest ◽

Hydatid Disease ◽

Characteristic Curve ◽

Receiver Operator Characteristic Curve ◽

Random Forest Model ◽

Hepatic Hydatid Disease ◽

Forest Model ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Microarray Profiling

Objectives Liver echinococcosis is a severe zoonotic disease caused by Echinococcus (tapeworm) infection, which is epidemic in the Qinghai region of China. Here, we aimed to explore biomarkers and establish a predictive model for the diagnosis of liver echinococcosis. Methods Microarray profiling followed by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analysis was performed in liver tissue from patients with liver hydatid disease and from healthy controls from the Qinghai region of China. A protein–protein interaction (PPI) network and random forest model were established to identify potential biomarkers and predict the occurrence of liver echinococcosis, respectively. Results Microarray profiling identified 1152 differentially expressed genes (DEGs), including 936 upregulated genes and 216 downregulated genes. Several previously unreported biological processes and signaling pathways were identified. The FCGR2B and CTLA4 proteins were identified by the PPI networks and random forest model. The random forest model based on FCGR2B and CTLA4 reliably predicted the occurrence of liver hydatid disease, with an area under the receiver operator characteristic curve of 0.921. Conclusion Our findings give new insight into gene expression in patients with liver echinococcosis from the Qinghai region of China, improving our understanding of hepatic hydatid disease.

Download Full-text

MapReduce-Based Improved Random Forest Model for Massive Educational Data Processing and Classification

Mobile Networks and Applications ◽

10.1007/s11036-020-01699-w ◽

2021 ◽

Author(s):

Wei Xu ◽

Vinh Truong Hoang

Keyword(s):

Random Forest ◽

Data Processing ◽

Random Forest Model ◽

Forest Model

Download Full-text