Developing a Process for the Analysis of User Journeys and the Prediction of Dropout in Digital Health Interventions: Machine Learning Approach (Preprint)

Mapping Intimacies ◽

10.2196/preprints.17738 ◽

2020 ◽

Author(s):

Vincent Bremer ◽

Philip I Chow ◽

Burkhardt Funk ◽

Frances P Thorndike ◽

Lee M Ritterband

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Behavioral Therapy ◽

Digital Health ◽

Area Under The Curve ◽

Prediction Performance ◽

Health Interventions ◽

Drop Out ◽

Support Vector ◽

Boosted Decision Trees

BACKGROUND User dropout is a widespread concern in the delivery and evaluation of digital (ie, web and mobile apps) health interventions. Researchers have yet to fully realize the potential of the large amount of data generated by these technology-based programs. Of particular interest is the ability to predict who will drop out of an intervention. This may be possible through the analysis of user journey data—self-reported as well as system-generated data—produced by the path (or journey) an individual takes to navigate through a digital health intervention. OBJECTIVE The purpose of this study is to provide a step-by-step process for the analysis of user journey data and eventually to predict dropout in the context of digital health interventions. The process is applied to data from an internet-based intervention for insomnia as a way to illustrate its use. The completion of the program is contingent upon completing 7 sequential cores, which include an initial tutorial core. Dropout is defined as not completing the seventh core. METHODS Steps of user journey analysis, including data transformation, feature engineering, and statistical model analysis and evaluation, are presented. Dropouts were predicted based on data from 151 participants from a fully automated web-based program (Sleep Healthy Using the Internet) that delivers cognitive behavioral therapy for insomnia. Logistic regression with L1 and L2 regularization, support vector machines, and boosted decision trees were used and evaluated based on their predictive performance. Relevant features from the data are reported that predict user dropout. RESULTS Accuracy of predicting dropout (area under the curve [AUC] values) varied depending on the program core and the machine learning technique. After model evaluation, boosted decision trees achieved AUC values ranging between 0.6 and 0.9. Additional handcrafted features, including time to complete certain steps of the intervention, time to get out of bed, and days since the last interaction with the system, contributed to the prediction performance. CONCLUSIONS The results support the feasibility and potential of analyzing user journey data to predict dropout. Theory-driven handcrafted features increased the prediction performance. The ability to predict dropout at an individual level could be used to enhance decision making for researchers and clinicians as well as inform dynamic intervention regimens.

Download Full-text

Developing a Process for the Analysis of User Journeys and the Prediction of Dropout in Digital Health Interventions: Machine Learning Approach

Journal of Medical Internet Research ◽

10.2196/17738 ◽

2020 ◽

Vol 22 (10) ◽

pp. e17738

Author(s):

Vincent Bremer ◽

Philip I Chow ◽

Burkhardt Funk ◽

Frances P Thorndike ◽

Lee M Ritterband

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Behavioral Therapy ◽

Digital Health ◽

Area Under The Curve ◽

Prediction Performance ◽

Health Interventions ◽

Drop Out ◽

Support Vector ◽

Boosted Decision Trees

Background User dropout is a widespread concern in the delivery and evaluation of digital (ie, web and mobile apps) health interventions. Researchers have yet to fully realize the potential of the large amount of data generated by these technology-based programs. Of particular interest is the ability to predict who will drop out of an intervention. This may be possible through the analysis of user journey data—self-reported as well as system-generated data—produced by the path (or journey) an individual takes to navigate through a digital health intervention. Objective The purpose of this study is to provide a step-by-step process for the analysis of user journey data and eventually to predict dropout in the context of digital health interventions. The process is applied to data from an internet-based intervention for insomnia as a way to illustrate its use. The completion of the program is contingent upon completing 7 sequential cores, which include an initial tutorial core. Dropout is defined as not completing the seventh core. Methods Steps of user journey analysis, including data transformation, feature engineering, and statistical model analysis and evaluation, are presented. Dropouts were predicted based on data from 151 participants from a fully automated web-based program (Sleep Healthy Using the Internet) that delivers cognitive behavioral therapy for insomnia. Logistic regression with L1 and L2 regularization, support vector machines, and boosted decision trees were used and evaluated based on their predictive performance. Relevant features from the data are reported that predict user dropout. Results Accuracy of predicting dropout (area under the curve [AUC] values) varied depending on the program core and the machine learning technique. After model evaluation, boosted decision trees achieved AUC values ranging between 0.6 and 0.9. Additional handcrafted features, including time to complete certain steps of the intervention, time to get out of bed, and days since the last interaction with the system, contributed to the prediction performance. Conclusions The results support the feasibility and potential of analyzing user journey data to predict dropout. Theory-driven handcrafted features increased the prediction performance. The ability to predict dropout at an individual level could be used to enhance decision making for researchers and clinicians as well as inform dynamic intervention regimens.

Download Full-text

1204 Analyzing User Journey Data In Digital Health: Predicting Dropout From A Digital CBT-I Intervention

SLEEP ◽

10.1093/sleep/zsaa056.1198 ◽

2020 ◽

Vol 43 (Supplement_1) ◽

pp. A460-A460

Author(s):

V Bremer ◽

P Chow ◽

B Funk ◽

F Thorndike ◽

L Ritterband

Keyword(s):

Clinical Decision Making ◽

Behavioral Therapy ◽

Digital Health ◽

Predictive Performance ◽

Clinical Decision ◽

Prediction Performance ◽

Drop Out ◽

Machine Learning Techniques ◽

Support Vector ◽

Task Support

Abstract Introduction Intervention dropout is an important factor for the evaluation and implementation of digital therapeutics, including in insomnia. Large amounts of individualized data (logins, questionnaires, EMA data) in these interventions can combine to create user journeys - the data generated by the path an individual takes to navigate the digital therapeutic. User journeys can provide insight about how likely users are to drop out of an intervention on an individual level and lead to increased prediction performance. Thus, the goal of this study is to provide a step-by-step guide for the analysis of user journeys and utilize this guide to predict intervention dropout, illustrated with an example from a data in a RCT of digital therapeutic for chronic insomnia, for which outcomes have previously been published. Methods Analysis of user journeys includes data transformation, feature engineering, and statistical model analysis, using machine learning techniques. A framework is established to leverage user journeys to predict various behaviors. For this study, the framework was applied to predict dropouts of 151 participants from a fully automated web-based program (SHUTi) that delivered cognitive behavioral therapy for insomnia. For this task, support vector machines, logistic regression with regularization, and boosted decision trees were applied at different points in 9-week intervention. These techniques were evaluated based on their predictive performance. Results After model evaluation, a decision tree ensemble achieved AUC values ranging between 0.6-0.9 based on application of machine earning techniques. Various handcrafted and theory-driven features (e.g., time to complete certain intervention steps, time to get out of bed after arising, and days since last system interaction contributed to prediction performance. Conclusion Results indicate that utilizing a user journey framework and analysis can predict intervention dropout. Further, handcrafted theory-driven features can increase prediction performance. This prediction of dropout could lead to an enhanced clinical decision-making in digital therapeutics. Support The original study evaluating the efficacy of this intervention has been reported elsewhere and was funded by grant R01 MH86758 from the National Institute of Mental Health.

Download Full-text

Early Prediction of Seven-Day Mortality in Intensive Care Unit Using a Machine Learning Model: Results from the SPIN-UTI Project

Journal of Clinical Medicine ◽

10.3390/jcm10050992 ◽

2021 ◽

Vol 10 (5) ◽

pp. 992

Author(s):

Martina Barchitta ◽

Andrea Maugeri ◽

Giuliana Favara ◽

Paolo Marco Riela ◽

Giovanni Gallo ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Intensive Care Units ◽

Learning Algorithm ◽

Area Under The Curve ◽

Support Vector ◽

Icu Admission ◽

Risk Of Death ◽

Saps Ii ◽

Svm Algorithm

Patients in intensive care units (ICUs) were at higher risk of worsen prognosis and mortality. Here, we aimed to evaluate the ability of the Simplified Acute Physiology Score (SAPS II) to predict the risk of 7-day mortality, and to test a machine learning algorithm which combines the SAPS II with additional patients’ characteristics at ICU admission. We used data from the “Italian Nosocomial Infections Surveillance in Intensive Care Units” network. Support Vector Machines (SVM) algorithm was used to classify 3782 patients according to sex, patient’s origin, type of ICU admission, non-surgical treatment for acute coronary disease, surgical intervention, SAPS II, presence of invasive devices, trauma, impaired immunity, antibiotic therapy and onset of HAI. The accuracy of SAPS II for predicting patients who died from those who did not was 69.3%, with an Area Under the Curve (AUC) of 0.678. Using the SVM algorithm, instead, we achieved an accuracy of 83.5% and AUC of 0.896. Notably, SAPS II was the variable that weighted more on the model and its removal resulted in an AUC of 0.653 and an accuracy of 68.4%. Overall, these findings suggest the present SVM model as a useful tool to early predict patients at higher risk of death at ICU admission.

Download Full-text

Prediction of Healing Performance of Autogenous Healing Concrete Using Machine Learning

Materials ◽

10.3390/ma14154068 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4068

Author(s):

Xu Huang ◽

Mirna Wasouf ◽

Jessada Sresakoolchai ◽

Sakdirat Kaewunruen

Keyword(s):

Machine Learning ◽

Search Algorithm ◽

Weather Conditions ◽

Prediction Performance ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Support Vector ◽

Self Healing ◽

Artificial Neural Network Ann

Cracks typically develop in concrete due to shrinkage, loading actions, and weather conditions; and may occur anytime in its life span. Autogenous healing concrete is a type of self-healing concrete that can automatically heal cracks based on physical or chemical reactions in concrete matrix. It is imperative to investigate the healing performance that autogenous healing concrete possesses, to assess the extent of the cracking and to predict the extent of healing. In the research of self-healing concrete, testing the healing performance of concrete in a laboratory is costly, and a mass of instances may be needed to explore reliable concrete design. This study is thus the world’s first to establish six types of machine learning algorithms, which are capable of predicting the healing performance (HP) of self-healing concrete. These algorithms involve an artificial neural network (ANN), a k-nearest neighbours (kNN), a gradient boosting regression (GBR), a decision tree regression (DTR), a support vector regression (SVR) and a random forest (RF). Parameters of these algorithms are tuned utilising grid search algorithm (GSA) and genetic algorithm (GA). The prediction performance indicated by coefficient of determination (R2) and root mean square error (RMSE) measures of these algorithms are evaluated on the basis of 1417 data sets from the open literature. The results show that GSA-GBR performs higher prediction performance (R2GSA-GBR = 0.958) and stronger robustness (RMSEGSA-GBR = 0.202) than the other five types of algorithms employed to predict the healing performance of autogenous healing concrete. Therefore, reliable prediction accuracy of the healing performance and efficient assistance on the design of autogenous healing concrete can be achieved.

Download Full-text

Value of radiomics in differential diagnosis of chromophobe renal cell carcinoma and renal oncocytoma

Abdominal Radiology ◽

10.1007/s00261-019-02269-9 ◽

2019 ◽

Vol 45 (10) ◽

pp. 3193-3201 ◽

Cited By ~ 3

Author(s):

Yajuan Li ◽

Xialing Huang ◽

Yuwei Xia ◽

Liling Long

Keyword(s):

Machine Learning ◽

Differential Diagnosis ◽

Cell Carcinoma ◽

Area Under The Curve ◽

Image Features ◽

Renal Tumors ◽

Support Vector ◽

Svm Classifier ◽

Renal Oncocytoma ◽

Lasso Regression

Abstract Purpose To explore the value of CT-enhanced quantitative features combined with machine learning for differential diagnosis of renal chromophobe cell carcinoma (chRCC) and renal oncocytoma (RO). Methods Sixty-one cases of renal tumors (chRCC = 44; RO = 17) that were pathologically confirmed at our hospital between 2008 and 2018 were retrospectively analyzed. All patients had undergone preoperative enhanced CT scans including the corticomedullary (CMP), nephrographic (NP), and excretory phases (EP) of contrast enhancement. Volumes of interest (VOIs), including lesions on the images, were manually delineated using the RadCloud platform. A LASSO regression algorithm was used to screen the image features extracted from all VOIs. Five machine learning classifications were trained to distinguish chRCC from RO by using a fivefold cross-validation strategy. The performance of the classifier was mainly evaluated by areas under the receiver operating characteristic (ROC) curve and accuracy. Results In total, 1029 features were extracted from CMP, NP, and EP. The LASSO regression algorithm was used to screen out the four, four, and six best features, respectively, and eight features were selected when CMP and NP were combined. All five classifiers had good diagnostic performance, with area under the curve (AUC) values greater than 0.850, and support vector machine (SVM) classifier showed a diagnostic accuracy of 0.945 (AUC 0.964 ± 0.054; sensitivity 0.999; specificity 0.800), showing the best performance. Conclusions Accurate preoperative differential diagnosis of chRCC and RO can be facilitated by a combination of CT-enhanced quantitative features and machine learning.

Download Full-text

The Comparison and Interpretation of Machine-Learning Models in Post-Stroke Functional Outcome Prediction

Diagnostics ◽

10.3390/diagnostics11101784 ◽

2021 ◽

Vol 11 (10) ◽

pp. 1784

Author(s):

Shih-Chieh Chang ◽

Chan-Lin Chu ◽

Chih-Kuang Chen ◽

Hsiang-Ning Chang ◽

Alice M. K. Wong ◽

...

Keyword(s):

Machine Learning ◽

Area Under The Curve ◽

Superior Performance ◽

Support Vector ◽

Balance Test ◽

Post Stroke ◽

Feature Importance ◽

Value Range ◽

Importance Analysis ◽

Partial Dependence

Prediction of post-stroke functional outcomes is crucial for allocating medical resources. In this study, a total of 577 patients were enrolled in the Post-Acute Care-Cerebrovascular Disease (PAC-CVD) program, and 77 predictors were collected at admission. The outcome was whether a patient could achieve a Barthel Index (BI) score of >60 upon discharge. Eight machine-learning (ML) methods were applied, and their results were integrated by stacking method. The area under the curve (AUC) of the eight ML models ranged from 0.83 to 0.887, with random forest, stacking, logistic regression, and support vector machine demonstrating superior performance. The feature importance analysis indicated that the initial Berg Balance Test (BBS-I), initial BI (BI-I), and initial Concise Chinese Aphasia Test (CCAT-I) were the top three predictors of BI scores at discharge. The partial dependence plot (PDP) and individual conditional expectation (ICE) plot indicated that the predictors’ ability to predict outcomes was the most pronounced within a specific value range (e.g., BBS-I < 40 and BI-I < 60). BI at discharge could be predicted by information collected at admission with the aid of various ML models, and the PDP and ICE plots indicated that the predictors could predict outcomes at a certain value range.

Download Full-text

MEWS++: Enhancing the Prediction of Clinical Deterioration in Admitted Patients through a Machine Learning Model

Journal of Clinical Medicine ◽

10.3390/jcm9020343 ◽

2020 ◽

Vol 9 (2) ◽

pp. 343 ◽

Cited By ~ 4

Author(s):

Arash Kia ◽

Prem Timsina ◽

Himanshu N. Joshi ◽

Eyal Klang ◽

Rohit R. Gupta ◽

...

Keyword(s):

Machine Learning ◽

At Risk ◽

Area Under The Curve ◽

Learning Model ◽

Clinical Deterioration ◽

Early Warning Score ◽

Support Vector ◽

Adult Age ◽

Machine Learning Model ◽

Patients At Risk

Early detection of patients at risk for clinical deterioration is crucial for timely intervention. Traditional detection systems rely on a limited set of variables and are unable to predict the time of decline. We describe a machine learning model called MEWS++ that enables the identification of patients at risk of escalation of care or death six hours prior to the event. A retrospective single-center cohort study was conducted from July 2011 to July 2017 of adult (age > 18) inpatients excluding psychiatric, parturient, and hospice patients. Three machine learning models were trained and tested: random forest (RF), linear support vector machine, and logistic regression. We compared the models’ performance to the traditional Modified Early Warning Score (MEWS) using sensitivity, specificity, and Area Under the Curve for Receiver Operating Characteristic (AUC-ROC) and Precision-Recall curves (AUC-PR). The primary outcome was escalation of care from a floor bed to an intensive care or step-down unit, or death, within 6 h. A total of 96,645 patients with 157,984 hospital encounters and 244,343 bed movements were included. Overall rate of escalation or death was 3.4%. The RF model had the best performance with sensitivity 81.6%, specificity 75.5%, AUC-ROC of 0.85, and AUC-PR of 0.37. Compared to traditional MEWS, sensitivity increased 37%, specificity increased 11%, and AUC-ROC increased 14%. This study found that using machine learning and readily available clinical data, clinical deterioration or death can be predicted 6 h prior to the event. The model we developed can warn of patient deterioration hours before the event, thus helping make timely clinical decisions.

Download Full-text

Computational prediction of implantation outcome after embryo transfer

Health Informatics Journal ◽

10.1177/1460458219892138 ◽

2019 ◽

Vol 26 (3) ◽

pp. 1810-1826 ◽

Cited By ~ 3

Author(s):

Behnaz Raef ◽

Masoud Maleki ◽

Reza Ferdousi

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Prediction Model ◽

Embryo Transfer ◽

Area Under The Curve ◽

Computational Prediction ◽

Support Vector ◽

Human Menopausal Gonadotropin ◽

Optimum Number ◽

Learning Approaches

The aim of this study is to develop a computational prediction model for implantation outcome after an embryo transfer cycle. In this study, information of 500 patients and 1360 transferred embryos, including cleavage and blastocyst stages and fresh or frozen embryos, from April 2016 to February 2018, were collected. The dataset containing 82 attributes and a target label (indicating positive and negative implantation outcomes) was constructed. Six dominant machine learning approaches were examined based on their performance to predict embryo transfer outcomes. Also, feature selection procedures were used to identify effective predictive factors and recruited to determine the optimum number of features based on classifiers performance. The results revealed that random forest was the best classifier (accuracy = 90.40% and area under the curve = 93.74%) with optimum features based on a 10-fold cross-validation test. According to the Support Vector Machine-Feature Selection algorithm, the ideal numbers of features are 78. Follicle stimulating hormone/human menopausal gonadotropin dosage for ovarian stimulation was the most important predictive factor across all examined embryo transfer features. The proposed machine learning-based prediction model could predict embryo transfer outcome and implantation of embryos with high accuracy, before the start of an embryo transfer cycle.

Download Full-text

Learning to Identify At-Risk Students in Distance Education Using Interaction Counts

Revista de Informática Teórica e Aplicada ◽

10.22456/2175-2745.62211 ◽

2016 ◽

Vol 23 (2) ◽

pp. 124 ◽

Cited By ~ 2

Author(s):

Douglas Detoni ◽

Cristian Cechinel ◽

Ricardo Araujo Matsumura ◽

Daniela Francisco Brauner

Keyword(s):

Machine Learning ◽

At Risk ◽

At Risk Students ◽

Drop Out ◽

Support Vector ◽

Learning Models ◽

Data Set ◽

Student Dropout ◽

Vector Machines ◽

Machine Learning Models

Student dropout is one of the main problems faced by distance learning courses. One of the major challenges for researchers is to develop methods to predict the behavior of students so that teachers and tutors are able to identify at-risk students as early as possible and provide assistance before they drop out or fail in their courses. Machine Learning models have been used to predict or classify students in these settings. However, while these models have shown promising results in several settings, they usually attain these results using attributes that are not immediately transferable to other courses or platforms. In this paper, we provide a methodology to classify students using only interaction counts from each student. We evaluate this methodology on a data set from two majors based on the Moodle platform. We run experiments consisting of training and evaluating three machine learning models (Support Vector Machines, Naive Bayes and Adaboost decision trees) under different scenarios. We provide evidences that patterns from interaction counts can provide useful information for classifying at-risk students. This classification allows the customization of the activities presented to at-risk students (automatically or through tutors) as an attempt to avoid students drop out.

Download Full-text

Diagnostic performance of machine learning applied to texture analysis-derived features for breast lesion characterisation at automated breast ultrasound: a pilot study

European Radiology Experimental ◽

10.1186/s41747-019-0121-6 ◽

2019 ◽

Vol 3 (1) ◽

Cited By ~ 2

Author(s):

Magda Marcon ◽

Alexander Ciritsis ◽

Cristina Rossi ◽

Anton S. Becker ◽

Nicole Berger ◽

...

Keyword(s):

Machine Learning ◽

Pilot Study ◽

Texture Analysis ◽

Area Under The Curve ◽

Texture Features ◽

Breast Ultrasound ◽

Support Vector ◽

Maximum Area ◽

Svm Algorithm ◽

Automated Breast Ultrasound

Abstract Background Our aims were to determine if features derived from texture analysis (TA) can distinguish normal, benign, and malignant tissue on automated breast ultrasound (ABUS); to evaluate whether machine learning (ML) applied to TA can categorise ABUS findings; and to compare ML to the analysis of single texture features for lesion classification. Methods This ethically approved retrospective pilot study included 54 women with benign (n = 38) and malignant (n = 32) solid breast lesions who underwent ABUS. After manual region of interest placement along the lesions’ margin as well as the surrounding fat and glandular breast tissue, 47 texture features (TFs) were calculated for each category. Statistical analysis (ANOVA) and a support vector machine (SVM) algorithm were applied to the texture feature to evaluate the accuracy in distinguishing (i) lesions versus normal tissue and (ii) benign versus malignant lesions. Results Skewness and kurtosis were the only TF significantly different among all the four categories (p < 0.000001). In subsets (i) and (ii), a maximum area under the curve of 0.86 (95% confidence interval [CI] 0.82–0.88) for energy and 0.86 (95% CI 0.82–0.89) for entropy were obtained. Using the SVM algorithm, a maximum area under the curve of 0.98 for both subsets was obtained with a maximum accuracy of 94.4% in subset (i) and 90.7% in subset (ii). Conclusions TA in combination with ML might represent a useful diagnostic tool in the evaluation of breast imaging findings in ABUS. Applying ML techniques to TFs might be superior compared to the analysis of single TF.

Download Full-text