scholarly journals Comparison of dynamic updating strategies for clinical prediction models

2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Erin M. Schnellinger ◽  
Wei Yang ◽  
Stephen E. Kimmel

Abstract Background Prediction models inform many medical decisions, but their performance often deteriorates over time. Several discrete-time update strategies have been proposed in the literature, including model recalibration and revision. However, these strategies have not been compared in the dynamic updating setting. Methods We used post-lung transplant survival data during 2010-2015 and compared the Brier Score (BS), discrimination, and calibration of the following update strategies: (1) never update, (2) update using the closed testing procedure proposed in the literature, (3) always recalibrate the intercept, (4) always recalibrate the intercept and slope, and (5) always refit/revise the model. In each case, we explored update intervals of every 1, 2, 4, and 8 quarters. We also examined how the performance of the update strategies changed as the amount of old data included in the update (i.e., sliding window length) increased. Results All methods of updating the model led to meaningful improvement in BS relative to never updating. More frequent updating yielded better BS, discrimination, and calibration, regardless of update strategy. Recalibration strategies led to more consistent improvements and less variability over time compared to the other updating strategies. Using longer sliding windows did not substantially impact the recalibration strategies, but did improve the discrimination and calibration of the closed testing procedure and model revision strategies. Conclusions Model updating leads to improved BS, with more frequent updating performing better than less frequent updating. Model recalibration strategies appeared to be the least sensitive to the update interval and sliding window length.

Mathematics ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 1244
Author(s):  
Lin Hao ◽  
Juncheol Kim ◽  
Sookhee Kwon ◽  
Il Do Ha

With the development of high-throughput technologies, more and more high-dimensional or ultra-high-dimensional genomic data are being generated. Therefore, effectively analyzing such data has become a significant challenge. Machine learning (ML) algorithms have been widely applied for modeling nonlinear and complicated interactions in a variety of practical fields such as high-dimensional survival data. Recently, multilayer deep neural network (DNN) models have made remarkable achievements. Thus, a Cox-based DNN prediction survival model (DNNSurv model), which was built with Keras and TensorFlow, was developed. However, its results were only evaluated on the survival datasets with high-dimensional or large sample sizes. In this paper, we evaluated the prediction performance of the DNNSurv model using ultra-high-dimensional and high-dimensional survival datasets and compared it with three popular ML survival prediction models (i.e., random survival forest and the Cox-based LASSO and Ridge models). For this purpose, we also present the optimal setting of several hyperparameters, including the selection of a tuning parameter. The proposed method demonstrated via data analysis that the DNNSurv model performed well overall as compared with the ML models, in terms of the three main evaluation measures (i.e., concordance index, time-dependent Brier score, and the time-dependent AUC) for survival prediction performance.


Author(s):  
Il Do Ha ◽  
Lin Hao ◽  
Juncheol Kim ◽  
Sookhee Kwon

As the development of high-throughput technologies, more and more high-dimensional or ultra high-dimensional genomic data are generated. Therefore, how to make effective analysis of such data becomes a challenge. Machine learning (ML) algorithms have been widely applied for modelling nonlinear and complicated interactions in a variety of practical fields such as high-dimensional survival data. Recently, the multilayer deep neural network (DNN) models have made remarkable achievements. Thus, a Cox-based DNN prediction survival model (DNNSurv model) , which was built with Keras and Tensorflow, was developed. However, its results were only evaluated to the survival datasets with high-dimensional or large sample sizes. In this paper, we evaluate the prediction performance of the DNNSurv model using ultra high-dimensional and high-dimensional survival datasets, and compare it with three popular ML survival prediction models (i.e., random survival forest and Cox-based LASSO and Ridge models). For this purpose we also present the optimal setting of several hyper-parameters including selection of tuning parameter. The proposed method demonstrates via data analysis that the DNNSurv model performs overall well as compared with the ML models, in terms of three main evaluation measures (i.e., concordance index, time-dependent Brier score and time-dependent AUC) for survival prediction performance.


2003 ◽  
Vol 42 (05) ◽  
pp. 564-571 ◽  
Author(s):  
M. Schumacher ◽  
E. Graf ◽  
T. Gerds

Summary Objectives: A lack of generally applicable tools for the assessment of predictions for survival data has to be recognized. Prediction error curves based on the Brier score that have been suggested as a sensible approach are illustrated by means of a case study. Methods: The concept of predictions made in terms of conditional survival probabilities given the patient’s covariates is introduced. Such predictions are derived from various statistical models for survival data including artificial neural networks. The idea of how the prediction error of a prognostic classification scheme can be followed over time is illustrated with the data of two studies on the prognosis of node positive breast cancer patients, one of them serving as an independent test data set. Results and Conclusions: The Brier score as a function of time is shown to be a valuable tool for assessing the predictive performance of prognostic classification schemes for survival data incorporating censored observations. Comparison with the prediction based on the pooled Kaplan Meier estimator yields a benchmark value for any classification scheme incorporating patient’s covariate measurements. The problem of an overoptimistic assessment of prediction error caused by data-driven modelling as it is, for example, done with artificial neural nets can be circumvented by an assessment in an independent test data set.


Author(s):  
Hannah L Combs ◽  
Kate A Wyman-Chick ◽  
Lauren O Erickson ◽  
Michele K York

Abstract Objective Longitudinal assessment of cognitive and emotional functioning in patients with Parkinson’s disease (PD) is helpful in tracking progression of the disease, developing treatment plans, evaluating outcomes, and educating patients and families. Determining whether change over time is meaningful in neurodegenerative conditions, such as PD, can be difficult as repeat assessment of neuropsychological functioning is impacted by factors outside of cognitive change. Regression-based prediction formulas are one method by which clinicians and researchers can determine whether an observed change is meaningful. The purpose of the current study was to develop and validate regression-based prediction models of cognitive and emotional test scores for participants with early-stage idiopathic PD and healthy controls (HC) enrolled in the Parkinson’s Progression Markers Initiative (PPMI). Methods Participants with de novo PD and HC were identified retrospectively from the PPMI archival database. Data from baseline testing and 12-month follow-up were utilized in this study. In total, 688 total participants were included in the present study (NPD = 508; NHC = 185). Subjects from both groups were randomly divided into development (70%) and validation (30%) subsets. Results Early-stage idiopathic PD patients and healthy controls were similar at baseline. Regression-based models were developed for all cognitive and self-report mood measures within both populations. Within the validation subset, the predicted and observed cognitive test scores did not significantly differ, except for semantic fluency. Conclusions The prediction models can serve as useful tools for researchers and clinicians to study clinically meaningful cognitive and mood change over time in PD.


2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Brittany R. Lapin ◽  
Nicolas R. Thompson ◽  
Andrew Schuster ◽  
Irene L. Katzan

Abstract Objectives Research has indicated proxies overestimate symptoms on patients’ behalves, however it is unclear whether patients and proxies agree on meaningful change across domains over time. The objective of this study is to assess patient-proxy agreement over time, as well as agreement on identification of meaningful change, across 10 health domains in patients who underwent acute rehabilitation following stroke. Methods Stroke patients were recruited from an ambulatory clinic or inpatient rehabilitation unit, and were included in the study if they were undergoing rehabilitation. At baseline and again after 30 days, patients and their proxies completed PROMIS Global Health and eight domain-specific PROMIS short forms. Reliability of patient-proxy assessments at baseline, follow-up, and the change in T-score was evaluated for each domain using intra-class correlation coefficients (ICC(2,1)). Agreement on meaningful improvement or worsening, defined as 5+ T-score points, was compared using percent exact agreement. Results Forty-one patient-proxy dyads were included in the study. Proxies generally reported worse symptoms and functioning compared to patients at both baseline and follow-up, and reported less change than patients. ICCs for baseline and change were primarily poor to moderate (range: 0.06 (for depression change) to 0.67 (for physical function baseline)), and were better at follow-up (range: 0.42 (for anxiety) to 0.84 (for physical function)). Percent exact agreement between indicating meaningful improvement versus no improvement ranged from 58.5–75.6%. Only a small proportion indicated meaningful worsening. Conclusions Patient-proxy agreement across 10 domains of health was better following completion of rehabilitation compared to baseline or change. Overall change was minimal but the majority of patient-proxy dyads agreed on meaningful change. Our study provides important insight for clinicians and researchers when interpreting change scores over time for questionnaires completed by both patients and proxies.


2020 ◽  
Author(s):  
Georgios Kantidakis ◽  
Hein Putter ◽  
Carlo Lancia ◽  
Jacob de Boer ◽  
Andries E Braat ◽  
...  

Abstract Background: Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians.Methods: In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques.Results: Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years.Conclusion: In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables.


2021 ◽  
Vol 13 (23) ◽  
pp. 4864
Author(s):  
Langfu Cui ◽  
Qingzhen Zhang ◽  
Liman Yang ◽  
Chenggang Bai

An inertial platform is the key component of a remote sensing system. During service, the performance of the inertial platform appears in degradation and accuracy reduction. For better maintenance, the inertial platform system is checked and maintained regularly. The performance change of an inertial platform can be evaluated by detection data. Due to limitations of detection conditions, inertial platform detection data belongs to small sample data. In this paper, in order to predict the performance of an inertial platform, a prediction model for an inertial platform is designed combining a sliding window, grey theory and neural network (SGMNN). The experiments results show that the SGMNN model performs best in predicting the inertial platform drift rate compared with other prediction models.


2021 ◽  
Author(s):  
Gaurav Gulati ◽  
Riley J Brazil ◽  
Jason Nelson ◽  
David van Klaveren ◽  
Christine M. Lundquist ◽  
...  

AbstractBackgroundClinical prediction models (CPMs) are used to inform treatment decisions for the primary prevention of cardiovascular disease. We aimed to assess the performance of such CPMs in fully independent cohorts.Methods and Results63 models predicting outcomes for patients at risk of cardiovascular disease from the Tufts PACE CPM Registry were selected for external validation on publicly available data from up to 4 broadly inclusive primary prevention clinical trials. For each CPM-trial pair, we assessed model discrimination, calibration, and net benefit. Results were stratified based on the relatedness of derivation and validation cohorts, and net benefit was reassessed after updating model intercept, slope, or complete re-estimation. The median c statistic of the CPMs decreased from 0.77 (IQR 0.72-0.78) in the derivation cohorts to 0.63 (IQR 0.58-0.66) when externally validated. The validation c-statistic was higher when derivation and validation cohorts were considered related than when they were distantly related (0.67 vs 0.60, p < 0.001). The calibration slope was also higher in related cohorts than distantly related cohorts (0.69 vs 0.58, p < 0.001). Net benefit analysis suggested substantial likelihood of harm when models were externally applied, but this likelihood decreased after model updating.ConclusionsDiscrimination and calibration decrease significantly when CPMs for primary prevention of cardiovascular disease are tested in external populations, particularly when the population is only distantly related to the derivation population. Poorly calibrated predictions lead to poor decision making. Model updating can reduce the likelihood of harmful decision making, and is needed to realize the full potential of risk-based decision making in new settings.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Noah DeWitt ◽  
Mohammed Guedira ◽  
Edwin Lauer ◽  
J. Paul Murphy ◽  
David Marshall ◽  
...  

Abstract Background Genetic variation in growth over the course of the season is a major source of grain yield variation in wheat, and for this reason variants controlling heading date and plant height are among the best-characterized in wheat genetics. While the major variants for these traits have been cloned, the importance of these variants in contributing to genetic variation for plant growth over time is not fully understood. Here we develop a biparental population segregating for major variants for both plant height and flowering time to characterize the genetic architecture of the traits and identify additional novel QTL. Results We find that additive genetic variation for both traits is almost entirely associated with major and moderate-effect QTL, including four novel heading date QTL and four novel plant height QTL. FT2 and Vrn-A3 are proposed as candidate genes underlying QTL on chromosomes 3A and 7A, while Rht8 is mapped to chromosome 2D. These mapped QTL also underlie genetic variation in a longitudinal analysis of plant growth over time. The oligogenic architecture of these traits is further demonstrated by the superior trait prediction accuracy of QTL-based prediction models compared to polygenic genomic selection models. Conclusions In a population constructed from two modern wheat cultivars adapted to the southeast U.S., almost all additive genetic variation in plant growth traits is associated with known major variants or novel moderate-effect QTL. Major transgressive segregation was observed in this population despite the similar plant height and heading date characters of the parental lines. This segregation is being driven primarily by a small number of mapped QTL, instead of by many small-effect, undetected QTL. As most breeding populations in the southeast U.S. segregate for known QTL for these traits, genetic variation in plant height and heading date in these populations likely emerges from similar combinations of major and moderate effect QTL. We can make more accurate and cost-effective prediction models by targeted genotyping of key SNPs.


2021 ◽  
Author(s):  
Sebastian Johannes Fritsch ◽  
Konstantin Sharafutdinov ◽  
Moein Einollahzadeh Samadi ◽  
Gernot Marx ◽  
Andreas Schuppert ◽  
...  

BACKGROUND During the course of the COVID-19 pandemic, a variety of machine learning models were developed to predict different aspects of the disease, such as long-term causes, organ dysfunction or ICU mortality. The number of training datasets used has increased significantly over time. However, these data now come from different waves of the pandemic, not always addressing the same therapeutic approaches over time as well as changing outcomes between two waves. The impact of these changes on model development has not yet been studied. OBJECTIVE The aim of the investigation was to examine the predictive performance of several models trained with data from one wave predicting the second wave´s data and the impact of a pooling of these data sets. Finally, a method for comparison of different datasets for heterogeneity is introduced. METHODS We used two datasets from wave one and two to develop several predictive models for mortality of the patients. Four classification algorithms were used: logistic regression (LR), support vector machine (SVM), random forest classifier (RF) and AdaBoost classifier (ADA). We also performed a mutual prediction on the data of that wave which was not used for training. Then, we compared the performance of models when a pooled dataset from two waves was used. The populations from the different waves were checked for heterogeneity using a convex hull analysis. RESULTS 63 patients from wave one (03-06/2020) and 54 from wave two (08/2020-01/2021) were evaluated. For both waves separately, we found models reaching sufficient accuracies up to 0.79 AUROC (95%-CI 0.76-0.81) for SVM on the first wave and up 0.88 AUROC (95%-CI 0.86-0.89) for RF on the second wave. After the pooling of the data, the AUROC decreased relevantly. In the mutual prediction, models trained on second wave´s data showed, when applied on first wave´s data, a good prediction for non-survivors but an insufficient classification for survivors. The opposite situation (training: first wave, test: second wave) revealed the inverse behaviour with models correctly classifying survivors and incorrectly predicting non-survivors. The convex hull analysis for the first and second wave populations showed a more inhomogeneous distribution of underlying data when compared to randomly selected sets of patients of the same size. CONCLUSIONS Our work demonstrates that a larger dataset is not a universal solution to all machine learning problems in clinical settings. Rather, it shows that inhomogeneous data used to develop models can lead to serious problems. With the convex hull analysis, we offer a solution for this problem. The outcome of such an analysis can raise concerns if the pooling of different datasets would cause inhomogeneous patterns preventing a better predictive performance.


Sign in / Sign up

Export Citation Format

Share Document