scholarly journals Low adherence to existing model reporting guidelines by commonly used clinical prediction models

Author(s):  
Jonathan Hsijing Lu ◽  
Alison Callahan ◽  
Birju Patel ◽  
Keith Morse ◽  
Dev Dash ◽  
...  

Objective: To assess whether the documentation available for commonly used machine learning models developed by an electronic health record (EHR) vendor provides information requested by model reporting guidelines. Materials and Methods: We identified items requested for reporting from model reporting guidelines published in computer science, biomedical informatics, and clinical journals, and merged similar items into representative "atoms". Four independent reviewers and one adjudicator assessed the degree to which model documentation for 12 models developed by Epic Systems reported the details requested in each atom. We present summary statistics of consensus, interrater agreement, and reporting rates of all atoms for the 12 models. Results: We identified 220 unique atoms across 15 model reporting guidelines. After examining the documentation for the 12 most commonly used Epic models, the independent reviewers had an interrater agreement of 76%. After adjudication, the model documentations' median completion rate of applicable atoms was 39% (range: 31%-47%). Most of the commonly requested atoms had reporting rates of 90% or above, including atoms concerning outcome definition, preprocessing, AUROC, internal validation and intended clinical use. For individual reporting guidelines, the median adherence rate for an entire guideline was 54% (range: 15%-71%). Atoms reported half the time or less included those relating to fairness (summary statistics and subgroup analyses, including for age, race/ethnicity, or sex), usefulness (net benefit, prediction time, warnings on out-of-scope use and when to stop use), and transparency (model coefficients). Atoms reported the least often related to missingness (missing data statistics, missingness strategy), validation (calibration plot, external validation), and monitoring (how models are updated/tuned, prediction monitoring). Conclusion: There are many recommendations about what should be reported about predictive models used to guide care. Existing model documentation examined in this study provides less than half of applicable atoms, and entire reporting guidelines have low adherence rates. Half or less of the reviewed documentation reported information related to usefulness, reliability, transparency and fairness of models. There is a need for better operationalization of reporting recommendations for predictive models in healthcare.

2020 ◽  
Author(s):  
Ruyi Zhang ◽  
Mei Xu ◽  
Xiangxiang Liu ◽  
Miao Wang ◽  
Qiang Jia ◽  
...  

Abstract Objectives To develop a clinically predictive nomogram model which can maximize patients’ net benefit in terms of predicting the prognosis of patients with thyroid carcinoma based on the 8th edition of the AJCC Cancer Staging method. MethodsWe selected 134,962 thyroid carcinoma patients diagnosed between 2004 and 2015 from SEER database with details of the 8th edition of the AJCC Cancer Staging Manual and separated those patients into two datasets randomly. The first dataset, training set, was used to build the nomogram model accounting for 80% (94,474 cases) and the second dataset, validation set, was used for external validation accounting for 20% (40,488 cases). Then we evaluated its clinical availability by analyzing DCA (Decision Curve Analysis) performance and evaluated its accuracy by calculating AUC, C-index as well as calibration plot.ResultsDecision curve analysis showed the final prediction model could maximize patients’ net benefit. In training set and validation set, Harrell’s Concordance Indexes were 0.9450 and 0.9421 respectively. Both sensitivity and specificity of three predicted time points (12 Months,36 Months and 60 Months) of two datasets were all above 0.80 except sensitivity of 60-month time point of validation set was 0.7662. AUCs of three predicted timepoints were 0.9562, 0.9273 and 0.9009 respectively for training set. Similarly, those numbers were 0.9645, 0.9329, and 0.8894 respectively for validation set. Calibration plot also showed that the nomogram model had a good calibration.ConclusionThe final nomogram model provided with both excellent accuracy and clinical availability and should be able to predict patients’ survival probability visually and accurately.


2020 ◽  
Author(s):  
Fangcan Sun ◽  
Bing Han ◽  
Fangfang Wu ◽  
Qianqian Shen ◽  
Minhong Shen ◽  
...  

Abstract Background A prediction algorithm to identify women with high risk of an emergency cesarean could help reduce morbidity and mortality associated with labor. The objective of the present study was to derive and validate a simple model to predict intrapartum cesarean delivery for low-risk nulliparous women in Chinese population.Methods We conducted a retrospective cohort study of low-risk nulliparous women with singleton, term, cephalic pregnancies. A predictive model for cesarean delivery was derived using univariate and multivariable logistic regression from the hospital of the First Affiliated Hospital of Soochow University. External validation of the prediction model was then performed using the data from Sihong county People’s Hospital. A new nomogram was established based on the development cohort to predict the cesarean. The ROC curve, calibration plot and decision curve analysis were used to assess the predictive performance.Results The intrapartum cesarean delivery rates in the development cohort and the external validation cohort were 8.79% (576/6,551) and 7.82% (599/7,657). Multivariable logistic regression analysis showed that maternal age, height, BMI, weight gained during pregnancy, gestational age, induction method, meconium-stained amniotic fluid and neonatal sex were independent factors affecting cesarean outcome. We had established two prediction models according to fetal sex was involved or not. The AUC was 0.782 and 0.774, respectively. The two prediction models were well-calibrated with Hosmer-Lemeshow test P=0.263 and P=0.817, respectively. Decision curve analysis demonstrated that two models had clinical application value, and they provided greatest net benefit between threshold probabilities of 4% to 60%. And internal validation using Bootstrap method demonstrated similar discriminatory ability. We external validated the model involving fetal sex, for which the AUC was 0.775, while the slope and intercept of the calibration plot were 0.979 and 0.004, respectively. On the external validation set, another model had an AUC of 0.775 and a calibration slope of 1.007. The online web server was constructed based on the nomogram for convenient clinical use.Conclusions Both two models established by these factors have good prediction efficiency and high accuracy, which can provide the reference for clinicians to guide pregnant women to choose an appropriate delivery mode.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0244629
Author(s):  
Ali A. El-Solh ◽  
Yolanda Lawson ◽  
Michael Carter ◽  
Daniel A. El-Solh ◽  
Kari A. Mergenhagen

Objective Our objective is to compare the predictive accuracy of four recently established outcome models of patients hospitalized with coronavirus disease 2019 (COVID-19) published between January 1st and May 1st 2020. Methods We used data obtained from the Veterans Affairs Corporate Data Warehouse (CDW) between January 1st, 2020, and May 1st 2020 as an external validation cohort. The outcome measure was hospital mortality. Areas under the ROC (AUC) curves were used to evaluate discrimination of the four predictive models. The Hosmer–Lemeshow (HL) goodness-of-fit test and calibration curves assessed applicability of the models to individual cases. Results During the study period, 1634 unique patients were identified. The mean age of the study cohort was 68.8±13.4 years. Hypertension, hyperlipidemia, and heart disease were the most common comorbidities. The crude hospital mortality was 29% (95% confidence interval [CI] 0.27–0.31). Evaluation of the predictive models showed an AUC range from 0.63 (95% CI 0.60–0.66) to 0.72 (95% CI 0.69–0.74) indicating fair to poor discrimination across all models. There were no significant differences among the AUC values of the four prognostic systems. All models calibrated poorly by either overestimated or underestimated hospital mortality. Conclusions All the four prognostic models examined in this study portend high-risk bias. The performance of these scores needs to be interpreted with caution in hospitalized patients with COVID-19.


2018 ◽  
Vol 17 (8) ◽  
pp. 675-689 ◽  
Author(s):  
Satish M Mahajan ◽  
Paul Heidenreich ◽  
Bruce Abbott ◽  
Ana Newton ◽  
Deborah Ward

Aims: Readmission rates for patients with heart failure have consistently remained high over the past two decades. As more electronic data, computing power, and newer statistical techniques become available, data-driven care could be achieved by creating predictive models for adverse outcomes such as readmissions. We therefore aimed to review models for predicting risk of readmission for patients admitted for heart failure. We also aimed to analyze and possibly group the predictors used across the models. Methods: Major electronic databases were searched to identify studies that examined correlation between readmission for heart failure and risk factors using multivariate models. We rigorously followed the review process using PRISMA methodology and other established criteria for quality assessment of the studies. Results: We did a detailed review of 334 papers and found 25 multivariate predictive models built using data from either health system or trials. A majority of models was built using multiple logistic regression followed by Cox proportional hazards regression. Some newer studies ventured into non-parametric and machine learning methods. Overall predictive accuracy with C-statistics ranged from 0.59 to 0.84. We examined significant predictors across the studies using clinical, administrative, and psychosocial groups. Conclusions: Complex disease management and correspondingly increasing costs for heart failure are driving innovations in building risk prediction models for readmission. Large volumes of diverse electronic data and new statistical methods have improved the predictive power of the models over the past two decades. More work is needed for calibration, external validation, and deployment of such models for clinical use.


Stroke ◽  
2021 ◽  
Author(s):  
Michiel H.F. Poorthuis ◽  
Reinier A.R. Herings ◽  
Kirsten Dansey ◽  
Johanna A.A. Damen ◽  
Jacoba P. Greving ◽  
...  

Background and Purpose: The net benefit of carotid endarterectomy (CEA) is determined partly by the risk of procedural stroke or death. Current guidelines recommend CEA if 30-day risks are <6% for symptomatic stenosis and <3% for asymptomatic stenosis. We aimed to identify prediction models for procedural stroke or death after CEA and to externally validate these models in a large registry of patients from the United States. Methods: We conducted a systematic search in MEDLINE and EMBASE for prediction models of procedural outcomes after CEA. We validated these models with data from patients who underwent CEA in the American College of Surgeons National Surgical Quality Improvement Program (2011–2017). We assessed discrimination using C statistics and calibration graphically. We determined the number of patients with predicted risks that exceeded recommended thresholds of procedural risks to perform CEA. Results: After screening 788 reports, 15 studies describing 17 prediction models were included. Nine were developed in populations including both asymptomatic and symptomatic patients, 2 in symptomatic and 5 in asymptomatic populations. In the external validation cohort of 26 293 patients who underwent CEA, 702 (2.7%) developed a stroke or died within 30-days. C statistics varied between 0.52 and 0.64 using all patients, between 0.51 and 0.59 using symptomatic patients, and between 0.49 to 0.58 using asymptomatic patients. The Ontario Carotid Endarterectomy Registry model that included symptomatic status, diabetes, heart failure, and contralateral occlusion as predictors, had C statistic of 0.64 and the best concordance between predicted and observed risks. This model identified 4.5% of symptomatic and 2.1% of asymptomatic patients with procedural risks that exceeded recommended thresholds. Conclusions: Of the 17 externally validated prediction models, the Ontario Carotid Endarterectomy Registry risk model had most reliable predictions of procedural stroke or death after CEA and can inform patients about procedural hazards and help focus CEA toward patients who would benefit most from it.


BMJ Open ◽  
2017 ◽  
Vol 7 (9) ◽  
pp. e016591 ◽  
Author(s):  
Luke Eliot Hodgson ◽  
Alexander Sarnowski ◽  
Paul J Roderick ◽  
Borislav D Dimitrov ◽  
Richard M Venn ◽  
...  

ObjectiveCritically appraise prediction models for hospital-acquired acute kidney injury (HA-AKI) in general populations.DesignSystematic review.Data sourcesMedline, Embase and Web of Science until November 2016.EligibilityStudies describing development of a multivariable model for predicting HA-AKI in non-specialised adult hospital populations. Published guidance followed for data extraction reporting and appraisal.Results14 046 references were screened. Of 53 HA-AKI prediction models, 11 met inclusion criteria (general medicine and/or surgery populations, 474 478 patient episodes) and five externally validated. The most common predictors were age (n=9 models), diabetes (5), admission serum creatinine (SCr) (5), chronic kidney disease (CKD) (4), drugs (diuretics (4) and/or ACE inhibitors/angiotensin-receptor blockers (3)), bicarbonate and heart failure (4 models each). Heterogeneity was identified for outcome definition. Deficiencies in reporting included handling of predictors, missing data and sample size. Admission SCr was frequently taken to represent baseline renal function. Most models were considered at high risk of bias. Area under the receiver operating characteristic curves to predict HA-AKI ranged 0.71–0.80 in derivation (reported in 8/11 studies), 0.66–0.80 for internal validation studies (n=7) and 0.65–0.71 in five external validations. For calibration, the Hosmer-Lemeshow test or a calibration plot was provided in 4/11 derivations, 3/11 internal and 3/5 external validations. A minority of the models allow easy bedside calculation and potential electronic automation. No impact analysis studies were found.ConclusionsAKI prediction models may help address shortcomings in risk assessment; however, in general hospital populations, few have external validation. Similar predictors reflect an elderly demographic with chronic comorbidities. Reporting deficiencies mirrors prediction research more broadly, with handling of SCr (baseline function and use as a predictor) a concern. Future research should focus on validation, exploration of electronic linkage and impact analysis. The latter could combine a prediction model with AKI alerting to address prevention and early recognition of evolving AKI.


2021 ◽  
Vol 8 ◽  
Author(s):  
Shanshan Gao ◽  
Gang Yin ◽  
Qing Xia ◽  
Guihai Wu ◽  
Jinxiu Zhu ◽  
...  

Background: The existing prediction models lack the generalized applicability for chronic heart failure (CHF) readmission. We aimed to develop and validate a widely applicable nomogram for the prediction of 180-day readmission to the patients.Methods: We prospectively enrolled 2,980 consecutive patients with CHF from two hospitals. A nomogram was created to predict 180-day readmission based on the selected variables. The patients were divided into three datasets for development, internal validation, and external validation (mean age: 74.2 ± 14.1, 73.8 ± 14.2, and 71.0 ± 11.7 years, respectively; sex: 50.2, 48.8, and 55.2% male, respectively). At baseline, 102 variables were submitted to the least absolute shrinkage and selection operator (Lasso) regression algorithm for variable selection. The selected variables were processed by the multivariable Cox proportional hazards regression modeling combined with univariate analysis and stepwise regression. The model was evaluated by the concordance index (C-index) and calibration plot. Finally, the nomogram was provided to visualize the results. The improvement in the regression model was calculated by the net reclassification index (NRI) (with tenfold cross-validation and 200 bootstraps).Results: Among the selected 2,980 patients, 1,696 (56.9%) were readmitted within 180 days, and 1,502 (50.4%) were men. A nomogram was established by the results of Lasso regression, univariate analysis, stepwise regression and multivariate Cox regression, as well as variables with clinical significance. The values of the C-index were 0.75 [95% confidence interval (CI): 0.72–0.79], 0.75 [95% CI: 0.69–0.81], and 0.73 [95% CI: 0.64–0.83] for the development, internal validation, and external validation datasets, respectively. Calibration plots were provided for both the internal and external validation sets. Five variables including history of acute heart failure, emergency department visit, age, blood urea nitrogen level, and beta blocker usage were considered in the final prediction model. When adding variables involving hospital discharge way, alcohol taken and left bundle branch block, the calculated values of NRI demonstrated no significant improvements.Conclusions: A nomogram for the prediction of 180-day readmission of patients with CHF was developed and validated based on five variables. The proposed methodology can improve the accurate prediction of patient readmission and have the wide applications for CHF.


BMC Medicine ◽  
2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Kym I. E. Snell ◽  
◽  
John Allotey ◽  
Melanie Smuk ◽  
Richard Hooper ◽  
...  

Abstract Background Pre-eclampsia is a leading cause of maternal and perinatal mortality and morbidity. Early identification of women at risk during pregnancy is required to plan management. Although there are many published prediction models for pre-eclampsia, few have been validated in external data. Our objective was to externally validate published prediction models for pre-eclampsia using individual participant data (IPD) from UK studies, to evaluate whether any of the models can accurately predict the condition when used within the UK healthcare setting. Methods IPD from 11 UK cohort studies (217,415 pregnant women) within the International Prediction of Pregnancy Complications (IPPIC) pre-eclampsia network contributed to external validation of published prediction models, identified by systematic review. Cohorts that measured all predictor variables in at least one of the identified models and reported pre-eclampsia as an outcome were included for validation. We reported the model predictive performance as discrimination (C-statistic), calibration (calibration plots, calibration slope, calibration-in-the-large), and net benefit. Performance measures were estimated separately in each available study and then, where possible, combined across studies in a random-effects meta-analysis. Results Of 131 published models, 67 provided the full model equation and 24 could be validated in 11 UK cohorts. Most of the models showed modest discrimination with summary C-statistics between 0.6 and 0.7. The calibration of the predicted compared to observed risk was generally poor for most models with observed calibration slopes less than 1, indicating that predictions were generally too extreme, although confidence intervals were wide. There was large between-study heterogeneity in each model’s calibration-in-the-large, suggesting poor calibration of the predicted overall risk across populations. In a subset of models, the net benefit of using the models to inform clinical decisions appeared small and limited to probability thresholds between 5 and 7%. Conclusions The evaluated models had modest predictive performance, with key limitations such as poor calibration (likely due to overfitting in the original development datasets), substantial heterogeneity, and small net benefit across settings. The evidence to support the use of these prediction models for pre-eclampsia in clinical decision-making is limited. Any models that we could not validate should be examined in terms of their predictive performance, net benefit, and heterogeneity across multiple UK settings before consideration for use in practice. Trial registration PROSPERO ID: CRD42015029349.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Virginie Tarazona ◽  
Samy Figueiredo ◽  
Sophie Hamada ◽  
Jonas Pochard ◽  
Ryan W. Haines ◽  
...  

Abstract Background Myoglobin and creatine kinase (CK) are both established markers of muscle injury but their hospital admission values have never been compared to predict post-traumatic acute kidney injury (AKI). Methods An observational registry study of consecutive trauma patients admitted to a major regional trauma centre. The primary outcome was stage 1 or more AKI in the first 7 days after trauma. We assessed the association of hospital admission myoglobin or CK with development of AKI both alone and when added to two existing risk prediction models for post traumatic AKI. Results Of the 857 trauma patients (median age 36 [25–52], 96% blunt trauma, median ISS of 20 [12–47]) included, 102 (12%) developed AKI. Admission myoglobin performed better than CK to predict AKI any stage with an AUC–ROC of 0.74 (95% CI 0.68–0.79) and 0.63 (95% CI 0.57–0.69), respectively (p < 0.001). Admission myoglobin also performed better than CK to predict AKI stage 2 or 3 [AUC–ROC of 0.79 (95% CI 0.74–0.84) and 0.74 (95% CI 0.69–0.79), respectively (p < 0.001)] with a best cutoff value of 1217 µg/L (sensitivity 74%, specificity 77%). Admission myoglobin added predictive value to two established models of AKI prediction and showed significant ability to reclassify subjects regarding AKI status, while admission CK did not. Decision curve analysis also revealed that myoglobin added net benefit to established predictive models. Admission myoglobin was better than CK at predicting development of significant rhabdomyolysis. Conclusions Admission myoglobin better predicts the development of AKI and severe rhabdomyolysis after major trauma. Admission myoglobin should be added in established predictive models of post-traumatic AKI to early identify high-risk patients.


Sign in / Sign up

Export Citation Format

Share Document