scholarly journals Evaluating Modeling and Validation Strategies for Tooth Loss

2019 ◽  
Vol 98 (10) ◽  
pp. 1088-1095 ◽  
Author(s):  
J. Krois ◽  
C. Graetz ◽  
B. Holtfreter ◽  
P. Brinkmann ◽  
T. Kocher ◽  
...  

Prediction models learn patterns from available data (training) and are then validated on new data (testing). Prediction modeling is increasingly common in dental research. We aimed to evaluate how different model development and validation steps affect the predictive performance of tooth loss prediction models of patients with periodontitis. Two independent cohorts (627 patients, 11,651 teeth) were followed over a mean ± SD 18.2 ± 5.6 y (Kiel cohort) and 6.6 ± 2.9 y (Greifswald cohort). Tooth loss and 10 patient- and tooth-level predictors were recorded. The impact of different model development and validation steps was evaluated: 1) model complexity (logistic regression, recursive partitioning, random forest, extreme gradient boosting), 2) sample size (full data set or 10%, 25%, or 75% of cases dropped at random), 3) prediction periods (maximum 10, 15, or 20 y or uncensored), and 4) validation schemes (internal or external by centers/time). Tooth loss was generally a rare event (880 teeth were lost). All models showed limited sensitivity but high specificity. Patients’ age and tooth loss at baseline as well as probing pocket depths showed high variable importance. More complex models (random forest, extreme gradient boosting) had no consistent advantages over simpler ones (logistic regression, recursive partitioning). Internal validation (in sample) overestimated the predictive power (area under the curve up to 0.90), while external validation (out of sample) found lower areas under the curve (range 0.62 to 0.82). Reducing the sample size decreased the predictive power, particularly for more complex models. Censoring the prediction period had only limited impact. When the model was trained in one period and tested in another, model outcomes were similar to the base case, indicating temporal validation as a valid option. No model showed higher accuracy than the no-information rate. In conclusion, none of the developed models would be useful in a clinical setting, despite high accuracy. During modeling, rigorous development and external validation should be applied and reported accordingly.

2021 ◽  
Vol 8 ◽  
Author(s):  
Ming-Hui Hung ◽  
Ling-Chieh Shih ◽  
Yu-Ching Wang ◽  
Hsin-Bang Leu ◽  
Po-Hsun Huang ◽  
...  

Objective: This study aimed to develop machine learning-based prediction models to predict masked hypertension and masked uncontrolled hypertension using the clinical characteristics of patients at a single outpatient visit.Methods: Data were derived from two cohorts in Taiwan. The first cohort included 970 hypertensive patients recruited from six medical centers between 2004 and 2005, which were split into a training set (n = 679), a validation set (n = 146), and a test set (n = 145) for model development and internal validation. The second cohort included 416 hypertensive patients recruited from a single medical center between 2012 and 2020, which was used for external validation. We used 33 clinical characteristics as candidate variables to develop models based on logistic regression (LR), random forest (RF), eXtreme Gradient Boosting (XGboost), and artificial neural network (ANN).Results: The four models featured high sensitivity and high negative predictive value (NPV) in internal validation (sensitivity = 0.914–1.000; NPV = 0.853–1.000) and external validation (sensitivity = 0.950–1.000; NPV = 0.875–1.000). The RF, XGboost, and ANN models showed much higher area under the receiver operating characteristic curve (AUC) (0.799–0.851 in internal validation, 0.672–0.837 in external validation) than the LR model. Among the models, the RF model, composed of 6 predictor variables, had the best overall performance in both internal and external validation (AUC = 0.851 and 0.837; sensitivity = 1.000 and 1.000; specificity = 0.609 and 0.580; NPV = 1.000 and 1.000; accuracy = 0.766 and 0.721, respectively).Conclusion: An effective machine learning-based predictive model that requires data from a single clinic visit may help to identify masked hypertension and masked uncontrolled hypertension.


2021 ◽  
Author(s):  
Steven J. Staffa ◽  
David Zurakowski

Summary Clinical prediction models in anesthesia and surgery research have many clinical applications including preoperative risk stratification with implications for clinical utility in decision-making, resource utilization, and costs. It is imperative that predictive algorithms and multivariable models are validated in a suitable and comprehensive way in order to establish the robustness of the model in terms of accuracy, predictive ability, reliability, and generalizability. The purpose of this article is to educate anesthesia researchers at an introductory level on important statistical concepts involved with development and validation of multivariable prediction models for a binary outcome. Methods covered include assessments of discrimination and calibration through internal and external validation. An anesthesia research publication is examined to illustrate the process and presentation of multivariable prediction model development and validation for a binary outcome. Properly assessing the statistical and clinical validity of a multivariable prediction model is essential for reassuring the generalizability and reproducibility of the published tool.


Author(s):  
Isabelle Kaiser ◽  
Annette B. Pfahlberg ◽  
Wolfgang Uter ◽  
Markus V. Heppt ◽  
Marit B. Veierød ◽  
...  

The rising incidence of cutaneous melanoma over the past few decades has prompted substantial efforts to develop risk prediction models identifying people at high risk of developing melanoma to facilitate targeted screening programs. We review these models, regarding study characteristics, differences in risk factor selection and assessment, evaluation, and validation methods. Our systematic literature search revealed 40 studies comprising 46 different risk prediction models eligible for the review. Altogether, 35 different risk factors were part of the models with nevi being the most common one (n = 35, 78%); little consistency in other risk factors was observed. Results of an internal validation were reported for less than half of the studies (n = 18, 45%), and only 6 performed external validation. In terms of model performance, 29 studies assessed the discriminative ability of their models; other performance measures, e.g., regarding calibration or clinical usefulness, were rarely reported. Due to the substantial heterogeneity in risk factor selection and assessment as well as methodologic aspects of model development, direct comparisons between models are hardly possible. Uniform methodologic standards for the development and validation of risk prediction models for melanoma and reporting standards for the accompanying publications are necessary and need to be obligatory for that reason.


2021 ◽  
Author(s):  
Cynthia Yang ◽  
Jan A. Kors ◽  
Solomon Ioannou ◽  
Luis H. John ◽  
Aniek F. Markus ◽  
...  

Objectives This systematic review aims to provide further insights into the conduct and reporting of clinical prediction model development and validation over time. We focus on assessing the reporting of information necessary to enable external validation by other investigators. Materials and Methods We searched Embase, Medline, Web-of-Science, Cochrane Library and Google Scholar to identify studies that developed one or more multivariable prognostic prediction models using electronic health record (EHR) data published in the period 2009-2019. Results We identified 422 studies that developed a total of 579 clinical prediction models using EHR data. We observed a steep increase over the years in the number of developed models. The percentage of models externally validated in the same paper remained at around 10%. Throughout 2009-2019, for both the target population and the outcome definitions, code lists were provided for less than 20% of the models. For about half of the models that were developed using regression analysis, the final model was not completely presented. Discussion Overall, we observed limited improvement over time in the conduct and reporting of clinical prediction model development and validation. In particular, the prediction problem definition was often not clearly reported, and the final model was often not completely presented. Conclusion Improvement in the reporting of information necessary to enable external validation by other investigators is still urgently needed to increase clinical adoption of developed models.


2021 ◽  
Vol 11 (11) ◽  
pp. 1055
Author(s):  
Pei-Chen Lin ◽  
Kuo-Tai Chen ◽  
Huan-Chieh Chen ◽  
Md. Mohaimenul Islam ◽  
Ming-Chin Lin

Accurate stratification of sepsis can effectively guide the triage of patient care and shared decision making in the emergency department (ED). However, previous research on sepsis identification models focused mainly on ICU patients, and discrepancies in model performance between the development and external validation datasets are rarely evaluated. The aim of our study was to develop and externally validate a machine learning model to stratify sepsis patients in the ED. We retrospectively collected clinical data from two geographically separate institutes that provided a different level of care at different time periods. The Sepsis-3 criteria were used as the reference standard in both datasets for identifying true sepsis cases. An eXtreme Gradient Boosting (XGBoost) algorithm was developed to stratify sepsis patients and the performance of the model was compared with traditional clinical sepsis tools; quick Sequential Organ Failure Assessment (qSOFA) and Systemic Inflammatory Response Syndrome (SIRS). There were 8296 patients (1752 (21%) being septic) in the development and 1744 patients (506 (29%) being septic) in the external validation datasets. The mortality of septic patients in the development and validation datasets was 13.5% and 17%, respectively. In the internal validation, XGBoost achieved an area under the receiver operating characteristic curve (AUROC) of 0.86, exceeding SIRS (0.68) and qSOFA (0.56). The performance of XGBoost deteriorated in the external validation (the AUROC of XGBoost, SIRS and qSOFA was 0.75, 0.57 and 0.66, respectively). Heterogeneity in patient characteristics, such as sepsis prevalence, severity, age, comorbidity and infection focus, could reduce model performance. Our model showed good discriminative capabilities for the identification of sepsis patients and outperformed the existing sepsis identification tools. Implementation of the ML model in the ED can facilitate timely sepsis identification and treatment. However, dataset discrepancies should be carefully evaluated before implementing the ML approach in clinical practice. This finding reinforces the necessity for future studies to perform external validation to ensure the generalisability of any developed ML approaches.


BMJ Open ◽  
2017 ◽  
Vol 7 (8) ◽  
pp. e014607 ◽  
Author(s):  
Marion Fahey ◽  
Anthony Rudd ◽  
Yannick Béjot ◽  
Charles Wolfe ◽  
Abdel Douiri

IntroductionStroke is a leading cause of adult disability and death worldwide. The neurological impairments associated with stroke prevent patients from performing basic daily activities and have enormous impact on families and caregivers. Practical and accurate tools to assist in predicting outcome after stroke at patient level can provide significant aid for patient management. Furthermore, prediction models of this kind can be useful for clinical research, health economics, policymaking and clinical decision support.Methods2869 patients with first-ever stroke from South London Stroke Register (SLSR) (1995–2004) will be included in the development cohort. We will use information captured after baseline to construct multilevel models and a Cox proportional hazard model to predict cognitive impairment, functional outcome and mortality up to 5 years after stroke. Repeated random subsampling validation (Monte Carlo cross-validation) will be evaluated in model development. Data from participants recruited to the stroke register (2005–2014) will be used for temporal validation of the models. Data from participants recruited to the Dijon Stroke Register (1985–2015) will be used for external validation. Discrimination, calibration and clinical utility of the models will be presented.EthicsPatients, or for patients who cannot consent their relatives, gave written informed consent to participate in stroke-related studies within the SLSR. The SLSR design was approved by the ethics committees of Guy’s and St Thomas’ NHS Foundation Trust, Kings College Hospital, Queens Square and Westminster Hospitals (London). The Dijon Stroke Registry was approved by the Comité National des Registres and the InVS and has authorisation of the Commission Nationale de l’Informatique et des Libertés.


2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
A Youssef

Abstract Study question Which models that predict pregnancy outcome in couples with unexplained RPL exist and what is the performance of the most used model? Summary answer We identified seven prediction models; none followed the recommended prediction model development steps. Moreover, the most used model showed poor predictive performance. What is known already RPL remains unexplained in 50–75% of couples For these couples, there is no effective treatment option and clinical management rests on supportive care. Essential part of supportive care consists of counselling on the prognosis of subsequent pregnancies. Indeed, multiple prediction models exist, however the quality and validity of these models varies. In addition, the prediction model developed by Brigham et al is the most widely used model, but has never been externally validated. Study design, size, duration We performed a systematic review to identify prediction models for pregnancy outcome after unexplained RPL. In addition we performed an external validation of the Brigham model in a retrospective cohort, consisting of 668 couples with unexplained RPL that visited our RPL clinic between 2004 and 2019. Participants/materials, setting, methods A systematic search was performed in December 2020 in Pubmed, Embase, Web of Science and Cochrane library to identify relevant studies. Eligible studies were selected and assessed according to the TRIPOD) guidelines, covering topics on model performance and validation statement. The performance of predicting live birth in the Brigham model was evaluated through calibration and discrimination, in which the observed pregnancy rates were compared to the predicted pregnancy rates. Main results and the role of chance Seven models were compared and assessed according to the TRIPOD statement. This resulted in two studies of low, three of moderate and two of above average reporting quality. These studies did not follow the recommended steps for model development and did not calculate a sample size. Furthermore, the predictive performance of neither of these models was internally- or externally validated. We performed an external validation of Brigham model. Calibration showed overestimation of the model and too extreme predictions, with a negative calibration intercept of –0.52 (CI 95% –0.68 – –0.36), with a calibration slope of 0.39 (CI 95% 0.07 – 0.71). The discriminative ability of the model was very low with a concordance statistic of 0.55 (CI 95% 0.50 – 0.59). Limitations, reasons for caution None of the studies are specifically named prediction models, therefore models may have been missed in the selection process. The external validation cohort used a retrospective design, in which only the first pregnancy after intake was registered. Follow-up time was not limited, which is important in counselling unexplained RPL couples. Wider implications of the findings: Currently, there are no suitable models that predict on pregnancy outcome after RPL. Moreover, we are in need of a model with several variables such that prognosis is individualized, and factors from both the female as the male to enable a couple specific prognosis. Trial registration number Not applicable


Atmosphere ◽  
2019 ◽  
Vol 10 (6) ◽  
pp. 341 ◽  
Author(s):  
Qingwen Jin ◽  
Xiangtao Fan ◽  
Jian Liu ◽  
Zhuxin Xue ◽  
Hongdeng Jian

Coastal cities in China are frequently hit by tropical cyclones (TCs), which result in tremendous loss of life and property. Even though the capability of numerical weather prediction models to forecast and track TCs has considerably improved in recent years, forecasting the intensity of a TC is still very difficult; thus, it is necessary to improve the accuracy of TC intensity prediction. To this end, we established a series of predictors using the Best Track TC dataset to predict the intensity of TCs in the Western North Pacific with an eXtreme Gradient BOOSTing (XGBOOST) model. The climatology and persistence factors, environmental factors, brainstorm features, intensity categories, and TC months are considered inputs for the models while the output is the TC intensity. The performance of the XGBOOST model was tested for very strong TCs such as Hato (2017), Rammasum (2014), Mujiage (2015), and Hagupit (2014). The results obtained show that the combination of inputs chosen were the optimal predictors for TC intensification with lead times of 6, 12, 18, and 24 h. Furthermore, the mean absolute error (MAE) of the XGBOOST model was much smaller than the MAEs of a back propagation neural network (BPNN) used to predict TC intensity. The MAEs of the forecasts with 6, 12, 18, and 24 h lead times for the test samples used were 1.61, 2.44, 3.10, and 3.70 m/s, respectively, for the XGBOOST model. The results indicate that the XGBOOST model developed in this study can be used to improve TC intensity forecast accuracy and can be considered a better alternative to conventional operational forecast models for TC intensity prediction.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Dougho Park ◽  
Byung Hee Kim ◽  
Sang-Eok Lee ◽  
Dong Young Kim ◽  
Mansu Kim ◽  
...  

AbstractIdentifying the severity of carpal tunnel syndrome (CTS) is essential to providing appropriate therapeutic interventions. We developed and validated machine-learning (ML) models for classifying CTS severity. Here, 1037 CTS hands with 11 variables each were retrospectively analyzed. CTS was confirmed using electrodiagnosis, and its severity was classified into three grades: mild, moderate, and severe. The dataset was randomly split into a training (70%) and test (30%) set. A total of 507 mild, 276 moderate, and 254 severe CTS hands were included. Extreme gradient boosting (XGB) showed the highest external validation accuracy in the multi-class classification at 76.6% (95% confidence interval [CI] 71.2–81.5). XGB also had an optimal model training accuracy of 76.1%. Random forest (RF) and k-nearest neighbors had the second-highest external validation accuracy of 75.6% (95% CI 70.0–80.5). For the RF and XGB models, the numeric rating scale of pain was the most important variable, and body mass index was the second most important. The one-versus-rest classification yielded improved external validation accuracies for each severity grade compared with the multi-class classification (mild, 83.6%; moderate, 78.8%; severe, 90.9%). The CTS severity classification based on the ML model was validated and is readily applicable to aiding clinical evaluations.


2020 ◽  
Vol 71 (16) ◽  
pp. 2079-2088 ◽  
Author(s):  
Kun Wang ◽  
Peiyuan Zuo ◽  
Yuwei Liu ◽  
Meng Zhang ◽  
Xiaofang Zhao ◽  
...  

Abstract Background This study aimed to develop mortality-prediction models for patients with coronavirus disease-2019 (COVID-19). Methods The training cohort included consecutive COVID-19 patients at the First People’s Hospital of Jiangxia District in Wuhan, China, from 7 January 2020 to 11 February 2020. We selected baseline data through the stepwise Akaike information criterion and ensemble XGBoost (extreme gradient boosting) model to build mortality-prediction models. We then validated these models by randomly collected COVID-19 patients in Union Hospital, Wuhan, from 1 January 2020 to 20 February 2020. Results A total of 296 COVID-19 patients were enrolled in the training cohort; 19 died during hospitalization and 277 discharged from the hospital. The clinical model developed using age, history of hypertension, and coronary heart disease showed area under the curve (AUC), 0.88 (95% confidence interval [CI], .80–.95); threshold, −2.6551; sensitivity, 92.31%; specificity, 77.44%; and negative predictive value (NPV), 99.34%. The laboratory model developed using age, high-sensitivity C-reactive protein, peripheral capillary oxygen saturation, neutrophil and lymphocyte count, d-dimer, aspartate aminotransferase, and glomerular filtration rate had a significantly stronger discriminatory power than the clinical model (P = .0157), with AUC, 0.98 (95% CI, .92–.99); threshold, −2.998; sensitivity, 100.00%; specificity, 92.82%; and NPV, 100.00%. In the subsequent validation cohort (N = 44), the AUC (95% CI) was 0.83 (.68–.93) and 0.88 (.75–.96) for the clinical model and laboratory model, respectively. Conclusions We developed 2 predictive models for the in-hospital mortality of patients with COVID-19 in Wuhan that were validated in patients from another center.


Sign in / Sign up

Export Citation Format

Share Document