scholarly journals Completeness of reporting of clinical prediction models developed using supervised machine learning: A systematic review

Author(s):  
Constanza L Andaur Navarro ◽  
Johanna AA Damen ◽  
Toshihiko Takada ◽  
Steven WJ Nijman ◽  
Paula Dhiman ◽  
...  

ABSTRACT Objective. While many studies have consistently found incomplete reporting of regression-based prediction model studies, evidence is lacking for machine learning-based prediction model studies. Our aim is to systematically review the adherence of Machine Learning (ML)-based prediction model studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Statement. Study design and setting: We included articles reporting on development or external validation of a multivariable prediction model (either diagnostic or prognostic) developed using supervised ML for individualized predictions across all medical fields (PROSPERO, CRD42019161764). We searched PubMed from 1 January 2018 to 31 December 2019. Data extraction was performed using the 22-item checklist for reporting of prediction model studies (www.TRIPOD-statement.org). We measured the overall adherence per article and per TRIPOD item. Results: Our search identified 24 814 articles, of which 152 articles were included: 94 (61.8%) prognostic and 58 (38.2%) diagnostic prediction model studies. Overall, articles adhered to a median of 38.7% (IQR 31.0-46.4) of TRIPOD items. No articles fully adhered to complete reporting of the abstract and very few reported the flow of participants (3.9%, 95% CI 1.8 to 8.3), appropriate title (4.6%, 95% CI 2.2 to 9.2), blinding of predictors (4.6%, 95% CI 2.2 to 9.2), model specification (5.2%, 95% CI 2.4 to 10.8), and model's predictive performance (5.9%, 95% CI 3.1 to 10.9). There was often complete reporting of source of data (98.0%, 95% CI 94.4 to 99.3) and interpretation of the results (94.7%, 95% CI 90.0 to 97.3). Conclusion. Similar to studies using conventional statistical techniques, the completeness of reporting is poor. Essential information to decide to use the model (i.e. model specification and its performance) is rarely reported. However, some items and sub-items of TRIPOD might be less suitable for ML-based prediction model studies and thus, TRIPOD requires extensions. Overall, there is an urgent need to improve the reporting quality and usability of research to avoid research waste.

2022 ◽  
Vol 22 (1) ◽  
Author(s):  
Constanza L. Andaur Navarro ◽  
Johanna A. A. Damen ◽  
Toshihiko Takada ◽  
Steven W. J. Nijman ◽  
Paula Dhiman ◽  
...  

Abstract Background While many studies have consistently found incomplete reporting of regression-based prediction model studies, evidence is lacking for machine learning-based prediction model studies. We aim to systematically review the adherence of Machine Learning (ML)-based prediction model studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Statement. Methods We included articles reporting on development or external validation of a multivariable prediction model (either diagnostic or prognostic) developed using supervised ML for individualized predictions across all medical fields. We searched PubMed from 1 January 2018 to 31 December 2019. Data extraction was performed using the 22-item checklist for reporting of prediction model studies (www.TRIPOD-statement.org). We measured the overall adherence per article and per TRIPOD item. Results Our search identified 24,814 articles, of which 152 articles were included: 94 (61.8%) prognostic and 58 (38.2%) diagnostic prediction model studies. Overall, articles adhered to a median of 38.7% (IQR 31.0–46.4%) of TRIPOD items. No article fully adhered to complete reporting of the abstract and very few reported the flow of participants (3.9%, 95% CI 1.8 to 8.3), appropriate title (4.6%, 95% CI 2.2 to 9.2), blinding of predictors (4.6%, 95% CI 2.2 to 9.2), model specification (5.2%, 95% CI 2.4 to 10.8), and model’s predictive performance (5.9%, 95% CI 3.1 to 10.9). There was often complete reporting of source of data (98.0%, 95% CI 94.4 to 99.3) and interpretation of the results (94.7%, 95% CI 90.0 to 97.3). Conclusion Similar to prediction model studies developed using conventional regression-based techniques, the completeness of reporting is poor. Essential information to decide to use the model (i.e. model specification and its performance) is rarely reported. However, some items and sub-items of TRIPOD might be less suitable for ML-based prediction model studies and thus, TRIPOD requires extensions. Overall, there is an urgent need to improve the reporting quality and usability of research to avoid research waste. Systematic review registration PROSPERO, CRD42019161764.


2022 ◽  
Vol 8 ◽  
Author(s):  
Jinzhang Li ◽  
Ming Gong ◽  
Yashutosh Joshi ◽  
Lizhong Sun ◽  
Lianjun Huang ◽  
...  

BackgroundAcute renal failure (ARF) is the most common major complication following cardiac surgery for acute aortic syndrome (AAS) and worsens the postoperative prognosis. Our aim was to establish a machine learning prediction model for ARF occurrence in AAS patients.MethodsWe included AAS patient data from nine medical centers (n = 1,637) and analyzed the incidence of ARF and the risk factors for postoperative ARF. We used data from six medical centers to compare the performance of four machine learning models and performed internal validation to identify AAS patients who developed postoperative ARF. The area under the curve (AUC) of the receiver operating characteristic (ROC) curve was used to compare the performance of the predictive models. We compared the performance of the optimal machine learning prediction model with that of traditional prediction models. Data from three medical centers were used for external validation.ResultsThe eXtreme Gradient Boosting (XGBoost) algorithm performed best in the internal validation process (AUC = 0.82), which was better than both the logistic regression (LR) prediction model (AUC = 0.77, p < 0.001) and the traditional scoring systems. Upon external validation, the XGBoost prediction model (AUC =0.81) also performed better than both the LR prediction model (AUC = 0.75, p = 0.03) and the traditional scoring systems. We created an online application based on the XGBoost prediction model.ConclusionsWe have developed a machine learning model that has better predictive performance than traditional LR prediction models as well as other existing risk scoring systems for postoperative ARF. This model can be utilized to provide early warnings when high-risk patients are found, enabling clinicians to take prompt measures.


Author(s):  
Sooyoung Yoo ◽  
Jinwook Choi ◽  
Borim Ryu ◽  
Seok Kim

Abstract Background Unplanned hospital readmission after discharge reflects low satisfaction and reliability in care and the possibility of potential medical accidents, and is thus indicative of the quality of patient care and the appropriateness of discharge plans. Objectives The purpose of this study was to develop and validate prediction models for all-cause unplanned hospital readmissions within 30 days of discharge, based on a common data model (CDM), which can be applied to multiple institutions for efficient readmission management. Methods Retrospective patient-level prediction models were developed based on clinical data of two tertiary general university hospitals converted into a CDM developed by Observational Medical Outcomes Partnership. Machine learning classification models based on the LASSO logistic regression model, decision tree, AdaBoost, random forest, and gradient boosting machine (GBM) were developed and tested by manipulating a set of CDM variables. An internal 10-fold cross-validation was performed on the target data of the model. To examine its transportability, the model was externally validated. Verification indicators helped evaluate the model performance based on the values of area under the curve (AUC). Results Based on the time interval for outcome prediction, it was confirmed that the prediction model targeting the variables obtained within 30 days of discharge was the most efficient (AUC of 82.75). The external validation showed that the model is transferable, with the combination of various clinical covariates. Above all, the prediction model based on the GBM showed the highest AUC performance of 84.14 ± 0.015 for the Seoul National University Hospital cohort, yielding in 78.33 in external validation. Conclusions This study showed that readmission prediction models developed using machine-learning techniques and CDM can be a useful tool to compare two hospitals in terms of patient-data features.


BMJ Open ◽  
2020 ◽  
Vol 10 (11) ◽  
pp. e038832
Author(s):  
Constanza L Andaur Navarro ◽  
Johanna A A G Damen ◽  
Toshihiko Takada ◽  
Steven W J Nijman ◽  
Paula Dhiman ◽  
...  

IntroductionStudies addressing the development and/or validation of diagnostic and prognostic prediction models are abundant in most clinical domains. Systematic reviews have shown that the methodological and reporting quality of prediction model studies is suboptimal. Due to the increasing availability of larger, routinely collected and complex medical data, and the rising application of Artificial Intelligence (AI) or machine learning (ML) techniques, the number of prediction model studies is expected to increase even further. Prediction models developed using AI or ML techniques are often labelled as a ‘black box’ and little is known about their methodological and reporting quality. Therefore, this comprehensive systematic review aims to evaluate the reporting quality, the methodological conduct, and the risk of bias of prediction model studies that applied ML techniques for model development and/or validation.Methods and analysisA search will be performed in PubMed to identify studies developing and/or validating prediction models using any ML methodology and across all medical fields. Studies will be included if they were published between January 2018 and December 2019, predict patient-related outcomes, use any study design or data source, and available in English. Screening of search results and data extraction from included articles will be performed by two independent reviewers. The primary outcomes of this systematic review are: (1) the adherence of ML-based prediction model studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD), and (2) the risk of bias in such studies as assessed using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). A narrative synthesis will be conducted for all included studies. Findings will be stratified by study type, medical field and prevalent ML methods, and will inform necessary extensions or updates of TRIPOD and PROBAST to better address prediction model studies that used AI or ML techniques.Ethics and disseminationEthical approval is not required for this study because only available published data will be analysed. Findings will be disseminated through peer-reviewed publications and scientific conferences.Systematic review registrationPROSPERO, CRD42019161764.


2021 ◽  
Author(s):  
Jaeyoung Yang ◽  
Hong-Gook Lim ◽  
Wonhyeong Park ◽  
Dongseok Kim ◽  
Jin Sun Yoon ◽  
...  

Abstract BackgroundPrediction of mortality in intensive care units is very important. Thus, various mortality prediction models have been developed for this purpose. However, they do not accurately reflect the changing condition of the patient in real time. The aim of this study was to develop and evaluate a machine learning model that predicts short-term mortality in the intensive care unit using four easy-to-collect vital signs.MethodsTwo independent retrospective observational cohorts were included in this study. The primary training cohort included the data of 1968 patients admitted to the intensive care unit at the Veterans Health Service Medical Center, Seoul, South Korea, from January 2018 to March 2019. The external validation cohort comprised the records of 409 patients admitted to the medical intensive care unit at Seoul National University Hospital, Seoul, South Korea, from January 2019 to December 2019. Datasets of four vital signs (heart rate, systolic blood pressure, diastolic blood pressure, and peripheral capillary oxygen saturation [SpO2]) measured every hour for 10 h were used for the development of the machine learning model. The performances of mortality prediction models generated using five machine learning algorithms, Random Forest (RF), XGboost, perceptron, convolutional neural network, and Long Short-Term Memory, were calculated and compared using area under the receiver operating characteristic curve (AUROC) values and an external validation dataset.ResultsThe machine learning model generated using the RF algorithm showed the best performance. Its AUROC was 0.922, which is much better than the 0.8408 of the Acute Physiology and Chronic Health Evaluation II. Thus, to investigate the importance of variables that influence the performance of the machine learning model, machine learning models were generated for each observation time or vital sign using the RF algorithm. The machine learning model developed using SpO2 showed the best performance (AUROC, 0.89). ConclusionsThe mortality prediction model developed in this study using data from only four types of commonly recorded vital signs is simpler than any existing mortality prediction model. This simple yet powerful new mortality prediction model could be useful for early detection of probable mortality and appropriate medical intervention, especially in rapidly deteriorating patients.


Circulation ◽  
2020 ◽  
Vol 141 (Suppl_1) ◽  
Author(s):  
Sridharan Raghavan ◽  
Wenhui Liu ◽  
Anna Baron ◽  
David Saxon ◽  
Meg Plomondon ◽  
...  

Accurate assessment of hypoglycemia risk is critical for treatment selection in individuals with diabetes and cardiovascular disease (CVD) - patients for whom hypoglycemia is particularly harmful. We developed and validated a hypoglycemia prediction model in diabetes patients with and without CVD using data routinely available in electronic health records (EHR) and compared performance to a published prediction model. We studied 128,893 US Veterans with diabetes and angiographic assessment of CVD from 2005 to 2018. We used a random 2/3 of the sample for model development and the remaining 1/3 for validation. The primary outcome was severe hypoglycemia based on a previously validated algorithm that uses diagnosis codes and glucose measurements. We evaluated 33 potential predictors, including demographics, diabetes-related variables, comorbidities, and CVD risk factors. We sequentially used two machine learning algorithms for model development. First, we used multivariable adaptive regression splines, which can accommodate interactions and non-linearities for continuous variables, to select predictors. Second, we used adaptive elastic net, which can accommodate time-to-event outcomes, to fit a model with the selected variables. We tested model discrimination using the area under the ROC curve (AUC) and calibration by plotting predicted versus observed event rates in the independent validation cohort. The best-fitting prediction model included 18 predictors; a history of hypoglycemia was the strongest predictor (Table). In external validation, AUC was 0.729 for 2-year events, and the slope of the calibration curve was 1.05, exceeding performance of the published model in this patient population for both discrimination and calibration (Table). Conclusions: Applying supervised machine learning to EHR data may provide an efficient approach to tailoring prediction of preventable clinical outcomes, e.g., hypoglycemia, for high risk patients receiving care in an integrated healthcare system.


2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Changhyun Choi ◽  
Jeonghwan Kim ◽  
Jongsung Kim ◽  
Donghyun Kim ◽  
Younghye Bae ◽  
...  

Prediction models of heavy rain damage using machine learning based on big data were developed for the Seoul Capital Area in the Republic of Korea. We used data on the occurrence of heavy rain damage from 1994 to 2015 as dependent variables and weather big data as explanatory variables. The model was developed by applying machine learning techniques such as decision trees, bagging, random forests, and boosting. As a result of evaluating the prediction performance of each model, the AUC value of the boosting model using meteorological data from the past 1 to 4 days was the highest at 95.87% and was selected as the final model. By using the prediction model developed in this study to predict the occurrence of heavy rain damage for each administrative region, we can greatly reduce the damage through proactive disaster management.


2020 ◽  
Author(s):  
Young Min Park ◽  
Byung-Joo Lee

Abstract Background: This study analyzed the prognostic significance of nodal factors, including the number of metastatic LNs and LNR, in patients with PTC, and attempted to construct a disease recurrence prediction model using machine learning techniques.Methods: We retrospectively analyzed clinico-pathologic data from 1040 patients diagnosed with papillary thyroid cancer between 2003 and 2009. Results: We analyzed clinico-pathologic factors related to recurrence through logistic regression analysis. Among the factors that we included, only sex and tumor size were significantly correlated with disease recurrence. Parameters such as age, sex, tumor size, tumor multiplicity, ETE, ENE, pT, pN, ipsilateral central LN metastasis, contralateral central LNs metastasis, number of metastatic LNs, and LNR were input for construction of a machine learning prediction model. The performance of five machine learning models related to recurrence prediction was compared based on accuracy. The Decision Tree model showed the best accuracy at 95%, and the lightGBM and stacking model together showed 93% accuracy. Conclusions: We confirmed that all machine learning prediction models showed an accuracy of 90% or more for predicting disease recurrence in PTC. Large-scale multicenter clinical studies should be performed to improve the performance of our prediction models and verify their clinical effectiveness.


2021 ◽  
Vol 297 ◽  
pp. 01073
Author(s):  
Sabyasachi Pramanik ◽  
K. Martin Sagayam ◽  
Om Prakash Jena

Cancer has been described as a diverse illness with several distinct subtypes that may occur simultaneously. As a result, early detection and forecast of cancer types have graced essentially in cancer fact-finding methods since they may help to improve the clinical treatment of cancer survivors. The significance of categorizing cancer suffers into higher or lower-threat categories has prompted numerous fact-finding associates from the bioscience and genomics field to investigate the utilization of machine learning (ML) algorithms in cancer diagnosis and treatment. Because of this, these methods have been used with the goal of simulating the development and treatment of malignant diseases in humans. Furthermore, the capacity of machine learning techniques to identify important characteristics from complicated datasets demonstrates the significance of these technologies. These technologies include Bayesian networks and artificial neural networks, along with a number of other approaches. Decision Trees and Support Vector Machines which have already been extensively used in cancer research for the creation of predictive models, also lead to accurate decision making. The application of machine learning techniques may undoubtedly enhance our knowledge of cancer development; nevertheless, a sufficient degree of validation is required before these approaches can be considered for use in daily clinical practice. An overview of current machine learning approaches utilized in the simulation of cancer development is presented in this paper. All of the supervised machine learning approaches described here, along with a variety of input characteristics and data samples, are used to build the prediction models. In light of the increasing trend towards the use of machine learning methods in biomedical research, we offer the most current papers that have used these approaches to predict risk of cancer or patient outcomes in order to better understand cancer.


Sign in / Sign up

Export Citation Format

Share Document