Protocol for a systematic review on the methodological and reporting quality of prediction model studies using machine learning techniques

IntroductionStudies addressing the development and/or validation of diagnostic and prognostic prediction models are abundant in most clinical domains. Systematic reviews have shown that the methodological and reporting quality of prediction model studies is suboptimal. Due to the increasing availability of larger, routinely collected and complex medical data, and the rising application of Artificial Intelligence (AI) or machine learning (ML) techniques, the number of prediction model studies is expected to increase even further. Prediction models developed using AI or ML techniques are often labelled as a ‘black box’ and little is known about their methodological and reporting quality. Therefore, this comprehensive systematic review aims to evaluate the reporting quality, the methodological conduct, and the risk of bias of prediction model studies that applied ML techniques for model development and/or validation.Methods and analysisA search will be performed in PubMed to identify studies developing and/or validating prediction models using any ML methodology and across all medical fields. Studies will be included if they were published between January 2018 and December 2019, predict patient-related outcomes, use any study design or data source, and available in English. Screening of search results and data extraction from included articles will be performed by two independent reviewers. The primary outcomes of this systematic review are: (1) the adherence of ML-based prediction model studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD), and (2) the risk of bias in such studies as assessed using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). A narrative synthesis will be conducted for all included studies. Findings will be stratified by study type, medical field and prevalent ML methods, and will inform necessary extensions or updates of TRIPOD and PROBAST to better address prediction model studies that used AI or ML techniques.Ethics and disseminationEthical approval is not required for this study because only available published data will be analysed. Findings will be disseminated through peer-reviewed publications and scientific conferences.Systematic review registrationPROSPERO, CRD42019161764.

Download Full-text

Availability and reporting quality of external validations of machine-learning prediction models with orthopedic surgical outcomes: a systematic review

Acta Orthopaedica ◽

10.1080/17453674.2021.1910448 ◽

2021 ◽

pp. 1-9

Author(s):

Olivier Q Groot ◽

Bas J J Bindels ◽

Paul T Ogink ◽

Neal D Kapoor ◽

Peter K Twining ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Surgical Outcomes ◽

Prediction Models ◽

Reporting Quality

Download Full-text

Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence

BMJ Open ◽

10.1136/bmjopen-2020-048008 ◽

2021 ◽

Vol 11 (7) ◽

pp. e048008

Author(s):

Gary S Collins ◽

Paula Dhiman ◽

Constanza L Andaur Navarro ◽

Ji Ma ◽

Lotty Hooft ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Prediction Model ◽

Assessment Tool ◽

Risk Of Bias ◽

Reporting Guideline ◽

Machine Learning Techniques ◽

Quality Of Reporting ◽

Model Studies ◽

Equator Network

IntroductionThe Transparent Reporting of a multivariable prediction model of Individual Prognosis Or Diagnosis (TRIPOD) statement and the Prediction model Risk Of Bias ASsessment Tool (PROBAST) were both published to improve the reporting and critical appraisal of prediction model studies for diagnosis and prognosis. This paper describes the processes and methods that will be used to develop an extension to the TRIPOD statement (TRIPOD-artificial intelligence, AI) and the PROBAST (PROBAST-AI) tool for prediction model studies that applied machine learning techniques.Methods and analysisTRIPOD-AI and PROBAST-AI will be developed following published guidance from the EQUATOR Network, and will comprise five stages. Stage 1 will comprise two systematic reviews (across all medical fields and specifically in oncology) to examine the quality of reporting in published machine-learning-based prediction model studies. In stage 2, we will consult a diverse group of key stakeholders using a Delphi process to identify items to be considered for inclusion in TRIPOD-AI and PROBAST-AI. Stage 3 will be virtual consensus meetings to consolidate and prioritise key items to be included in TRIPOD-AI and PROBAST-AI. Stage 4 will involve developing the TRIPOD-AI checklist and the PROBAST-AI tool, and writing the accompanying explanation and elaboration papers. In the final stage, stage 5, we will disseminate TRIPOD-AI and PROBAST-AI via journals, conferences, blogs, websites (including TRIPOD, PROBAST and EQUATOR Network) and social media. TRIPOD-AI will provide researchers working on prediction model studies based on machine learning with a reporting guideline that can help them report key details that readers need to evaluate the study quality and interpret its findings, potentially reducing research waste. We anticipate PROBAST-AI will help researchers, clinicians, systematic reviewers and policymakers critically appraise the design, conduct and analysis of machine learning based prediction model studies, with a robust standardised tool for bias evaluation.Ethics and disseminationEthical approval has been granted by the Central University Research Ethics Committee, University of Oxford on 10-December-2020 (R73034/RE001). Findings from this study will be disseminated through peer-review publications.PROSPERO registration numberCRD42019140361 and CRD42019161764.

Download Full-text

Completeness of reporting of clinical prediction models developed using supervised machine learning: a systematic review

BMC Medical Research Methodology ◽

10.1186/s12874-021-01469-6 ◽

2022 ◽

Vol 22 (1) ◽

Author(s):

Constanza L. Andaur Navarro ◽

Johanna A. A. Damen ◽

Toshihiko Takada ◽

Steven W. J. Nijman ◽

Paula Dhiman ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Prediction Model ◽

Prediction Models ◽

Supervised Machine Learning ◽

Model Specification ◽

Essential Information ◽

Model Studies ◽

Completeness Of Reporting ◽

Complete Reporting

Abstract Background While many studies have consistently found incomplete reporting of regression-based prediction model studies, evidence is lacking for machine learning-based prediction model studies. We aim to systematically review the adherence of Machine Learning (ML)-based prediction model studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Statement. Methods We included articles reporting on development or external validation of a multivariable prediction model (either diagnostic or prognostic) developed using supervised ML for individualized predictions across all medical fields. We searched PubMed from 1 January 2018 to 31 December 2019. Data extraction was performed using the 22-item checklist for reporting of prediction model studies (www.TRIPOD-statement.org). We measured the overall adherence per article and per TRIPOD item. Results Our search identified 24,814 articles, of which 152 articles were included: 94 (61.8%) prognostic and 58 (38.2%) diagnostic prediction model studies. Overall, articles adhered to a median of 38.7% (IQR 31.0–46.4%) of TRIPOD items. No article fully adhered to complete reporting of the abstract and very few reported the flow of participants (3.9%, 95% CI 1.8 to 8.3), appropriate title (4.6%, 95% CI 2.2 to 9.2), blinding of predictors (4.6%, 95% CI 2.2 to 9.2), model specification (5.2%, 95% CI 2.4 to 10.8), and model’s predictive performance (5.9%, 95% CI 3.1 to 10.9). There was often complete reporting of source of data (98.0%, 95% CI 94.4 to 99.3) and interpretation of the results (94.7%, 95% CI 90.0 to 97.3). Conclusion Similar to prediction model studies developed using conventional regression-based techniques, the completeness of reporting is poor. Essential information to decide to use the model (i.e. model specification and its performance) is rarely reported. However, some items and sub-items of TRIPOD might be less suitable for ML-based prediction model studies and thus, TRIPOD requires extensions. Overall, there is an urgent need to improve the reporting quality and usability of research to avoid research waste. Systematic review registration PROSPERO, CRD42019161764.

Download Full-text

Multivariable Prediction Models for Health Care Spending Using Machine Learning: a Protocol of a Systematic Review

10.21203/rs.3.rs-1064149/v1 ◽

2021 ◽

Author(s):

Andrew W. Huang ◽

Martin Haslberger ◽

Neto Coulibaly ◽

Omar Galárraga ◽

Arman Oganisian ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Health Care ◽

Prediction Model ◽

Prediction Models ◽

Risk Of Bias ◽

Population Characteristics ◽

Health Care Spending ◽

Successful Implementation ◽

Individual Level

Abstract Background With rising cost pressures on health care systems, machine-learning (ML) based algorithms are increasingly used to predict health care costs. Despite their potential advantages, the successful implementation of these methods could be undermined by biases introduced in the design, conduct, or analysis of studies seeking to develop and/or validate ML models. The utility of such models may also be negatively affected by poor reporting of these studies. In this systematic review, we aim to evaluate the reporting quality, methodological characteristics, and risk of bias of ML-based prediction models for individual-level health care spending. Methods We will systematically search PubMed and Embase to identify studies developing, updating, or validating ML-based models to predict an individual’s health care spending for any medical condition, over any time period, and in any setting. We will exclude prediction models of aggregate-level health care spending, models used to infer causality, models using radiomics or speech parameters, models of non-clinically validated predictors (e.g. genomics), and cost-effectiveness analyses without predicting individual-level health care spending. We will extract data based on the CHARMS checklist, previously published research, and relevant recommendations. We will assess the adherence of ML-based studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) and examine the inclusion of transparency and reproducibility indicators (e.g. statements on data sharing). To assess the risk of bias, we will apply the Prediction model Risk Of Bias Assessment Tool (PROBAST). Findings will be stratified by study design, ML methods used, population characteristics, and medical field. Discussion Our systematic review will appraise the quality, reporting, and risk of bias of ML-based models for individualized health care cost prediction. This review will provide an overview of the available models and give insights into the strengths and limitations of using ML methods for the prediction of health spending. Trial registration: Not applicable.

Download Full-text

Machine Learning-based Prediction Models for Diagnosis and Prognosis in Inflammatory Bowel Diseases: A Systematic Review

Journal of Crohn s and Colitis ◽

10.1093/ecco-jcc/jjab155 ◽

2021 ◽

Author(s):

Nghia H Nguyen ◽

Dominic Picetti ◽

Parambir S Dulai ◽

Vipul Jairath ◽

William J Sandborn ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Risk Prediction ◽

Statistical Models ◽

Prediction Models ◽

Risk Of Bias ◽

Learning Models ◽

Bowel Diseases ◽

Inflammatory Bowel ◽

Machine Learning Models

Abstract Background and Aims There is increasing interest in machine learning-based prediction models in inflammatory bowel diseases (IBD). We synthesized and critically appraised studies comparing machine learning vs. traditional statistical models, using routinely available clinical data for risk prediction in IBD. Methods Through a systematic review till January 1, 2021, we identified cohort studies that derived and/or validated machine learning models, based on routinely collected clinical data in patients with IBD, to predict the risk of harboring or developing adverse clinical outcomes, and reported its predictive performance against a traditional statistical model for the same outcome. We appraised the risk of bias in these studies using the Prediction model Risk of Bias ASsessment (PROBAST) tool. Results We included 13 studies on machine learning-based prediction models in IBD encompassing themes of predicting treatment response to biologics and thiopurines, predicting longitudinal disease activity and complications and outcomes in patients with acute severe ulcerative colitis. The most common machine learnings models used were tree-based algorithms, which are classification approaches achieved through supervised learning. Machine learning models outperformed traditional statistical models in risk prediction. However, most models were at high risk of bias, and only one was externally validated. Conclusions Machine learning-based prediction models based on routinely collected data generally perform better than traditional statistical models in risk prediction in IBD, though frequently have high risk of bias. Future studies examining these approaches are warranted, with special focus on external validation and clinical applicability.

Download Full-text

Development of Heavy Rain Damage Prediction Model Using Machine Learning Based on Big Data

Advances in Meteorology ◽

10.1155/2018/5024930 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11 ◽

Cited By ~ 12

Author(s):

Changhyun Choi ◽

Jeonghwan Kim ◽

Jongsung Kim ◽

Donghyun Kim ◽

Younghye Bae ◽

...

Keyword(s):

Machine Learning ◽

Big Data ◽

Prediction Model ◽

Prediction Models ◽

Meteorological Data ◽

Heavy Rain ◽

Machine Learning Techniques ◽

Damage Prediction ◽

Explanatory Variables ◽

The Republic

Prediction models of heavy rain damage using machine learning based on big data were developed for the Seoul Capital Area in the Republic of Korea. We used data on the occurrence of heavy rain damage from 1994 to 2015 as dependent variables and weather big data as explanatory variables. The model was developed by applying machine learning techniques such as decision trees, bagging, random forests, and boosting. As a result of evaluating the prediction performance of each model, the AUC value of the boosting model using meteorological data from the past 1 to 4 days was the highest at 95.87% and was selected as the final model. By using the prediction model developed in this study to predict the occurrence of heavy rain damage for each administrative region, we can greatly reduce the damage through proactive disaster management.

Download Full-text

Machine Learning-Based Prediction Model for Papillary Thyroid Carcinoma Recurrence

10.21203/rs.3.rs-113105/v1 ◽

2020 ◽

Author(s):

Young Min Park ◽

Byung-Joo Lee

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Tumor Size ◽

Large Scale ◽

Prediction Models ◽

Prognostic Significance ◽

Disease Recurrence ◽

Machine Learning Techniques ◽

Papillary Thyroid ◽

Recurrence Prediction

Abstract Background: This study analyzed the prognostic significance of nodal factors, including the number of metastatic LNs and LNR, in patients with PTC, and attempted to construct a disease recurrence prediction model using machine learning techniques.Methods: We retrospectively analyzed clinico-pathologic data from 1040 patients diagnosed with papillary thyroid cancer between 2003 and 2009. Results: We analyzed clinico-pathologic factors related to recurrence through logistic regression analysis. Among the factors that we included, only sex and tumor size were significantly correlated with disease recurrence. Parameters such as age, sex, tumor size, tumor multiplicity, ETE, ENE, pT, pN, ipsilateral central LN metastasis, contralateral central LNs metastasis, number of metastatic LNs, and LNR were input for construction of a machine learning prediction model. The performance of five machine learning models related to recurrence prediction was compared based on accuracy. The Decision Tree model showed the best accuracy at 95%, and the lightGBM and stacking model together showed 93% accuracy. Conclusions: We confirmed that all machine learning prediction models showed an accuracy of 90% or more for predicting disease recurrence in PTC. Large-scale multicenter clinical studies should be performed to improve the performance of our prediction models and verify their clinical effectiveness.

Download Full-text

Predicting Loan Approval of Bank Direct Marketing Data Using Ensemble Machine Learning Algorithms

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2020.14.117 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Decision Makers ◽

Machine Learning Techniques ◽

Data Set ◽

Ensemble Machine Learning ◽

Marketing Data ◽

Loan Approval

The Bank Marketing data set at Kaggle is mostly used in predicting if bank clients will subscribe a long-term deposit. We believe that this data set could provide more useful information such as predicting whether a bank client could be approved for a loan. This is a critical choice that has to be made by decision makers at the bank. Building a prediction model for such high-stakes decision does not only require high model prediction accuracy, but also needs a reasonable prediction interpretation. In this research, different ensemble machine learning techniques have been deployed such as Bagging and Boosting. Our research results showed that the loan approval prediction model has an accuracy of 83.97%, which is approximately 25% better than most state-of-the-art other loan prediction models found in the literature. As well, the model interpretation efforts done in this research was able to explain a few critical cases that the bank decision makers may encounter; therefore, the high accuracy of the designed models was accompanied with a trust in prediction. We believe that the achieved model accuracy accompanied with the provided interpretation information are vitally needed for decision makers to understand how to maintain balance between security and reliability of their financial lending system, while providing fair credit opportunities to their clients.

Download Full-text

Prediction Models for Severe Manifestations and Mortality due to COVID-19: A Rapid Systematic Review

10.1101/2021.01.28.21250718 ◽

2021 ◽

Author(s):

Jamie L. Miller ◽

Masafumi Tada ◽

Michihiko Goto ◽

Nicholas Mohr ◽

Sangil Lee

Keyword(s):

Systematic Review ◽

Prediction Model ◽

Prediction Models ◽

External Validation ◽

Clinical Signs ◽

Risk Of Bias ◽

Low Risk ◽

Prognostic Models ◽

Organ Support ◽

Public Data

ABSTRACTBackgroundThroughout 2020, the coronavirus disease 2019 (COVID-19) has become a threat to public health on national and global level. There has been an immediate need for research to understand the clinical signs and symptoms of COVID-19 that can help predict deterioration including mechanical ventilation, organ support, and death. Studies thus far have addressed the epidemiology of the disease, common presentations, and susceptibility to acquisition and transmission of the virus; however, an accurate prognostic model for severe manifestations of COVID-19 is still needed because of the limited healthcare resources available.ObjectiveThis systematic review aims to evaluate published reports of prediction models for severe illnesses caused COVID-19.MethodsSearches were developed by the primary author and a medical librarian using an iterative process of gathering and evaluating terms. Comprehensive strategies, including both index and keyword methods, were devised for PubMed and EMBASE. The data of confirmed COVID-19 patients from randomized control studies, cohort studies, and case-control studies published between January 2020 and July 2020 were retrieved. Studies were independently assessed for risk of bias and applicability using the Prediction Model Risk Of Bias Assessment Tool (PROBAST). We collected study type, setting, sample size, type of validation, and outcome including intubation, ventilation, any other type of organ support, or death. The combination of the prediction model, scoring system, performance of predictive models, and geographic locations were summarized.ResultsA primary review found 292 articles relevant based on title and abstract. After further review, 246 were excluded based on the defined inclusion and exclusion criteria. Forty-six articles were included in the qualitative analysis. Inter observer agreement on inclusion was 0.86 (95% confidence interval: 0.79 - 0.93). When the PROBAST tool was applied, 44 of the 46 articles were identified to have high or unclear risk of bias, or high or unclear concern for applicability. Two studied reported prediction models, 4C Mortality Score from hospital data and QCOVID from general public data from UK, and were rated as low risk of bias and low concerns for applicability.ConclusionSeveral prognostic models are reported in the literature, but many of them had concerning risks of biases and applicability. For most of the studies, caution is needed before use, as many of them will require external validation before dissemination. However, two articles were found to have low risk of bias and low applicability can be useful tools.

Download Full-text