Development of Prediction Models for Unplanned Hospital Readmission within 30 Days Based on Common Data Model: A Feasibility Study

Abstract Background Unplanned hospital readmission after discharge reflects low satisfaction and reliability in care and the possibility of potential medical accidents, and is thus indicative of the quality of patient care and the appropriateness of discharge plans. Objectives The purpose of this study was to develop and validate prediction models for all-cause unplanned hospital readmissions within 30 days of discharge, based on a common data model (CDM), which can be applied to multiple institutions for efficient readmission management. Methods Retrospective patient-level prediction models were developed based on clinical data of two tertiary general university hospitals converted into a CDM developed by Observational Medical Outcomes Partnership. Machine learning classification models based on the LASSO logistic regression model, decision tree, AdaBoost, random forest, and gradient boosting machine (GBM) were developed and tested by manipulating a set of CDM variables. An internal 10-fold cross-validation was performed on the target data of the model. To examine its transportability, the model was externally validated. Verification indicators helped evaluate the model performance based on the values of area under the curve (AUC). Results Based on the time interval for outcome prediction, it was confirmed that the prediction model targeting the variables obtained within 30 days of discharge was the most efficient (AUC of 82.75). The external validation showed that the model is transferable, with the combination of various clinical covariates. Above all, the prediction model based on the GBM showed the highest AUC performance of 84.14 ± 0.015 for the Seoul National University Hospital cohort, yielding in 78.33 in external validation. Conclusions This study showed that readmission prediction models developed using machine-learning techniques and CDM can be a useful tool to compare two hospitals in terms of patient-data features.

Download Full-text

Thirty-day Hospital Readmission Prediction Model Based on Common Data Model with Weather and Air Quality Data

10.21203/rs.3.rs-598503/v1 ◽

2021 ◽

Author(s):

Borim Ryu ◽

Sooyoung Yoo ◽

Seok Kim ◽

Jinwook Choi

Keyword(s):

Machine Learning ◽

Air Quality ◽

Prediction Model ◽

Data Model ◽

Hospital Readmissions ◽

External Validation ◽

Quality Data ◽

Common Data Model ◽

Quality Factors ◽

Model Based

Abstract Many epidemiological studies have established an association between environmental exposure and clinical outcome for hospital admissions. However, few studies have explored the impact of environmental factors, such as ambient air pollution and meteorological factors, on hospital readmissions using predictive analysis. In this study, we aimed to develop a model to predict unplanned hospital readmissions within 30 days of discharge based on the common data model considering weather and air quality factors. Moreover, we validated the proposed model externally. We developed and compared the following machine learning methods: decision tree, random forest, AdaBoost, and gradient boosting machine–based models. We performed 10-fold cross-validation for internal validation, and external validation was performed by applying the model to unseen data. The performance of the prediction model was evaluated using the area under the receiver operating characteristic curve. PM10, rainfall, and maximum temperature were the weather and air quality variables that most impacted the model. Among the four machine learning models, the AdaBoost-based model demonstrated the best performance and was the most accurate in predicting the readmission of patients with musculoskeletal diseases. External validation demonstrated that the model based on weather and air quality factors is transportable.

Download Full-text

Thirty-day hospital readmission prediction model based on common data model with weather and air quality data

Scientific Reports ◽

10.1038/s41598-021-02395-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Borim Ryu ◽

Sooyoung Yoo ◽

Seok Kim ◽

Jinwook Choi

Keyword(s):

Air Quality ◽

Data Model ◽

Hospital Readmission ◽

External Validation ◽

Quality Data ◽

Maximum Temperature ◽

Gradient Boosting ◽

Common Data Model ◽

Quality Factors ◽

Model Based

AbstractAlthough several studies have attempted to develop a model for predicting 30-day re-hospitalization, few attempts have been made for sufficient verification and multi-center expansion for clinical use. In this study, we developed a model that predicts unplanned hospital readmission within 30 days of discharge; the model is based on a common data model and considers weather and air quality factors, and can be easily extended to multiple hospitals. We developed and compared four tree-based machine learning methods: decision tree, random forest, AdaBoost, and gradient boosting machine (GBM). Above all, GBM showed the highest AUC performance of 75.1 in the clinical model, while the clinical and W-score model showed the best performance of 73.9 for musculoskeletal diseases. Further, PM10, rainfall, and maximum temperature were the weather and air quality variables that most impacted the model. In addition, external validation has confirmed that the model based on weather and air quality factors has transportability to adapt to other hospital systems.

Download Full-text

Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy032 ◽

2018 ◽

Vol 25 (8) ◽

pp. 969-975 ◽

Cited By ~ 34

Author(s):

Jenna M Reps ◽

Martijn J Schuemie ◽

Marc A Suchard ◽

Patrick B Ryan ◽

Peter R Rijnbeek

Keyword(s):

Open Source ◽

Observational Data ◽

Open Source Software ◽

Data Model ◽

Prediction Models ◽

External Validation ◽

Model Performance ◽

Common Data Model ◽

Proof Of Concept ◽

Model Framework

Abstract Objective To develop a conceptual prediction model framework containing standardized steps and describe the corresponding open-source software developed to consistently implement the framework across computational environments and observational healthcare databases to enable model sharing and reproducibility. Methods Based on existing best practices we propose a 5 step standardized framework for: (1) transparently defining the problem; (2) selecting suitable datasets; (3) constructing variables from the observational data; (4) learning the predictive model; and (5) validating the model performance. We implemented this framework as open-source software utilizing the Observational Medical Outcomes Partnership Common Data Model to enable convenient sharing of models and reproduction of model evaluation across multiple observational datasets. The software implementation contains default covariates and classifiers but the framework enables customization and extension. Results As a proof-of-concept, demonstrating the transparency and ease of model dissemination using the software, we developed prediction models for 21 different outcomes within a target population of people suffering from depression across 4 observational databases. All 84 models are available in an accessible online repository to be implemented by anyone with access to an observational database in the Common Data Model format. Conclusions The proof-of-concept study illustrates the framework’s ability to develop reproducible models that can be readily shared and offers the potential to perform extensive external validation of models, and improve their likelihood of clinical uptake. In future work the framework will be applied to perform an “all-by-all” prediction analysis to assess the observational data prediction domain across numerous target populations, outcomes and time, and risk settings.

Download Full-text

Machine Learning Prediction Model for Acute Renal Failure After Acute Aortic Syndrome Surgery

Frontiers in Medicine ◽

10.3389/fmed.2021.728521 ◽

2022 ◽

Vol 8 ◽

Author(s):

Jinzhang Li ◽

Ming Gong ◽

Yashutosh Joshi ◽

Lizhong Sun ◽

Lianjun Huang ◽

...

Keyword(s):

Machine Learning ◽

Renal Failure ◽

Prediction Model ◽

Prediction Models ◽

External Validation ◽

Scoring Systems ◽

Acute Aortic Syndrome ◽

Internal Validation ◽

Medical Centers ◽

Better Than

BackgroundAcute renal failure (ARF) is the most common major complication following cardiac surgery for acute aortic syndrome (AAS) and worsens the postoperative prognosis. Our aim was to establish a machine learning prediction model for ARF occurrence in AAS patients.MethodsWe included AAS patient data from nine medical centers (n = 1,637) and analyzed the incidence of ARF and the risk factors for postoperative ARF. We used data from six medical centers to compare the performance of four machine learning models and performed internal validation to identify AAS patients who developed postoperative ARF. The area under the curve (AUC) of the receiver operating characteristic (ROC) curve was used to compare the performance of the predictive models. We compared the performance of the optimal machine learning prediction model with that of traditional prediction models. Data from three medical centers were used for external validation.ResultsThe eXtreme Gradient Boosting (XGBoost) algorithm performed best in the internal validation process (AUC = 0.82), which was better than both the logistic regression (LR) prediction model (AUC = 0.77, p < 0.001) and the traditional scoring systems. Upon external validation, the XGBoost prediction model (AUC =0.81) also performed better than both the LR prediction model (AUC = 0.75, p = 0.03) and the traditional scoring systems. We created an online application based on the XGBoost prediction model.ConclusionsWe have developed a machine learning model that has better predictive performance than traditional LR prediction models as well as other existing risk scoring systems for postoperative ARF. This model can be utilized to provide early warnings when high-risk patients are found, enabling clinicians to take prompt measures.

Download Full-text

Prediction of Masked Hypertension and Masked Uncontrolled Hypertension Using Machine Learning

Frontiers in Cardiovascular Medicine ◽

10.3389/fcvm.2021.778306 ◽

2021 ◽

Vol 8 ◽

Author(s):

Ming-Hui Hung ◽

Ling-Chieh Shih ◽

Yu-Ching Wang ◽

Hsin-Bang Leu ◽

Po-Hsun Huang ◽

...

Keyword(s):

Machine Learning ◽

Clinical Characteristics ◽

Prediction Models ◽

External Validation ◽

Uncontrolled Hypertension ◽

Gradient Boosting ◽

Masked Hypertension ◽

Internal Validation ◽

Hypertensive Patients ◽

Extreme Gradient Boosting

Objective: This study aimed to develop machine learning-based prediction models to predict masked hypertension and masked uncontrolled hypertension using the clinical characteristics of patients at a single outpatient visit.Methods: Data were derived from two cohorts in Taiwan. The first cohort included 970 hypertensive patients recruited from six medical centers between 2004 and 2005, which were split into a training set (n = 679), a validation set (n = 146), and a test set (n = 145) for model development and internal validation. The second cohort included 416 hypertensive patients recruited from a single medical center between 2012 and 2020, which was used for external validation. We used 33 clinical characteristics as candidate variables to develop models based on logistic regression (LR), random forest (RF), eXtreme Gradient Boosting (XGboost), and artificial neural network (ANN).Results: The four models featured high sensitivity and high negative predictive value (NPV) in internal validation (sensitivity = 0.914–1.000; NPV = 0.853–1.000) and external validation (sensitivity = 0.950–1.000; NPV = 0.875–1.000). The RF, XGboost, and ANN models showed much higher area under the receiver operating characteristic curve (AUC) (0.799–0.851 in internal validation, 0.672–0.837 in external validation) than the LR model. Among the models, the RF model, composed of 6 predictor variables, had the best overall performance in both internal and external validation (AUC = 0.851 and 0.837; sensitivity = 1.000 and 1.000; specificity = 0.609 and 0.580; NPV = 1.000 and 1.000; accuracy = 0.766 and 0.721, respectively).Conclusion: An effective machine learning-based predictive model that requires data from a single clinic visit may help to identify masked hypertension and masked uncontrolled hypertension.

Download Full-text

Electronic Medical Record-Based Machine Learning Approach to Predict the Risk of 30-Day Adverse Cardiac Events after Invasive Coronary Treatment (Preprint)

10.2196/preprints.26801 ◽

2020 ◽

Author(s):

Osung Kwon ◽

Wonjun Na ◽

Hee Jun Kang ◽

Tae Joon Jun ◽

Jihoon Kweon ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Medical Record ◽

Electronic Medical Record ◽

Prediction Models ◽

Excellent Result ◽

External Validation ◽

Gradient Boosting ◽

Cardiac Events ◽

Adverse Cardiac Events

BACKGROUND Although there is a growing interest in prediction models based on electronic medical record (EMR), to identify patients at risk of adverse cardiac events following invasive coronary treatment, robust models fully utilizing EMR data are limited. OBJECTIVE We aimed to develop and validate machine-learning (ML) models using diverse fields of EMR to predict risk of 30-day adverse cardiac events after percutaneous intervention or bypass surgery. METHODS EMR data of 5,184,565 records of 16,793 patients at a quaternary hospital between 2006-2016, was categorized into static basic (e.g. demographics), dynamic time-series (e.g. laboratory values), and cardiac-specific data (e.g. coronary angiography). The data were randomly split into training, tuning, and testing sets in a ratio of 3:1:1. Each model was evaluated with 5-fold cross-validation and with an external EMR-based cohort at a tertiary hospital. Logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and feedforward neural network (FNN) algorithms were applied. Primary outcome was 30-day mortality following invasive treatment. RESULTS GBM showed the best performance with area under the receiver operating characteristic curve (AUROC) of 0.99; RF had a similar AUROC of 0.98. AUROCs of FNN and LR were 0.96 and 0.93, respectively. GBM had the highest area under the precision-recall curve (AUPRC) of 0.80 and those of RF, LR and FNN were 0.73, 0.68, and 0.63, respectively. All models showed low Brier scores of <0.1 as well as highly fitted calibration plots, indicating a good fit of the ML-based models. On external validation, the GBM model demonstrated maximal performance with AUROCs 0.90, while FNN had AUROC of 0.85. The AUROC of LR and RF were slightly lower at 0.80, and 0.79, respectively. The AUPRCs of GBM, LR, and FNN were similar at 0.47, 0.43, and 0.41, respectively, while that of RF was lower at 0.33. All models showed low Brier scores of 0.1. Among the categories in the GBM model, time-series dynamic data demonstrated high AUROC of >0.95, contributing majorly to the excellent result CONCLUSIONS Exploiting diverse fields of EMR dataset, the ML-based 30-days adverse cardiac event prediction models performed outstanding, and the applied framework could be generalized for various healthcare prediction models.ts.

Download Full-text

Completeness of reporting of clinical prediction models developed using supervised machine learning: A systematic review

10.1101/2021.06.28.21259089 ◽

2021 ◽

Author(s):

Constanza L Andaur Navarro ◽

Johanna AA Damen ◽

Toshihiko Takada ◽

Steven WJ Nijman ◽

Paula Dhiman ◽

...

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Prediction Models ◽

External Validation ◽

Supervised Machine Learning ◽

Model Specification ◽

Essential Information ◽

Model Studies ◽

Completeness Of Reporting ◽

Complete Reporting

ABSTRACT Objective. While many studies have consistently found incomplete reporting of regression-based prediction model studies, evidence is lacking for machine learning-based prediction model studies. Our aim is to systematically review the adherence of Machine Learning (ML)-based prediction model studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Statement. Study design and setting: We included articles reporting on development or external validation of a multivariable prediction model (either diagnostic or prognostic) developed using supervised ML for individualized predictions across all medical fields (PROSPERO, CRD42019161764). We searched PubMed from 1 January 2018 to 31 December 2019. Data extraction was performed using the 22-item checklist for reporting of prediction model studies (www.TRIPOD-statement.org). We measured the overall adherence per article and per TRIPOD item. Results: Our search identified 24 814 articles, of which 152 articles were included: 94 (61.8%) prognostic and 58 (38.2%) diagnostic prediction model studies. Overall, articles adhered to a median of 38.7% (IQR 31.0-46.4) of TRIPOD items. No articles fully adhered to complete reporting of the abstract and very few reported the flow of participants (3.9%, 95% CI 1.8 to 8.3), appropriate title (4.6%, 95% CI 2.2 to 9.2), blinding of predictors (4.6%, 95% CI 2.2 to 9.2), model specification (5.2%, 95% CI 2.4 to 10.8), and model's predictive performance (5.9%, 95% CI 3.1 to 10.9). There was often complete reporting of source of data (98.0%, 95% CI 94.4 to 99.3) and interpretation of the results (94.7%, 95% CI 90.0 to 97.3). Conclusion. Similar to studies using conventional statistical techniques, the completeness of reporting is poor. Essential information to decide to use the model (i.e. model specification and its performance) is rarely reported. However, some items and sub-items of TRIPOD might be less suitable for ML-based prediction model studies and thus, TRIPOD requires extensions. Overall, there is an urgent need to improve the reporting quality and usability of research to avoid research waste.

Download Full-text

Institution-Specific Machine Learning Models for Prehospital Assessment to Predict Hospital Admission: Prediction Model Development Study (Preprint)

10.2196/preprints.20324 ◽

2020 ◽

Author(s):

Toru Shirakawa ◽

Tomohiro Sonoo ◽

Kentaro Ogura ◽

Ryo Fujimori ◽

Konan Hara ◽

...

Keyword(s):

Machine Learning ◽

Hospital Admission ◽

Prediction Model ◽

Predictive Factors ◽

Prediction Models ◽

Vital Signs ◽

Chief Complaint ◽

Gradient Boosting ◽

Model Based ◽

Ambulance Transport

BACKGROUND Although multiple prediction models have been developed to predict hospital admission to emergency departments (EDs) to address overcrowding and patient safety, only a few studies have examined prediction models for prehospital use. Development of institution-specific prediction models is feasible in this age of data science, provided that predictor-related information is readily collectable. OBJECTIVE We aimed to develop a hospital admission prediction model based on patient information that is commonly available during ambulance transport before hospitalization. METHODS Patients transported by ambulance to our ED from April 2018 through March 2019 were enrolled. Candidate predictors were age, sex, chief complaint, vital signs, and patient medical history, all of which were recorded by emergency medical teams during ambulance transport. Patients were divided into two cohorts for derivation (3601/5145, 70.0%) and validation (1544/5145, 30.0%). For statistical models, logistic regression, logistic lasso, random forest, and gradient boosting machine were used. Prediction models were developed in the derivation cohort. Model performance was assessed by area under the receiver operating characteristic curve (AUROC) and association measures in the validation cohort. RESULTS Of 5145 patients transported by ambulance, including deaths in the ED and hospital transfers, 2699 (52.5%) required hospital admission. Prediction performance was higher with the addition of predictive factors, attaining the best performance with an AUROC of 0.818 (95% CI 0.792-0.839) with a machine learning model and predictive factors of age, sex, chief complaint, and vital signs. Sensitivity and specificity of this model were 0.744 (95% CI 0.716-0.773) and 0.745 (95% CI 0.709-0.776), respectively. CONCLUSIONS For patients transferred to EDs, we developed a well-performing hospital admission prediction model based on routinely collected prehospital information including chief complaints.

Download Full-text

A Self-Care Prediction Model for Children with Disability Based on Genetic Algorithm and Extreme Gradient Boosting

Mathematics ◽

10.3390/math8091590 ◽

2020 ◽

Vol 8 (9) ◽

pp. 1590

Author(s):

Muhammad Syafrudin ◽

Ganjar Alfian ◽

Norma Latif Fitriyani ◽

Muhammad Anshari ◽

Tony Hadibarata ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Prediction Model ◽

Prediction Models ◽

Self Care ◽

Gradient Boosting ◽

Children With Disability ◽

Study Results ◽

Extreme Gradient Boosting ◽

Care Problems

Detecting self-care problems is one of important and challenging issues for occupational therapists, since it requires a complex and time-consuming process. Machine learning algorithms have been recently applied to overcome this issue. In this study, we propose a self-care prediction model called GA-XGBoost, which combines genetic algorithms (GAs) with extreme gradient boosting (XGBoost) for predicting self-care problems of children with disability. Selecting the feature subset affects the model performance; thus, we utilize GA to optimize finding the optimum feature subsets toward improving the model’s performance. To validate the effectiveness of GA-XGBoost, we present six experiments: comparing GA-XGBoost with other machine learning models and previous study results, a statistical significant test, impact analysis of feature selection and comparison with other feature selection methods, and sensitivity analysis of GA parameters. During the experiments, we use accuracy, precision, recall, and f1-score to measure the performance of the prediction models. The results show that GA-XGBoost obtains better performance than other prediction models and the previous study results. In addition, we design and develop a web-based self-care prediction to help therapist diagnose the self-care problems of children with disabilities. Therefore, appropriate treatment/therapy could be performed for each child to improve their therapeutic outcome.

Download Full-text

Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database

Diagnostics ◽

10.3390/diagnostics11060943 ◽

2021 ◽

Vol 11 (6) ◽

pp. 943

Author(s):

Joung Ouk (Ryan) Kim ◽

Yong-Suk Jeong ◽

Jin Ho Kim ◽

Jong-Weon Lee ◽

Dougho Park ◽

...

Keyword(s):

Machine Learning ◽

Health Insurance ◽

Prediction Model ◽

National Health Insurance ◽

National Health ◽

Prediction Models ◽

Characteristic Curve ◽

Health Screening ◽

Gradient Boosting ◽

Extreme Gradient Boosting

Background: This study proposes a cardiovascular diseases (CVD) prediction model using machine learning (ML) algorithms based on the National Health Insurance Service-Health Screening datasets. Methods: We extracted 4699 patients aged over 45 as the CVD group, diagnosed according to the international classification of diseases system (I20–I25). In addition, 4699 random subjects without CVD diagnosis were enrolled as a non-CVD group. Both groups were matched by age and gender. Various ML algorithms were applied to perform CVD prediction; then, the performances of all the prediction models were compared. Results: The extreme gradient boosting, gradient boosting, and random forest algorithms exhibited the best average prediction accuracy (area under receiver operating characteristic curve (AUROC): 0.812, 0.812, and 0.811, respectively) among all algorithms validated in this study. Based on AUROC, the ML algorithms improved the CVD prediction performance, compared to previously proposed prediction models. Preexisting CVD history was the most important factor contributing to the accuracy of the prediction model, followed by total cholesterol, low-density lipoprotein cholesterol, waist-height ratio, and body mass index. Conclusions: Our results indicate that the proposed health screening dataset-based CVD prediction model using ML algorithms is readily applicable, produces validated results and outperforms the previous CVD prediction models.

Download Full-text