The use of machine learning prediction models in spinal surgical outcome: An overview of current development and external validation studies

2021 ◽  
pp. 100872
Author(s):  
Paul T. Ogink ◽  
Olivier Q. Groot ◽  
Bas J.J. Bindels ◽  
Daniel G. Tobert
Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Matthew W Segar ◽  
Byron Jaeger ◽  
Kershaw V Patel ◽  
Vijay Nambi ◽  
Chiadi E Ndumele ◽  
...  

Introduction: Heart failure (HF) risk and the underlying biological risk factors vary by race. Machine learning (ML) may improve race-specific HF risk prediction but this has not been fully evaluated. Methods: The study included participants from 4 cohorts (ARIC, DHS, JHS, and MESA) aged > 40 years, free of baseline HF, and with adjudicated HF event follow-up. Black adults from JHS and white adults from ARIC were used to derive race-specific ML models to predict 10-year HF risk. The ML models were externally validated in subgroups of black and white adults from ARIC (excluding JHS participants) and pooled MESA/DHS cohorts and compared to prior established HF risk scores developed in ARIC and MESA. Harrell’s C-index and Greenwood-Nam-D’Agostino chi-square were used to assess discrimination and calibration, respectively. Results: In the derivation cohorts, 288 of 4141 (7.0%) black and 391 of 8242 (4.7%) white adults developed HF over 10 years. The ML models had excellent discrimination in both black and white participants (C-indices = 0.88 and 0.89). In the external validation cohorts for black participants from ARIC (excluding JHS, N = 1072) and MESA/DHS pooled cohorts (N = 2821), 131 (12.2%) and 115 (4.1%) developed HF. The ML model had adequate calibration and demonstrated superior discrimination compared to established HF risk models (Fig A). A consistent pattern was also observed in the external validation cohorts of white participants from the MESA/DHS pooled cohorts (N=3236; 100 [3.1%] HF events) (Fig A). The most important predictors of HF in both races were NP levels. Cardiac biomarkers and glycemic parameters were most important among blacks while LV hypertrophy and prevalent CVD and traditional CV risk factors were the strongest predictors among whites (Fig B). Conclusions: Race-specific and ML-based HF risk models that integrate clinical, laboratory, and biomarker data demonstrated superior performance when compared to traditional risk prediction models.


Endocrine ◽  
2021 ◽  
Author(s):  
Olivier Zanier ◽  
Matteo Zoli ◽  
Victor E. Staartjes ◽  
Federica Guaraldi ◽  
Sofia Asioli ◽  
...  

Abstract Purpose Biochemical remission (BR), gross total resection (GTR), and intraoperative cerebrospinal fluid (CSF) leaks are important metrics in transsphenoidal surgery for acromegaly, and prediction of their likelihood using machine learning would be clinically advantageous. We aim to develop and externally validate clinical prediction models for outcomes after transsphenoidal surgery for acromegaly. Methods Using data from two registries, we develop and externally validate machine learning models for GTR, BR, and CSF leaks after endoscopic transsphenoidal surgery in acromegalic patients. For the model development a registry from Bologna, Italy was used. External validation was then performed using data from Zurich, Switzerland. Gender, age, prior surgery, as well as Hardy and Knosp classification were used as input features. Discrimination and calibration metrics were assessed. Results The derivation cohort consisted of 307 patients (43.3% male; mean [SD] age, 47.2 [12.7] years). GTR was achieved in 226 (73.6%) and BR in 245 (79.8%) patients. In the external validation cohort with 46 patients, 31 (75.6%) achieved GTR and 31 (77.5%) achieved BR. Area under the curve (AUC) at external validation was 0.75 (95% confidence interval: 0.59–0.88) for GTR, 0.63 (0.40–0.82) for BR, as well as 0.77 (0.62–0.91) for intraoperative CSF leaks. While prior surgery was the most important variable for prediction of GTR, age, and Hardy grading contributed most to the predictions of BR and CSF leaks, respectively. Conclusions Gross total resection, biochemical remission, and CSF leaks remain hard to predict, but machine learning offers potential in helping to tailor surgical therapy. We demonstrate the feasibility of developing and externally validating clinical prediction models for these outcomes after surgery for acromegaly and lay the groundwork for development of a multicenter model with more robust generalization.


2022 ◽  
Vol 8 ◽  
Author(s):  
Jinzhang Li ◽  
Ming Gong ◽  
Yashutosh Joshi ◽  
Lizhong Sun ◽  
Lianjun Huang ◽  
...  

BackgroundAcute renal failure (ARF) is the most common major complication following cardiac surgery for acute aortic syndrome (AAS) and worsens the postoperative prognosis. Our aim was to establish a machine learning prediction model for ARF occurrence in AAS patients.MethodsWe included AAS patient data from nine medical centers (n = 1,637) and analyzed the incidence of ARF and the risk factors for postoperative ARF. We used data from six medical centers to compare the performance of four machine learning models and performed internal validation to identify AAS patients who developed postoperative ARF. The area under the curve (AUC) of the receiver operating characteristic (ROC) curve was used to compare the performance of the predictive models. We compared the performance of the optimal machine learning prediction model with that of traditional prediction models. Data from three medical centers were used for external validation.ResultsThe eXtreme Gradient Boosting (XGBoost) algorithm performed best in the internal validation process (AUC = 0.82), which was better than both the logistic regression (LR) prediction model (AUC = 0.77, p < 0.001) and the traditional scoring systems. Upon external validation, the XGBoost prediction model (AUC =0.81) also performed better than both the LR prediction model (AUC = 0.75, p = 0.03) and the traditional scoring systems. We created an online application based on the XGBoost prediction model.ConclusionsWe have developed a machine learning model that has better predictive performance than traditional LR prediction models as well as other existing risk scoring systems for postoperative ARF. This model can be utilized to provide early warnings when high-risk patients are found, enabling clinicians to take prompt measures.


2021 ◽  
Vol 8 ◽  
Author(s):  
Ming-Hui Hung ◽  
Ling-Chieh Shih ◽  
Yu-Ching Wang ◽  
Hsin-Bang Leu ◽  
Po-Hsun Huang ◽  
...  

Objective: This study aimed to develop machine learning-based prediction models to predict masked hypertension and masked uncontrolled hypertension using the clinical characteristics of patients at a single outpatient visit.Methods: Data were derived from two cohorts in Taiwan. The first cohort included 970 hypertensive patients recruited from six medical centers between 2004 and 2005, which were split into a training set (n = 679), a validation set (n = 146), and a test set (n = 145) for model development and internal validation. The second cohort included 416 hypertensive patients recruited from a single medical center between 2012 and 2020, which was used for external validation. We used 33 clinical characteristics as candidate variables to develop models based on logistic regression (LR), random forest (RF), eXtreme Gradient Boosting (XGboost), and artificial neural network (ANN).Results: The four models featured high sensitivity and high negative predictive value (NPV) in internal validation (sensitivity = 0.914–1.000; NPV = 0.853–1.000) and external validation (sensitivity = 0.950–1.000; NPV = 0.875–1.000). The RF, XGboost, and ANN models showed much higher area under the receiver operating characteristic curve (AUC) (0.799–0.851 in internal validation, 0.672–0.837 in external validation) than the LR model. Among the models, the RF model, composed of 6 predictor variables, had the best overall performance in both internal and external validation (AUC = 0.851 and 0.837; sensitivity = 1.000 and 1.000; specificity = 0.609 and 0.580; NPV = 1.000 and 1.000; accuracy = 0.766 and 0.721, respectively).Conclusion: An effective machine learning-based predictive model that requires data from a single clinic visit may help to identify masked hypertension and masked uncontrolled hypertension.


2020 ◽  
Author(s):  
Osung Kwon ◽  
Wonjun Na ◽  
Hee Jun Kang ◽  
Tae Joon Jun ◽  
Jihoon Kweon ◽  
...  

BACKGROUND Although there is a growing interest in prediction models based on electronic medical record (EMR), to identify patients at risk of adverse cardiac events following invasive coronary treatment, robust models fully utilizing EMR data are limited. OBJECTIVE We aimed to develop and validate machine-learning (ML) models using diverse fields of EMR to predict risk of 30-day adverse cardiac events after percutaneous intervention or bypass surgery. METHODS EMR data of 5,184,565 records of 16,793 patients at a quaternary hospital between 2006-2016, was categorized into static basic (e.g. demographics), dynamic time-series (e.g. laboratory values), and cardiac-specific data (e.g. coronary angiography). The data were randomly split into training, tuning, and testing sets in a ratio of 3:1:1. Each model was evaluated with 5-fold cross-validation and with an external EMR-based cohort at a tertiary hospital. Logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and feedforward neural network (FNN) algorithms were applied. Primary outcome was 30-day mortality following invasive treatment. RESULTS GBM showed the best performance with area under the receiver operating characteristic curve (AUROC) of 0.99; RF had a similar AUROC of 0.98. AUROCs of FNN and LR were 0.96 and 0.93, respectively. GBM had the highest area under the precision-recall curve (AUPRC) of 0.80 and those of RF, LR and FNN were 0.73, 0.68, and 0.63, respectively. All models showed low Brier scores of <0.1 as well as highly fitted calibration plots, indicating a good fit of the ML-based models. On external validation, the GBM model demonstrated maximal performance with AUROCs 0.90, while FNN had AUROC of 0.85. The AUROC of LR and RF were slightly lower at 0.80, and 0.79, respectively. The AUPRCs of GBM, LR, and FNN were similar at 0.47, 0.43, and 0.41, respectively, while that of RF was lower at 0.33. All models showed low Brier scores of 0.1. Among the categories in the GBM model, time-series dynamic data demonstrated high AUROC of >0.95, contributing majorly to the excellent result CONCLUSIONS Exploiting diverse fields of EMR dataset, the ML-based 30-days adverse cardiac event prediction models performed outstanding, and the applied framework could be generalized for various healthcare prediction models.ts.


Author(s):  
Sooyoung Yoo ◽  
Jinwook Choi ◽  
Borim Ryu ◽  
Seok Kim

Abstract Background Unplanned hospital readmission after discharge reflects low satisfaction and reliability in care and the possibility of potential medical accidents, and is thus indicative of the quality of patient care and the appropriateness of discharge plans. Objectives The purpose of this study was to develop and validate prediction models for all-cause unplanned hospital readmissions within 30 days of discharge, based on a common data model (CDM), which can be applied to multiple institutions for efficient readmission management. Methods Retrospective patient-level prediction models were developed based on clinical data of two tertiary general university hospitals converted into a CDM developed by Observational Medical Outcomes Partnership. Machine learning classification models based on the LASSO logistic regression model, decision tree, AdaBoost, random forest, and gradient boosting machine (GBM) were developed and tested by manipulating a set of CDM variables. An internal 10-fold cross-validation was performed on the target data of the model. To examine its transportability, the model was externally validated. Verification indicators helped evaluate the model performance based on the values of area under the curve (AUC). Results Based on the time interval for outcome prediction, it was confirmed that the prediction model targeting the variables obtained within 30 days of discharge was the most efficient (AUC of 82.75). The external validation showed that the model is transferable, with the combination of various clinical covariates. Above all, the prediction model based on the GBM showed the highest AUC performance of 84.14 ± 0.015 for the Seoul National University Hospital cohort, yielding in 78.33 in external validation. Conclusions This study showed that readmission prediction models developed using machine-learning techniques and CDM can be a useful tool to compare two hospitals in terms of patient-data features.


2021 ◽  
Vol 11 (12) ◽  
pp. 1271
Author(s):  
Jaehyeong Cho ◽  
Jimyung Park ◽  
Eugene Jeong ◽  
Jihye Shin ◽  
Sangjeong Ahn ◽  
...  

Background: Several prediction models have been proposed for preoperative risk stratification for mortality. However, few studies have investigated postoperative risk factors, which have a significant influence on survival after surgery. This study aimed to develop prediction models using routine immediate postoperative laboratory values for predicting postoperative mortality. Methods: Two tertiary hospital databases were used in this research: one for model development and another for external validation of the resulting models. The following algorithms were utilized for model development: LASSO logistic regression, random forest, deep neural network, and XGBoost. We built the models on the lab values from immediate postoperative blood tests and compared them with the SASA scoring system to demonstrate their efficacy. Results: There were 3817 patients who had immediate postoperative blood test values. All models trained on immediate postoperative lab values outperformed the SASA model. Furthermore, the developed random forest model had the best AUROC of 0.82 and AUPRC of 0.13, and the phosphorus level contributed the most to the random forest model. Conclusions: Machine learning models trained on routine immediate postoperative laboratory values outperformed previously published approaches in predicting 30-day postoperative mortality, indicating that they may be beneficial in identifying patients at increased risk of postoperative death.


2020 ◽  
Author(s):  
Janmajay Singh ◽  
Masahiro Sato ◽  
Tomoko Ohkuma

BACKGROUND Missing data in electronic health records is inevitable and considered to be nonrandom. Several studies have found that features indicating missing patterns (missingness) encode useful information about a patient’s health and advocate for their inclusion in clinical prediction models. But their effectiveness has not been comprehensively evaluated. OBJECTIVE The goal of the research is to study the effect of including informative missingness features in machine learning models for various clinically relevant outcomes and explore robustness of these features across patient subgroups and task settings. METHODS A total of 48,336 electronic health records from the 2012 and 2019 PhysioNet Challenges were used, and mortality, length of stay, and sepsis outcomes were chosen. The latter dataset was multicenter, allowing external validation. Gated recurrent units were used to learn sequential patterns in the data and classify or predict labels of interest. Models were evaluated on various criteria and across population subgroups evaluating discriminative ability and calibration. RESULTS Generally improved model performance in retrospective tasks was observed on including missingness features. Extent of improvement depended on the outcome of interest (area under the curve of the receiver operating characteristic [AUROC] improved from 1.2% to 7.7%) and even patient subgroup. However, missingness features did not display utility in a simulated prospective setting, being outperformed (0.9% difference in AUROC) by the model relying only on pathological features. This was despite leading to earlier detection of disease (true positives), since including these features led to a concomitant rise in false positive detections. CONCLUSIONS This study comprehensively evaluated effectiveness of missingness features on machine learning models. A detailed understanding of how these features affect model performance may lead to their informed use in clinical settings especially for administrative tasks like length of stay prediction where they present the greatest benefit. While missingness features, representative of health care processes, vary greatly due to intra- and interhospital factors, they may still be used in prediction models for clinically relevant outcomes. However, their use in prospective models producing frequent predictions needs to be explored further.


2021 ◽  
Vol 14 (12) ◽  
pp. 1941-1949
Author(s):  
Seungkwon Choi ◽  
◽  
Sungwho Park ◽  
Iksoo Byon ◽  
Hee-Young Choi ◽  
...  

AIM: To predict final visual acuity and analyze significant factors influencing open globe injury prognosis. METHODS: Prediction models were built using a supervised classification algorithm from Microsoft Azure Machine Learning Studio. The best algorithm was selected to analyze the predicted final visual acuity. We retrospectively reviewed the data of 171 patients with open globe injury who visited the Pusan National University Hospital between January 2010 and July 2020. We then applied cross-validation, the permutation feature importance method, and the synthetic minority over-sampling technique to enhance tool performance. RESULTS: The two-class boosted decision tree model showed the best predictive performance. The accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve were 0.925, 0.962, 0.833, 0.893, and 0.971, respectively. To increase the efficiency and efficacy of the prognostic tool, the top 14 features were finally selected using the permutation feature importance method: (listed in the order of importance) retinal detachment, location of laceration, initial visual acuity, iris damage, surgeon, past history, size of the scleral laceration, vitreous hemorrhage, trauma characteristics, age, corneal injury, primary diagnosis, wound location, and lid laceration. CONCLUSION: Here we devise a highly accurate model to predict the final visual acuity of patients with open globe injury. This tool is useful and easily accessible to doctors and patients, reducing the socioeconomic burden. With further multicenter verification using larger datasets and external validation, we expect this model to become useful worldwide.


Author(s):  
M. VALKEMA ◽  
H. LINGSMA ◽  
P. LAMBIN ◽  
J. VAN LANSCHOT

Biostatistics versus machine learning: from traditional prediction models to automated medical analysis Machine learning is increasingly applied to medical data to develop clinical prediction models. This paper discusses the application of machine learning in comparison with traditional biostatistical methods. Biostatistics is well-suited for structured datasets. The selection of variables for a biostatistical prediction model is primarily knowledge-driven. A similar approach is possible with machine learning. But in addition, machine learning allows for analysis of unstructured datasets, which are e.g. derived from medical imaging and written texts in patient records. In contrast to biostatistics, the selection of variables with machine learning is mainly data-driven. Complex machine learning models are able to detect nonlinear patterns and interactions in data. However, this requires large datasets to prevent overfitting. For both machine learning and biostatistics, external validation of a developed model in a comparable setting is required to evaluate a model’s reproducibility. Machine learning models are not easily implemented in clinical practice, since they are recognized as black boxes (i.e. non-intuitive). For this purpose, research initiatives are ongoing within the field of explainable artificial intelligence. Finally, the application of machine learning for automated imaging analysis and development of clinical decision support systems is discussed.


Sign in / Sign up

Export Citation Format

Share Document