scholarly journals Development and evaluation of a machine learning-based in-hospital COvid-19 Disease Outcome Predictor (CODOP): a multicontinental retrospective study

Author(s):  
Riku Klen ◽  
Disha Purohit ◽  
Ricardo Gomez-Huelgas ◽  
Jose Manuel Casas-Rojo ◽  
Juan Miguel Anton Santos ◽  
...  

Summary: Background More contagious SARS-CoV-2 virus variants, breakthrough infections, waning immunity, and differential access to COVID-19 vaccines account for the worst yet numbers of hospitalization and deaths during the COVID-19 pandemic, particularly in resource-limited countries. There is an urgent need for clinically valuable, generalizable, and parsimonious triage tools assisting the appropriate allocation of hospital resources during the pandemic. We aimed to develop and extensively validate a machine learning-based tool for accurately predicting the clinical outcome of hospitalized COVID-19 patients. Methods: CODOP was built using modified stable iterative variable selection and linear regression with lasso regularisation. To avoid generalization problems, CODOP was trained and tested with three time-sliced and geographically distinct cohorts encompassing 40 511 blood-based analyses of COVID-19 patients from more than 110 hospitals in Spain and the USA during 2020-21. We assessed the discriminative ability of the model using the Area Under the Receiving Operative Curve (AUROC) as well as horizon and Kaplan-Meier risk stratification analyses. To reckon the fluctuating pressure levels in hospitals through the pandemic, we offer two online CODOP calculators suited for undertriage or overtriage scenarios. We challenged their generalizability and clinical utility throughout an evaluation with datasets gathered in five hospitals from three Latin American countries. Findings: CODOP uses 12 clinical parameters commonly measured at hospital admission and associated with the pathophysiology of COVID-19. CODOP reaches high discriminative ability up to nine days before clinical resolution (AUROC: 0.90-0.96, 95% CI 0.879-0.970), it is well calibrated, and it enables an effective dynamic risk stratification during hospitalization. The two CODOP online calculators predicted the clinical outcome of the majority of patients (73-100% sensitivity and 84-100% specificity) from the distinctive Latin American evaluation cohort. Interpretation: The high predictive performance of CODOP in geographically disperse patient cohorts and the easiness-of-use, strongly suggest its clinical utility as a global triage tool, particularly in resource-limited countries.

2021 ◽  
Author(s):  
Gamal Shiha ◽  
Nabiel NH Mikhail ◽  
Reham Soliman ◽  
Ayman Hassan ◽  
Mohammed Eslam

Abstract Background and aim:Many HCC risk prediction scores were developed to guide HCC risk stratification and identify CHC patients who either need intensified surveillance or may not require screening. There is a need to compare different scores and their predictive performance in clinical practice.We aim to compare the newest HCC risk scores evaluating their discriminative ability, and clinical utility in a large cohort of CHC patients.Patients and methodsThe performance of the scores was evaluated in 3075 CHC patients who achieved SVR following DAAs using Log rank, Harrell’s c statistic, also tested for HCC-risk stratification and negative predictive values.ResultsHCC developed in 212 patients within 5 years follow up. Twelve HCC risk scores were identified and displayed significant Log rank (p ≤ 0.05) except Alonso-Lopez TE-HCC, and Chun scores where (p = 0.374, p = 0.053 respectively). Analysis of the remaining ten scores revealed that ADRES, GES pre-post treatment, GES algorithm and Watanabe (post-treatment) scores including dynamics of AFP, were clinically applicable and demonstrated good statistical performance; Log rank analysis < 0.001, Harrell’s C statistic (0.66–0.83) and high negative predictive values (94.38–97.65%). In these three scores, the 5 years Cumulative IR in low risk groups is very low (0.54–1.6) so, screening could be avoid safely in these patients.ConclusionADRES, GES score, GES algorithm and Watanabe (post-treatment) scores seem to offer acceptable HCC-risk predictability and clinical utility in CHC patients. The dynamics of AFP as a component of these scores may explain their high performance when compared to other scores.Word count: 250


2020 ◽  
Vol 7 ◽  
Author(s):  
Bin Zhang ◽  
Qin Liu ◽  
Xiao Zhang ◽  
Shuyi Liu ◽  
Weiqi Chen ◽  
...  

Aim: Early detection of coronavirus disease 2019 (COVID-19) patients who are likely to develop worse outcomes is of great importance, which may help select patients at risk of rapid deterioration who should require high-level monitoring and more aggressive treatment. We aimed to develop and validate a nomogram for predicting 30-days poor outcome of patients with COVID-19.Methods: The prediction model was developed in a primary cohort consisting of 233 patients with laboratory-confirmed COVID-19, and data were collected from January 3 to March 20, 2020. We identified and integrated significant prognostic factors for 30-days poor outcome to construct a nomogram. The model was subjected to internal validation and to external validation with two separate cohorts of 110 and 118 cases, respectively. The performance of the nomogram was assessed with respect to its predictive accuracy, discriminative ability, and clinical usefulness.Results: In the primary cohort, the mean age of patients was 55.4 years and 129 (55.4%) were male. Prognostic factors contained in the clinical nomogram were age, lactic dehydrogenase, aspartate aminotransferase, prothrombin time, serum creatinine, serum sodium, fasting blood glucose, and D-dimer. The model was externally validated in two cohorts achieving an AUC of 0.946 and 0.878, sensitivity of 100 and 79%, and specificity of 76.5 and 83.8%, respectively. Although adding CT score to the clinical nomogram (clinical-CT nomogram) did not yield better predictive performance, decision curve analysis showed that the clinical-CT nomogram provided better clinical utility than the clinical nomogram.Conclusions: We established and validated a nomogram that can provide an individual prediction of 30-days poor outcome for COVID-19 patients. This practical prognostic model may help clinicians in decision making and reduce mortality.


2021 ◽  
Author(s):  
Jialong Xiao ◽  
Miao Mo ◽  
Zezhou Wang ◽  
Changming Zhou ◽  
Jie Shen ◽  
...  

BACKGROUND Over recent years, machine learning (ML) methods have been increasingly explored in cancer prognosis prediction because of the appearance of improved machine learning algorithms. These algorithms can use censored data for modeling, such as support vector machines (SVM) for survival analysis and random survival forest (RSF). However, it is still debated whether traditional (Cox proportional hazard regression) or ML-based prognostic prediction models have better predictive performance. OBJECTIVE This study aims to use the machine learning algorithms to predict the survival of breast cancer and compare the predictive performance with the traditional Cox regression. METHODS This retrospective cohort study included all patients diagnosed with breast cancer and subsequently hospitalized in Fudan University Shanghai Cancer Center (FUSCC) between January 1, 2008 and December 31, 2016. A total of 25267 cases with 21 features were eligible for model development, and the data set was randomly split into a train set (70%) and a test set (30%) for developing four models and predicting overall survival in breast cancer patients. The discriminative ability of models was evaluated by the concordance index (C-index) and the time-dependent area under the curve (AUC); the calibration ability of models was evaluated by the Brier score. RESULTS The RSF model revealed the best discriminative performance among the four models with 3-year, 5-year and 10-year time-dependent AUC of 0.857, 0.838 and 0.781, respectively and C-index of 0.827 (0.809, 0.845), which significantly outperformed the Cox-EN model (0.816, p=0.007), the Cox model (0.814, p=0.003) and the SVM model (0.812, p<0.001). The four models' 3-year, 5-year, and 10-year brier scores were very close, ranging from 0.027 to 0.094, which meant all models had good calibration. In the context of feature importance, elastic net and RSF both indicated that TNM staging, neoadjuvant therapy, number of lymph node metastases, age, and tumor diameter were the top 5 important features for predicting the prognosis of breast cancer. A final online tool was developed to predict the overall survival of breast cancer patients. CONCLUSIONS RSF model slightly outperformed the other models on discriminative ability, revealing the great potential to be used as an effective approach for survival analysis. CLINICALTRIAL ClinicalTrials. gov, registration number: NCT04996732.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sam Nguyen ◽  
Ryan Chan ◽  
Jose Cadena ◽  
Braden Soper ◽  
Paul Kiszka ◽  
...  

AbstractThe combination of machine learning (ML) and electronic health records (EHR) data may be able to improve outcomes of hospitalized COVID-19 patients through improved risk stratification and patient outcome prediction. However, in resource constrained environments the clinical utility of such data-driven predictive tools may be limited by the cost or unavailability of certain laboratory tests. We leveraged EHR data to develop an ML-based tool for predicting adverse outcomes that optimizes clinical utility under a given cost structure. We further gained insights into the decision-making process of the ML models through an explainable AI tool. This cohort study was performed using deidentified EHR data from COVID-19 patients from ProMedica Health System in northwest Ohio and southeastern Michigan. We tested the performance of various ML approaches for predicting either increasing ventilatory support or mortality. We performed post hoc analysis to obtain optimal feature sets under various budget constraints. We demonstrate that it is possible to achieve a significant reduction in cost at the expense of a small reduction in predictive performance. For example, when predicting ventilation, it is possible to achieve a 43% reduction in cost with only a 3% reduction in performance. Similarly, when predicting mortality, it is possible to achieve a 50% reduction in cost with only a 1% reduction in performance. This study presents a quick, accurate, and cost-effective method to evaluate risk of deterioration for patients with SARS-CoV-2 infection at the time of clinical evaluation.


2019 ◽  
Author(s):  
Oskar Flygare ◽  
Jesper Enander ◽  
Erik Andersson ◽  
Brjánn Ljótsson ◽  
Volen Z Ivanov ◽  
...  

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.


2020 ◽  
Vol 27 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Camila Rizzotto ◽  
Walter Filgueira de Azevedo Junior

Background: Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. Objective: Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. Method: SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding, and thermodynamic data to create targeted scoring functions. Results: Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases, and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. Conclusion: Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker, and AutoDock Vina.


2020 ◽  
Vol 28 (2) ◽  
pp. 253-265 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Amauri Duarte da Silva ◽  
Walter Filgueira de Azevedo

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.


Author(s):  
Kazutaka Uchida ◽  
Junichi Kouno ◽  
Shinichi Yoshimura ◽  
Norito Kinjo ◽  
Fumihiro Sakakibara ◽  
...  

AbstractIn conjunction with recent advancements in machine learning (ML), such technologies have been applied in various fields owing to their high predictive performance. We tried to develop prehospital stroke scale with ML. We conducted multi-center retrospective and prospective cohort study. The training cohort had eight centers in Japan from June 2015 to March 2018, and the test cohort had 13 centers from April 2019 to March 2020. We use the three different ML algorithms (logistic regression, random forests, XGBoost) to develop models. Main outcomes were large vessel occlusion (LVO), intracranial hemorrhage (ICH), subarachnoid hemorrhage (SAH), and cerebral infarction (CI) other than LVO. The predictive abilities were validated in the test cohort with accuracy, positive predictive value, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and F score. The training cohort included 3178 patients with 337 LVO, 487 ICH, 131 SAH, and 676 CI cases, and the test cohort included 3127 patients with 183 LVO, 372 ICH, 90 SAH, and 577 CI cases. The overall accuracies were 0.65, and the positive predictive values, sensitivities, specificities, AUCs, and F scores were stable in the test cohort. The classification abilities were also fair for all ML models. The AUCs for LVO of logistic regression, random forests, and XGBoost were 0.89, 0.89, and 0.88, respectively, in the test cohort, and these values were higher than the previously reported prediction models for LVO. The ML models developed to predict the probability and types of stroke at the prehospital stage had superior predictive abilities.


Sign in / Sign up

Export Citation Format

Share Document