scholarly journals Development of a Multivariable Model for COVID-19 Risk Stratification Based on Gradient Boosting Decision Trees

Author(s):  
Jahir M. Gutierrez ◽  
Maksims Volkovs ◽  
Tomi Poutanen ◽  
Tristan Watson ◽  
Laura Rosella

AbstractImportancePopulation stratification of the adult population in Ontario, Canada by their risk of COVID-19 complications can support rapid pandemic response, resource allocation, and decision making.ObjectiveTo develop and validate a multivariable model to predict risk of hospitalization due to COVID-19 severity from routinely collected health records of the entire adult population of Ontario, Canada.Design, Setting, and ParticipantsThis cohort study included 36,323 adult patients (age ≥ 18 years) from the province of Ontario, Canada, who tested positive for SARS-CoV-2 nucleic acid by polymerase chain reaction between February 2 and October 5, 2020, and followed up through November 5, 2020. Patients living in long-term care facilities were excluded from the analysis.Main Outcomes and MeasuresRisk of hospitalization within 30 days of COVID-19 diagnosis was estimated via Gradient Boosting Decision Trees, and risk factor importance was examined via Shapley values.ResultsThe study cohort included 36,323 patients with majority female sex (18,895 [52.02%]) and median (IQR) age of 45 (31-58) years. The cohort had a hospitalization rate of 7.11% (2,583 hospitalizations) with median (IQR) time to hospitalization of 1 (0-5) days, and a mortality rate of 2.49% (906 deaths) with median (IQR) time to death of 12 (6-27) days. In contrast to patients who were not hospitalized, those who were hospitalized had a higher median age (64 years vs 43 years, p-value < 0.001), majority male (56.25% vs 47.35%, p-value<0.001), and had a higher median [IQR] number of comorbidities (3 [2-6] vs 1 [0-3], p-value<0.001). Patients were randomly split into development (n=29,058, 80%) and held-out validation (n=7,265, 20%) cohorts. The final Gradient Boosting model was built using the XGBoost algorithm and achieved high discrimination (development cohort: mean area under the receiver operating characteristic curve across the five folds of 0.852; held-out validation cohort: 0.8475) as well as excellent calibration (R2=0.998, slope=1.01, intercept=-0.01). The patients who scored at the top 10% in the validation cohort captured 47.41% of the actual hospitalizations, whereas those scored at the top 30% captured 80.56%. Patients in the held-out validation cohort (n=7,265) with a score of at least 0.5 (n=2,149, 29.58%) had a 20.29% hospitalization rate (positive predictive value 20.29%) compared with 2.2% hospitalization rate for those with a score less than 0.5 (n=5,116, 70.42%; negative predictive value 97.8%). Aside from age, gender and number of comorbidities, the features that most contribute to model predictions were: history of abnormal blood levels of creatinine, neutrophils and leukocytes, geography and chronic kidney disease.ConclusionsA risk stratification model has been developed and validated using unique, de-identified, and linked routinely collected health administrative data available in Ontario, Canada. The final XGBoost model showed a high discrimination rate, with the potential utility to stratify patients at risk of serious COVID-19 outcomes. This model demonstrates that routinely collected health system data can be successfully leveraged as a proxy for the potential risk of severe COVID-19 complications. Specifically, past laboratory results and demographic factors provide a strong signal for identifying patients who are susceptible to complications. The model can support population risk stratification that informs patients’ protection most at risk for severe COVID-19 complications.

CMAJ Open ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. E1223-E1231
Author(s):  
Jahir M. Gutierrez ◽  
Maksims Volkovs ◽  
Tomi Poutanen ◽  
Tristan Watson ◽  
Laura C. Rosella

2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Verena Schöning ◽  
Evangelia Liakoni ◽  
Christine Baumgartner ◽  
Aristomenis K. Exadaktylos ◽  
Wolf E. Hautz ◽  
...  

Abstract Background Clinical risk scores and machine learning models based on routine laboratory values could assist in automated early identification of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) patients at risk for severe clinical outcomes. They can guide patient triage, inform allocation of health care resources, and contribute to the improvement of clinical outcomes. Methods In- and out-patients tested positive for SARS-CoV-2 at the Insel Hospital Group Bern, Switzerland, between February 1st and August 31st (‘first wave’, n = 198) and September 1st through November 16th 2020 (‘second wave’, n = 459) were used as training and prospective validation cohort, respectively. A clinical risk stratification score and machine learning (ML) models were developed using demographic data, medical history, and laboratory values taken up to 3 days before, or 1 day after, positive testing to predict severe outcomes of hospitalization (a composite endpoint of admission to intensive care, or death from any cause). Test accuracy was assessed using the area under the receiver operating characteristic curve (AUROC). Results Sex, C-reactive protein, sodium, hemoglobin, glomerular filtration rate, glucose, and leucocytes around the time of first positive testing (− 3 to + 1 days) were the most predictive parameters. AUROC of the risk stratification score on training data (AUROC = 0.94, positive predictive value (PPV) = 0.97, negative predictive value (NPV) = 0.80) were comparable to the prospective validation cohort (AUROC = 0.85, PPV = 0.91, NPV = 0.81). The most successful ML algorithm with respect to AUROC was support vector machines (median = 0.96, interquartile range = 0.85–0.99, PPV = 0.90, NPV = 0.58). Conclusion With a small set of easily obtainable parameters, both the clinical risk stratification score and the ML models were predictive for severe outcomes at our tertiary hospital center, and performed well in prospective validation.


2020 ◽  
Vol 9 (8) ◽  
pp. 2495
Author(s):  
Ewelina Szczepanek-Parulska ◽  
Kosma Wolinski ◽  
Katarzyna Dobruch-Sobczak ◽  
Patrycja Antosik ◽  
Anna Ostalowska ◽  
...  

Computer-aided diagnosis (CAD) and other risk stratification systems may improve ultrasound image interpretation. This prospective study aimed to compare the diagnostic performance of CAD and the European Thyroid Imaging Reporting and Data System (EU-TIRADS) classification applied by physicians with S-Detect 2 software CAD based on Korean Thyroid Imaging Reporting and Data System (K-TIRADS) and combinations of both methods (MODELs 1 to 5). In all, 133 nodules from 88 patients referred to thyroidectomy with available histopathology or with unambiguous results of cytology were included. The S-Detect system, EU-TIRADS, and mixed MODELs 1–5 for the diagnosis of thyroid cancer showed a sensitivity of 89.4%, 90.9%, 84.9%, 95.5%, 93.9%, 78.9% and 93.9%; a specificity of 80.6%, 61.2%, 88.1%, 53.7%, 73.1%, 89.6% and 80.6%; a positive predictive value of 81.9%, 69.8%, 87.5%, 67%, 77.5%, 88.1% and 82.7%; a negative predictive value of 88.5%, 87.2%, 85.5%, 92.3%, 92.5%, 81.1% and 93.1%; and an accuracy of 85%, 75.9%, 86.5%, 74.4%, 83.5%, 84.2%, and 87.2%, respectively. Comparison showed superiority of the similar MODELs 1 and 5 over other mixed models as well as EU-TIRADS and S-Detect used alone (p-value < 0.05). S-Detect software is characterized with high sensitivity and good specificity, whereas EU-TIRADS has high sensitivity, but rather low specificity. The best diagnostic performance in malignant thyroid nodule (TN) risk stratification was obtained for the combined model of S-Detect (“possibly malignant” nodule) and simultaneously obtaining 4 or 5 points (MODEL 1) or exactly 5 points (MODEL 5) on the EU-TIRADS scale.


2019 ◽  
pp. 1-10
Author(s):  
Gabriel A. Brooks ◽  
Hajime Uno ◽  
Erin J. Aiello Bowles ◽  
Alexander R. Menter ◽  
Maureen O’Keeffe-Rosetti ◽  
...  

PURPOSE Hospitalizations are a common occurrence during chemotherapy for advanced cancer. Validated risk stratification tools could facilitate proactive approaches for reducing hospitalizations by identifying at-risk patients. PATIENTS AND METHODS We assembled two retrospective cohorts of patients receiving chemotherapy for advanced nonhematologic cancer; cohorts were drawn from three integrated health plans of the Cancer Research Network. We used these cohorts to develop and validate logistic regression models estimating 30-day hospitalization risk after chemotherapy initiation. The development cohort included patients in two health plans from 2005 to 2013. The validation cohort included patients in a third health plan from 2007 to 2016. Candidate predictor variables were derived from clinical data in institutional data warehouses. Models were validated based on the C-statistic, positive predictive value, and negative predictive value. Positive predictive value and negative predictive value were calculated in reference to a prespecified risk threshold (hospitalization risk ≥ 18.0%). RESULTS There were 3,606 patients in the development cohort (median age, 63 years) and 634 evaluable patients in the validation cohort (median age, 64 years). Lung cancer was the most common diagnosis in both cohorts (26% and 31%, respectively). The selected risk stratification model included two variables: albumin and sodium. The model C-statistic in the validation cohort was 0.69 (95% CI, 0.62 to 0.75); 39% of patients were classified as high risk according to the prespecified threshold; 30-day hospitalization risk was 24.2% (95% CI, 19.9% to 32.0%) in the high-risk group and 8.7% (95% CI, 6.1% to 12.0%) in the low-risk group. CONCLUSION A model based on data elements routinely collected during cancer treatment can reliably identify patients at high risk for hospitalization after chemotherapy initiation. Additional research is necessary to determine whether this model can be deployed to prevent chemotherapy-related hospitalizations.


Author(s):  
Ellen Haag ◽  
Claudia Gregoriano ◽  
Alexandra Molitor ◽  
Milena Kloter ◽  
Alexander Kutz ◽  
...  

Abstract Objectives Risk stratification in patients with infection is usually based on the Sequential Organ Failure Assessment-Score (SOFA score). Our aim was to investigate whether the vasoactive peptide mid-regional pro-adrenomedullin (MR-proADM) improves the predictive value of the SOFA score for 30-day mortality in patients with acute infection presenting to the emergency department (ED). Methods This secondary analysis of the prospective observational TRIAGE study included 657 patients with infection. The SOFA score, MR-proADM, and traditional inflammation markers were all measured at time of admission. Associations of admission parameters and 30-day mortality were investigated by measures of logistic regression, discrimination analyses, net reclassification index (NRI), and integrated discrimination index (IDI). Results MR-proADM values were higher in non-survivors compared with survivors (4.5 ± 3.5 nmol/L vs. 1.7 ± 1.8 nmol/L) with an adjusted odds ratio of 26.6 (95% CI 3.92 to 180.61, p=0.001) per 1 nmol/L increase in admission MR-proADM levels and an area under the receiver operator curve (AUC) of 0.86. While the SOFA score alone revealed an AUC of 0.81, adding MR-proADM further improved discrimination (AUC 0.87) and classification within predefined risk categories (NRI 0.075, p-value <0.05). An admission MR-proADM threshold of 1.75 nmol/L provided the best prognostic accuracy for 30-day mortality; with a sensitivity of 81% and a specificity of 75%, and a negative predictive value of 98%. Conclusions MR-proADM improved the mortality risk stratification in patients with infection presenting to the ED beyond SOFA score alone and may further improve initial therapeutic site-of-care decisions. Trial registration ClinicalTrials.gov NCT01768494. Registered January 15, 2013.


2021 ◽  
Author(s):  
Verena Schöning ◽  
Evangelia Liakoni ◽  
Christine Baumgartner ◽  
Aristomenis K. Exadaktylos ◽  
Wolf E. Hautz ◽  
...  

Abstract Background: Clinical risk scores and machine learning models based on routine laboratory values could assist in automated early identification of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) patients at risk for severe clinical outcomes. They can guide patient triage, inform allocation of health care resources, and contribute to the improvement of clinical outcomes. Methods: In- and out-patients tested positive for SARS-CoV-2 at the Insel Hospital Group Bern, Switzerland, between February 1st and August 31st (‘first wave’, n=198) and September 1st through November 16th 2020 (‘second wave’, n=459) were used as training and prospective validation cohort, respectively. A clinical risk stratification score and machine learning (ML) models were developed using demographic data, medical history, and laboratory values taken up to three days before, or one day after, positive testing to predict severe outcomes of hospitalization (a composite endpoint of admission to intensive care, or death from any cause). Test accuracy was assessed using the area under the receiver operating characteristic curve (AUROC).Results: Sex, C-reactive protein, sodium, hemoglobin, glomerular filtration rate, glucose, and leucocytes around the time of first positive testing (‑3 to +1 days) were the most predictive parameters. AUROC of the risk stratification score on training data (AUROC = 0.94, positive predictive value (PPV) = 0.97, negative predictive value (NPV) = 0.80) were comparable to the prospective validation cohort (AUROC = 0.85, PPV = 0.91, NPV = 0.81). The most successful ML algorithm with respect to AUROC was support vector machines (median = 0.96, interquartile range = 0.85-0.99, PPV = 0.90, NPV = 0.58).Conclusion: With a small set of easily obtainable parameters, both the clinical risk stratification score and the ML models were predictive for severe outcomes at our tertiary hospital center, and performed well in prospective validation.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
D Doudesis ◽  
J Yang ◽  
A Tsanas ◽  
C Stables ◽  
A Shah ◽  
...  

Abstract Introduction The myocardial-ischemic-injury-index (MI3) is a promising machine learned algorithm that predicts the likelihood of myocardial infarction in patients with suspected acute coronary syndrome. Whether this algorithm performs well in unselected patients or predicts recurrent events is unknown. Methods In an observational analysis from a multi-centre randomised trial, we included all patients with suspected acute coronary syndrome and serial high-sensitivity cardiac troponin I measurements without ST-segment elevation myocardial infarction. Using gradient boosting, MI3 incorporates age, sex, and two troponin measurements to compute a value (0–100) reflecting an individual's likelihood of myocardial infarction, and estimates the negative predictive value (NPV) and positive predictive value (PPV). Model performance for an index diagnosis of myocardial infarction, and for subsequent myocardial infarction or cardiovascular death at one year was determined using previously defined low- and high-probability thresholds (1.6 and 49.7, respectively). Results In total 20,761 of 48,282 (43%) patients (64±16 years, 46% women) were eligible of whom 3,278 (15.8%) had myocardial infarction. MI3 was well discriminated with an area under the receiver-operating-characteristic curve of 0.949 (95% confidence interval 0.946–0.952) identifying 12,983 (62.5%) patients as low-probability (sensitivity 99.3% [99.0–99.6%], NPV 99.8% [99.8–99.9%]), and 2,961 (14.3%) as high-probability (specificity 95.0% [94.7–95.3%], PPV 70.4% [69–71.9%]). At one year, subsequent myocardial infarction or cardiovascular death occurred more often in high-probability compared to low-probability patients (17.6% [520/2,961] versus 1.5% [197/12,983], P&lt;0.001). Conclusions In unselected consecutive patients with suspected acute coronary syndrome, the MI3 algorithm accurately estimates the likelihood of myocardial infarction and predicts probability of subsequent adverse cardiovascular events. Performance of MI3 at example thresholds Funding Acknowledgement Type of funding source: Foundation. Main funding source(s): Medical Research Council


Cancers ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2762
Author(s):  
Samantha Di Donato ◽  
Alessia Vignoli ◽  
Chiara Biagioni ◽  
Luca Malorni ◽  
Elena Mori ◽  
...  

Adjuvant treatment for patients with early stage colorectal cancer (eCRC) is currently based on suboptimal risk stratification, especially for elderly patients. Metabolomics may improve the identification of patients with residual micrometastases after surgery. In this retrospective study, we hypothesized that metabolomic fingerprinting could improve risk stratification in patients with eCRC. Serum samples obtained after surgery from 94 elderly patients with eCRC (65 relapse free and 29 relapsed, after 5-years median follow up), and from 75 elderly patients with metastatic colorectal cancer (mCRC) obtained before a new line of chemotherapy, were retrospectively analyzed via proton nuclear magnetic resonance spectroscopy. The prognostic role of metabolomics in patients with eCRC was assessed using Kaplan–Meier curves. PCA-CA-kNN could discriminate the metabolomic fingerprint of patients with relapse-free eCRC and mCRC (70.0% accuracy using NOESY spectra). This model was used to classify the samples of patients with relapsed eCRC: 69% of eCRC patients with relapse were predicted as metastatic. The metabolomic classification was strongly associated with prognosis (p-value 0.0005, HR 3.64), independently of tumor stage. In conclusion, metabolomics could be an innovative tool to refine risk stratification in elderly patients with eCRC. Based on these results, a prospective trial aimed at improving risk stratification by metabolomic fingerprinting (LIBIMET) is ongoing.


2021 ◽  
pp. 154431672110303
Author(s):  
Sayan Sarkar ◽  
Shyam Mohan ◽  
Shakthi Parvathy

The purpose of this study is to analyze how accurate duplex ultrasonography using color Doppler and computed tomography (CT) angiography are in detection of peripheral arterial disease (PAD) in comparison with the Gold Standard of digital subtraction angiography (DSA). This is a single-center prospective, analytical study done on patients with symptoms of PAD referred to the Department of Radiodiagnosis of Medical Trust Hospital (n = 53). All patients were imaged with color Doppler, CT angiography, and DSA. The peak systolic velocity (PSV) ratio was calculated by Doppler ultrasound, and the percentage stenosis for the same vascular segments was calculated using CT angiography and DSA. To test the statistical significance between the results, chi-square test was used. A P value <.05 indicates statistical significance. The PSV ratio for each grade—normal (<1.5), mild (1.5-2.8), moderate (2.9-4.9), and severe (≥5)—and the percentage of stenosis for each grade observed on CT angiography—normal (<20% stenosis), mild (20%-49% stenosis), moderate (50%-74% stenosis), severe (75%-99% stenosis), and total occlusion (100% stenosis)—were found to be highly sensitive and specific with good positive predictive value, negative predictive value, and accuracy level when compared with DSA with narrow confidence intervals for each range. The P value was <.001 for both color Doppler and CT angiography. Computed tomography angiography can be an effective tool as an alternative to DSA for gradation of stenosis if the artifacts resulting from vascular calcification can be avoided. Duplex ultrasonography can be utilized for gradation of stenosis by using the value of PSV ratio and spectral pattern together. However, it can only act as an adjunct to CT angiography because it is incapable of imaging the full length of the arterial segments in 1 frame.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
J.M Leerink ◽  
H.J.H Van Der Pal ◽  
E.A.M Feijen ◽  
P.G Meregalli ◽  
M.S Pourier ◽  
...  

Abstract Background Childhood cancer survivors (CCS) treated with anthracyclines and/or chest-directed radiotherapy receive life-long echocardiographic surveillance to detect cardiomyopathy early. Current risk stratification and surveillance frequency recommendations are based on anthracycline- and chest-directed radiotherapy dose. We assessed the added prognostic value of an initial left ventricular ejection fraction (EF) measurement at &gt;5 years after cancer diagnosis. Patients and methods Echocardiographic follow-up was performed in asymptomatic CCS from the Emma Children's Hospital (derivation; n=299; median time after diagnosis, 16.7 years [inter quartile range (IQR) 11.8–23.15]) and from the Radboud University Medical Center (validation; n=218, median time after diagnosis, 17.0 years [IQR 13.0–21.7]) in the Netherlands. CCS with cardiomyopathy at baseline were excluded (n=16). The endpoint was cardiomyopathy, defined as a clinically significant decreased EF (EF&lt;40%). The predictive value of the initial EF at &gt;5 years after cancer diagnosis was analyzed with multivariable Cox regression models in the derivation cohort and the model was validated in the validation cohort. Results The median follow-up after the initial EF was 10.9 years and 8.9 years in the derivation and validation cohort, respectively, with cardiomyopathy developing in 11/299 (3.7%) and 7/218 (3.2%), respectively. Addition of the initial EF on top of anthracycline and chest radiotherapy dose increased the C-index from 0.75 to 0.85 in the derivation cohort and from 0.71 to 0.92 in the validation cohort (p&lt;0.01). The model was well calibrated at 10-year predicted probabilities up to 5%. An initial EF between 40–49% was associated with a hazard ratio of 6.8 (95% CI 1.8–25) for development of cardiomyopathy during follow-up. For those with a predicted 10-year cardiomyopathy probability &lt;3% (76.9% of the derivation cohort and 74.3% of validation cohort) the negative predictive value was &gt;99% in both cohorts. Conclusion The addition of the initial EF &gt;5 years after cancer diagnosis to anthracycline- and chest-directed radiotherapy dose improves the 10-year cardiomyopathy prediction in CCS. Our validated prediction model identifies low-risk survivors in whom the surveillance frequency may be reduced to every 10 years. Calibration in both cohorts Funding Acknowledgement Type of funding source: Foundation. Main funding source(s): Dutch Heart Foundation


Sign in / Sign up

Export Citation Format

Share Document