scholarly journals Risk stratification for COVID-19 hospitalization: a multivariable model based on gradient-boosting decision trees

CMAJ Open ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. E1223-E1231
Author(s):  
Jahir M. Gutierrez ◽  
Maksims Volkovs ◽  
Tomi Poutanen ◽  
Tristan Watson ◽  
Laura C. Rosella
2020 ◽  
Author(s):  
Jahir M. Gutierrez ◽  
Maksims Volkovs ◽  
Tomi Poutanen ◽  
Tristan Watson ◽  
Laura Rosella

AbstractImportancePopulation stratification of the adult population in Ontario, Canada by their risk of COVID-19 complications can support rapid pandemic response, resource allocation, and decision making.ObjectiveTo develop and validate a multivariable model to predict risk of hospitalization due to COVID-19 severity from routinely collected health records of the entire adult population of Ontario, Canada.Design, Setting, and ParticipantsThis cohort study included 36,323 adult patients (age ≥ 18 years) from the province of Ontario, Canada, who tested positive for SARS-CoV-2 nucleic acid by polymerase chain reaction between February 2 and October 5, 2020, and followed up through November 5, 2020. Patients living in long-term care facilities were excluded from the analysis.Main Outcomes and MeasuresRisk of hospitalization within 30 days of COVID-19 diagnosis was estimated via Gradient Boosting Decision Trees, and risk factor importance was examined via Shapley values.ResultsThe study cohort included 36,323 patients with majority female sex (18,895 [52.02%]) and median (IQR) age of 45 (31-58) years. The cohort had a hospitalization rate of 7.11% (2,583 hospitalizations) with median (IQR) time to hospitalization of 1 (0-5) days, and a mortality rate of 2.49% (906 deaths) with median (IQR) time to death of 12 (6-27) days. In contrast to patients who were not hospitalized, those who were hospitalized had a higher median age (64 years vs 43 years, p-value < 0.001), majority male (56.25% vs 47.35%, p-value<0.001), and had a higher median [IQR] number of comorbidities (3 [2-6] vs 1 [0-3], p-value<0.001). Patients were randomly split into development (n=29,058, 80%) and held-out validation (n=7,265, 20%) cohorts. The final Gradient Boosting model was built using the XGBoost algorithm and achieved high discrimination (development cohort: mean area under the receiver operating characteristic curve across the five folds of 0.852; held-out validation cohort: 0.8475) as well as excellent calibration (R2=0.998, slope=1.01, intercept=-0.01). The patients who scored at the top 10% in the validation cohort captured 47.41% of the actual hospitalizations, whereas those scored at the top 30% captured 80.56%. Patients in the held-out validation cohort (n=7,265) with a score of at least 0.5 (n=2,149, 29.58%) had a 20.29% hospitalization rate (positive predictive value 20.29%) compared with 2.2% hospitalization rate for those with a score less than 0.5 (n=5,116, 70.42%; negative predictive value 97.8%). Aside from age, gender and number of comorbidities, the features that most contribute to model predictions were: history of abnormal blood levels of creatinine, neutrophils and leukocytes, geography and chronic kidney disease.ConclusionsA risk stratification model has been developed and validated using unique, de-identified, and linked routinely collected health administrative data available in Ontario, Canada. The final XGBoost model showed a high discrimination rate, with the potential utility to stratify patients at risk of serious COVID-19 outcomes. This model demonstrates that routinely collected health system data can be successfully leveraged as a proxy for the potential risk of severe COVID-19 complications. Specifically, past laboratory results and demographic factors provide a strong signal for identifying patients who are susceptible to complications. The model can support population risk stratification that informs patients’ protection most at risk for severe COVID-19 complications.


2014 ◽  
Vol 26 (4) ◽  
pp. 781-817 ◽  
Author(s):  
Ching-Pei Lee ◽  
Chih-Jen Lin

Linear rankSVM is one of the widely used methods for learning to rank. Although its performance may be inferior to nonlinear methods such as kernel rankSVM and gradient boosting decision trees, linear rankSVM is useful to quickly produce a baseline model. Furthermore, following its recent development for classification, linear rankSVM may give competitive performance for large and sparse data. A great deal of works have studied linear rankSVM. The focus is on the computational efficiency when the number of preference pairs is large. In this letter, we systematically study existing works, discuss their advantages and disadvantages, and propose an efficient algorithm. We discuss different implementation issues and extensions with detailed experiments. Finally, we develop a robust linear rankSVM tool for public use.


2021 ◽  
Vol 103 (7) ◽  
pp. 586-592
Author(s):  
Daphne I. Ling ◽  
Jacqueline M. Brady ◽  
Elizabeth Arendt ◽  
Marc Tompkins ◽  
Julie Agel ◽  
...  

2022 ◽  
pp. 251-275
Author(s):  
Edgar Cossio Franco ◽  
Jorge Alberto Delgado Cazarez ◽  
Carlos Alberto Ochoa Ortiz Zezzatti

The objective of this chapter is to implement an intelligent model based on machine learning in the application of macro-ergonomic methods in human resources processes based on the ISO 12207 standard. To achieve the objective, a method of constructing a Java language algorithm is applied to select the best prospect for a given position. Machine learning is done through decision trees and algorithm j48. Among the findings, it is shown that the model is useful in identifying the best profiles for a given position, optimizing the time in the selection process and human resources as well as the reduction of work stress.


2005 ◽  
Vol 37 (3) ◽  
pp. 551-568 ◽  
Author(s):  
Elke A L M G Moons ◽  
Geert P M Wets ◽  
Marc Aerts ◽  
Theo A Arentze ◽  
Harry J P Timmermans

The aim of this paper is to gain a better understanding of the impact of simplification on a sequential model of activity-scheduling behavior which uses feature-selection methods. To that effect, the predictive performance of the Albatross model, which incorporates nine different facets of activity–travel behavior, based on the original full decision trees, is compared with the performance of the model based on trimmed decision trees. The results indicate that significantly smaller decision trees can be used for modeling the different choice facets of the sequential model system without losing much in predictive power. The performance of the models is compared at three levels: the choice-facet level, the activity-pattern level (comparing the observed and generated sequences of activities), and the trip-matrix level, comparing the correlation coefficients that determine the strength of the associations between the observed and the predicted origin–destination matrices. The results indicate that the model based on the trimmed decision trees predicts activity-diary schedules with a minimum loss of accuracy at the decision level. Moreover, the results indicate a slightly better performance at the activity-pattern and the trip-matrix level.


2018 ◽  
Vol 51 ◽  
pp. 02004 ◽  
Author(s):  
Stanislav Eroshenko ◽  
Alexandra Khalyasmaa ◽  
Denis Snegirev

The paper presents the operational model of very-short term solar power stations (SPS) generation forecasting developed by the authors, based on weather information and built into the existing software product as a separate module for SPS operational forecasting. It was revealed that one of the optimal mathematical methods for SPS generation operational forecasting is gradient boosting on decision trees. The paper describes the basic principles of operational forecasting based on the boosting of decision trees, the main advantages and disadvantages of implementing this algorithm. Moreover, this paper presents an example of this algorithm implementation being analyzed using the example of data analysis and forecasting the generation of the existing SPS.


2019 ◽  
Vol 116 (40) ◽  
pp. 19887-19893 ◽  
Author(s):  
José Marcio Luna ◽  
Efstathios D. Gennatas ◽  
Lyle H. Ungar ◽  
Eric Eaton ◽  
Eric S. Diffenderfer ◽  
...  

The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.


Sign in / Sign up

Export Citation Format

Share Document