Modeling Length of Stay as an Optimized Two-class Prediction Problem

2007 ◽  
Vol 46 (03) ◽  
pp. 352-359 ◽  
Author(s):  
N. Peek ◽  
F. Voorbraak ◽  
E. de Jonge ◽  
B. A. J. M. de Mol ◽  
M. Verduijn

Summary Objectives: To develop a predictive model for the outcome length of stay at the Intensive Care Unit (ICU LOS), including the choice of an optimal dichotomization threshold for this outcome. Reduction of prediction problems of this type of outcome to a two-class problem is a common strategy to identify high-risk patients. Methods: Threshold selection and model development are performed simultaneously. From the range of possible threshold values, the value is chosen for which the corresponding predictive model has maximal precision based on the data. To compare the precision of models for different dichotomizations of the outcome, the MALOR performance statistic is introduced. This statistic is insensitive to the prevalence of positive cases in a two-class prediction problem. Results: The procedure is applied to data from cardiac surgery patients to dichotomize the outcome ICU LOS. The class probabilitytree method is used to develop predictive models. Within our data, the best model precision is found at the threshold of seven days. Conclusions: The presented method extends existing procedures for predictive modeling with optimization of the outcome definition for predictive purposes. The method can be applied to all prediction problems where the outcome variable needs to be dichotomized, and is insensitive to changes in the prevalence of positive cases with different dichotomization thresholds.

2021 ◽  
Vol 12 ◽  
pp. 215145932199274
Author(s):  
Sanjit R. Konda ◽  
Joseph R. Johnson ◽  
Nicket Dedhia ◽  
Erin A. Kelly ◽  
Kenneth A. Egol

Introduction: This study sought to investigate whether a validated trauma triage tool can stratify hospital quality measures and inpatient cost for middle-aged and geriatric trauma patients with isolated proximal and midshaft humerus fractures. Materials and Methods: Patients aged 55 and older who sustained a proximal or midshaft humerus fracture and required inpatient treatment were included. Patient demographic, comorbidity, and injury severity information was used to calculate each patient’s Score for Trauma Triage in the Geriatric and Middle-Aged (STTGMA). Based on scores, patients were stratified to create minimal, low, moderate, and high risk groups. Outcomes included length of stay, complications, operative management, ICU/SDU-level care, discharge disposition, unplanned readmission, and index admission costs. Results: Seventy-four patients with 74 humerus fractures met final inclusion criteria. Fifty-eight (78.4%) patients presented with proximal humerus and 16 (21.6%) with midshaft humerus fractures. Mean length of stay was 5.5 ± 3.4 days with a significant difference among risk groups (P = 0.029). Lower risk patients were more likely to undergo surgical management (P = 0.015) while higher risk patients required more ICU/SDU-level care (P < 0.001). Twenty-six (70.3%) minimal risk patients were discharged home compared to zero high risk patients (P = 0.001). Higher risk patients experienced higher total inpatient costs across operative and nonoperative treatment groups. Conclusion: The STTGMA tool is able to reliably predict hospital quality measures and cost outcomes that may allow hospitals and providers to improve value-based care and clinical decision-making for patients presenting with proximal and midshaft humerus fractures. Level of Evidence: Prognostic Level III.


2021 ◽  
Author(s):  
Fang He ◽  
John H Page ◽  
Kerry R Weinberg ◽  
Anirban Mishra

BACKGROUND The current COVID-19 pandemic is unprecedented; under resource-constrained setting, predictive algorithms can help to stratify disease severity, alerting physicians of high-risk patients, however there are few risk scores derived from a substantially large EHR dataset, using simplified predictors as input. OBJECTIVE To develop and validate simplified machine learning algorithms which predicts COVID-19 adverse outcomes, to evaluate the AUC (area under the receiver operating characteristic curve), sensitivity, specificity and calibration of the algorithms, to derive clinically meaningful thresholds. METHODS We conducted machine learning model development and validation via cohort study using multi-center, patient-level, longitudinal electronic health records (EHR) from Optum® COVID-19 database which provides anonymized, longitudinal EHR from across US. The models were developed based on clinical characteristics to predict 28-day in-hospital mortality, ICU admission, respiratory failure, mechanical ventilator usages at inpatient setting. Data from patients who were admitted prior to Sep 7, 2020, is randomly sampled into development, test and validation datasets; data collected from Sep 7, 2020 through Nov 15, 2020 was reserved as prospective validation dataset. RESULTS Of 3.7M patients in the analysis, a total of 585,867 patients were diagnosed or tested positive for SARS-CoV-2; and 50,703 adult patients were hospitalized with COVID-19 between Feb 1 and Nov 15, 2020. Among the study cohort (N=50,703), there were 6,204 deaths, 9,564 ICU admissions, 6,478 mechanically ventilated or EMCO patients and 25,169 patients developed ARDS or respiratory failure within 28 days since hospital admission. The algorithms demonstrated high accuracy (AUC = 0.89 (0.89 - 0.89) on validation dataset (N=10,752)), consistent prediction through the second wave of pandemic from September to November (AUC = 0.85 (0.85 - 0.86) on post-development validation (N= 14,863)), great clinical relevance and utility. Besides, a comprehensive 386 input covariates from baseline and at admission was included in the analysis; the end-to-end pipeline automates feature selection and model development process, producing 10 key predictors as input such as age, blood urea nitrogen, oxygen saturation, which are both commonly measured and concordant with recognized risk factors for COVID-19. CONCLUSIONS The systematic approach and rigorous validations demonstrate consistent model performance to predict even beyond the time period of data collection, with satisfactory discriminatory power and great clinical utility. Overall, the study offers an accurate, validated and reliable prediction model based on only ten clinical features as a prognostic tool to stratifying COVID-19 patients into intermediate, high and very high-risk groups. This simple predictive tool could be shared with a wider healthcare community, to enable service as an early warning system to alert physicians of possible high-risk patients, or as a resource triaging tool to optimize healthcare resources. CLINICALTRIAL N/A


PLoS ONE ◽  
2016 ◽  
Vol 11 (8) ◽  
pp. e0161493 ◽  
Author(s):  
Marie-Eva Laurencet ◽  
François Girardin ◽  
Fabio Rigamonti ◽  
Anne Bevand ◽  
Philippe Meyer ◽  
...  

2021 ◽  
Author(s):  
Syeda Nadia Firdaus

This thesis explores machine learning models based on various feature sets to solve the protein structural class prediction problem which is a significant classification problem in bioinformatics. Knowledge of protein structural classes contributes to an understanding of protein folding patterns, and this has made structural class prediction research a major topic of interest. In this thesis, features are extracted from predicted secondary structure and hydropathy sequence using new strategies to classify proteins into one of the four major structural classes: all-α, all-β, α/β, and α+β. The prediction accuracy using these features compares favourably with some existing successful methods. We use Support Vector Machines (SVM), since this learning method has well-known efficiency in solving this classification problem. On a standard dataset (25PDB), the proposed system has an overall accuracy of 89% with as few as 22 features, whereas the previous best performing method had an accuracy of 88% using 2510 features.


2020 ◽  
Vol 7 (1) ◽  
pp. e000479
Author(s):  
Drew B Schembre ◽  
Robson E Ely ◽  
Janice M Connolly ◽  
Kunjali T Padhya ◽  
Rohit Sharda ◽  
...  

ObjectiveThe Glasgow-Blatchford Bleeding Score (GBS) was designed to identify patients with upper gastrointestinal bleeding (UGIB) who do not require hospitalisation. It may also help stratify patients unlikely to benefit from intensive care.DesignWe reviewed patients assigned a GBS in the emergency room (ER) via a semiautomated calculator. Patients with a score ≤7 (low risk) were directed to an unmonitored bed (UMB), while those with a score of ≥8 (high risk) were considered for MB placement. Conformity with guidelines and subsequent transfers to MB were reviewed, along with transfusion requirement, rebleeding, length of stay, need for intervention and death.ResultsOver 34 months, 1037 patients received a GBS in the ER. 745 had an UGIB. 235 (32%) of these patients had a GBS ≤7. 29 (12%) low-risk patients were admitted to MBs. Four low-risk patients admitted to UMB required transfer to MB within the first 48 hours. Low-risk patients admitted to UMBs were no more likely to die, rebleed, need transfusion or require more endoscopic, radiographic or surgical procedures than those admitted to MBs. No low-risk patient died from GIB. Patients with GBS ≥8 were more likely to rebleed, require transfusion and interventions to control bleeding but not to die.ConclusionA semiautomated GBS calculator can be incorporated into an ER workflow. Patients with a GBS ≤7 are unlikely to need MB care for UGIB. Further studies are warranted to determine an ideal scoring system for MB admission.


CJEM ◽  
2017 ◽  
Vol 19 (S1) ◽  
pp. S101
Author(s):  
K. Johns ◽  
S. Smith ◽  
E. Karreman ◽  
A. Kastelic

Introduction: Extended length of stay (LOS) in emergency departments (EDs) and overcrowding are a problems for the Canadian healthcare system, which can lead to the creation of a healthcare access block, a reduced health outcome for acute care patients, and decreased satisfaction with the health care system. The goal of this study is to identify and assess specific factors that predict length of stay in EDs for those patients who fall in the highest LOS category. Methods: A total of 130 patient charts from EDs in Regina were reviewed. Charts included in this study were from the 90th-100th percentile of time-users, who were registered during February 2016, and were admitted to hospital from the ED. Patient demographic data and ED visit data were collected. T-tests and multiple regression analyses were conducted to identify any significant predictors of our outcome variable, LOS. Results: None of the demographic variables showed a significant relationship with LOS (age: p=.36; sex: p=.92, CTAS: p=.48), nor did most of the included ED visit data such as door to doctor time (p=.34) and time for imaging studies (X-ray: p=.56; ultrasound: p=.50; CT p=.45). However, the time between the request for consult until the decision to admit did show a significant relationship with LOS (p&lt;.01).Potential confounding variables analyzed were social work consult requests (p=.14), number of emergency visits on day of registration (p=.62), and hour of registration (00-12 or 12-24-p&lt;.01). After adjustment for time of registration, using hierarchical multiple regression, time from consult request to admit decision maintained a significant predictor (p&lt;.01) of LOS. Conclusion: After adjusting for the influence of confounding factors, “consult request to admit decision” was by far the strongest predictor of LOS of all included variables in our study. The results of this study were limited to some extent by inconsistencies in the documentation of some of the analyzed metrics. Establishing standardized documentation could reduce this issue in future studies of this nature. Future areas of interest include establishing a standard reference for our variables, a further analysis into why consult requests are a major predictor, and how to alleviate this in the future.


2020 ◽  
Vol 51 (4) ◽  
pp. 648-665
Author(s):  
Min Wu ◽  
Qi Feng ◽  
Xiaohu Wen ◽  
Ravinesh C. Deo ◽  
Zhenliang Yin ◽  
...  

Abstract The study evaluates the potential utility of the random forest (RF) predictive model used to simulate daily reference evapotranspiration (ET0) in two stations located in the arid oasis area of northwestern China. To construct an accurate RF-based predictive model, ET0 is estimated by an appropriate combination of model inputs comprising maximum air temperature (Tmax), minimum air temperature (Tmin), sunshine durations (Sun), wind speed (U2), and relative humidity (Rh). The output of RF models are tested by ET0 calculated using Penman–Monteith FAO 56 (PMF-56) equation. Results showed that the RF model was considered as a better way to predict ET0 for the arid oasis area with limited data. Besides, Rh was the most influential factor on the behavior of ET0, except for air temperature in the proposed arid area. Moreover, the uncertainty analysis with a Monte Carlo method was carried out to verify the reliability of the results, and it was concluded that RF model had a lower uncertainty and can be used successfully in simulating ET0. The proposed study shows RF as a sound modeling approach for the prediction of ET0 in the arid areas where reliable weather data sets are available, but relatively limited.


Sign in / Sign up

Export Citation Format

Share Document