scholarly journals A Machine Learning Prediction Model of Respiratory Failure Within 48 Hours of Patient Admission for COVID-19: Model Development and Validation (Preprint)

2020 ◽  
Author(s):  
Siavash Bolourani ◽  
Max Brenner ◽  
Ping Wang ◽  
Thomas McGinn ◽  
Jamie S Hirsch ◽  
...  

BACKGROUND Predicting early respiratory failure due to COVID-19 can help triage patients to higher levels of care, allocate scarce resources, and reduce morbidity and mortality by appropriately monitoring and treating the patients at greatest risk for deterioration. Given the complexity of COVID-19, machine learning approaches may support clinical decision making for patients with this disease. OBJECTIVE Our objective is to derive a machine learning model that predicts respiratory failure within 48 hours of admission based on data from the emergency department. METHODS Data were collected from patients with COVID-19 who were admitted to Northwell Health acute care hospitals and were discharged, died, or spent a minimum of 48 hours in the hospital between March 1 and May 11, 2020. Of 11,525 patients, 933 (8.1%) were placed on invasive mechanical ventilation within 48 hours of admission. Variables used by the models included clinical and laboratory data commonly collected in the emergency department. We trained and validated three predictive models (two based on XGBoost and one that used logistic regression) using cross-hospital validation. We compared model performance among all three models as well as an established early warning score (Modified Early Warning Score) using receiver operating characteristic curves, precision-recall curves, and other metrics. RESULTS The XGBoost model had the highest mean accuracy (0.919; area under the curve=0.77), outperforming the other two models as well as the Modified Early Warning Score. Important predictor variables included the type of oxygen delivery used in the emergency department, patient age, Emergency Severity Index level, respiratory rate, serum lactate, and demographic characteristics. CONCLUSIONS The XGBoost model had high predictive accuracy, outperforming other early warning scores. The clinical plausibility and predictive ability of XGBoost suggest that the model could be used to predict 48-hour respiratory failure in admitted patients with COVID-19.

10.2196/24246 ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. e24246 ◽  
Author(s):  
Siavash Bolourani ◽  
Max Brenner ◽  
Ping Wang ◽  
Thomas McGinn ◽  
Jamie S Hirsch ◽  
...  

Background Predicting early respiratory failure due to COVID-19 can help triage patients to higher levels of care, allocate scarce resources, and reduce morbidity and mortality by appropriately monitoring and treating the patients at greatest risk for deterioration. Given the complexity of COVID-19, machine learning approaches may support clinical decision making for patients with this disease. Objective Our objective is to derive a machine learning model that predicts respiratory failure within 48 hours of admission based on data from the emergency department. Methods Data were collected from patients with COVID-19 who were admitted to Northwell Health acute care hospitals and were discharged, died, or spent a minimum of 48 hours in the hospital between March 1 and May 11, 2020. Of 11,525 patients, 933 (8.1%) were placed on invasive mechanical ventilation within 48 hours of admission. Variables used by the models included clinical and laboratory data commonly collected in the emergency department. We trained and validated three predictive models (two based on XGBoost and one that used logistic regression) using cross-hospital validation. We compared model performance among all three models as well as an established early warning score (Modified Early Warning Score) using receiver operating characteristic curves, precision-recall curves, and other metrics. Results The XGBoost model had the highest mean accuracy (0.919; area under the curve=0.77), outperforming the other two models as well as the Modified Early Warning Score. Important predictor variables included the type of oxygen delivery used in the emergency department, patient age, Emergency Severity Index level, respiratory rate, serum lactate, and demographic characteristics. Conclusions The XGBoost model had high predictive accuracy, outperforming other early warning scores. The clinical plausibility and predictive ability of XGBoost suggest that the model could be used to predict 48-hour respiratory failure in admitted patients with COVID-19.


Author(s):  
Sasi Sekhar T. V. D. ◽  
Anjani Kumar C. ◽  
Bhavya Ch. ◽  
Sameera B. ◽  
Rama Devi Ch.

Background: Scoring systems can be used to define critically ill patients, estimate their prognosis, help in clinical decision making, and guide the allocation of resources and to estimate the quality of care.  It remains unclear whether the additional data needed to compute ICU scores improves mortality prediction for critically ill patients compared to the simpler ED scores.Methods: We have done a prospective observational study of consecutively admitted 400 critically ill patients to ICU directly from Emergency Department in Dr PSIMS and RF over a period of 2 years. Clinical and laboratory data conforming to the modified early warning score (MEWS), rapid emergency medicine score (REMS), acute physiology and chronic health evaluation (APACHE II), and simplified acute physiology score (SAPS II) were recorded for all patients. A comparison was made between ED scoring systems MEWS, REMS and ICU scoring systems APACHE II, SAPSII. The outcome was recorded in two categories: survived and non-survived with a primary end point of 30-day mortality. Discrimination was evaluated using receiver operating characteristic (ROC) curves.Results: The ICU scores outperformed the ED scores with more area under curve values. The predicted mortality percentage of ICU based scoring systems is high compared to emergency scores (predicted mortality % of SAPS II-63%, APACHE II-33.3%, MEWS-18.5%, REMS-14.8%).Conclusions: ICU scores showed more predictive accuracy than ED scores in prognosticating the outcomes in critically ill patients. This difference is seemed more due to complexity of ICU scores.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11988
Author(s):  
Kuan-Han Wu ◽  
Fu-Jen Cheng ◽  
Hsiang-Ling Tai ◽  
Jui-Cheng Wang ◽  
Yii-Ting Huang ◽  
...  

Background A feasible and accurate risk prediction systems for emergency department (ED) patients is urgently required. The Modified Early Warning Score (MEWS) is a wide-used tool to predict clinical outcomes in ED. Literatures showed that machine learning (ML) had better predictability in specific patient population than traditional scoring system. By analyzing a large multicenter dataset, we aim to develop a ML model to predict in-hospital morality of the adult non traumatic ED patients for different time stages, and comparing performance with other ML models and MEWS. Methods A retrospective observational cohort study was conducted in five Taiwan EDs including two tertiary medical centers and three regional hospitals. All consecutively adult (>17 years old) non-traumatic patients admit to ED during a 9-year period (January first, 2008 to December 31th, 2016) were included. Exclusion criteria including patients with (1) out-of-hospital cardiac arrest and (2) discharge against medical advice and transferred to other hospital (3) missing collect variables. The primary outcome was in-hospital mortality and were categorized into 6, 24, 72, 168 hours mortality. MEWS was calculated by systolic blood pressure, pulse rate, respiratory rate, body temperature, and level of consciousness. An ensemble supervised stacking ML model was developed and compared to sensitive and unsensitive Xgboost, Random Forest, and Adaboost. We conducted a performance test and examine both the area under the receiver operating characteristic (AUROC) and the area under the precision and recall curve (AUPRC) as the comparative measures. Result After excluding 182,001 visits (7.46%), study group was consisted of 24,37,326 ED visits. The dataset was split into 67% training data and 33% test data for ML model development. There was no statistically difference found in the characteristics between two groups. For the prediction of 6, 24, 72, 168 hours in-hospital mortality, the AUROC of MEW and ML mode was 0.897, 0.865, 0.841, 0.816 and 0.939, 0.928, 0.913, 0.902 respectively. The stacking ML model outperform other ML model as well. For the prediction of in-hospital mortality over 48-hours, AUPRC performance of MEWS drop below 0.1, while the AUPRC of ML mode was 0.317 in 6 hours and 0.2150 in 168 hours. For each time frame, ML model achieved statistically significant higher AUROC and AUPRC than MEWS (all P < 0.001). Both models showed decreasing prediction ability as time elapse, but there was a trend that the gap of AUROC values between two model increases gradually (P < 0.001). Three MEWS thresholds (score >3, >4, and >5) were determined as baselines for comparison, ML mode consistently showed improved or equally performance in sensitivity, PPV, NPV, but not in specific. Conclusion Stacking ML methods improve predicted in-hospital mortality than MEWS in adult non-traumatic ED patients, especially in the prediction of delayed mortality.


2012 ◽  
Vol 11 (2) ◽  
pp. 58-58
Author(s):  
Chris Roseveare ◽  

The ability to identify and discharge the low-risk patient, and to predict those cases where deterioration is likely is already a key element of the practice of acute medicine . This is an area which has been extensively examined in the past, but two articles in this edition add an interesting dimension to the literature. The use of physiological variables to calculate risk enables fluctuations in a patient’s condition over time can be monitored, allowing appropriate escalation measures to be instituted. The National Early Warning Score has already been implemented in Wales and roll-out across England is expected imminently. Austen and colleagues have highlighted some of the advantages that a standardised system will provide in comparison to their locally-developed Early Warning Score; however the problem of under-scoring due to incomplete or inaccurate recording remains and will continue until electronic solutions are more widespread. Scoring systems utilising laboratory data from admission are less useful for ongoing monitoring but could provide clinicians with an objective measure of risk at the time of initial assessment. As austerity measures bite, the pressure to direct our limited resources to the most appropriate cases will undoubtedly intensify, making this increasingly important. The rigorous quality control mechanisms in laboratories ensure the reliability of biochemical test results; furthermore most hospitals have electronic systems for recording and displaying results which limits the risk of errors from human transcription. O’Sullivan et al have utilised the extensive database from St James’ hospital in Dublin to develop a score based on a number of biochemical and haematological tests. Although this will need to be prospectively validated, retrospective analysis using a huge sample over a number of years, suggests their score may be highly predictive of good and poor outcome. This has great potential to support clinical decision making at the ‘front door’ and improve utilisation of resources. If variety is the ‘spice of life’, then Acute Medicine is certainly the ‘vindaloo’ of the modern hospital. The enormous breadth of clinical problems encountered on the AMU is apparent from the data gathered in York Hospital during the 15 months prior to April 2011. Variety is a key attraction for many junior doctors considering their career choice, at a time when many areas of hospital practice are becoming increasingly specialised. The acute medicine curriculum has ensured that trainees undertake blocks of training in respiratory medicine and cardiology, which is clearly important given that these areas reflected almost 50% of patients. However the authors highlight that the infrequency of certain problems, such as cord compression and diabetic ketoacidosis might also need to be addressed with training outside the AMU in neurology and endocrinology to ensure adequate exposure to these conditions. The rise in alcohol-related admissions is also highlighted in this article, and our trainee section includes a problem based review of the management of these problems. The obesity epidemic, as well as the proliferation of weight-loss surgery and its complications is another area which increasingly challenges our AMU resources. The article by Fiona Maggs provides some practical advice on how to address these issues. I hope you enjoy this edition, and the summer months ahead...


2021 ◽  
Vol 25 ◽  
pp. 233121652110661
Author(s):  
Elaheh Shafieibavani ◽  
Benjamin Goudey ◽  
Isabell Kiral ◽  
Peter Zhong ◽  
Antonio Jimeno-Yepes ◽  
...  

While cochlear implants have helped hundreds of thousands of individuals, it remains difficult to predict the extent to which an individual’s hearing will benefit from implantation. Several publications indicate that machine learning may improve predictive accuracy of cochlear implant outcomes compared to classical statistical methods. However, existing studies are limited in terms of model validation and evaluating factors like sample size on predictive performance. We conduct a thorough examination of machine learning approaches to predict word recognition scores (WRS) measured approximately 12 months after implantation in adults with post-lingual hearing loss. This is the largest retrospective study of cochlear implant outcomes to date, evaluating 2,489 cochlear implant recipients from three clinics. We demonstrate that while machine learning models significantly outperform linear models in prediction of WRS, their overall accuracy remains limited (mean absolute error: 17.9-21.8). The models are robust across clinical cohorts, with predictive error increasing by at most 16% when evaluated on a clinic excluded from the training set. We show that predictive improvement is unlikely to be improved by increasing sample size alone, with doubling of sample size estimated to only increasing performance by 3% on the combined dataset. Finally, we demonstrate how the current models could support clinical decision making, highlighting that subsets of individuals can be identified that have a 94% chance of improving WRS by at least 10% points after implantation, which is likely to be clinically meaningful. We discuss several implications of this analysis, focusing on the need to improve and standardize data collection.


2021 ◽  
Author(s):  
Feng Xie ◽  
Marcus Eng Hock Ong ◽  
Johannes Nathaniel Min Hui Liew ◽  
Kenneth Boon Kiat Tan ◽  
Andrew Fu Wah Ho ◽  
...  

AbstractImportanceTriage in the emergency department (ED) for admission and appropriate level of hospital care is a complex clinical judgment based on the tacit understanding of the patient’s likely acute course, availability of medical resources, and local practices. While a scoring tool could be valuable in triage, currently available tools have demonstrated limitations.ObjectiveTo develop a tool based on a parsimonious list of predictors available early at ED triage, to provide a simple, early, and accurate estimate of short-term mortality risk, the Score for Emergency Risk Prediction (SERP), and evaluate its predictive accuracy relative to published tools.Design, Setting, and ParticipantsWe performed a single-site, retrospective study for all emergency department (ED) patients between January 2009 and December 2016 admitted in a tertiary hospital in Singapore. SERP was derived using the machine learning framework for developing predictive models, AutoScore, based on six variables easily available early in the ED care process. Using internal validation, the SERP was compared to the current triage system, Patient Acuity Category Scale (PACS), Modified Early Warning Score (MEWS), National Early Warning Score (NEWS), Cardiac Arrest Risk Triage (CART), and Charlson Comorbidity Index (CCI) in predicting both primary and secondary outcomes in the study.Main Outcomes and MeasuresThe primary outcome of interest was 30-day mortality. Secondary outcomes include 2-day mortality, inpatient mortality, 30-day post-discharge mortality, and 1-year mortality. The SERP’s predictive power was measured using the area under the curve (AUC) in the receiver operating characteristic (ROC) analysis. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated under the optimal threshold, defined as the point nearest to the upper-left corner of the ROC curve.ResultsWe included 224,666 ED episodes in the model training cohort, 56,167 episodes in the validation cohort, and 42,676 episodes in the testing cohort. 18,797 (5.8%) of them died in 30 days after their ED visits. Evaluated on the testing set, SERP outperformed several benchmark scores in predicting 30-day mortality and other mortality-related outcomes. Under cut-off score of 27, SERP achieved a sensitivity of 72.6% (95% confidence interval [CI]: 70.7-74.3%), a specificity of 77.8% (95% CI: 77.5-78.2), a positive predictive value of 15.8% (15.4-16.2%) and a negative predictive value of 98% (97.9-98.1%).ConclusionsSERP showed better prediction performance than existing triage scores while maintaining easy implementation and ease of ascertainment at the ED. It has the potential to be widely applied and validated in different circumstances and healthcare settings.Key pointsQuestionHow does a tool for predicting hospital outcomes based on a machine learning-based automatic clinical score generator, AutoScore, perform in a cohort of individuals admitted to hospital from the emergency department (ED) compared to other published clinical tools?FindingsThe new tool, the Score for Emergency Risk Prediction (SERP), is parsimonious and point-based. SERP was more accurate in identifying patients who died during short or long-term care, compared with other point-based clinical tools.MeaningSERP, a tool based on AutoScore is promising for triaging patients admitted from the ED according to mortality risk.


2020 ◽  
Author(s):  
Hsiao-Ko Chang ◽  
Hui-Chih Wang ◽  
Chih-Fen Huang ◽  
Feipei Lai

BACKGROUND In most of Taiwan’s medical institutions, congestion is a serious problem for emergency departments. Due to a lack of beds, patients spend more time in emergency retention zones, which make it difficult to detect cardiac arrest (CA). OBJECTIVE We seek to develop a pharmaceutical early warning model to predict cardiac arrest in emergency departments via drug classification and medical expert suggestion. METHODS We propose a new early warning score model for detecting cardiac arrest via pharmaceutical classification and by using a sliding window; we apply learning-based algorithms to time-series data for a Pharmaceutical Early Warning Scoring Model (PEWSM). By treating pharmaceutical features as a dynamic time-series factor for cardiopulmonary resuscitation (CPR) patients, we increase sensitivity, reduce false alarm rates and mortality, and increase the model’s accuracy. To evaluate the proposed model we use the area under the receiver operating characteristic curve (AUROC). RESULTS Four important findings are as follows: (1) We identify the most important drug predictors: bits, and replenishers and regulators of water and electrolytes. The best AUROC of bits is 85%; that of replenishers and regulators of water and electrolytes is 86%. These two features are the most influential of the drug features in the task. (2) We verify feature selection, in which accounting for drugs improve the accuracy: In Task 1, the best AUROC of vital signs is 77%, and that of all features is 86%. In Task 2, the best AUROC of all features is 85%, which demonstrates that thus accounting for the drugs significantly affects prediction. (3) We use a better model: For traditional machine learning, this study adds a new AI technology: the long short-term memory (LSTM) model with the best time-series accuracy, comparable to the traditional random forest (RF) model; the two AUROC measures are 85%. (4) We determine whether the event can be predicted beforehand: The best classifier is still an RF model, in which the observational starting time is 4 hours before the CPR event. Although the accuracy is impaired, the predictive accuracy still reaches 70%. Therefore, we believe that CPR events can be predicted four hours before the event. CONCLUSIONS This paper uses a sliding window to account for dynamic time-series data consisting of the patient’s vital signs and drug injections. In a comparison with NEWS, we improve predictive accuracy via feature selection, which includes drugs as features. In addition, LSTM yields better performance with time-series data. The proposed PEWSM, which offers 4-hour predictions, is better than the National Early Warning Score (NEWS) in the literature. This also confirms that the doctor’s heuristic rules are consistent with the results found by machine learning algorithms.


2020 ◽  
Author(s):  
Hsiao-Ko Chang ◽  
Hui-Chih Wang ◽  
Chih-Fen Huang ◽  
Feipei Lai

BACKGROUND In most of Taiwan’s medical institutions, congestion is a serious problem for emergency departments. Due to a lack of beds, patients spend more time in emergency retention zones, which make it difficult to detect cardiac arrest (CA). OBJECTIVE We seek to develop a Drug Early Warning System Model (DEWSM), it included drug injections and vital signs as this research important features. We use it to predict cardiac arrest in emergency departments via drug classification and medical expert suggestion. METHODS We propose this new model for detecting cardiac arrest via drug classification and by using a sliding window; we apply learning-based algorithms to time-series data for a DEWSM. By treating drug features as a dynamic time-series factor for cardiopulmonary resuscitation (CPR) patients, we increase sensitivity, reduce false alarm rates and mortality, and increase the model’s accuracy. To evaluate the proposed model, we use the area under the receiver operating characteristic curve (AUROC). RESULTS Four important findings are as follows: (1) We identify the most important drug predictors: bits (intravenous therapy), and replenishers and regulators of water and electrolytes (fluid and electrolyte supplement). The best AUROC of bits is 85%, it means the medical expert suggest the drug features: bits, it will affect the vital signs, and then the evaluate this model correctly classified patients with CPR reach 85%; that of replenishers and regulators of water and electrolytes is 86%. These two features are the most influential of the drug features in the task. (2) We verify feature selection, in which accounting for drugs improve the accuracy: In Task 1, the best AUROC of vital signs is 77%, and that of all features is 86%. In Task 2, the best AUROC of all features is 85%, which demonstrates that thus accounting for the drugs significantly affects prediction. (3) We use a better model: For traditional machine learning, this study adds a new AI technology: the long short-term memory (LSTM) model with the best time-series accuracy, comparable to the traditional random forest (RF) model; the two AUROC measures are 85%. It can be seen that the use of new AI technology will achieve better results, currently comparable to the accuracy of traditional common RF, and the LSTM model can be adjusted in the future to obtain better results. (4) We determine whether the event can be predicted beforehand: The best classifier is still an RF model, in which the observational starting time is 4 hours before the CPR event. Although the accuracy is impaired, the predictive accuracy still reaches 70%. Therefore, we believe that CPR events can be predicted four hours before the event. CONCLUSIONS This paper uses a sliding window to account for dynamic time-series data consisting of the patient’s vital signs and drug injections. The National Early Warning Score (NEWS) only focuses on the score of vital signs, and does not include factors related to drug injections. In this study, the experimental results of adding the drug injections are better than only vital signs. In a comparison with NEWS, we improve predictive accuracy via feature selection, which includes drugs as features. In addition, we use traditional machine learning methods and deep learning (using LSTM method as the main processing time series data) as the basis for comparison of this research. The proposed DEWSM, which offers 4-hour predictions, is better than the NEWS in the literature. This also confirms that the doctor’s heuristic rules are consistent with the results found by machine learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document