scholarly journals Using Unsupervised Machine Learning to Identify Severity Subgroups Among COVID-19 Patients in the Emergency Department (Preprint)

2020 ◽  
Author(s):  
Julián Benito-León ◽  
Mª Dolores del Castillo ◽  
Alberto Estirado ◽  
Ritwik Ghosh ◽  
Souvki Dubey ◽  
...  

BACKGROUND Early detection and intervention are the key factors for improving outcomes in COVID-19. OBJECTIVE To detect severity subgroups among COVID-19 patients, based only on clinical data and standard laboratory tests obtained during the assessment at the emergency department. METHODS We applied unsupervised machine learning to a dataset of 853 COVID-19 patients from HM hospitals in Spain. RESULTS From a total of 850 variables, four tests, the serum levels of aspartate transaminase (AST), lactate dehydrogenase (LDH) and C-reactive protein (CRP), and the number of neutrophils, were enough to segregate the entire patient pool into three separate clusters. Further, the percentage of monocytes and lymphocytes and the levels of alanine transaminase (ALT) distinguished the cluster 3 from the other two clusters. The cluster 1 was characterized by the higher mortality rate and higher levels of AST, ALT, LDH, CRP and number of neutrophils, and low percentage of monocytes and lymphocytes. The cluster 2 included patients with a moderate mortality rate and medium levels of the previous laboratory determinations. The cluster 3 was characterized by the lower mortality rate and lower levels of AST, ALT, LDH, CRP and number of neutrophils, and higher percentage of monocytes and lymphocytes. Age, sex, comorbidities, and vital signs did not allow us to separate the three clusters. An online cluster assignment tool can be found at https://g-nec.car.upm-csic.es/COVID19-severity-group-assessment/. CONCLUSIONS A few standard laboratory tests, deemed to be available in all emergency departments, have shown far discriminative power for characterization of severity subgroups among COVID-19 patients.

2015 ◽  
Vol 120 (3) ◽  
pp. 627-635 ◽  
Author(s):  
Oliver M. Theusinger ◽  
Werner Baulig ◽  
Burkhardt Seifert ◽  
Stefan M. Müller ◽  
Sergio Mariotti ◽  
...  

Author(s):  
Vafa Bayat ◽  
Steven Phelps ◽  
Russell Ryono ◽  
Chong Lee ◽  
Hemal Parekh ◽  
...  

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Adam Karlsson ◽  
Willem Stassen ◽  
Amy Loutfi ◽  
Ulrika Wallgren ◽  
Eric Larsson ◽  
...  

Abstract Background Sepsis is a life-threatening condition, causing almost one fifth of all deaths worldwide. The aim of the current study was to identify variables predictive of 7- and 30-day mortality among variables reflective of the presentation of septic patients arriving to the emergency department (ED) using machine learning. Methods Retrospective cross-sectional design, including all patients arriving to the ED at Södersjukhuset in Sweden during 2013 and discharged with an International Classification of Diseases (ICD)-10 code corresponding to sepsis. All predictions were made using a Balanced Random Forest Classifier and 91 variables reflecting ED presentation. An exhaustive search was used to remove unnecessary variables in the final model. A 10-fold cross validation was performed and the accuracy was described using the mean value of the following: AUC, sensitivity, specificity, PPV, NPV, positive LR and negative LR. Results The study population included 445 septic patients, randomised to a training (n = 356, 80%) and a validation set (n = 89, 20%). The six most important variables for predicting 7-day mortality were: “fever”, “abnormal verbal response”, “low saturation”, “arrival by emergency medical services (EMS)”, “abnormal behaviour or level of consciousness” and “chills”. The model including these variables had an AUC of 0.83 (95% CI: 0.80–0.86). The final model predicting 30-day mortality used similar six variables, however, including “breathing difficulties” instead of “abnormal behaviour or level of consciousness”. This model achieved an AUC = 0.80 (CI 95%, 0.78–0.82). Conclusions The results suggest that six specific variables were predictive of 7- and 30-day mortality with good accuracy which suggests that these symptoms, observations and mode of arrival may be important components to include along with vital signs in a future prediction tool of mortality among septic patients presenting to the ED. In addition, the Random Forests appears to be a suitable machine learning method on which to build future studies.


2018 ◽  
Author(s):  
Kumardeep Chaudhary ◽  
Aine Duffy ◽  
Priti Poojary ◽  
Aparna Saha ◽  
Kinsuk Chauhan ◽  
...  

AbstractObjectiveAcute kidney injury (AKI) is highly prevalent in critically ill patients with sepsis. Sepsis-associated AKI is a heterogeneous clinical entity, and, like many complex syndromes, is composed of distinct subtypes. We aimed to agnostically identify AKI subphenotypes using machine learning techniques and routinely collected data in electronic health records (EHRs).DesignCohort study utilizing the MIMIC-III Database.SettingICUs from tertiary care hospital in the U.S.PatientsPatients older than 18 years with sepsis and who developed AKI within 48 hours of ICU admission.InterventionsUnsupervised machine learning utilizing all available vital signs and laboratory measurements.Measurements and Main ResultsWe identified 1,865 patients with sepsis-associated AKI. Ten vital signs and 691 unique laboratory results were identified. After data processing and feature selection, 59 features, of which 28 were measures of intra-patient variability, remained for inclusion into an unsupervised machine-learning algorithm. We utilized k-means clustering with k ranging from 2 – 10; k=2 had the highest silhouette score (0.62). Cluster 1 had 1,358 patients while Cluster 2 had 507 patients. There were no significant differences between clusters on age, race or gender. We found significant differences in comorbidities and small but significant differences in several laboratory variables (hematocrit, bicarbonate, albumin) and vital signs (systolic blood pressure and heart rate). In-hospital mortality was higher in cluster 2 patients, 25% vs. 20%, p=0.008. Features with the largest differences between clusters included variability in basophil and eosinophil counts, alanine aminotransferase levels and creatine kinase values.ConclusionsUtilizing routinely collected laboratory variables and vital signs in the EHR, we were able to identify two distinct subphenotypes of sepsis-associated AKI with different outcomes. Variability in laboratory variables, as opposed to their actual value, was more important for determination of subphenotypes. Our findings show the potential utility of unsupervised machine learning to better subtype AKI.


2019 ◽  
Vol 10 (05) ◽  
pp. 952-963 ◽  
Author(s):  
Zfania Tom Korach ◽  
Kenrick D. Cato ◽  
Sarah A. Collins ◽  
Min Jeoung Kang ◽  
Christopher Knaplund ◽  
...  

Abstract Background In the hospital setting, it is crucial to identify patients at risk for deterioration before it fully develops, so providers can respond rapidly to reverse the deterioration. Rapid response (RR) activation criteria include a subjective component (“worried about the patient”) that is often documented in nurses' notes and is hard to capture and quantify, hindering active screening for deteriorating patients. Objectives We used unsupervised machine learning to automatically discover RR event risk/protective factors from unstructured nursing notes. Methods In this retrospective cohort study, we obtained nursing notes of hospitalized, nonintensive care unit patients, documented from 2015 through 2018 from Partners HealthCare databases. We applied topic modeling to those notes to reveal topics (clusters of associated words) documented by nurses. Two nursing experts named each topic with a representative Systematized Nomenclature of Medicine–Clinical Terms (SNOMED CT) concept. We used the concepts along with vital signs and demographics in a time-dependent covariates extended Cox model to identify risk/protective factors for RR event risk. Results From a total of 776,849 notes of 45,299 patients, we generated 95 stable topics, of which 80 were mapped to 72 distinct SNOMED CT concepts. Compared with a model containing only demographics and vital signs, the latent topics improved the model's predictive ability from a concordance index of 0.657 to 0.720. Thirty topics were found significantly associated with RR event risk at a 0.05 level, and 11 remained significant after Bonferroni correction of the significance level to 6.94E-04, including physical examination (hazard ratio [HR] = 1.07, 95% confidence interval [CI], 1.03–1.12), informing doctor (HR = 1.05, 95% CI, 1.03–1.08), and seizure precautions (HR = 1.08, 95% CI, 1.04–1.12). Conclusion Unsupervised machine learning methods can automatically reveal interpretable and informative signals from free-text and may support early identification of patients at risk for RR events.


2017 ◽  
Author(s):  
Nathaniel R. Greenbaum ◽  
Yacine Jernite ◽  
Yoni Halpern ◽  
Shelley Calder ◽  
Larry A. Nathanson ◽  
...  

AbstractObjectiveTo determine the effect of contextual autocomplete, a user interface that uses machine learning, on the efficiency and quality of documentation of presenting problems (chief complaints) in the emergency department (ED).Materials and MethodsWe used contextual autocomplete, a user interface that ranks concepts by their predicted probability, to help nurses enter data about a patient’s reason for visiting the ED. Predicted probabilities were calculated using a previously derived model based on triage vital signs and a brief free text note. We evaluated the percentage and quality of structured data captured using a prospective before-and-after study design.ResultsA total of 279,231 patient encounters were analyzed. Structured data capture improved from 26.2% to 97.2% (p<0.0001). During the post-implementation period, presenting problems were more complete (3.35 vs 3.66; p=0.0004), as precise (3.59 vs. 3.74; p=0.1), and higher in overall quality (3.38 vs. 3.72; p=0.0002). Our system reduced the mean number of keystrokes required to document a presenting problem from 11.6 to 0.6 (p<0.0001), a 95% improvement.DiscussionWe have demonstrated a technique that captures structured data on nearly all patients. We estimate that our system reduces the number of man-hours required annually to type presenting problems at our institution from 92.5 hours to 4.8 hours.ConclusionImplementation of a contextual autocomplete system resulted in improved structured data capture, ontology usage compliance, and data quality.


2017 ◽  
Author(s):  
Sabrina Jaeger ◽  
Simone Fulle ◽  
Samo Turk

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.


2020 ◽  
Author(s):  
Jiawei Peng ◽  
Yu Xie ◽  
Deping Hu ◽  
Zhenggang Lan

The system-plus-bath model is an important tool to understand nonadiabatic dynamics for large molecular systems. The understanding of the collective motion of a huge number of bath modes is essential to reveal their key roles in the overall dynamics. We apply the principal component analysis (PCA) to investigate the bath motion based on the massive data generated from the MM-SQC (symmetrical quasi-classical dynamics method based on the Meyer-Miller mapping Hamiltonian) nonadiabatic dynamics of the excited-state energy transfer dynamics of Frenkel-exciton model. The PCA method clearly clarifies that two types of bath modes, which either display the strong vibronic couplings or have the frequencies close to electronic transition, are very important to the nonadiabatic dynamics. These observations are fully consistent with the physical insights. This conclusion is obtained purely based on the PCA understanding of the trajectory data, without the large involvement of pre-defined physical knowledge. The results show that the PCA approach, one of the simplest unsupervised machine learning methods, is very powerful to analyze the complicated nonadiabatic dynamics in condensed phase involving many degrees of freedom.


2020 ◽  
Author(s):  
Hsiao-Ko Chang ◽  
Hui-Chih Wang ◽  
Chih-Fen Huang ◽  
Feipei Lai

BACKGROUND In most of Taiwan’s medical institutions, congestion is a serious problem for emergency departments. Due to a lack of beds, patients spend more time in emergency retention zones, which make it difficult to detect cardiac arrest (CA). OBJECTIVE We seek to develop a Drug Early Warning System Model (DEWSM), it included drug injections and vital signs as this research important features. We use it to predict cardiac arrest in emergency departments via drug classification and medical expert suggestion. METHODS We propose this new model for detecting cardiac arrest via drug classification and by using a sliding window; we apply learning-based algorithms to time-series data for a DEWSM. By treating drug features as a dynamic time-series factor for cardiopulmonary resuscitation (CPR) patients, we increase sensitivity, reduce false alarm rates and mortality, and increase the model’s accuracy. To evaluate the proposed model, we use the area under the receiver operating characteristic curve (AUROC). RESULTS Four important findings are as follows: (1) We identify the most important drug predictors: bits (intravenous therapy), and replenishers and regulators of water and electrolytes (fluid and electrolyte supplement). The best AUROC of bits is 85%, it means the medical expert suggest the drug features: bits, it will affect the vital signs, and then the evaluate this model correctly classified patients with CPR reach 85%; that of replenishers and regulators of water and electrolytes is 86%. These two features are the most influential of the drug features in the task. (2) We verify feature selection, in which accounting for drugs improve the accuracy: In Task 1, the best AUROC of vital signs is 77%, and that of all features is 86%. In Task 2, the best AUROC of all features is 85%, which demonstrates that thus accounting for the drugs significantly affects prediction. (3) We use a better model: For traditional machine learning, this study adds a new AI technology: the long short-term memory (LSTM) model with the best time-series accuracy, comparable to the traditional random forest (RF) model; the two AUROC measures are 85%. It can be seen that the use of new AI technology will achieve better results, currently comparable to the accuracy of traditional common RF, and the LSTM model can be adjusted in the future to obtain better results. (4) We determine whether the event can be predicted beforehand: The best classifier is still an RF model, in which the observational starting time is 4 hours before the CPR event. Although the accuracy is impaired, the predictive accuracy still reaches 70%. Therefore, we believe that CPR events can be predicted four hours before the event. CONCLUSIONS This paper uses a sliding window to account for dynamic time-series data consisting of the patient’s vital signs and drug injections. The National Early Warning Score (NEWS) only focuses on the score of vital signs, and does not include factors related to drug injections. In this study, the experimental results of adding the drug injections are better than only vital signs. In a comparison with NEWS, we improve predictive accuracy via feature selection, which includes drugs as features. In addition, we use traditional machine learning methods and deep learning (using LSTM method as the main processing time series data) as the basis for comparison of this research. The proposed DEWSM, which offers 4-hour predictions, is better than the NEWS in the literature. This also confirms that the doctor’s heuristic rules are consistent with the results found by machine learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document