scholarly journals Unstructured clinical notes within the 24 hours since admission predict short, mid & long-term mortality in adult ICU patients

PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0262182
Author(s):  
Maria Mahbub ◽  
Sudarshan Srinivasan ◽  
Ioana Danciu ◽  
Alina Peluso ◽  
Edmon Begoli ◽  
...  

Mortality prediction for intensive care unit (ICU) patients is crucial for improving outcomes and efficient utilization of resources. Accessibility of electronic health records (EHR) has enabled data-driven predictive modeling using machine learning. However, very few studies rely solely on unstructured clinical notes from the EHR for mortality prediction. In this work, we propose a framework to predict short, mid, and long-term mortality in adult ICU patients using unstructured clinical notes from the MIMIC III database, natural language processing (NLP), and machine learning (ML) models. Depending on the statistical description of the patients’ length of stay, we define the short-term as 48-hour and 4-day period, the mid-term as 7-day and 10-day period, and the long-term as 15-day and 30-day period after admission. We found that by only using clinical notes within the 24 hours of admission, our framework can achieve a high area under the receiver operating characteristics (AU-ROC) score for short, mid and long-term mortality prediction tasks. The test AU-ROC scores are 0.87, 0.83, 0.83, 0.82, 0.82, and 0.82 for 48-hour, 4-day, 7-day, 10-day, 15-day, and 30-day period mortality prediction, respectively. We also provide a comparative study among three types of feature extraction techniques from NLP: frequency-based technique, fixed embedding-based technique, and dynamic embedding-based technique. Lastly, we provide an interpretation of the NLP-based predictive models using feature-importance scores.

PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0254894
Author(s):  
Firdaus Aziz ◽  
Sorayya Malek ◽  
Khairul Shafiq Ibrahim ◽  
Raja Ezman Raja Shariff ◽  
Wan Azman Wan Ahmad ◽  
...  

Background Conventional risk score for predicting short and long-term mortality following an ST-segment elevation myocardial infarction (STEMI) is often not population specific. Objective Apply machine learning for the prediction and identification of factors associated with short and long-term mortality in Asian STEMI patients and compare with a conventional risk score. Methods The National Cardiovascular Disease Database for Malaysia registry, of a multi-ethnic, heterogeneous Asian population was used for in-hospital (6299 patients), 30-days (3130 patients), and 1-year (2939 patients) model development. 50 variables were considered. Mortality prediction was analysed using feature selection methods with machine learning algorithms and compared to Thrombolysis in Myocardial Infarction (TIMI) score. Invasive management of varying degrees was selected as important variables that improved mortality prediction. Results Model performance using a complete and reduced variable produced an area under the receiver operating characteristic curve (AUC) from 0.73 to 0.90. The best machine learning model for in-hospital, 30 days, and 1-year outperformed TIMI risk score (AUC = 0.88, 95% CI: 0.846–0.910; vs AUC = 0.81, 95% CI:0.772–0.845, AUC = 0.90, 95% CI: 0.870–0.935; vs AUC = 0.80, 95% CI: 0.746–0.838, AUC = 0.84, 95% CI: 0.798–0.872; vs AUC = 0.76, 95% CI: 0.715–0.802, p < 0.0001 for all). TIMI score underestimates patients’ risk of mortality. 90% of non-survival patients are classified as high risk (>50%) by machine learning algorithm compared to 10–30% non-survival patients by TIMI. Common predictors identified for short- and long-term mortality were age, heart rate, Killip class, fasting blood glucose, prior primary PCI or pharmaco-invasive therapy and diuretics. The final algorithm was converted into an online tool with a database for continuous data archiving for algorithm validation. Conclusions In a multi-ethnic population, patients with STEMI were better classified using the machine learning method compared to TIMI scoring. Machine learning allows for the identification of distinct factors in individual Asian populations for better mortality prediction. Ongoing continuous testing and validation will allow for better risk stratification and potentially alter management and outcomes in the future.


Diagnostics ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 1060
Author(s):  
Yu-Hsuan Li ◽  
Wayne Huey-Herng Sheu ◽  
Wen-Chao Yeh ◽  
Yung-Chun Chang ◽  
I-Te Lee

We aimed to develop and validate a model for predicting mortality in patients with angina across the spectrum of dysglycemia. A total of 1479 patients admitted for coronary angiography due to angina were enrolled. All-cause mortality served as the primary endpoint. The models were validated with five-fold cross validation to predict long-term mortality. The features selected by least absolute shrinkage and selection operator (LASSO) were age, heart rate, plasma glucose levels at 30 min and 120 min during an oral glucose tolerance test (OGTT), the use of angiotensin II receptor blockers, the use of diuretics, and smoking history. This best performing model was built using a random survival forest with selected features. It had a good discriminative ability (Harrell’s C-index: 0.829) and acceptable calibration (Brier score: 0.08) for predicting long-term mortality. Among patients with obstructive coronary artery disease confirmed by angiography, our model outperformed the Global Registry of Acute Coronary Events discharge score for mortality prediction (Harrell’s C-index: 0.829 vs. 0.739, p < 0.001). In conclusion, we developed a machine learning model to predict long-term mortality among patients with angina. With the integration of OGTT, the model could help to identify a high risk of mortality across the spectrum of dysglycemia.


2020 ◽  
Author(s):  
Christopher A Hane ◽  
Vijay S Nori ◽  
William H Crown ◽  
Darshak M Sanghavi ◽  
Paul Bleicher

BACKGROUND Clinical trials need efficient tools to assist in recruiting patients at risk of Alzheimer disease and related dementias (ADRD). Early detection can also assist patients with financial planning for long-term care. Clinical notes are an important, underutilized source of information in machine learning models because of the cost of collection and complexity of analysis. OBJECTIVE This study aimed to investigate the use of deidentified clinical notes from multiple hospital systems collected over 10 years to augment retrospective machine learning models of the risk of developing ADRD. METHODS We used 2 years of data to predict the future outcome of ADRD onset. Clinical notes are provided in a deidentified format with specific terms and sentiments. Terms in clinical notes are embedded into a 100-dimensional vector space to identify clusters of related terms and abbreviations that differ across hospital systems and individual clinicians. RESULTS When using clinical notes, the area under the curve (AUC) improved from 0.85 to 0.94, and positive predictive value (PPV) increased from 45.07% (25,245/56,018) to 68.32% (14,153/20,717) in the model at disease onset. Models with clinical notes improved in both AUC and PPV in years 3-6 when notes’ volume was largest; results are mixed in years 7 and 8 with the smallest cohorts. CONCLUSIONS Although clinical notes helped in the short term, the presence of ADRD symptomatic terms years earlier than onset adds evidence to other studies that clinicians undercode diagnoses of ADRD. De-identified clinical notes increase the accuracy of risk models. Clinical notes collected across multiple hospital systems via natural language processing can be merged using postprocessing techniques to aid model accuracy.


The online discussion forums and blogs are very vibrant platforms for cancer patients to express their views in the form of stories. These stories sometimes become a source of inspiration for some patients who are anxious in searching the similar cases. This paper proposes a method using natural language processing and machine learning to analyze unstructured texts accumulated from patient’s reviews and stories. The proposed methodology aims to identify behavior, emotions, side-effects, decisions and demographics associated with the cancer victims. The pre-processing phase of our work involves extraction of web text followed by text-cleaning where some special characters and symbols are omitted, and finally tagging the texts using NLTK’s (Natural Language Toolkit) POS (Parts of Speech) Tagger. The post-processing phase performs training of seven machine learning classifiers (refer Table 6). The Decision Tree classifier shows the higher precision (0.83) among the other classifiers while, the Area under the operating Characteristics (AUC) for Support Vector Machine (SVM) classifier is highest (0.98).


2021 ◽  
Author(s):  
Yue Yu ◽  
Chi Peng ◽  
Zhiyuan Zhang ◽  
Kejia Shen ◽  
Yufeng Zhang ◽  
...  

Abstract Background Establishing a mortality prediction model of patients undergoing cardiac surgery might be useful for clinicians for alerting, judgment, and intervention, while few predictive tools for long-term mortality have been developed targeting patients post-cardiac surgery. Objective We aimed to construct and validate several machine learning (ML) algorithms to predict long-term mortality and identify risk factors in unselected patients after cardiac surgery during a 4-year follow-up. Methods The Medical Information Mart for Intensive Care (MIMIC-III) database was used to perform a retrospective administrative database study. Candidate predictors consisted of the demographics, comorbidity, vital signs, laboratory test results, prognostic scoring systems, and treatment information on the first day of ICU admission. 4-year mortality was set as the study outcome. We used the ML methods of logistic regression (LR), artificial neural network (NNET), naïve bayes (NB), gradient boosting machine (GBM), adapting boosting (Ada), random forest (RF), bagged trees (BT), and eXtreme Gradient Boosting (XGB). The prognostic capacity and clinical utility of these ML models were compared using the area under the receiver operating characteristic curves (AUC), calibration curves, and decision curve analysis (DCA). Results Of 7,368 patients in MIMIC-III included in the final cohort, a total of 1,337 (18.15%) patients died during a 4-year follow-up. Among 65 variables extracted from the database, a total of 25 predictors were selected using recursive feature elimination (RFE) and included in the subsequent analysis. The Ada model performed best among eight models in both discriminatory ability with the highest AUC of 0.801 and goodness of fit (visualized by calibration curve). Moreover, the DCA shows that the net benefit of the RF, Ada, and BT models surpassed that of other ML models for almost all threshold probability values. Additionally, through the Ada technique, we determined that red blood cell distribution width (RDW), blood urea nitrogen (BUN), SAPS II, anion gap (AG), age, urine output, chloride, creatinine, congestive heart failure, and SOFA were the Top 10 predictors in the feature importance rankings. Conclusions The Ada model performs best in predicting long-term mortality after cardiac surgery among the eight ML models. The ML-based algorithms might have significant application in the development of early warning systems for patients following operations.


Vascular ◽  
2019 ◽  
Vol 27 (5) ◽  
pp. 479-486 ◽  
Author(s):  
Serkan Aslan ◽  
Ali Rıza Demir ◽  
Yusuf Demir ◽  
Ömer Taşbulak ◽  
Mehmet Altunova ◽  
...  

Objectives Platelets play an important role in the pathogenesis of atherosclerosis and the physiopathology of cardiovascular events. Plateletcrit provides complete information on total platelet mass. The relationship between plateletcrit values and long-term outcomes in patients with carotid stenosis is not known. The purpose of the present study is to evaluate the reliability of plateletcrit for predicting major adverse cardiac and cerebrovascular events (MACCE) in patients with carotid stenosis. Methods A total of 230 patients with more than 50% stenosis of the carotid artery were retrospectively included in this study. All cases were divided into two groups according to the calculated threshold value of plateletcrit with receiver operating characteristics curve and baseline parameters and clinical outcomes were compared. Univariate and multivariate analyses were used to evaluate the association between the plateletcrit and MACCE. Results The cut-off value for plateletcrit was found to be 0.233 for predicting MACCE, with 56.2% sensitivity and 68.0% specificity. High plateletcrit levels were demonstrated to be statistically higher in patients with MACCE (0.247 in the MACCE (+) group vs. 0.213 in the MACCE (–) group, p < 0.001). In the Kaplan–Meier survival analysis, the long-term mortality rate was higher in the high plateletcrit group ( p = 0.006). Multivariate regression analysis showed that plateletcrit was independently associated with MACCE (OR: 2.196, CI: 1.200–4.018; p = 0.011). Conclusions Our data suggest that plateletcrit has an independently predictive value for long-term mortality and MACCE, and it can be used as a marker to predict the long-term adverse outcomes in patients with carotid stenosis.


2015 ◽  
Vol 3 (Suppl 1) ◽  
pp. A11
Author(s):  
R Lohse ◽  
MB Damholt ◽  
J Wiis ◽  
A Perner ◽  
T Lange ◽  
...  

Heart ◽  
2015 ◽  
Vol 102 (3) ◽  
pp. 204-208 ◽  
Author(s):  
Joseph T Knapper ◽  
Faisal Khosa ◽  
Michael J Blaha ◽  
Taylor A Lebeis ◽  
Jenna Kay ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document