scholarly journals Identify Patients with Congestive Heart Failure through Analyzing Free-Text Clinical Notes

Author(s):  
Margot Yann ◽  
Therese Stukel ◽  
Liisa Jaakkimainen ◽  
Karen Tu

IntroductionA number of challenges exist in analyzing unstructured free text data in electronic medical records (EMRs). EMR text are difficult to represent and model due to their high dimensionality, heterogeneity, sparsity, incompleteness, random errors and the presence of noise. Objectives and ApproachStandard Natural Language Processing (NLP) tools make errors when applied to clinical notes due to physician use of unconventional language, involving polysemy, abbreviations, ambiguity, misspelling, variations, and negation. This paper presents a novel NLP framework, “Clinical Learning On Natural Expression” (CLONE), to automatically learn from a large primary care EMR database, analyzing free text clinical notes from primary care practices. CLONE’s predictive clinical models using text mining and neural network approach to extract features to identify patterns. To demonstrate effectiveness, we evaluate CLONE’s ability in a case study to identify patients with a specific chronic condition: congestive heart failure (CHF). ResultsA random selected sample of 7500 patients from Electronic Medical Record Administrative data Linked Database (EMRALD) is used. In this dataset, each patient’s medical chart includes a reference standard, manually reviewed by medical practitioners. Prevalence of CHF is approximately 2%. The low prevalence leads to another challenging problem in machine learning: imbalanced datasets. After pre-processing, we build deep learning models to represent and extract important medical information from free text to identify CHF patients through analyzing patient charts. We evaluated the effectiveness of CLONE by comparing the predicted labels with the standard references on a holdout test dataset. Comparing it with a number of alternative algorithms, we improve the overall accuracy to over 90% on a test dataset. Conclusion/ImplicationsAs the role of NLP in EMR data expands, the CLONE natural language processing framework can lead to substantial reduction in manual processing, while improving predictive accuracy.

2013 ◽  
Vol 52 (01) ◽  
pp. 33-42 ◽  
Author(s):  
M.-H. Kuo ◽  
P. Gooch ◽  
J. St-Maurice

SummaryObjective: The objective of this study was to undertake a proof of concept that demonstrated the use of primary care data and natural language processing and term extraction to assess emergency room use. The study extracted biopsychosocial concepts from primary care free text and related them to inappropriate emergency room use through the use of odds ratios.Methods: De-identified free text notes were extracted from a primary care clinic in Guelph, Ontario and analyzed with a software toolkit that incorporated General Architecture for Text Engineering (GATE) and MetaMap components for natural language processing and term extraction.Results: Over 10 million concepts were extracted from 13,836 patient records. Codes found in at least 1% percent of the sample were regressed against inappropriate emergency room use. 77 codes fell within the realm of biopsychosocial, were very statistically significant (p < 0.001) and had an OR > 2.0. Thematically, these codes involved mental health and pain related concepts.Conclusions: Analyzed thematically, mental health issues and pain are important themes; we have concluded that pain and mental health problems are primary drivers for inappropriate emergency room use. Age and sex were not significant. This proof of concept demonstrates the feasibly of combining natural language processing and primary care data to analyze a system use question. As a first work it supports further research and could be applied to investigate other, more complex problems.


2021 ◽  
Author(s):  
Sena Chae ◽  
Jiyoun Song ◽  
Marietta Ojo ◽  
Maxim Topaz

The goal of this natural language processing (NLP) study was to identify patients in home healthcare with heart failure symptoms and poor self-management (SM). The preliminary lists of symptoms and poor SM status were identified, NLP algorithms were used to refine the lists, and NLP performance was evaluated using 2.3 million home healthcare clinical notes. The overall precision to identify patients with heart failure symptoms and poor SM status was 0.86. The feasibility of methods was demonstrated to identify patients with heart failure symptoms and poor SM documented in home healthcare notes. This study facilitates utilizing key symptom information and patients’ SM status from unstructured data in electronic health records. The results of this study can be applied to better individualize symptom management to support heart failure patients’ quality-of-life.


Author(s):  
Jennifer Hornung Garvin ◽  
Youngjun Kim ◽  
Glenn Temple Gobbel ◽  
Michael E Matheny ◽  
Andrew Redd ◽  
...  

BACKGROUND We developed an accurate, stakeholder-informed, automated, natural language processing (NLP) system to measure the quality of heart failure (HF) inpatient care, and explored the potential for adoption of this system within an integrated health care system. OBJECTIVE To accurately automate a United States Department of Veterans Affairs (VA) quality measure for inpatients with HF. METHODS We automated the HF quality measure Congestive Heart Failure Inpatient Measure 19 (CHI19) that identifies whether a given patient has left ventricular ejection fraction (LVEF) <40%, and if so, whether an angiotensin-converting enzyme inhibitor or angiotensin-receptor blocker was prescribed at discharge if there were no contraindications. We used documents from 1083 unique inpatients from eight VA medical centers to develop a reference standard (RS) to train (n=314) and test (n=769) the Congestive Heart Failure Information Extraction Framework (CHIEF). We also conducted semi-structured interviews (n=15) for stakeholder feedback on implementation of the CHIEF. RESULTS The CHIEF classified each hospitalization in the test set with a sensitivity (SN) of 98.9% and positive predictive value of 98.7%, compared with an RS and SN of 98.5% for available External Peer Review Program assessments. Of the 1083 patients available for the NLP system, the CHIEF evaluated and classified 100% of cases. Stakeholders identified potential implementation facilitators and clinical uses of the CHIEF. CONCLUSIONS The CHIEF provided complete data for all patients in the cohort and could potentially improve the efficiency, timeliness, and utility of HF quality measurements.


2021 ◽  
pp. 379-393
Author(s):  
Jiaming Zeng ◽  
Imon Banerjee ◽  
A. Solomon Henry ◽  
Douglas J. Wood ◽  
Ross D. Shachter ◽  
...  

PURPOSE Knowing the treatments administered to patients with cancer is important for treatment planning and correlating treatment patterns with outcomes for personalized medicine study. However, existing methods to identify treatments are often lacking. We develop a natural language processing approach with structured electronic medical records and unstructured clinical notes to identify the initial treatment administered to patients with cancer. METHODS We used a total number of 4,412 patients with 483,782 clinical notes from the Stanford Cancer Institute Research Database containing patients with nonmetastatic prostate, oropharynx, and esophagus cancer. We trained treatment identification models for each cancer type separately and compared performance of using only structured, only unstructured ( bag-of-words, doc2vec, fasttext), and combinations of both ( structured + bow, structured + doc2vec, structured + fasttext). We optimized the identification model among five machine learning methods (logistic regression, multilayer perceptrons, random forest, support vector machines, and stochastic gradient boosting). The treatment information recorded in the cancer registry is the gold standard and compares our methods to an identification baseline with billing codes. RESULTS For prostate cancer, we achieved an f1-score of 0.99 (95% CI, 0.97 to 1.00) for radiation and 1.00 (95% CI, 0.99 to 1.00) for surgery using structured + doc2vec. For oropharynx cancer, we achieved an f1-score of 0.78 (95% CI, 0.58 to 0.93) for chemoradiation and 0.83 (95% CI, 0.69 to 0.95) for surgery using doc2vec. For esophagus cancer, we achieved an f1-score of 1.0 (95% CI, 1.0 to 1.0) for both chemoradiation and surgery using all combinations of structured and unstructured data. We found that employing the free-text clinical notes outperforms using the billing codes or only structured data for all three cancer types. CONCLUSION Our results show that treatment identification using free-text clinical notes greatly improves upon the performance using billing codes and simple structured data. The approach can be used for treatment cohort identification and adapted for longitudinal cancer treatment identification.


2020 ◽  

The utility of cardiac MRI (CMR) in patients with heart failure has been well demonstrated and continues to expand as MRI techniques evolve. Its main superiorities in this patient population include: accurate and reproducible quantification of ventricular systolic functions; enhanced discrimination of abnormal myocardial tissue characteristics (i.e., oedema, interstitial fibrosis, and replacement fibrosis); and assessment of valvular function/morphology, endocardium and pericardium in a single scan.1,2 CMR is now an essential part of the diagnosis of various types of heart failure, including cardiac amyloidosis, cardiac sarcoidosis, myocarditis, arrhythmogenic right ventricular cardiomyopathy, and iron overload cardiomyopathy. CMR findings also have prognostic implications, such as in hypertrophic cardiomyopathy.1,2These have resulted in an increasing demand and utility of CMR in routine clinical practice. However, the synthesis of imaging findings into a final or differential diagnosis is typically written in free-text, resulting in difficulties with accurately categorising cardiomyopathy types by generic query algorithms. Natural language processing (NLP) is an analytical method that has been used to develop computer-based algorithms that handle and transform natural linguistics so that the information can be used for computation.3 It enables gathering and combining of information extracted from various online databases, and helps create solid outputs that could serve as research endpoints, including sample identification and variable collection. In the field of imaging, NLP may also have several clinical applications, such as highlighting and classifying imaging findings, generating follow-up recommendations, imaging protocols, and survival prediction models.4


2018 ◽  
Author(s):  
Tao Chen ◽  
Mark Dredze ◽  
Jonathan P Weiner ◽  
Leilani Hernandez ◽  
Joe Kimura ◽  
...  

BACKGROUND Geriatric syndromes in older adults are associated with adverse outcomes. However, despite being reported in clinical notes, these syndromes are often poorly captured by diagnostic codes in the structured fields of electronic health records (EHRs) or administrative records. OBJECTIVE We aim to automatically determine if a patient has any geriatric syndromes by mining the free text of associated EHR clinical notes. We assessed which statistical natural language processing (NLP) techniques are most effective. METHODS We applied conditional random fields (CRFs), a widely used machine learning algorithm, to identify each of 10 geriatric syndrome constructs in a clinical note. We assessed three sets of features and attributes for CRF operations: a base set, enhanced token, and contextual features. We trained the CRF on 3901 manually annotated notes from 85 patients, tuned the CRF on a validation set of 50 patients, and evaluated it on 50 held-out test patients. These notes were from a group of US Medicare patients over 65 years of age enrolled in a Medicare Advantage Health Maintenance Organization and cared for by a large group practice in Massachusetts. RESULTS A final feature set was formed through comprehensive feature ablation experiments. The final CRF model performed well at patient-level determination (macroaverage F1=0.834, microaverage F1=0.851); however, performance varied by construct. For example, at phrase-partial evaluation, the CRF model worked well on constructs such as absence of fecal control (F1=0.857) and vision impairment (F1=0.798) but poorly on malnutrition (F1=0.155), weight loss (F1=0.394), and severe urinary control issues (F1=0.532). Errors were primarily due to previously unobserved words (ie, out-of-vocabulary) and a lack of context. CONCLUSIONS This study shows that statistical NLP can be used to identify geriatric syndromes from EHR-extracted clinical notes. This creates new opportunities to identify patients with geriatric syndromes and study their health outcomes.


2020 ◽  
Vol 6 ◽  
pp. 233372142095986
Author(s):  
Maxim Topaz ◽  
Victoria Adams ◽  
Paula Wilson ◽  
Kyungmi Woo ◽  
Miriam Ryvicker

Background: Little is known about symptom documentation related to Alzheimer’s disease and related dementias (ADRD) by home healthcare (HHC) clinicians. Objective: This study: (1) developed a natural language processing (NLP) algorithm that identifies common neuropsychiatric symptoms of ADRD in HHC free-text clinical notes; (2) described symptom clusters and hospitalization or emergency department (ED) visit rates for patients with and without these symptoms. Method: We examined a corpus of −2.6 million free-text notes for 112,237 HHC episodes among 89,459 patients admitted to a non-profit HHC agency for post-acute care with any diagnosis. We used NLP software (NimbleMiner) to construct indicators of six neuropsychiatric symptoms. Structured HHC assessment data were used to identify known ADRD diagnoses and construct measures of hospitalization/ED use during HHC. Results: Neuropsychiatric symptoms were documented for 40% of episodes. Common clusters included impaired memory, anxiety and/or depressed mood. One in three episodes without an ADRD diagnosis had documented symptoms. Hospitalization/ED rates increased with one or more symptoms present. Conclusion: HHC providers should examine episodes with neuropsychiatric symptoms but no ADRD diagnoses to determine whether ADRD diagnosis was missed or to recommend ADRD evaluation. NLP-generated symptom indicators can help to identify high-risk patients for targeted interventions.


Sign in / Sign up

Export Citation Format

Share Document