scholarly journals A Comprehensive Typing System for Information Extraction from Clinical Narratives

2019 ◽  
Author(s):  
J. Harry Caufield ◽  
Yichao Zhou ◽  
Yunsheng Bai ◽  
David A. Liem ◽  
Anders O. Garlid ◽  
...  

AbstractWe have developed ACROBAT (Annotation for Case Reports using Open Biomedical Annotation Terms), a typing system for detailed information extraction from clinical text. This resource supports detailed identification and categorization of entities, events, and relations within clinical text documents, including clincal case reports (CCRs) and the free-text components of electronic health records. Using ACROBAT and the text of 200 CCRs, we annotated a wide variety of real-world clinical disease presentations. The resulting dataset, MACCROBAT2018, is a rich collection of annotated clinical language appropriate for training biomedical natural language processing systems.

2020 ◽  
Vol 23 (1) ◽  
pp. 21-26 ◽  
Author(s):  
Nemanja Vaci ◽  
Qiang Liu ◽  
Andrey Kormilitzin ◽  
Franco De Crescenzo ◽  
Ayse Kurtulmus ◽  
...  

BackgroundUtilisation of routinely collected electronic health records from secondary care offers unprecedented possibilities for medical science research but can also present difficulties. One key issue is that medical information is presented as free-form text and, therefore, requires time commitment from clinicians to manually extract salient information. Natural language processing (NLP) methods can be used to automatically extract clinically relevant information.ObjectiveOur aim is to use natural language processing (NLP) to capture real-world data on individuals with depression from the Clinical Record Interactive Search (CRIS) clinical text to foster the use of electronic healthcare data in mental health research.MethodsWe used a combination of methods to extract salient information from electronic health records. First, clinical experts define the information of interest and subsequently build the training and testing corpora for statistical models. Second, we built and fine-tuned the statistical models using active learning procedures.FindingsResults show a high degree of accuracy in the extraction of drug-related information. Contrastingly, a much lower degree of accuracy is demonstrated in relation to auxiliary variables. In combination with state-of-the-art active learning paradigms, the performance of the model increases considerably.ConclusionsThis study illustrates the feasibility of using the natural language processing models and proposes a research pipeline to be used for accurately extracting information from electronic health records.Clinical implicationsReal-world, individual patient data are an invaluable source of information, which can be used to better personalise treatment.


Stroke ◽  
2020 ◽  
Vol 51 (Suppl_1) ◽  
Author(s):  
Zhongyu Anna Liu ◽  
Muhammad Mamdani ◽  
Richard Aviv ◽  
Chloe Pou-Prom ◽  
Amy Yu

Introduction: Diagnostic imaging reports contain important data for stroke surveillance and clinical research but converting a large amount of free-text data into structured data with manual chart abstraction is resource-intensive. We determined the accuracy of CHARTextract, a natural language processing (NLP) tool, to extract relevant stroke-related attributes from full reports of computed tomograms (CT), CT angiograms (CTA), and CT perfusion (CTP) performed at a tertiary stroke centre. Methods: We manually extracted data from full reports of 1,320 consecutive CT/CTA/CTP performed between October 2017 and January 2019 in patients presenting with acute stroke. Trained chart abstractors collected data on the presence of anterior proximal occlusion, basilar occlusion, distal intracranial occlusion, established ischemia, haemorrhage, the laterality of these lesions, and ASPECT scores, all of which were used as a reference standard. Reports were then randomly split into a training set (n= 921) and validation set (n= 399). We used CHARTextract to extract the same attributes by creating rule-based information extraction pipelines. The rules were human-defined and created through an iterative process in the training sample and then validated in the validation set. Results: The prevalence of anterior proximal occlusion was 12.3% in the dataset (n=86 left, n=72 right, and n=4 bilateral). In the training sample, CHARTextract identified this attribute with an overall accuracy of 97.3% (PPV 84.1% and NPV 99.4%, sensitivity 95.5% and specificity 97.5%). In the validation set, the overall accuracy was 95.2% (PPV 76.3% and NPV 98.5%, sensitivity 90.0% and specificity 96.0%). Conclusions: We showed that CHARTextract can identify the presence of anterior proximal vessel occlusion with high accuracy, suggesting that NLP can be used to automate the process of data collection for stroke research. We will present the accuracy of CHARTextract for the remaining neurological attributes at ISC 2020.


2021 ◽  
Vol 89 (9) ◽  
pp. S155
Author(s):  
Nicolas Nunez ◽  
Joanna M. Biernacka ◽  
Manuel Gardea-Resendez ◽  
Bhavani Singh Agnikula Kshatriya ◽  
Euijung Ryu ◽  
...  

2021 ◽  
Author(s):  
Marika Cusick ◽  
Sumithra Velupillai ◽  
Johnny Downs ◽  
Thomas Campion ◽  
Rina Dutta ◽  
...  

Abstract In the global effort to prevent death by suicide, many academic medical institutions are implementing natural language processing (NLP) approaches to detect suicidality from unstructured clinical text in electronic health records (EHRs), with the hope of targeting timely, preventative interventions to individuals most at risk of suicide. Despite the international need, the development of these NLP approaches in EHRs has been largely local and not shared across healthcare systems. In this study, we developed a process to share NLP approaches that were individually developed at King’s College London (KCL), UK and Weill Cornell Medicine (WCM), US - two academic medical centers based in different countries with vastly different healthcare systems. After a successful technical porting of the NLP approaches, our quantitative evaluation determined that independently developed NLP approaches can detect suicidality at another healthcare organization with a different EHR system, clinical documentation processes, and culture, yet do not achieve the same level of success as at the institution where the NLP algorithm was developed (KCL approach: F1-score 0.85 vs. 0.68, WCM approach: F1-score 0.87 vs. 0.72). Shared use of these NLP approaches is a critical step forward towards improving data-driven algorithms for early suicide risk identification and timely prevention.


2019 ◽  
Vol 28 (8) ◽  
pp. 1143-1151 ◽  
Author(s):  
Brian Hazlehurst ◽  
Carla A. Green ◽  
Nancy A. Perrin ◽  
John Brandes ◽  
David S. Carrell ◽  
...  

2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e14062-e14062
Author(s):  
Meng Ma ◽  
Arielle Redfern ◽  
Xiang Zhou ◽  
Dan Li ◽  
Ying Ru ◽  
...  

e14062 Background: Real world evidence generated from electronic health records (EHRs) is playing an increasing role in health care decisions. It has been recognized as an essential element to assess cancer outcomes in real-world settings. Automatically abstracting outcomes from notes is becoming a fundamental challenge in medical informatics. In this study, we aim to develop a system to automatically abstract outcomes (Progression, Response, Stable Disease) from notes in lung cancer. Methods: A lung cancer cohort (n = 5,003) was obtained from the Mount Sinai Data Warehouse. The progress, pathology and radiology notes of patients were used. We integrated various techniques of Natural Language Processing (NLP) and Artificial Intelligence (AI) and developed a system to automatically abstract outcomes. The corresponding images, biopsies and lines of treatments (LOTs) were abstracted as attributes of outcomes. This system includes four information models: 1. Customized NLP annotator model: preprocessor, section detector, sentence splitter, named entity recognition, relation detector; CRF and LSTM methods were applied to recognize entities and relations. 2. Clinical Outcome container model: biopsy evidence extractor, lines of treatment detector, image evidence extractor, clinical outcome event recognizer, date detector, and temporal reasoning; Domain-specific rules were crafted to automatically infer outcomes. 3. Document Summarizer; 4. Longitudinal Outcome Summarizer. Results: To evaluate the outcomes abstracted, we curated a subset (n = 792) from patient cohort for which LOTs were available. About 61% of the outcomes identified were supported by radiologic images (time window = ±14 days) or biopsy pathology results (time window = ±100 days). In 91% (720/792) of patients, Progression was abstracted within a time window of 90 days prior to first-line treatment. Also, 72% of the Progression events identified were accompanied by a downstream event (e.g., treatment change or death). We randomly selected 250 outcomes for manual curation, and 197 outcomes were assessed to be correct (precision = 79%). Moreover, our automated abstraction system improved human abstractor efficiency to curate outcomes, reducing curation time per patient by 90%. Conclusions: We have demonstrated the feasibility and effectiveness of NLP and AI approaches to abstract outcomes from lung cancer EHR data. It promises to automatically abstract outcomes and other clinical entities from notes across all cancers.


2019 ◽  
Vol 26 (4) ◽  
pp. 364-379 ◽  
Author(s):  
Theresa A Koleck ◽  
Caitlin Dreisbach ◽  
Philip E Bourne ◽  
Suzanne Bakken

Abstract Objective Natural language processing (NLP) of symptoms from electronic health records (EHRs) could contribute to the advancement of symptom science. We aim to synthesize the literature on the use of NLP to process or analyze symptom information documented in EHR free-text narratives. Materials and Methods Our search of 1964 records from PubMed and EMBASE was narrowed to 27 eligible articles. Data related to the purpose, free-text corpus, patients, symptoms, NLP methodology, evaluation metrics, and quality indicators were extracted for each study. Results Symptom-related information was presented as a primary outcome in 14 studies. EHR narratives represented various inpatient and outpatient clinical specialties, with general, cardiology, and mental health occurring most frequently. Studies encompassed a wide variety of symptoms, including shortness of breath, pain, nausea, dizziness, disturbed sleep, constipation, and depressed mood. NLP approaches included previously developed NLP tools, classification methods, and manually curated rule-based processing. Only one-third (n = 9) of studies reported patient demographic characteristics. Discussion NLP is used to extract information from EHR free-text narratives written by a variety of healthcare providers on an expansive range of symptoms across diverse clinical specialties. The current focus of this field is on the development of methods to extract symptom information and the use of symptom information for disease classification tasks rather than the examination of symptoms themselves. Conclusion Future NLP studies should concentrate on the investigation of symptoms and symptom documentation in EHR free-text narratives. Efforts should be undertaken to examine patient characteristics and make symptom-related NLP algorithms or pipelines and vocabularies openly available.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Martijn G. Kersloot ◽  
Florentien J. P. van Putten ◽  
Ameen Abu-Hanna ◽  
Ronald Cornet ◽  
Derk L. Arts

Abstract Background Free-text descriptions in electronic health records (EHRs) can be of interest for clinical research and care optimization. However, free text cannot be readily interpreted by a computer and, therefore, has limited value. Natural Language Processing (NLP) algorithms can make free text machine-interpretable by attaching ontology concepts to it. However, implementations of NLP algorithms are not evaluated consistently. Therefore, the objective of this study was to review the current methods used for developing and evaluating NLP algorithms that map clinical text fragments onto ontology concepts. To standardize the evaluation of algorithms and reduce heterogeneity between studies, we propose a list of recommendations. Methods Two reviewers examined publications indexed by Scopus, IEEE, MEDLINE, EMBASE, the ACM Digital Library, and the ACL Anthology. Publications reporting on NLP for mapping clinical text from EHRs to ontology concepts were included. Year, country, setting, objective, evaluation and validation methods, NLP algorithms, terminology systems, dataset size and language, performance measures, reference standard, generalizability, operational use, and source code availability were extracted. The studies’ objectives were categorized by way of induction. These results were used to define recommendations. Results Two thousand three hundred fifty five unique studies were identified. Two hundred fifty six studies reported on the development of NLP algorithms for mapping free text to ontology concepts. Seventy-seven described development and evaluation. Twenty-two studies did not perform a validation on unseen data and 68 studies did not perform external validation. Of 23 studies that claimed that their algorithm was generalizable, 5 tested this by external validation. A list of sixteen recommendations regarding the usage of NLP systems and algorithms, usage of data, evaluation and validation, presentation of results, and generalizability of results was developed. Conclusion We found many heterogeneous approaches to the reporting on the development and evaluation of NLP algorithms that map clinical text to ontology concepts. Over one-fourth of the identified publications did not perform an evaluation. In addition, over one-fourth of the included studies did not perform a validation, and 88% did not perform external validation. We believe that our recommendations, alongside an existing reporting standard, will increase the reproducibility and reusability of future studies and NLP algorithms in medicine.


BMJ Open ◽  
2019 ◽  
Vol 9 (4) ◽  
pp. e023232 ◽  
Author(s):  
Beata Fonferko-Shadrach ◽  
Arron S Lacey ◽  
Angus Roberts ◽  
Ashley Akbari ◽  
Simon Thompson ◽  
...  

ObjectiveRoutinely collected healthcare data are a powerful research resource but often lack detailed disease-specific information that is collected in clinical free text, for example, clinic letters. We aim to use natural language processing techniques to extract detailed clinical information from epilepsy clinic letters to enrich routinely collected data.DesignWe used the general architecture for text engineering (GATE) framework to build an information extraction system, ExECT (extraction of epilepsy clinical text), combining rule-based and statistical techniques. We extracted nine categories of epilepsy information in addition to clinic date and date of birth across 200 clinic letters. We compared the results of our algorithm with a manual review of the letters by an epilepsy clinician.SettingDe-identified and pseudonymised epilepsy clinic letters from a Health Board serving half a million residents in Wales, UK.ResultsWe identified 1925 items of information with overall precision, recall and F1 score of 91.4%, 81.4% and 86.1%, respectively. Precision and recall for epilepsy-specific categories were: epilepsy diagnosis (88.1%, 89.0%), epilepsy type (89.8%, 79.8%), focal seizures (96.2%, 69.7%), generalised seizures (88.8%, 52.3%), seizure frequency (86.3%–53.6%), medication (96.1%, 94.0%), CT (55.6%, 58.8%), MRI (82.4%, 68.8%) and electroencephalogram (81.5%, 75.3%).ConclusionsWe have built an automated clinical text extraction system that can accurately extract epilepsy information from free text in clinic letters. This can enhance routinely collected data for research in the UK. The information extracted with ExECT such as epilepsy type, seizure frequency and neurological investigations are often missing from routinely collected data. We propose that our algorithm can bridge this data gap enabling further epilepsy research opportunities. While many of the rules in our pipeline were tailored to extract epilepsy specific information, our methods can be applied to other diseases and also can be used in clinical practice to record patient information in a structured manner.


Sign in / Sign up

Export Citation Format

Share Document