Use of Natural Language Processing to Translate Clinical Information from a Database of 889,921 Chest Radiographic Reports

Abstract Background Merging disparate and heterogeneous datasets from clinical routine in a standardized and semantically enriched format to enable a multiple use of data also means incorporating unstructured data such as medical free texts. Although the extraction of structured data from texts, known as natural language processing (NLP), has been researched at least for the English language extensively, it is not enough to get a structured output in any format. NLP techniques need to be used together with clinical information standards such as openEHR to be able to reuse and exchange still unstructured data sensibly. Objectives The aim of the study is to automatically extract crucial information from medical free texts and to transform this unstructured clinical data into a standardized and structured representation by designing and implementing an exemplary pipeline for the processing of pediatric medical histories. Methods We constructed a pipeline that allows reusing medical free texts such as pediatric medical histories in a structured and standardized way by (1) selecting and modeling appropriate openEHR archetypes as standard clinical information models, (2) defining a German dictionary with crucial text markers serving as expert knowledge base for a NLP pipeline, and (3) creating mapping rules between the NLP output and the archetypes. The approach was evaluated in a first pilot study by using 50 manually annotated medical histories from the pediatric intensive care unit of the Hannover Medical School. Results We successfully reused 24 existing international archetypes to represent the most crucial elements of unstructured pediatric medical histories in a standardized form. The self-developed NLP pipeline was constructed by defining 3.055 text marker entries, 132 text events, 66 regular expressions, and a text corpus consisting of 776 entries for automatic correction of spelling mistakes. A total of 123 mapping rules were implemented to transform the extracted snippets to an openEHR-based representation to be able to store them together with other structured data in an existing openEHR-based data repository. In the first evaluation, the NLP pipeline yielded 97% precision and 94% recall. Conclusion The use of NLP and openEHR archetypes was demonstrated as a viable approach for extracting and representing important information from pediatric medical histories in a structured and semantically enriched format. We designed a promising approach with potential to be generalized, and implemented a prototype that is extensible and reusable for other use cases concerning German medical free texts. In a long term, this will harness unstructured clinical data for further research purposes such as the design of clinical decision support systems. Together with structured data already integrated in openEHR-based representations, we aim at developing an interoperable openEHR-based application that is capable of automatically assessing a patient's risk status based on the patient's medical history at time of admission.

Download Full-text

Clinical Natural Language Processing in 2015: Leveraging the Variety of Texts of Clinical Interest

Yearbook of Medical Informatics ◽

10.15265/iy-2016-049 ◽

2016 ◽

Vol 25 (01) ◽

pp. 234-239 ◽

Cited By ~ 8

Author(s):

P. Zweigenbaum ◽

A. Névéol ◽

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Information ◽

Bibliographic Databases ◽

Sources Of Information ◽

Multiple Sources ◽

Clinical Interest ◽

Clinical Natural Language Processing ◽

Selection Of

Summary Objective: To summarize recent research and present a selection of the best papers published in 2015 in the field of clinical Natural Language Processing (NLP). Method: A systematic review of the literature was performed by the two section editors of the IMIA Yearbook NLP section by searching bibliographic databases with a focus on NLP efforts applied to clinical texts or aimed at a clinical outcome. Section editors first selected a shortlist of candidate best papers that were then peer-reviewed by independent external reviewers. Results: The clinical NLP best paper selection shows that clinical NLP is making use of a variety of texts of clinical interest to contribute to the analysis of clinical information and the building of a body of clinical knowledge. The full review process highlighted five papers analyzing patient-authored texts or seeking to connect and aggregate multiple sources of information. They provide a contribution to the development of methods, resources, applications, and sometimes a combination of these aspects. Conclusions: The field of clinical NLP continues to thrive through the contributions of both NLP researchers and healthcare professionals interested in applying NLP techniques to impact clinical practice. Foundational progress in the field makes it possible to leverage a larger variety of texts of clinical interest for healthcare purposes.

Download Full-text

UTA DLNLP at SemEval-2016 Task 12: Deep Learning Based Natural Language Processing System for Clinical Information Identification from Clinical Notes and Pathology Reports

10.18653/v1/s16-1197 ◽

2016 ◽

Cited By ~ 4

Author(s):

Peng Li ◽

Heng Huang

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Processing System ◽

Clinical Information ◽

Clinical Notes ◽

Natural Language Processing System ◽

Pathology Reports

Download Full-text

Natural Language Processing and Technical Challenges of Influenza-Like Illness Surveillance

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v8i1.6575 ◽

2016 ◽

Vol 8 (1) ◽

Author(s):

Dino P. Rumoro ◽

Gillian S. Gibbs ◽

Shital C. Shah ◽

Marilyn M. Hallock ◽

Gordon M. Trenholme ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Information ◽

Free Text ◽

Surveillance Systems ◽

Clinical Text ◽

Influenza Like Illness ◽

Technical Issues ◽

Original Information

Processing free-text clinical information in an electronic medical record may enhance surveillance systems for early identification of influenza-like illness outbreaks. However, processing clinical text using natural language processing (NLP) poses a challenge in preserving the semantics of the original information recorded. In this study, we discuss several NLP and technical issues as well as potential solutions for implementation in syndromic surveillance systems.

Download Full-text

Obtaining structured clinical data from unstructured data using natural language processing software

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.381 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Arron S Lacey ◽

Beata Fonferko-Shadrach ◽

Ronan A Lyons ◽

Mike P Kerr ◽

David V Ford ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Age Of Onset ◽

Focal Epilepsy ◽

Clinical Information ◽

Unstructured Data ◽

Free Text ◽

Epilepsy Diagnosis ◽

Generalized Epilepsy

ABSTRACT BackgroundFree text documents in healthcare settings contain a wealth of information not captured in electronic healthcare records (EHRs). Epilepsy clinic letters are an example of an unstructured data source containing a large amount of intricate disease information. Extracting meaningful and contextually correct clinical information from free text sources, to enhance EHRs, remains a significant challenge. SCANR (Swansea University Collaborative in the Analysis of NLP Research) was set up to use natural language processing (NLP) technology to extract structured data from unstructured sources. IBM Watson Content Analytics software (ICA) uses NLP technology. It enables users to define annotations based on dictionaries and language characteristics to create parsing rules that highlight relevant items. These include clinical details such as symptoms and diagnoses, medication and test results, as well as personal identifiers. ApproachTo use ICA to build a pipeline to accurately extract detailed epilepsy information from clinic letters. MethodsWe used ICA to retrieve important epilepsy information from 41 pseudo-anonymized unstructured epilepsy clinic letters. The 41 letters consisted of 13 ‘new’ and 28 ‘follow-up’ letters (for 15 different patients) written by 12 different doctors in different styles. We designed dictionaries and annotators to enable ICA to extract epilepsy type (focal, generalized or unclassified), epilepsy cause, age of onset, investigation results (EEG, CT and MRI), medication, and clinic date. Epilepsy clinicians assessed the accuracy of the pipeline. ResultsThe accuracy (sensitivity, specificity) of each concept was: epilepsy diagnosis 98% (97%, 100%), focal epilepsy 100%, generalized epilepsy 98% (93%, 100%), medication 95% (93%, 100%), age of onset 100% and clinic date 95% (95%, 100%). Precision and recall for each concept were respectively, 98% and 97% for epilepsy diagnosis, 100% each for focal epilepsy, 100% and 93% for generalized epilepsy, 100% each for age of onset, 100% and 93% for medication, 100% and 96% for EEG results, 100% and 83% for MRI scan results, and 100% and 95% for clinic date. Conclusions ICA is capable of extracting detailed, structured epilepsy information from unstructured clinic letters to a high degree of accuracy. This data can be used to populate relational databases and be linked to EHRs. Researchers can build in custom rules to identify concepts of interest from letters and produce structured information. We plan to extend our work to hundreds and then thousands of clinic letters, to provide phenotypically rich epilepsy data to link with other anonymised, routinely collected data.

Download Full-text

Using natural language processing to extract structured epilepsy data from unstructured clinic letters

International Journal for Population Data Science ◽

10.23889/ijpds.v3i4.699 ◽

2018 ◽

Vol 3 (4) ◽

Cited By ~ 1

Author(s):

Beata Fonferko-Shadrach ◽

Arron Lacey ◽

Ashley Akbari ◽

Simon Thompson ◽

David Ford ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Clinical Information ◽

Training Sample ◽

Healthcare Research ◽

Free Text ◽

Specific Information ◽

Data Types

IntroductionElectronic health records (EHR) are a powerful resource in enabling large-scale healthcare research. EHRs often lack detailed disease-specific information that is collected in free text within clinical settings. This challenge can be addressed by using Natural Language Processing (NLP) to derive and extract detailed clinical information from free text. Objectives and ApproachUsing a training sample of 40 letters, we used the General Architecture for Text Engineering (GATE) framework to build custom rule sets for nine categories of epilepsy information as well as clinic date and date of birth. We used a validation set of 200 clinic letters to compare the results of our algorithm to a separate manual review by a clinician, where we evaluated a “per item” and a “per letter” approach for each category. ResultsThe “per letter” approach identified 1,939 items of information with overall precision, recall and F1-score of 92.7%, 77.7% and 85.6%. Precision and recall for epilepsy specific categories were: diagnosis (85.3%,92.4%), type (93.7%,83.2%), focal seizure (99.0%,68.3%), generalised seizure (92.5%,57.0%), seizure frequency (92.0%,52.3%), medication (96.1%,94.0%), CT (66.7%,47.1%), MRI (96.6%,51.4%) and EEG (95.8%,40.6%). By combining all items per category, per letter we were able to achieve higher precision, recall and F1-scores of 94.6%, 84.2% and 89.0% across all categories. Conclusion/ImplicationsOur results demonstrate that NLP techniques can be used to accurately extract rich phenotypic details from clinic letters that is often missing from routinely-collected data. Capturing these new data types provides a platform for conducting novel precision neurology research, in addition to potential applicability to other disease areas.

Download Full-text

An Implementation of Natural Language Processing and Text Mining in Stroke Research

Journal of the Korean Neurological Association ◽

10.17340/jkna.2021.3.2 ◽

2021 ◽

Vol 39 (3) ◽

pp. 121-128

Author(s):

Chulho Kim

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Information ◽

Clinical Decision ◽

Health Records ◽

Natural Language Text ◽

Stroke Research ◽

Text Preprocessing ◽

Language Text

Natural language processing (NLP) is a computerized approach to analyzing text that explores how computers can be used to understand and manipulate natural language text or speech to do useful things. In healthcare field, these NLP techniques are applied in a variety of applications, ranging from evaluating the adequacy of treatment, assessing the presence of the acute illness, and the other clinical decision support. After converting text into computer-readable data through the text preprocessing process, an NLP can extract valuable information using the rule-based algorithm, machine learning, and neural network. We can use NLP to distinguish subtypes of stroke or accurately extract critical clinical information such as severity of stroke and prognosis of patients, etc. If these NLP methods are actively utilized in the future, they will be able to make the most of the electronic health records to enable optimal medical judgment.

Download Full-text

Clinical Characteristics and Prognostic Factors for ICU Admission of Patients with Covid-19 Using Machine Learning and Natural Language Processing

10.1101/2020.05.22.20109959 ◽

2020 ◽

Cited By ~ 2

Author(s):

Jose L. Izquierdo ◽

Julio Ancochea ◽

Joan B. Soriano ◽

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Information ◽

Epidemiological Methods ◽

Free Text ◽

Icu Admission ◽

La Mancha ◽

Learning Data

ABSTRACTThere remain many unknowns regarding the onset and clinical course of the ongoing COVID-19 pandemic. We used a combination of classic epidemiological methods, natural language processing (NLP), and machine learning (for predictive modeling), to analyse the electronic health records (EHRs) of patients with COVID-19.We explored the unstructured free text in the EHRs within the SESCAM Healthcare Network (Castilla La-Mancha, Spain) from the entire population with available EHRs (1,364,924 patients) from January 1st to March 29th, 2020. We extracted related clinical information upon diagnosis, progression and outcome for all COVID-19 cases, focusing in those requiring ICU admission.A total of 10,504 patients with a clinical or PCR-confirmed diagnosis of COVID-19 were identified, 52.5% males, with age of 58.2±19.7 years. Upon admission, the most common symptoms were cough, fever, and dyspnoea, but all in less than half of cases. Overall, 6% of hospitalized patients required ICU admission. Using a machine-learning, data-driven algorithm we identified that a combination of age, fever, and tachypnoea was the most parsimonious predictor of ICU admission: those younger than 56 years, without tachypnoea, and temperature <39°C, (or >39°C without respiratory crackles), were free of ICU admission. On the contrary, COVID-19 patients aged 40 to 79 years were likely to be admitted to the ICU if they had tachypnoea and delayed their visit to the ER after being seen in primary care.Our results show that a combination of easily obtainable clinical variables (age, fever, and tachypnoea with/without respiratory crackles) predicts which COVID-19 patients require ICU admission.

Download Full-text