Natural language processing pipeline for temporal information extraction and classification from free text eligibility criteria

Introduction: Diagnostic imaging reports contain important data for stroke surveillance and clinical research but converting a large amount of free-text data into structured data with manual chart abstraction is resource-intensive. We determined the accuracy of CHARTextract, a natural language processing (NLP) tool, to extract relevant stroke-related attributes from full reports of computed tomograms (CT), CT angiograms (CTA), and CT perfusion (CTP) performed at a tertiary stroke centre. Methods: We manually extracted data from full reports of 1,320 consecutive CT/CTA/CTP performed between October 2017 and January 2019 in patients presenting with acute stroke. Trained chart abstractors collected data on the presence of anterior proximal occlusion, basilar occlusion, distal intracranial occlusion, established ischemia, haemorrhage, the laterality of these lesions, and ASPECT scores, all of which were used as a reference standard. Reports were then randomly split into a training set (n= 921) and validation set (n= 399). We used CHARTextract to extract the same attributes by creating rule-based information extraction pipelines. The rules were human-defined and created through an iterative process in the training sample and then validated in the validation set. Results: The prevalence of anterior proximal occlusion was 12.3% in the dataset (n=86 left, n=72 right, and n=4 bilateral). In the training sample, CHARTextract identified this attribute with an overall accuracy of 97.3% (PPV 84.1% and NPV 99.4%, sensitivity 95.5% and specificity 97.5%). In the validation set, the overall accuracy was 95.2% (PPV 76.3% and NPV 98.5%, sensitivity 90.0% and specificity 96.0%). Conclusions: We showed that CHARTextract can identify the presence of anterior proximal vessel occlusion with high accuracy, suggesting that NLP can be used to automate the process of data collection for stroke research. We will present the accuracy of CHARTextract for the remaining neurological attributes at ISC 2020.

Download Full-text

A natural language processing pipeline for pairing measurements uniquely across free-text CT reports

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2014.08.015 ◽

2015 ◽

Vol 53 ◽

pp. 36-48 ◽

Cited By ~ 9

Author(s):

Merlijn Sevenster ◽

Jeffrey Bozeman ◽

Andrea Cowhy ◽

William Trost

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Free Text ◽

Processing Pipeline

Download Full-text

A Natural Language Processing Pipeline of Chinese Free-Text Radiology Reports for Liver Cancer Diagnosis

IEEE Access ◽

10.1109/access.2020.3020138 ◽

2020 ◽

Vol 8 ◽

pp. 159110-159119

Author(s):

Honglei Liu ◽

Yan Xu ◽

Zhiqiang Zhang ◽

Ni Wang ◽

Yanqun Huang ◽

...

Keyword(s):

Natural Language Processing ◽

Liver Cancer ◽

Natural Language ◽

Language Processing ◽

Cancer Diagnosis ◽

Free Text ◽

Processing Pipeline ◽

Radiology Reports

Download Full-text

An Ontology Based Natural Language Processing Pipeline for Brazilian COVID-19 EMR

10.5753/bresci.2021.15794 ◽

2021 ◽

Author(s):

Raquel A. J. Gritz ◽

Rafael S. Pereira ◽

Henrique Matheus F. da Silva ◽

Henrique G. Zatti ◽

Laura E. A. Viana ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Electronic Medical Records ◽

Language Processing ◽

Medical Records ◽

Free Text ◽

Huge Amount ◽

Processing Pipeline ◽

Textual Data ◽

The World

COVID-19 became a pandemic infecting more than 100 million people across the world and has been going on for over a year. A huge amount of data has been produced as electronic medical records in the form of textual data because of patient visits. Extracting this information may be very useful in better understanding the COVID-19 disease. However, challenges exist in interpreting the medical records typed as free text as doctors may use different terms to type in their observations. In order to deal with the latter, we created an ontology in Portuguese to describe the terms used in COVID-19 medical records in Brazil. In this paper, we present a brief overview of the ontology and how we are using it as the first step of a more complex NLP task.

Download Full-text

Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis (Preprint)

10.2196/preprints.24020 ◽

2020 ◽

Author(s):

Maciej Rybinski ◽

Xiang Dai ◽

Sonit Singh ◽

Sarvnaz Karimi ◽

Anthony Nguyen

Keyword(s):

Natural Language Processing ◽

Family History ◽

Natural Language ◽

Information Extraction ◽

Learning Strategies ◽

Language Processing ◽

Entity Recognition ◽

Free Text ◽

Shared Task ◽

Data Set

BACKGROUND The prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes. OBJECTIVE The aim of this study is to develop automated methods that enable access to FH data through natural language processing. METHODS We performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems. RESULTS Our experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (<i>P</i><.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%. CONCLUSIONS Our approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.

Download Full-text

Proceedings of the Workshop on Balto-Slavonic Natural Language Processing Information Extraction and Enabling Technologies - ACL '07

10.3115/1567545 ◽

2007 ◽

Cited By ~ 1

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Enabling Technologies ◽

Processing Information

Download Full-text

Sentiment Analysis Techniques Applied to Raw-Text Data from a Csq-8 Questionnaire about Mindfulness in Times of COVID-19 to Improve Strategy Generation

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18126408 ◽

2021 ◽

Vol 18 (12) ◽

pp. 6408

Author(s):

Mario Jojoa Acosta ◽

Gema Castillo-Sánchez ◽

Begonya Garcia-Zapirain ◽

Isabel de la Torre Díez ◽

Manuel Franco-Martín

Keyword(s):

Health Care ◽

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Transfer Learning ◽

Language Processing ◽

Health Care Professionals ◽

Ground Truth ◽

Relevant Information ◽

Free Text

The use of artificial intelligence in health care has grown quickly. In this sense, we present our work related to the application of Natural Language Processing techniques, as a tool to analyze the sentiment perception of users who answered two questions from the CSQ-8 questionnaires with raw Spanish free-text. Their responses are related to mindfulness, which is a novel technique used to control stress and anxiety caused by different factors in daily life. As such, we proposed an online course where this method was applied in order to improve the quality of life of health care professionals in COVID 19 pandemic times. We also carried out an evaluation of the satisfaction level of the participants involved, with a view to establishing strategies to improve future experiences. To automatically perform this task, we used Natural Language Processing (NLP) models such as swivel embedding, neural networks, and transfer learning, so as to classify the inputs into the following three categories: negative, neutral, and positive. Due to the limited amount of data available—86 registers for the first and 68 for the second—transfer learning techniques were required. The length of the text had no limit from the user’s standpoint, and our approach attained a maximum accuracy of 93.02% and 90.53%, respectively, based on ground truth labeled by three experts. Finally, we proposed a complementary analysis, using computer graphic text representation based on word frequency, to help researchers identify relevant information about the opinions with an objective approach to sentiment. The main conclusion drawn from this work is that the application of NLP techniques in small amounts of data using transfer learning is able to obtain enough accuracy in sentiment analysis and text classification stages.

Download Full-text

Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100262 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100262

Author(s):

Mustafa Khanbhai ◽

Patrick Anyadi ◽

Joshua Symons ◽

Kelsey Flott ◽

Ara Darzi ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Patient Experience ◽

Language Processing ◽

Performance Metrics ◽

Free Text ◽

Patient Feedback

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.

Download Full-text

Measuring Adoption of Patient Priorities-Aligned Care Using Natural Language Processing

Innovation in Aging ◽

10.1093/geroni/igaa057.592 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 183-183

Author(s):

Javad Razjouyan ◽

Jennifer Freytag ◽

Edward Odom ◽

Lilian Dindo ◽

Aanand Naik

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Chart Review ◽

Group Analysis ◽

Intervention Group ◽

Multiple Chronic Conditions ◽

Free Text ◽

Term Care

Abstract Patient Priorities Care (PPC) is a model of care that aligns health care recommendations with priorities of older adults with multiple chronic conditions. Social workers (SW), after online training, document PPC in the patient’s electronic health record (EHR). Our goal is to identify free-text notes with PPC language using a natural language processing (NLP) model and to measure PPC adoption and effect on long term services and support (LTSS) use. Free-text notes from the EHR produced by trained SWs passed through a hybrid NLP model that utilized rule-based and statistical machine learning. NLP accuracy was validated against chart review. Patients who received PPC were propensity matched with patients not receiving PPC (control) on age, gender, BMI, Charlson comorbidity index, facility and SW. The change in LTSS utilization 6-month intervals were compared by groups with univariate analysis. Chart review indicated that 491 notes out of 689 had PPC language and the NLP model reached to precision of 0.85, a recall of 0.90, an F1 of 0.87, and an accuracy of 0.91. Within group analysis shows that intervention group used LTSS 1.8 times more in the 6 months after the encounter compared to 6 months prior. Between group analysis shows that intervention group has significant higher number of LTSS utilization (p=0.012). An automated NLP model can be used to reliably measure the adaptation of PPC by SW. PPC seems to encourage use of LTSS that may delay time to long term care placement.

Download Full-text

Natural language processing methods for knowledge management—Applying document clustering for fast search and grouping of engineering documents

Concurrent Engineering ◽

10.1177/1063293x20982973 ◽

2021 ◽

pp. 1063293X2098297

Author(s):

Ivar Örn Arnarsson ◽

Otto Frost ◽

Emil Gustavsson ◽

Mats Jirstrand ◽

Johan Malmqvist

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Domain Knowledge ◽

Clustering Algorithms ◽

Document Clustering ◽

Unstructured Data ◽

Free Text ◽

Engineering Change ◽

Engineering Documents

Product development companies collect data in form of Engineering Change Requests for logged design issues, tests, and product iterations. These documents are rich in unstructured data (e.g. free text). Previous research affirms that product developers find that current IT systems lack capabilities to accurately retrieve relevant documents with unstructured data. In this research, we demonstrate a method using Natural Language Processing and document clustering algorithms to find structurally or contextually related documents from databases containing Engineering Change Request documents. The aim is to radically decrease the time needed to effectively search for related engineering documents, organize search results, and create labeled clusters from these documents by utilizing Natural Language Processing algorithms. A domain knowledge expert at the case company evaluated the results and confirmed that the algorithms we applied managed to find relevant document clusters given the queries tested.

Download Full-text