Natural language processing for automated annotation of medication mentions in primary care visit conversations

Craig H Ganoe; Weiyi Wu; Paul J Barr; William Haslett; Michelle D Dannenberg; Kyra L Bonasia; James C Finora; Jesse A Schoonmaker; Wambui M Onsando; James Ryan; Glyn Elwyn; Martha L Bruce; Amar K Das; Saeed Hassanpour

doi:10.1093/jamiaopen/ooab071

Natural language processing for automated annotation of medication mentions in primary care visit conversations

JAMIA Open ◽

10.1093/jamiaopen/ooab071 ◽

2021 ◽

Vol 4 (3) ◽

Author(s):

Craig H Ganoe ◽

Weiyi Wu ◽

Paul J Barr ◽

William Haslett ◽

Michelle D Dannenberg ◽

...

Keyword(s):

Primary Care ◽

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Primary Care Visit ◽

Data Set ◽

Test Set ◽

Medication Information ◽

Care Visit

Abstract Objectives The objective of this study is to build and evaluate a natural language processing approach to identify medication mentions in primary care visit conversations between patients and physicians. Materials and Methods Eight clinicians contributed to a data set of 85 clinic visit transcripts, and 10 transcripts were randomly selected from this data set as a development set. Our approach utilizes Apache cTAKES and Unified Medical Language System controlled vocabulary to generate a list of medication candidates in the transcribed text and then performs multiple customized filters to exclude common false positives from this list while including some additional common mentions of the supplements and immunizations. Results Sixty-five transcripts with 1121 medication mentions were randomly selected as an evaluation set. Our proposed method achieved an F-score of 85.0% for identifying the medication mentions in the test set, significantly outperforming existing medication information extraction systems for medical records with F-scores ranging from 42.9% to 68.9% on the same test set. Discussion Our medication information extraction approach for primary care visit conversations showed promising results, extracting about 27% more medication mentions from our evaluation set while eliminating many false positives in comparison to existing baseline systems. We made our approach publicly available on the web as an open-source software. Conclusion Integration of our annotation system with clinical recording applications has the potential to improve patients’ understanding and recall of key information from their clinic visits, and, in turn, to positively impact health outcomes.

Download Full-text

Natural Language Processing for Automated Annotation of Medication Mentions in Primary Care Visit Conversations

10.1101/2021.03.29.21254488 ◽

2021 ◽

Author(s):

Craig H Ganoe ◽

Weiyi Wu ◽

Paul J Barr ◽

William Haslett ◽

Michelle D Dannenberg ◽

...

Keyword(s):

Primary Care ◽

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

False Positives ◽

Controlled Vocabulary ◽

Primary Care Visit ◽

Medication Information ◽

Care Visit

ABSTRACTObjectivesThe objective of this study is to build and evaluate a natural language processing approach to identify medication mentions in primary care visit conversations between patients and physicians.Materials and MethodsEight clinicians contributed to a dataset of 85 clinic visit transcripts, and ten transcripts were randomly selected from this dataset as a development set. Our approach utilizes Apache cTAKES and Unified Medical Language System (UMLS) controlled vocabulary to generate a list of medication candidates in the transcribed text, and then performs multiple customized filters to exclude common false positives from this list while including some additional common mentions of the supplements and immunizations.ResultsSixty-five transcripts with 1,121 medication mentions were randomly selected as an evaluation set. Our proposed method achieved an F-score of 85.0% for identifying the medication mentions in the test set, significantly outperforming existing medication information extraction systems for medical records with F-scores ranging from 42.9% to 68.9%.DiscussionOur medication information extraction approach for primary care visit conversations showed promising results, extracting about 27% more medication mentions from our evaluation set while eliminating many false positives in comparison to existing baseline systems. We made our approach publicly available on the web as an open-source software.ConclusionIntegration of our annotation system with clinical recording applications has the potential to improve patients’ understanding and recall of key information from their clinic visits, and, in turn, to positively impact health outcomes.

Download Full-text

Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis (Preprint)

10.2196/preprints.24020 ◽

2020 ◽

Author(s):

Maciej Rybinski ◽

Xiang Dai ◽

Sonit Singh ◽

Sarvnaz Karimi ◽

Anthony Nguyen

Keyword(s):

Natural Language Processing ◽

Family History ◽

Natural Language ◽

Information Extraction ◽

Learning Strategies ◽

Language Processing ◽

Entity Recognition ◽

Free Text ◽

Shared Task ◽

Data Set

BACKGROUND The prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes. OBJECTIVE The aim of this study is to develop automated methods that enable access to FH data through natural language processing. METHODS We performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems. RESULTS Our experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (<i>P</i><.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%. CONCLUSIONS Our approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.

Download Full-text

2. Unlocking information in electronic health records using natural language processing: a case study in medication information extraction

Text Mining of Web-based Medical Content ◽

10.1515/9781614513902.33 ◽

2014 ◽

Author(s):

Hua Xu ◽

C. Denny Joshua

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Health Records ◽

Medication Information ◽

Electronic Health

Download Full-text

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text

Proceedings of the Workshop on Balto-Slavonic Natural Language Processing Information Extraction and Enabling Technologies - ACL '07

10.3115/1567545 ◽

2007 ◽

Cited By ~ 1

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Enabling Technologies ◽

Processing Information

Download Full-text

Identifying and intercepting health misinformation on Reddit dermatology forums with artificially intelligent bots using natural language processing (Preprint)

10.2196/preprints.20975 ◽

2021 ◽

Author(s):

Monique B. Sager ◽

Aditya M. Kashyap ◽

Mila Tamminga ◽

Sadhana Ravoori ◽

Christopher Callison-Burch ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

The United States ◽

Test Accuracy ◽

Limited Data ◽

Test Environment ◽

Data Set ◽

Inappropriate Care ◽

Processing Techniques

BACKGROUND Reddit, the fifth most popular website in the United States, boasts a large and engaged user base on its dermatology forums where users crowdsource free medical opinions. Unfortunately, much of the advice provided is unvalidated and could lead to inappropriate care. Initial testing has shown that artificially intelligent bots can detect misinformation on Reddit forums and may be able to produce responses to posts containing misinformation. OBJECTIVE To analyze the ability of bots to find and respond to health misinformation on Reddit’s dermatology forums in a controlled test environment. METHODS Using natural language processing techniques, we trained bots to target misinformation using relevant keywords and to post pre-fabricated responses. By evaluating different model architectures across a held-out test set, we compared performances. RESULTS Our models yielded data test accuracies ranging from 95%-100%, with a BERT fine-tuned model resulting in the highest level of test accuracy. Bots were then able to post corrective pre-fabricated responses to misinformation. CONCLUSIONS Using a limited data set, bots had near-perfect ability to detect these examples of health misinformation within Reddit dermatology forums. Given that these bots can then post pre-fabricated responses, this technique may allow for interception of misinformation. Providing correct information, even instantly, however, does not mean users will be receptive or find such interventions persuasive. Further work should investigate this strategy’s effectiveness to inform future deployment of bots as a technique in combating health misinformation. CLINICALTRIAL N/A

Download Full-text

Benchmarking Effectiveness and Efficiency of Deep Learning Models for Semantic Textual Similarity in the Clinical Domain: Validation Study

JMIR Medical Informatics ◽

10.2196/27386 ◽

2021 ◽

Vol 9 (12) ◽

pp. e27386

Author(s):

Qingyu Chen ◽

Alex Rankine ◽

Yifan Peng ◽

Elaheh Aghaarabi ◽

Zhiyong Lu

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Mean Squared Error ◽

Pearson Correlation ◽

Data Set ◽

Squared Error ◽

Real Time Applications ◽

Effectiveness And Efficiency ◽

Pearson Correlations

Background Semantic textual similarity (STS) measures the degree of relatedness between sentence pairs. The Open Health Natural Language Processing (OHNLP) Consortium released an expertly annotated STS data set and called for the National Natural Language Processing Clinical Challenges. This work describes our entry, an ensemble model that leverages a range of deep learning (DL) models. Our team from the National Library of Medicine obtained a Pearson correlation of 0.8967 in an official test set during 2019 National Natural Language Processing Clinical Challenges/Open Health Natural Language Processing shared task and achieved a second rank. Objective Although our models strongly correlate with manual annotations, annotator-level correlation was only moderate (weighted Cohen κ=0.60). We are cautious of the potential use of DL models in production systems and argue that it is more critical to evaluate the models in-depth, especially those with extremely high correlations. In this study, we benchmark the effectiveness and efficiency of top-ranked DL models. We quantify their robustness and inference times to validate their usefulness in real-time applications. Methods We benchmarked five DL models, which are the top-ranked systems for STS tasks: Convolutional Neural Network, BioSentVec, BioBERT, BlueBERT, and ClinicalBERT. We evaluated a random forest model as an additional baseline. For each model, we repeated the experiment 10 times, using the official training and testing sets. We reported 95% CI of the Wilcoxon rank-sum test on the average Pearson correlation (official evaluation metric) and running time. We further evaluated Spearman correlation, R², and mean squared error as additional measures. Results Using only the official training set, all models obtained highly effective results. BioSentVec and BioBERT achieved the highest average Pearson correlations (0.8497 and 0.8481, respectively). BioSentVec also had the highest results in 3 of 4 effectiveness measures, followed by BioBERT. However, their robustness to sentence pairs of different similarity levels varies significantly. A particular observation is that BERT models made the most errors (a mean squared error of over 2.5) on highly similar sentence pairs. They cannot capture highly similar sentence pairs effectively when they have different negation terms or word orders. In addition, time efficiency is dramatically different from the effectiveness results. On average, the BERT models were approximately 20 times and 50 times slower than the Convolutional Neural Network and BioSentVec models, respectively. This results in challenges for real-time applications. Conclusions Despite the excitement of further improving Pearson correlations in this data set, our results highlight that evaluations of the effectiveness and efficiency of STS models are critical. In future, we suggest more evaluations on the generalization capability and user-level testing of the models. We call for community efforts to create more biomedical and clinical STS data sets from different perspectives to reflect the multifaceted notion of sentence-relatedness.

Download Full-text

Abstract 165: Automated Stroke-Related Information Extraction From Diagnostic Imaging Reports Using Natural Language Processing

Stroke ◽

10.1161/str.51.suppl_1.165 ◽

2020 ◽

Vol 51 (Suppl_1) ◽

Author(s):

Zhongyu Anna Liu ◽

Muhammad Mamdani ◽

Richard Aviv ◽

Chloe Pou-Prom ◽

Amy Yu

Keyword(s):

Natural Language Processing ◽

Diagnostic Imaging ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Ct Perfusion ◽

Training Sample ◽

Free Text ◽

Validation Set ◽

Proximal Occlusion

Introduction: Diagnostic imaging reports contain important data for stroke surveillance and clinical research but converting a large amount of free-text data into structured data with manual chart abstraction is resource-intensive. We determined the accuracy of CHARTextract, a natural language processing (NLP) tool, to extract relevant stroke-related attributes from full reports of computed tomograms (CT), CT angiograms (CTA), and CT perfusion (CTP) performed at a tertiary stroke centre. Methods: We manually extracted data from full reports of 1,320 consecutive CT/CTA/CTP performed between October 2017 and January 2019 in patients presenting with acute stroke. Trained chart abstractors collected data on the presence of anterior proximal occlusion, basilar occlusion, distal intracranial occlusion, established ischemia, haemorrhage, the laterality of these lesions, and ASPECT scores, all of which were used as a reference standard. Reports were then randomly split into a training set (n= 921) and validation set (n= 399). We used CHARTextract to extract the same attributes by creating rule-based information extraction pipelines. The rules were human-defined and created through an iterative process in the training sample and then validated in the validation set. Results: The prevalence of anterior proximal occlusion was 12.3% in the dataset (n=86 left, n=72 right, and n=4 bilateral). In the training sample, CHARTextract identified this attribute with an overall accuracy of 97.3% (PPV 84.1% and NPV 99.4%, sensitivity 95.5% and specificity 97.5%). In the validation set, the overall accuracy was 95.2% (PPV 76.3% and NPV 98.5%, sensitivity 90.0% and specificity 96.0%). Conclusions: We showed that CHARTextract can identify the presence of anterior proximal vessel occlusion with high accuracy, suggesting that NLP can be used to automate the process of data collection for stroke research. We will present the accuracy of CHARTextract for the remaining neurological attributes at ISC 2020.

Download Full-text

Natural Language Processing-Based Information Extraction and Abstraction for Lease Documents

Advances in Computer and Electrical Engineering - Neural Networks for Natural Language Processing ◽

10.4018/978-1-7998-1159-6.ch011 ◽

2020 ◽

pp. 170-187

Author(s):

Sumathi S. ◽

Rajkumar S. ◽

Indumathi S.

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Data Extraction ◽

Easy Access ◽

Property A ◽

Key Events

Lease abstraction is the method of compartmentalization of key data from a lease document. Lease document for a property contains key business, money, and legal data about a property. A lease abstract report contains details concerning the property location and basic lease details, price schedules, key events, terms and conditions, automobile parking arrangements, and landowner and tenant obligations. Abstracting a true estate contract into electronic type facilitates easy access to key data, exchanging the tedious method of reading the whole contents of the contract every time. Language process may be used for data extraction and abstraction of knowledge from lease documents.

Download Full-text

Syntactic and semantic information extraction from NPP procedures utilizing natural language processing integrated with rules

Nuclear Engineering and Technology ◽

10.1016/j.net.2020.08.010 ◽

2020 ◽

Author(s):

Yongsun Choi ◽

Minh Duc Nguyen ◽

Thomas N. Kerr

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Semantic Information

Download Full-text