Natural Language Processing and Information Extraction in Biology

Abstract Objectives The objective of this study is to build and evaluate a natural language processing approach to identify medication mentions in primary care visit conversations between patients and physicians. Materials and Methods Eight clinicians contributed to a data set of 85 clinic visit transcripts, and 10 transcripts were randomly selected from this data set as a development set. Our approach utilizes Apache cTAKES and Unified Medical Language System controlled vocabulary to generate a list of medication candidates in the transcribed text and then performs multiple customized filters to exclude common false positives from this list while including some additional common mentions of the supplements and immunizations. Results Sixty-five transcripts with 1121 medication mentions were randomly selected as an evaluation set. Our proposed method achieved an F-score of 85.0% for identifying the medication mentions in the test set, significantly outperforming existing medication information extraction systems for medical records with F-scores ranging from 42.9% to 68.9% on the same test set. Discussion Our medication information extraction approach for primary care visit conversations showed promising results, extracting about 27% more medication mentions from our evaluation set while eliminating many false positives in comparison to existing baseline systems. We made our approach publicly available on the web as an open-source software. Conclusion Integration of our annotation system with clinical recording applications has the potential to improve patients’ understanding and recall of key information from their clinic visits, and, in turn, to positively impact health outcomes.

Download Full-text

Abstract 165: Automated Stroke-Related Information Extraction From Diagnostic Imaging Reports Using Natural Language Processing

Stroke ◽

10.1161/str.51.suppl_1.165 ◽

2020 ◽

Vol 51 (Suppl_1) ◽

Author(s):

Zhongyu Anna Liu ◽

Muhammad Mamdani ◽

Richard Aviv ◽

Chloe Pou-Prom ◽

Amy Yu

Keyword(s):

Natural Language Processing ◽

Diagnostic Imaging ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Ct Perfusion ◽

Training Sample ◽

Free Text ◽

Validation Set ◽

Proximal Occlusion

Introduction: Diagnostic imaging reports contain important data for stroke surveillance and clinical research but converting a large amount of free-text data into structured data with manual chart abstraction is resource-intensive. We determined the accuracy of CHARTextract, a natural language processing (NLP) tool, to extract relevant stroke-related attributes from full reports of computed tomograms (CT), CT angiograms (CTA), and CT perfusion (CTP) performed at a tertiary stroke centre. Methods: We manually extracted data from full reports of 1,320 consecutive CT/CTA/CTP performed between October 2017 and January 2019 in patients presenting with acute stroke. Trained chart abstractors collected data on the presence of anterior proximal occlusion, basilar occlusion, distal intracranial occlusion, established ischemia, haemorrhage, the laterality of these lesions, and ASPECT scores, all of which were used as a reference standard. Reports were then randomly split into a training set (n= 921) and validation set (n= 399). We used CHARTextract to extract the same attributes by creating rule-based information extraction pipelines. The rules were human-defined and created through an iterative process in the training sample and then validated in the validation set. Results: The prevalence of anterior proximal occlusion was 12.3% in the dataset (n=86 left, n=72 right, and n=4 bilateral). In the training sample, CHARTextract identified this attribute with an overall accuracy of 97.3% (PPV 84.1% and NPV 99.4%, sensitivity 95.5% and specificity 97.5%). In the validation set, the overall accuracy was 95.2% (PPV 76.3% and NPV 98.5%, sensitivity 90.0% and specificity 96.0%). Conclusions: We showed that CHARTextract can identify the presence of anterior proximal vessel occlusion with high accuracy, suggesting that NLP can be used to automate the process of data collection for stroke research. We will present the accuracy of CHARTextract for the remaining neurological attributes at ISC 2020.

Download Full-text

Natural Language Processing-Based Information Extraction and Abstraction for Lease Documents

Advances in Computer and Electrical Engineering - Neural Networks for Natural Language Processing ◽

10.4018/978-1-7998-1159-6.ch011 ◽

2020 ◽

pp. 170-187

Author(s):

Sumathi S. ◽

Rajkumar S. ◽

Indumathi S.

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Data Extraction ◽

Easy Access ◽

Property A ◽

Key Events

Lease abstraction is the method of compartmentalization of key data from a lease document. Lease document for a property contains key business, money, and legal data about a property. A lease abstract report contains details concerning the property location and basic lease details, price schedules, key events, terms and conditions, automobile parking arrangements, and landowner and tenant obligations. Abstracting a true estate contract into electronic type facilitates easy access to key data, exchanging the tedious method of reading the whole contents of the contract every time. Language process may be used for data extraction and abstraction of knowledge from lease documents.

Download Full-text

Syntactic and semantic information extraction from NPP procedures utilizing natural language processing integrated with rules

Nuclear Engineering and Technology ◽

10.1016/j.net.2020.08.010 ◽

2020 ◽

Author(s):

Yongsun Choi ◽

Minh Duc Nguyen ◽

Thomas N. Kerr

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Semantic Information

Download Full-text

Improving the Efficacy of the Data Entry Process for Clinical Research With a Natural Language Processing–Driven Medical Information Extraction System: Quantitative Field Research

JMIR Medical Informatics ◽

10.2196/13331 ◽

2019 ◽

Vol 7 (3) ◽

pp. e13331 ◽

Cited By ~ 3

Author(s):

Jiang Han ◽

Ken Chen ◽

Lei Fang ◽

Shaodian Zhang ◽

Fei Wang ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Clinical Research ◽

Information Extraction ◽

Language Processing ◽

Medical Information ◽

Data Entry ◽

Field Research ◽

Extraction System ◽

Information Extraction System

Download Full-text

Building Graph for Events and Time in Natural Language Text

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8419.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 581-586

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Question Answering ◽

Relation Extraction ◽

Event Extraction ◽

Event Time ◽

Time Graph ◽

Question Answering Systems

Events and time are two major key terms in natural language processing due to the various event-oriented tasks these are become an essential terms in information extraction. In natural language processing and information extraction or retrieval event and time leads to several applications like text summaries, documents summaries, and question answering systems. In this paper, we present events-time graph as a new way of construction for event-time based information from text. In this event-time graph nodes are events, whereas edges represent the temporal and co-reference relations between events. In many of the previous researches of natural language processing mainly individually focused on extraction tasks and in domain-specific way but in this work we present extraction and representation of the relationship between events- time by representing with event time graph construction. Our overall system construction is in three-step process that performs event extraction, time extraction, and representing relation extraction. Each step is at a performance level comparable with the state of the art. We present Event extraction on MUC data corpus annotated with events mentions on which we train and evaluate our model. Next, we present time extraction the model of times tested for several news articles from Wikipedia corpus. Next is to represent event time relation by representation by next constructing event time graphs. Finally, we evaluate the overall quality of event graphs with the evaluation metrics and conclude the observations of the entire work

Download Full-text

ENCADEAr: ENCADEAmento automático de notícias

Oslo Studies in Language ◽

10.5617/osla.1457 ◽

2015 ◽

Vol 7 (1) ◽

Author(s):

Carla Abreu ◽

Jorge Teixeira ◽

Eugénio Oliveira

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Supervised Learning ◽

Language Processing ◽

Name Entity Recognition ◽

Entity Recognition ◽

Name Entity ◽

Supervised Learning Algorithms ◽

Processing Information

This work aims at defining and evaluating different techniques to automatically build temporal news sequences. The approach proposed is composed by three steps: (i) near duplicate documents detention; (ii) keywords extraction; (iii) news sequences creation. This approach is based on: Natural Language Processing, Information Extraction, Name Entity Recognition and supervised learning algorithms. The proposed methodology got a precision of 93.1% for news chains sequences creation.

Download Full-text