Named Entity Recognition in Clinical Text Based on Capsule-LSTM for Privacy Protection

Developing a RadLex-based Named Entity Recognition Tool for Mining Textual Radiology Reports (Preprint)

10.2196/preprints.25378 ◽

2020 ◽

Author(s):

Shintaro Tsuji ◽

Andrew Wen ◽

Naoki Takahashi ◽

Hongjian Zhang ◽

Katsuhiko Ogasawara ◽

...

Keyword(s):

Named Entity Recognition ◽

Noun Phrases ◽

General Purpose ◽

Entity Recognition ◽

Free Text ◽

Clinical Text ◽

Named Entity ◽

Radiology Reports ◽

Two Measures ◽

F Measure

BACKGROUND Named entity recognition (NER) plays an important role in extracting the features of descriptions for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities depends on its dictionary lookup. Especially, the recognition of compound terms is very complicated because there are a variety of patterns. OBJECTIVE The objective of the study is to develop and evaluate a NER tool concerned with compound terms using the RadLex for mining free-text radiology reports. METHODS We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general-purpose dictionary, GPD). We manually annotated 400 of radiology reports for compound terms (Cts) in noun phrases and used them as the gold standard for the performance evaluation (precision, recall, and F-measure). Additionally, we also created a compound-term-enhanced dictionary (CtED) by analyzing false negatives (FNs) and false positives (FPs), and applied it for another 100 radiology reports for validation. We also evaluated the stem terms of compound terms, through defining two measures: an occurrence ratio (OR) and a matching ratio (MR). RESULTS The F-measure of the cTAKES+RadLex+GPD was 32.2% (Precision 92.1%, Recall 19.6%) and that of combined the CtED was 67.1% (Precision 98.1%, Recall 51.0%). The OR indicated that stem terms of “effusion”, "node", "tube", and "disease" were used frequently, but it still lacks capturing Cts. The MR showed that 71.9% of stem terms matched with that of ontologies and RadLex improved about 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using ontologies. CONCLUSIONS We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance toward expanding vocabularies.

Download Full-text

A comprehensive study of named entity recognition in Chinese clinical text

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2013-002381 ◽

2014 ◽

Vol 21 (5) ◽

pp. 808-814 ◽

Cited By ~ 63

Author(s):

J. Lei ◽

B. Tang ◽

X. Lu ◽

K. Gao ◽

M. Jiang ◽

...

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Clinical Text ◽

Named Entity ◽

Comprehensive Study

Download Full-text

Mining heart disease risk factors in clinical text with named entity recognition and distributional semantic models

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2015.08.009 ◽

2015 ◽

Vol 58 ◽

pp. S143-S149 ◽

Cited By ~ 23

Author(s):

Jay Urbain

Keyword(s):

Risk Factors ◽

Disease Risk ◽

Named Entity Recognition ◽

Entity Recognition ◽

Clinical Text ◽

Named Entity ◽

Semantic Models ◽

Heart Disease Risk Factors ◽

Distributional Semantic Models ◽

Heart Disease Risk

Download Full-text

The Impact of De-identification on Downstream Named Entity Recognition in Clinical Text

10.18653/v1/2020.louhi-1.1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Hanna Berg ◽

Aron Henriksson ◽

Hercules Dalianis

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Clinical Text ◽

Named Entity ◽

The Impact

Download Full-text

A study of active learning methods for named entity recognition in clinical text

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2015.09.010 ◽

2015 ◽

Vol 58 ◽

pp. 11-18 ◽

Cited By ~ 38

Author(s):

Yukun Chen ◽

Thomas A. Lasko ◽

Qiaozhu Mei ◽

Joshua C. Denny ◽

Hua Xu

Keyword(s):

Active Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Methods ◽

Clinical Text ◽

Named Entity

Download Full-text

A Survey on Recent Named Entity Recognition and Relationship Extraction Techniques on Clinical Texts

Applied Sciences ◽

10.3390/app11188319 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8319

Author(s):

Priyankar Bose ◽

Sriram Srinivasan ◽

William C. Sleeman ◽

Jatinder Palta ◽

Rishabh Kapoor ◽

...

Keyword(s):

Information Extraction ◽

Named Entity Recognition ◽

Entity Recognition ◽

Future Research ◽

Clinical Text ◽

Named Entity ◽

Relationship Extraction ◽

New Information ◽

Comprehensive Survey ◽

Future Research Directions

Significant growth in Electronic Health Records (EHR) over the last decade has provided an abundance of clinical text that is mostly unstructured and untapped. This huge amount of clinical text data has motivated the development of new information extraction and text mining techniques. Named Entity Recognition (NER) and Relationship Extraction (RE) are key components of information extraction tasks in the clinical domain. In this paper, we highlight the present status of clinical NER and RE techniques in detail by discussing the existing proposed NLP models for the two tasks and their performances and discuss the current challenges. Our comprehensive survey on clinical NER and RE encompass current challenges, state-of-the-art practices, and future directions in information extraction from clinical text. This is the first attempt to discuss both of these interrelated topics together in the clinical context. We identified many research articles published based on different approaches and looked at applications of these tasks. We also discuss the evaluation metrics that are used in the literature to measure the effectiveness of the two these NLP methods and future research directions.

Download Full-text

SZTE-NLP: Clinical Text Analysis with Named Entity Recognition

10.3115/v1/s14-2108 ◽

2014 ◽

Author(s):

Melinda Katona ◽

Richárd Farkas

Keyword(s):

Text Analysis ◽

Named Entity Recognition ◽

Entity Recognition ◽

Clinical Text ◽

Named Entity

Download Full-text

Combining Contextualized Embeddings and Prior Knowledge for Clinical Named Entity Recognition: Evaluation Study

JMIR Medical Informatics ◽

10.2196/14850 ◽

2019 ◽

Vol 7 (4) ◽

pp. e14850 ◽

Cited By ~ 4

Author(s):

Min Jiang ◽

Todd Sanger ◽

Xiong Liu

Keyword(s):

Deep Learning ◽

Prior Knowledge ◽

Language Processing ◽

Named Entity Recognition ◽

Word Embedding ◽

Training Data ◽

Entity Recognition ◽

Named Entities ◽

Clinical Text ◽

Named Entity

Background Named entity recognition (NER) is a key step in clinical natural language processing (NLP). Traditionally, rule-based systems leverage prior knowledge to define rules to identify named entities. Recently, deep learning–based NER systems have become more and more popular. Contextualized word embedding, as a new type of representation of the word, has been proposed to dynamically capture word sense using context information and has proven successful in many deep learning–based systems in either general domain or medical domain. However, there are very few studies that investigate the effects of combining multiple contextualized embeddings and prior knowledge on the clinical NER task. Objective This study aims to improve the performance of NER in clinical text by combining multiple contextual embeddings and prior knowledge. Methods In this study, we investigate the effects of combining multiple contextualized word embeddings with classic word embedding in deep neural networks to predict named entities in clinical text. We also investigate whether using a semantic lexicon could further improve the performance of the clinical NER system. Results By combining contextualized embeddings such as ELMo and Flair, our system achieves the F-1 score of 87.30% when only training based on a portion of the 2010 Informatics for Integrating Biology and the Bedside NER task dataset. After incorporating the medical lexicon into the word embedding, the F-1 score was further increased to 87.44%. Another finding was that our system still could achieve an F-1 score of 85.36% when the size of the training data was reduced to 40%. Conclusions Combined contextualized embedding could be beneficial for the clinical NER task. Moreover, the semantic lexicon could be used to further improve the performance of the clinical NER system.

Download Full-text

Recent advances in Swedish and Spanish medical entity recognition in clinical texts using deep neural approaches

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-0981-y ◽

2019 ◽

Vol 19 (S7) ◽

Cited By ~ 4

Author(s):

Rebecka Weegar ◽

Alicia Pérez ◽

Arantza Casillas ◽

Maite Oronoz

Keyword(s):

Electronic Health Records ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Methods ◽

Health Records ◽

Clinical Text ◽

Named Entity ◽

Text Corpora ◽

Electronic Health

Abstract Background Text mining and natural language processing of clinical text, such as notes from electronic health records, requires specific consideration of the specialized characteristics of these texts. Deep learning methods could potentially mitigate domain specific challenges such as limited access to in-domain tools and data sets. Methods A bi-directional Long Short-Term Memory network is applied to clinical notes in Spanish and Swedish for the task of medical named entity recognition. Several types of embeddings, both generated from in-domain and out-of-domain text corpora, and a number of generation and combination strategies for embeddings have been evaluated in order to investigate different input representations and the influence of domain on the final results. Results For Spanish, a micro averaged F1-score of 75.25 was obtained and for Swedish, the corresponding score was 76.04. The best results for both languages were achieved using embeddings generated from in-domain corpora extracted from electronic health records, but embeddings generated from related domains were also found to be beneficial. Conclusions A recurrent neural network with in-domain embeddings improved the medical named entity recognition compared to shallow learning methods, showing this combination to be suitable for entity recognition in clinical text for both languages.

Download Full-text

Evaluation of Named Entity Recognition Algorithms Using Clinical Text Data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.5.20093 ◽

2018 ◽

Vol 7 (4.5) ◽

pp. 295

Author(s):

J. Manimaran ◽

T. Velmurugan

Keyword(s):

Medical Information ◽

Knowledge Engineering ◽

Medical Center ◽

Named Entity Recognition ◽

Entity Recognition ◽

Lookup Table ◽

Clinical Text ◽

Patient Reports ◽

Named Entity ◽

Research Areas

Named Entity Recognition (NER) is one of the most important research areas in the field of medical. Presently, most of the clinical NER research is based on two approaches as Knowledge Engineering (KE) and Machine Learning (ML). KE is used a word lookup table approach and ML is known as supervised learning approach. The aim of this work is to evaluate a recent algorithm in KE and ML approaches using various clinical text databases. Therefore, the NOBLE Coder and Clinical Named Entity Recognition (CliNER) algorithms are selected, NOBLE Coder is depended on KE approach and CliNER is ML approach. The two algorithms will be described and compared its performance on three openly available datasets that is obtained from Medical Information Mart for Intensive Care II (MIMIC II), Pittsburgh Medical Center, and i2b2 2010 challenge. Among these datasets, the annotated data are included which is used to detect the highest sensitivity and specificity on each algorithm. The randomly distributed patient reports were taken as input data to these algorithms. By executing these algorithms, the information is extracted and which classified into predefined concept types, for example medical problems, treatments and tests. The accuracy of both algorithms is calculated using standard measures. The taken two algorithms are analyzed based on the produced results. Finally, the best among two is suggested for better use in clinical data.

Download Full-text