Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives

Mapping Intimacies ◽

10.21203/rs.2.22697/v1 ◽

2020 ◽

Author(s):

Liliya Akhtyamova ◽

Paloma Martínez ◽

Karin Verspoor ◽

John Cardiff

Keyword(s):

Deep Learning ◽

Clinical Case ◽

Named Entity Recognition ◽

Performance Measure ◽

Spanish Language ◽

Entity Recognition ◽

Superior Performance ◽

Word Embeddings ◽

Named Entity ◽

Radiology Reports

Abstract Background: In the Big Data era there is an increasing need to fully exploit and analyse the huge quantity of information available about health. Natural Language Processing (NLP) technologies can contribute to extract relevant information from unstructured data contained in Electronic Health Records (EHR) such as clinical notes, patient’s discharge summaries and radiology reports among others. Extracted information could help in health-related decision making processes. Named entity recognition (NER) devoted to detect important concepts in texts (diseases, symptoms, drugs, etc.) is a crucial task in information extraction processes especially in languages other than English. In this work, we develop a deep learning-based NLP pipeline for biomedical entity extraction in Spanish clinical narrative. Methods: We explore the use of contextualized word embeddings to enhance named entity recognition in Spanish language clinical text, particularly of pharmacological substances, compounds, and proteins. Various combinations of word and sense embeddings were tested on the evaluation corpus of the PharmacoNER 2019 task, the Spanish Clinical Case Corpus (SPACCC). This data set consists of clinical case sections derived from open access Spanish-language medical publications. Results: NER system integrates in-domain pre-trained Flair and FastText word embeddings, byte-pairwise encoded and the bi-LSTM-based character word embeddings. The system yielded the best performance measure with F-score of 90.84%. Error analysis showed that the main source of errors for the best model is the newly detected false positive entities with the half of that amount of errors belonged to longer than the actual ones detected entities. Conclusions: Our study shows that our deep-learning-based system with domain-specific contextualized embeddings coupled with stacking of complementary embeddings yields superior performance over the system with integrated standard and general-domain word embeddings. With this system, we achieve performance competitive with the state-of-the-art.

Download Full-text

A deep learning-based bilingual Hindi and Punjabi named entity recognition system using enhanced word embeddings

Knowledge-Based Systems ◽

10.1016/j.knosys.2021.107601 ◽

2021 ◽

pp. 107601

Author(s):

Archana Goyal ◽

Vishal Gupta ◽

Manish Kumar

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Recognition System ◽

Entity Recognition ◽

Word Embeddings ◽

Named Entity

Download Full-text

Deep learning with word embeddings improves biomedical named entity recognition

Bioinformatics ◽

10.1093/bioinformatics/btx228 ◽

2017 ◽

Vol 33 (14) ◽

pp. i37-i48 ◽

Cited By ~ 155

Author(s):

Maryam Habibi ◽

Leon Weber ◽

Mariana Neves ◽

David Luis Wiegandt ◽

Ulf Leser

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Word Embeddings ◽

Named Entity ◽

Biomedical Named Entity Recognition

Download Full-text

Faculty Opinions recommendation of Deep learning with word embeddings improves biomedical named entity recognition.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.730927099.793537967 ◽

2017 ◽

Author(s):

Nigel Collier

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Word Embeddings ◽

Named Entity ◽

Biomedical Named Entity Recognition

Download Full-text

Developing a RadLex-based Named Entity Recognition Tool for Mining Textual Radiology Reports (Preprint)

10.2196/preprints.25378 ◽

2020 ◽

Author(s):

Shintaro Tsuji ◽

Andrew Wen ◽

Naoki Takahashi ◽

Hongjian Zhang ◽

Katsuhiko Ogasawara ◽

...

Keyword(s):

Named Entity Recognition ◽

Noun Phrases ◽

General Purpose ◽

Entity Recognition ◽

Free Text ◽

Clinical Text ◽

Named Entity ◽

Radiology Reports ◽

Two Measures ◽

F Measure

BACKGROUND Named entity recognition (NER) plays an important role in extracting the features of descriptions for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities depends on its dictionary lookup. Especially, the recognition of compound terms is very complicated because there are a variety of patterns. OBJECTIVE The objective of the study is to develop and evaluate a NER tool concerned with compound terms using the RadLex for mining free-text radiology reports. METHODS We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general-purpose dictionary, GPD). We manually annotated 400 of radiology reports for compound terms (Cts) in noun phrases and used them as the gold standard for the performance evaluation (precision, recall, and F-measure). Additionally, we also created a compound-term-enhanced dictionary (CtED) by analyzing false negatives (FNs) and false positives (FPs), and applied it for another 100 radiology reports for validation. We also evaluated the stem terms of compound terms, through defining two measures: an occurrence ratio (OR) and a matching ratio (MR). RESULTS The F-measure of the cTAKES+RadLex+GPD was 32.2% (Precision 92.1%, Recall 19.6%) and that of combined the CtED was 67.1% (Precision 98.1%, Recall 51.0%). The OR indicated that stem terms of “effusion”, "node", "tube", and "disease" were used frequently, but it still lacks capturing Cts. The MR showed that 71.9% of stem terms matched with that of ontologies and RadLex improved about 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using ontologies. CONCLUSIONS We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance toward expanding vocabularies.

Download Full-text

Named Entity Recognition Method for Fault Knowledge based on Deep Learning

Proceedings of the 4th International Conference on Machine Learning and Soft Computing ◽

10.1145/3380688.3380690 ◽

2020 ◽

Author(s):

Zhicheng Chen ◽

Xiaobao Liu ◽

Yanchao Yin ◽

Hongbiao Lu

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Recognition Method ◽

Named Entity ◽

Knowledge Based

Download Full-text

De-identifying Spanish medical texts - named entity recognition applied to radiology reports

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00236-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Irene Pérez-Díez ◽

Raúl Pérez-Moraga ◽

Adolfo López-Cerdán ◽

Jose-Maria Salinas-Serrano ◽

María de la Iglesia-Vayá

Keyword(s):

Electronic Health Records ◽

English Language ◽

Personal Information ◽

Named Entity Recognition ◽

Entity Recognition ◽

Medical Texts ◽

Health Records ◽

Named Entity ◽

Radiology Reports ◽

Electronic Health

Abstract Background Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several anonymization strategies for the English language, they are also language-dependent. Here, we introduce a named entity recognition strategy for Spanish medical texts, translatable to other languages. Results We tested 4 neural networks on our radiology reports dataset, achieving a recall of 97.18% of the identifying entities. Alongside, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data. The three best architectures were tested with the MEDDOCAN challenge dataset of electronic health records as an external test, achieving a recall of 69.18%. Conclusions The strategy proposed, combining named entity recognition tasks with randomization of entities, is suitable for Spanish radiology reports. It does not require a big training corpus, thus it could be easily extended to other languages and medical texts, such as electronic health records.

Download Full-text

Named Entity Recognition and Relation Extraction

ACM Computing Surveys ◽

10.1145/3445965 ◽

2021 ◽

Vol 54 (1) ◽

pp. 1-39

Author(s):

Zara Nasar ◽

Syed Waqar Jaffry ◽

Muhammad Kamran Malik

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Named Entity Recognition ◽

Relation Extraction ◽

The State ◽

Entity Recognition ◽

Joint Models ◽

Named Entity ◽

Textual Data ◽

Benchmark Datasets

With the advent of Web 2.0, there exist many online platforms that result in massive textual-data production. With ever-increasing textual data at hand, it is of immense importance to extract information nuggets from this data. One approach towards effective harnessing of this unstructured textual data could be its transformation into structured text. Hence, this study aims to present an overview of approaches that can be applied to extract key insights from textual data in a structured way. For this, Named Entity Recognition and Relation Extraction are being majorly addressed in this review study. The former deals with identification of named entities, and the latter deals with problem of extracting relation between set of entities. This study covers early approaches as well as the developments made up till now using machine learning models. Survey findings conclude that deep-learning-based hybrid and joint models are currently governing the state-of-the-art. It is also observed that annotated benchmark datasets for various textual-data generators such as Twitter and other social forums are not available. This scarcity of dataset has resulted into relatively less progress in these domains. Additionally, the majority of the state-of-the-art techniques are offline and computationally expensive. Last, with increasing focus on deep-learning frameworks, there is need to understand and explain the under-going processes in deep architectures.

Download Full-text

Recurrent Neural Network-Based Model for Named Entity Recognition with Improved Word Embeddings

IETE Journal of Research ◽

10.1080/03772063.2021.2006805 ◽

2021 ◽

pp. 1-7

Author(s):

Archana Goyal ◽

Vishal Gupta ◽

Manish Kumar

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Named Entity Recognition ◽

Entity Recognition ◽

Word Embeddings ◽

Named Entity

Download Full-text

An Overview of Technological Revolution in Deep Learning Architectures for Biomedical Named Entity Recognition

10.1109/asiancon51346.2021.9544823 ◽

2021 ◽

Author(s):

T. Mathu ◽

Kumudha Raimond ◽

S. Jeba Priya

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Technological Revolution ◽

Named Entity ◽

Learning Architectures ◽

Biomedical Named Entity Recognition

Download Full-text

A multiclass classification method based on deep learning for named entity recognition in electronic medical records

2016 New York Scientific Data Summit (NYSDS) ◽

10.1109/nysds.2016.7747810 ◽

2016 ◽

Cited By ~ 20

Author(s):

Xishuang Dong ◽

Lijun Qian ◽

Yi Guan ◽

Lei Huang ◽

Qiubin Yu ◽

...

Keyword(s):

Deep Learning ◽

Electronic Medical Records ◽

Medical Records ◽

Named Entity Recognition ◽

Multiclass Classification ◽

Entity Recognition ◽

Classification Method ◽

Named Entity

Download Full-text