A Bootstrapping Approach With CRF and Deep Learning Models for Improving the Biomedical Named Entity Recognition in Multi-Domains

Abstract Background The volume of biomedical literature and clinical data is growing at an exponential rate. Therefore, efficient access to data described in unstructured biomedical texts is a crucial task for the biomedical industry and research. Named Entity Recognition (NER) is the first step for information and knowledge acquisition when we deal with unstructured texts. Recent NER approaches use contextualized word representations as input for a downstream classification task. However, distributed word vectors (embeddings) are very limited in Spanish and even more for the biomedical domain. Methods In this work, we develop several biomedical Spanish word representations, and we introduce two Deep Learning approaches for pharmaceutical, chemical, and other biomedical entities recognition in Spanish clinical case texts and biomedical texts, one based on a Bi-STM-CRF model and the other on a BERT-based architecture. Results Several Spanish biomedical embeddigns together with the two deep learning models were evaluated on the PharmaCoNER and CORD-19 datasets. The PharmaCoNER dataset is composed of a set of Spanish clinical cases annotated with drugs, chemical compounds and pharmacological substances; our extended Bi-LSTM-CRF model obtains an F-score of 85.24% on entity identification and classification and the BERT model obtains an F-score of 88.80% . For the entity normalization task, the extended Bi-LSTM-CRF model achieves an F-score of 72.85% and the BERT model achieves 79.97%. The CORD-19 dataset consists of scholarly articles written in English annotated with biomedical concepts such as disorder, species, chemical or drugs, gene and protein, enzyme and anatomy. Bi-LSTM-CRF model and BERT model obtain an F-measure of 78.23% and 78.86% on entity identification and classification, respectively on the CORD-19 dataset. Conclusion These results prove that deep learning models with in-domain knowledge learned from large-scale datasets highly improve named entity recognition performance. Moreover, contextualized representations help to understand complexities and ambiguity inherent to biomedical texts. Embeddings based on word, concepts, senses, etc. other than those for English are required to improve NER tasks in other languages.

Download Full-text

Models Distillation with Lifelong Deep Learning for Vietnamese Biomedical Named Entity Recognition

10.1109/kse53942.2021.9648790 ◽

2021 ◽

Author(s):

Thi-Cham Nguyen ◽

Hoang-Quynh Le ◽

Duy-Cat Can ◽

Quang-Thuy Ha

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Biomedical Named Entity Recognition

Download Full-text

Faculty Opinions recommendation of Clinical named entity recognition using deep learning models.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.733368460.793546823 ◽

2018 ◽

Author(s):

Nigel Collier

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Models ◽

Named Entity

Download Full-text

DTranNER: biomedical named entity recognition with deep learning-based label-label transition model

BMC Bioinformatics ◽

10.1186/s12859-020-3393-1 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 1

Author(s):

S. K. Hong ◽

Jae-Gil Lee

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Transition Model ◽

Named Entity ◽

Biomedical Named Entity Recognition

Download Full-text

Deep learning with word embeddings improves biomedical named entity recognition

Bioinformatics ◽

10.1093/bioinformatics/btx228 ◽

2017 ◽

Vol 33 (14) ◽

pp. i37-i48 ◽

Cited By ~ 155

Author(s):

Maryam Habibi ◽

Leon Weber ◽

Mariana Neves ◽

David Luis Wiegandt ◽

Ulf Leser

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Word Embeddings ◽

Named Entity ◽

Biomedical Named Entity Recognition

Download Full-text

Chinese named entity recognition model based on BERT

MATEC Web of Conferences ◽

10.1051/matecconf/202133606021 ◽

2021 ◽

Vol 336 ◽

pp. 06021

Author(s):

Hongshuai Liu ◽

Ge Jun ◽

Yuanyuan Zheng

Keyword(s):

Deep Learning ◽

Language Model ◽

Named Entity Recognition ◽

Experimental Results ◽

Entity Recognition ◽

Global Information ◽

Learning Models ◽

Named Entity ◽

Whole Word ◽

Document Level

Nowadays, most deep learning models ignore Chinese habits and global information when processing Chinese tasks. To solve this problem, we constructed the BERT-BiLSTM-Attention-CRF model. In the model, we embeded the BERT pre-training language model that adopts the Whole Word Mask strategy, and added a document-level attention. Experimental results show that our method achieves good results in the MSRA corpus, and F1 reaches 95.00%.

Download Full-text