Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization

Abstract Background The volume of biomedical literature and clinical data is growing at an exponential rate. Therefore, efficient access to data described in unstructured biomedical texts is a crucial task for the biomedical industry and research. Named Entity Recognition (NER) is the first step for information and knowledge acquisition when we deal with unstructured texts. Recent NER approaches use contextualized word representations as input for a downstream classification task. However, distributed word vectors (embeddings) are very limited in Spanish and even more for the biomedical domain. Methods In this work, we develop several biomedical Spanish word representations, and we introduce two Deep Learning approaches for pharmaceutical, chemical, and other biomedical entities recognition in Spanish clinical case texts and biomedical texts, one based on a Bi-STM-CRF model and the other on a BERT-based architecture. Results Several Spanish biomedical embeddigns together with the two deep learning models were evaluated on the PharmaCoNER and CORD-19 datasets. The PharmaCoNER dataset is composed of a set of Spanish clinical cases annotated with drugs, chemical compounds and pharmacological substances; our extended Bi-LSTM-CRF model obtains an F-score of 85.24% on entity identification and classification and the BERT model obtains an F-score of 88.80% . For the entity normalization task, the extended Bi-LSTM-CRF model achieves an F-score of 72.85% and the BERT model achieves 79.97%. The CORD-19 dataset consists of scholarly articles written in English annotated with biomedical concepts such as disorder, species, chemical or drugs, gene and protein, enzyme and anatomy. Bi-LSTM-CRF model and BERT model obtain an F-measure of 78.23% and 78.86% on entity identification and classification, respectively on the CORD-19 dataset. Conclusion These results prove that deep learning models with in-domain knowledge learned from large-scale datasets highly improve named entity recognition performance. Moreover, contextualized representations help to understand complexities and ambiguity inherent to biomedical texts. Embeddings based on word, concepts, senses, etc. other than those for English are required to improve NER tasks in other languages.

Download Full-text

A Bootstrapping Approach With CRF and Deep Learning Models for Improving the Biomedical Named Entity Recognition in Multi-Domains

IEEE Access ◽

10.1109/access.2019.2914168 ◽

2019 ◽

Vol 7 ◽

pp. 70308-70318 ◽

Cited By ~ 4

Author(s):

Juae Kim ◽

Youngjoong Ko ◽

Jungyun Seo

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Models ◽

Named Entity ◽

Biomedical Named Entity Recognition

Download Full-text

Faculty Opinions recommendation of Clinical named entity recognition using deep learning models.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.733368460.793546823 ◽

2018 ◽

Author(s):

Nigel Collier

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Models ◽

Named Entity

Download Full-text

A hybrid deep learning framework for bacterial named entity recognition with domain features

BMC Bioinformatics ◽

10.1186/s12859-019-3071-3 ◽

2019 ◽

Vol 20 (S16) ◽

Cited By ~ 2

Author(s):

Xusheng Li ◽

Chengcheng Fu ◽

Ran Zhong ◽

Duo Zhong ◽

Tingting He ◽

...

Keyword(s):

Deep Learning ◽

Short Term Memory ◽

Named Entity Recognition ◽

Biomedical Literature ◽

Entity Recognition ◽

Named Entity ◽

Learning Framework ◽

Part Of Speech ◽

Interaction Extraction ◽

Memory Network

Abstract Background Microbes have been shown to play a crucial role in various ecosystems. Many human diseases have been proved to be associated with bacteria, so it is essential to extract the interaction between bacteria for medical research and application. At the same time, many bacterial interactions with certain experimental evidences have been reported in biomedical literature. Integrating this knowledge into a database or knowledge graph could accelerate the progress of biomedical research. A crucial and necessary step in interaction extraction (IE) is named entity recognition (NER). However, due to the specificity of bacterial naming, there are still challenges in bacterial named entity recognition. Results In this paper, we propose a novel method for bacterial named entity recognition, which integrates domain features into a deep learning framework combining bidirectional long short-term memory network and convolutional neural network. When domain features are not added, F1-measure of the model achieves 89.14%. After part-of-speech (POS) features and dictionary features are added, F1-measure of the model achieves 89.7%. Hence, our model achieves an advanced performance in bacterial NER with the domain features. Conclusions We propose an efficient method for bacterial named entity recognition which combines domain features and deep learning models. Compared with the previous methods, the effect of our model has been improved. At the same time, the process of complex manual extraction and feature design are significantly reduced.

Download Full-text

Chinese named entity recognition model based on BERT

MATEC Web of Conferences ◽

10.1051/matecconf/202133606021 ◽

2021 ◽

Vol 336 ◽

pp. 06021

Author(s):

Hongshuai Liu ◽

Ge Jun ◽

Yuanyuan Zheng

Keyword(s):

Deep Learning ◽

Language Model ◽

Named Entity Recognition ◽

Experimental Results ◽

Entity Recognition ◽

Global Information ◽

Learning Models ◽

Named Entity ◽

Whole Word ◽

Document Level

Nowadays, most deep learning models ignore Chinese habits and global information when processing Chinese tasks. To solve this problem, we constructed the BERT-BiLSTM-Attention-CRF model. In the model, we embeded the BERT pre-training language model that adopts the Whole Word Mask strategy, and added a document-level attention. Experimental results show that our method achieves good results in the MSRA corpus, and F1 reaches 95.00%.

Download Full-text

Deep learning with language models improves named entity recognition for PharmaCoNER

BMC Bioinformatics ◽

10.1186/s12859-021-04260-y ◽

2021 ◽

Vol 22 (S1) ◽

Author(s):

Cong Sun ◽

Zhihao Yang ◽

Lei Wang ◽

Yin Zhang ◽

Hongfei Lin ◽

...

Keyword(s):

Deep Learning ◽

Language Processing ◽

Domain Knowledge ◽

Named Entity Recognition ◽

Model Performance ◽

Relation Extraction ◽

Entity Recognition ◽

Language Models ◽

Named Entity ◽

Biomedical Texts

Abstract Background The recognition of pharmacological substances, compounds and proteins is essential for biomedical relation extraction, knowledge graph construction, drug discovery, as well as medical question answering. Although considerable efforts have been made to recognize biomedical entities in English texts, to date, only few limited attempts were made to recognize them from biomedical texts in other languages. PharmaCoNER is a named entity recognition challenge to recognize pharmacological entities from Spanish texts. Because there are currently abundant resources in the field of natural language processing, how to leverage these resources to the PharmaCoNER challenge is a meaningful study. Methods Inspired by the success of deep learning with language models, we compare and explore various representative BERT models to promote the development of the PharmaCoNER task. Results The experimental results show that deep learning with language models can effectively improve model performance on the PharmaCoNER dataset. Our method achieves state-of-the-art performance on the PharmaCoNER dataset, with a max F1-score of 92.01%. Conclusion For the BERT models on the PharmaCoNER dataset, biomedical domain knowledge has a greater impact on model performance than the native language (i.e., Spanish). The BERT models can obtain competitive performance by using WordPiece to alleviate the out of vocabulary limitation. The performance on the BERT model can be further improved by constructing a specific vocabulary based on domain knowledge. Moreover, the character case also has a certain impact on model performance.

Download Full-text

Named entity recognition in medical texts in Russian using deep learning models

Bioinformatics of Genome Regulation and Structure/ Systems Biology ◽

10.18699/bgrs/sb-2020-114 ◽

2020 ◽

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Models ◽

Medical Texts ◽

Named Entity

Download Full-text

Applying Citizen Science to Gene, Drug, Disease Relationship Extraction from Biomedical Abstracts

10.1101/564187 ◽

2019 ◽

Author(s):

Ginger Tsueng ◽

Max Nanis ◽

Jennifer T. Fouquier ◽

Michael Mayers ◽

Benjamin M. Good ◽

...

Keyword(s):

Information Extraction ◽

Citizen Science ◽

Computational Methods ◽

Language Processing ◽

Named Entity Recognition ◽

Biomedical Literature ◽

Entity Recognition ◽

Named Entity ◽

Relationship Extraction ◽

Biomedical Texts

AbstractBiomedical literature is growing at a rate that outpaces our ability to harness the knowledge contained therein. In order to mine valuable inferences from the large volume of literature, many researchers have turned to information extraction algorithms to harvest information in biomedical texts. Information extraction is usually accomplished via a combination of manual expert curation and computational methods. Advances in computational methods usually depends on the generation of gold standards by a limited number of expert curators. This process can be time consuming and represents an area of biomedical research that is ripe for exploration with citizen science. Citizen scientists have been previously found to be willing and capable of performing named entity recognition of disease mentions in biomedical abstracts, but it was uncertain whether or not the same could be said of relationship extraction. Relationship extraction requires training on identifying named entities as well as a deeper understanding of how different entity types can relate to one another. Here, we used the web-based application Mark2Cure (https://mark2cure.org) to demonstrate that citizen scientists can perform relationship extraction and confirm the importance of accurate named entity recognition on this task. We also discuss opportunities for future improvement of this system, as well as the potential synergies between citizen science, manual biocuration, and natural language processing.

Download Full-text

Abstract PR-12: Towards verifying results from biomedical deep learning models using the UMLS: Cases of primary tumor site classification and cancer Named Entity Recognition

10.1158/1557-3265.adi21-pr-12 ◽

2021 ◽

Author(s):

Joan Byamugisha ◽

Waheeda Saib ◽

Theodore Gaelejwe ◽

Asad Jeewa ◽

Maletsabisa Molapo

Keyword(s):

Deep Learning ◽

Primary Tumor ◽

Named Entity Recognition ◽

Entity Recognition ◽

Site Classification ◽

Tumor Site ◽

Learning Models ◽

Primary Tumor Site ◽

Named Entity

Download Full-text

Named Entity Recognition Method for Fault Knowledge based on Deep Learning

Proceedings of the 4th International Conference on Machine Learning and Soft Computing ◽

10.1145/3380688.3380690 ◽

2020 ◽

Author(s):

Zhicheng Chen ◽

Xiaobao Liu ◽

Yanchao Yin ◽

Hongbiao Lu

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Recognition Method ◽

Named Entity ◽

Knowledge Based

Download Full-text

Named Entity Recognition and Relation Extraction

ACM Computing Surveys ◽

10.1145/3445965 ◽

2021 ◽

Vol 54 (1) ◽

pp. 1-39

Author(s):

Zara Nasar ◽

Syed Waqar Jaffry ◽

Muhammad Kamran Malik

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Named Entity Recognition ◽

Relation Extraction ◽

The State ◽

Entity Recognition ◽

Joint Models ◽

Named Entity ◽

Textual Data ◽

Benchmark Datasets

With the advent of Web 2.0, there exist many online platforms that result in massive textual-data production. With ever-increasing textual data at hand, it is of immense importance to extract information nuggets from this data. One approach towards effective harnessing of this unstructured textual data could be its transformation into structured text. Hence, this study aims to present an overview of approaches that can be applied to extract key insights from textual data in a structured way. For this, Named Entity Recognition and Relation Extraction are being majorly addressed in this review study. The former deals with identification of named entities, and the latter deals with problem of extracting relation between set of entities. This study covers early approaches as well as the developments made up till now using machine learning models. Survey findings conclude that deep-learning-based hybrid and joint models are currently governing the state-of-the-art. It is also observed that annotated benchmark datasets for various textual-data generators such as Twitter and other social forums are not available. This scarcity of dataset has resulted into relatively less progress in these domains. Additionally, the majority of the state-of-the-art techniques are offline and computationally expensive. Last, with increasing focus on deep-learning frameworks, there is need to understand and explain the under-going processes in deep architectures.

Download Full-text