Knowledge Distillation with Teacher Multi-task Model for Biomedical Named Entity Recognition

Improving Biomedical Named Entity Recognition with Label Re-correction and Knowledge Distillation

10.21203/rs.3.rs-125685/v1 ◽

2020 ◽

Author(s):

Huiwei Zhou ◽

Zhe Liu ◽

Chengkun Lang ◽

Yingyu Lin ◽

Junjie Hou

Keyword(s):

High Performance ◽

Named Entity Recognition ◽

Knowledge Bases ◽

Entity Recognition ◽

High Quality ◽

Named Entity ◽

Iterative Correction ◽

Comparison Results ◽

Knowledge Distillation ◽

Biomedical Named Entity Recognition

Abstract Background: Biomedical named entities recognition is one of the most essential tasks in biomedical information extraction. Previous studies suffer from inadequate annotation datasets, especially the limited knowledge contained in them. Methods: To remedy the above issue, we propose a novel Chemical and Disease Named Entity Recognition (CDNER) framework with label re-correction and knowledge distillation strategies, which could not only create large and high-quality datasets but also obtain a high-performance entity recognition model. Our framework is inspired by two points: 1) named entity recognition should be considered from the perspective of both coverage and accuracy; 2) trustable annotations should be yielded by iterative correction. Firstly, for coverage, we annotate chemical and disease entities in a large unlabeled dataset by PubTator to generate a weakly labeled dataset. For accuracy, we then filter it by utilizing multiple knowledge bases to generate another dataset. Next, the two datasets are revised by a label re-correction strategy to construct two high-quality datasets, which are used to train two CDNER models, respectively. Finally, we compress the knowledge in the two models into a single model with knowledge distillation. Results: Experiments on the BioCreative V chemical-disease relation corpus show that knowledge from large datasets significantly improves CDNER performance, leading to new state-of-the-art results.Conclusions: We propose a framework with label re-correction and knowledge distillation strategies. Comparison results show that the two perspectives of knowledge in the two re-corrected datasets respectively are complementary and both effective for biomedical named entity recognition.

Download Full-text

Improving the recall of biomedical named entity recognition with label re-correction and knowledge distillation

BMC Bioinformatics ◽

10.1186/s12859-021-04200-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Huiwei Zhou ◽

Zhe Liu ◽

Chengkun Lang ◽

Yibin Xu ◽

Yingyu Lin ◽

...

Keyword(s):

Large Scale ◽

Named Entity Recognition ◽

Knowledge Bases ◽

Entity Recognition ◽

High Quality ◽

Recognition Model ◽

Named Entity ◽

Comparison Results ◽

Knowledge Distillation ◽

Biomedical Named Entity Recognition

Abstract Background Biomedical named entity recognition is one of the most essential tasks in biomedical information extraction. Previous studies suffer from inadequate annotated datasets, especially the limited knowledge contained in them. Methods To remedy the above issue, we propose a novel Biomedical Named Entity Recognition (BioNER) framework with label re-correction and knowledge distillation strategies, which could not only create large and high-quality datasets but also obtain a high-performance recognition model. Our framework is inspired by two points: (1) named entity recognition should be considered from the perspective of both coverage and accuracy; (2) trustable annotations should be yielded by iterative correction. Firstly, for coverage, we annotate chemical and disease entities in a large-scale unlabeled dataset by PubTator to generate a weakly labeled dataset. For accuracy, we then filter it by utilizing multiple knowledge bases to generate another weakly labeled dataset. Next, the two datasets are revised by a label re-correction strategy to construct two high-quality datasets, which are used to train two recognition models, respectively. Finally, we compress the knowledge in the two models into a single recognition model with knowledge distillation. Results Experiments on the BioCreative V chemical-disease relation corpus and NCBI Disease corpus show that knowledge from large-scale datasets significantly improves the performance of BioNER, especially the recall of it, leading to new state-of-the-art results. Conclusions We propose a framework with label re-correction and knowledge distillation strategies. Comparison results show that the two perspectives of knowledge in the two re-corrected datasets respectively are complementary and both effective for BioNER.

Download Full-text

Biomedical Named Entity Recognition with Tri-Training Learning

2009 2nd International Conference on Biomedical Engineering and Informatics ◽

10.1109/bmei.2009.5304799 ◽

2009 ◽

Cited By ~ 3

Author(s):

YueHong Cai ◽

XianYi Cheng

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Biomedical Named Entity Recognition

Download Full-text

An Overview of Technological Revolution in Deep Learning Architectures for Biomedical Named Entity Recognition

10.1109/asiancon51346.2021.9544823 ◽

2021 ◽

Author(s):

T. Mathu ◽

Kumudha Raimond ◽

S. Jeba Priya

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Technological Revolution ◽

Named Entity ◽

Learning Architectures ◽

Biomedical Named Entity Recognition

Download Full-text

CRFs based parallel biomedical named entity recognition algorithm employing MapReduce framework

Cluster Computing ◽

10.1007/s10586-015-0426-z ◽

2015 ◽

Vol 18 (2) ◽

pp. 493-505 ◽

Cited By ~ 18

Author(s):

Zhuo Tang ◽

Lingang Jiang ◽

Li Yang ◽

Kenli Li ◽

Keqin Li

Keyword(s):

Named Entity Recognition ◽

Recognition Algorithm ◽

Entity Recognition ◽

Mapreduce Framework ◽

Named Entity ◽

Biomedical Named Entity Recognition

Download Full-text

D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information

Bioinformatics ◽

10.1093/bioinformatics/bty356 ◽

2018 ◽

Vol 34 (20) ◽

pp. 3539-3546 ◽

Cited By ~ 25

Author(s):

Thanh Hai Dang ◽

Hoang-Quynh Le ◽

Trang M Nguyen ◽

Sinh T Vu

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Linguistic Information ◽

Named Entity ◽

Biomedical Named Entity Recognition

Download Full-text

NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding

npj Systems Biology and Applications ◽

10.1038/s41540-021-00200-x ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Kanix Wang ◽

Robert Stevens ◽

Halima Alachram ◽

Yu Li ◽

Larisa Soldatova ◽

...

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Analysis Tool ◽

Automated Extraction ◽

Named Entities ◽

Named Entity ◽

Automated Knowledge ◽

Biomedical Texts ◽

Machine Reading ◽

Biomedical Named Entity Recognition

AbstractMachine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades1,2, the most dramatic advances in MR have followed in the wake of critical corpus development3. Large, well-annotated corpora have been associated with punctuated advances in MR methodology and automated knowledge extraction systems in the same way that ImageNet4 was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named entity analysis tool for biomedicine: (a) a new, Named Entity Recognition Ontology (NERO) developed specifically for describing textual entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named entity recognition (NER) automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus.

Download Full-text

BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition

10.21203/rs.3.rs-90025/v1 ◽

2020 ◽

Author(s):

Usman Naseem ◽

Matloob Khushi ◽

Vinay Reddy ◽

Sakthivel Rajendran ◽

Imran Razzak ◽

...

Keyword(s):

State Of The Art ◽

Language Model ◽

Named Entity Recognition ◽

Training Data ◽

Entity Recognition ◽

Future Research ◽

Named Entity ◽

Domain Specific ◽

Context Dependent ◽

Biomedical Named Entity Recognition

Abstract Background: In recent years, with the growing amount of biomedical documents, coupled with advancement in natural language processing algorithms, the research on biomedical named entity recognition (BioNER) has increased exponentially. However, BioNER research is challenging as NER in the biomedical domain are: (i) often restricted due to limited amount of training data, (ii) an entity can refer to multiple types and concepts depending on its context and, (iii) heavy reliance on acronyms that are sub-domain specific. Existing BioNER approaches often neglect these issues and directly adopt the state-of-the-art (SOTA) models trained in general corpora which often yields unsatisfactory results. Results: We propose biomedical ALBERT (A Lite Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) - bioALBERT - an effective domain-specific pre-trained language model trained on huge biomedical corpus designed to capture biomedical context-dependent NER. We adopted self-supervised loss function used in ALBERT that targets on modelling inter-sentence coherence to better learn context-dependent representations and incorporated parameter reduction strategies to minimise memory usage and enhance the training time in BioNER. In our experiments, BioALBERT outperformed comparative SOTA BioNER models on eight biomedical NER benchmark datasets with four different entity types. The performance is increased for; (i) disease type corpora by 7.47% (NCBI-disease) and 10.63% (BC5CDR-disease); (ii) drug-chem type corpora by 4.61% (BC5CDR-Chem) and 3.89 (BC4CHEMD); (iii) gene-protein type corpora by 12.25% (BC2GM) and 6.42% (JNLPBA); and (iv) Species type corpora by 6.19% (LINNAEUS) and 23.71% (Species-800) is observed which leads to a state-of-the-art results. Conclusions: The performance of proposed model on four different biomedical entity types shows that our model is robust and generalizable in recognizing biomedical entities in text. We trained four different variants of BioALBERT models which are available for the research community to be used in future research.

Download Full-text

A Federated Adversarial Learning Method for Biomedical Named Entity Recognition

10.1109/bibm52615.2021.9669728 ◽

2021 ◽

Author(s):

Hanyu Zhao ◽

Sha Yuan ◽

Niantao Xie ◽

Jiahong Leng ◽

Guoqiang Wang

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Learning Method ◽

Adversarial Learning ◽

Named Entity ◽

Biomedical Named Entity Recognition

Download Full-text

A Literature Survey on Biomedical Named Entity Recognition

Advances in Power Systems and Energy Management - Lecture Notes in Electrical Engineering ◽

10.1007/978-981-15-7504-4_12 ◽

2021 ◽

pp. 109-119

Author(s):

Saurabh Suman ◽

Adyasha Dash ◽

Siddharth Swarup Rautaray

Keyword(s):

Named Entity Recognition ◽

Literature Survey ◽

Entity Recognition ◽

Named Entity ◽

Biomedical Named Entity Recognition

Download Full-text