Drug Knowledge Extraction Framework with Entity Pair Calibration for Chinese Drug Instructions

Author(s):  
Xiaoliang Zhang ◽  
Lunsheng Zhou ◽  
Feng Gao ◽  
Zhongmin Wang ◽  
Yongqing Wang ◽  
...  

Abstract Existing pharmaceutical information extraction research often focus on standalone entity or relationship identification tasks over drug instructions. There is a lack of a holistic solution for drug knowledge extraction. Moreover, current methods perform poorly in extracting fine-grained interaction relations from drug instructions. To solve these problems, this paper proposes an information extraction framework for drug instructions. The framework proposes deep learning models with fine-tuned pre-training models for entity recognition and relation extraction, in addition, it incorporates an novel entity pair calibration process to promote the performance for fine-grained relation extraction. The framework experiments on more than 60k Chinese drug description sentences from 4000 drug instructions. Empirical results show that the framework can successfully identify drug related entities (F1 >= 0.95) and their relations (F1 >= 0.83) from the realistic dataset, and the entity pair calibration plays an important role (~5% F1 score improvement) in extracting fine-grained relations.

2021 ◽  
Vol 21 (S9) ◽  
Author(s):  
Dongfang Li ◽  
Ying Xiong ◽  
Baotian Hu ◽  
Buzhou Tang ◽  
Weihua Peng ◽  
...  

Abstract Background Drug repurposing is to find new indications of approved drugs, which is essential for investigating new uses for approved or investigational drug efficiency. The active gene annotation corpus (named AGAC) is annotated by human experts, which was developed to support knowledge discovery for drug repurposing. The AGAC track of the BioNLP Open Shared Tasks using this corpus is organized by EMNLP-BioNLP 2019, where the “Selective annotation” attribution makes AGAC track more challenging than other traditional sequence labeling tasks. In this work, we show our methods for trigger word detection (Task 1) and its thematic role identification (Task 2) in the AGAC track. As a step forward to drug repurposing research, our work can also be applied to large-scale automatic extraction of medical text knowledge. Methods To meet the challenges of the two tasks, we consider Task 1 as the medical name entity recognition (NER), which cultivates molecular phenomena related to gene mutation. And we regard Task 2 as a relation extraction task, which captures the thematic roles between entities. In this work, we exploit pre-trained biomedical language representation models (e.g., BioBERT) in the information extraction pipeline for mutation-disease knowledge collection from PubMed. Moreover, we design the fine-tuning framework by using a multi-task learning technique and extra features. We further investigate different approaches to consolidate and transfer the knowledge from varying sources and illustrate the performance of our model on the AGAC corpus. Our approach is based on fine-tuned BERT, BioBERT, NCBI BERT, and ClinicalBERT using multi-task learning. Further experiments show the effectiveness of knowledge transformation and the ensemble integration of models of two tasks. We conduct a performance comparison of various algorithms. We also do an ablation study on the development set of Task 1 to examine the effectiveness of each component of our method. Results Compared with competitor methods, our model obtained the highest Precision (0.63), Recall (0.56), and F-score value (0.60) in Task 1, which ranks first place. It outperformed the baseline method provided by the organizers by 0.10 in F-score. The model shared the same encoding layers for the named entity recognition and relation extraction parts. And we obtained a second high F-score (0.25) in Task 2 with a simple but effective framework. Conclusions Experimental results on the benchmark annotation of genes with active mutation-centric function changes corpus show that integrating pre-trained biomedical language representation models (i.e., BERT, NCBI BERT, ClinicalBERT, BioBERT) into a pipe of information extraction methods with multi-task learning can improve the ability to collect mutation-disease knowledge from PubMed.


2015 ◽  
Vol 8 (2) ◽  
pp. 1-15 ◽  
Author(s):  
Aicha Ghoulam ◽  
Fatiha Barigou ◽  
Ghalem Belalem

Information Extraction (IE) is a natural language processing (NLP) task whose aim is to analyse texts written in natural language to extract structured and useful information such as named entities and semantic relations between them. Information extraction is an important task in a diverse set of applications like bio-medical literature mining, customer care, community websites, personal information management and so on. In this paper, the authors focus only on information extraction from clinical reports. The two most fundamental tasks in information extraction are discussed; namely, named entity recognition task and relation extraction task. The authors give details about the most used rule/pattern-based and machine learning techniques for each task. They also make comparisons between these techniques and summarize the advantages and disadvantages of each one.


Author(s):  
Kecheng Zhan ◽  
Weihua Peng ◽  
Ying Xiong ◽  
Huhao Fu ◽  
Qingcai Chen ◽  
...  

BACKGROUND Family history (FH) information, including family members, side of family of family members, living status of family members, observations of family members, etc., plays a significant role in disease diagnosis and treatment. Family member information extraction aims to extract FH information from semi-structured/unstructured text in electronic health records (EHRs), which is a challenging task regarding named entity recognition (NER) and relation extraction (RE), where NE refers to family members, living status and observations, and relation refers to relations between family members and living status, and relations between family members and observations. OBJECTIVE This study aims to explore the ways to effectively extract family history information from clinical text. METHODS Inspired by dependency parsing, we design a novel graph-based schema to represent FH information and introduced deep biaffine attention to extract FH information in clinical text. In the deep biaffine attention model, we use CNN-BiLSTM (Convolutional Neural Network-Bidirectional Long Short Term Memory network) and BERT (Bidirectional Encoder Representation from Transformers) to encode input sentences, and deployed biaffine classifier to extract FH information. In addition, we also develop a post-processing module to adjust results. A system based on the proposed method was developed for the 2019 n2c2/OHNLP shared task track on FH information extraction, which includes two subtasks on entity recognition and relation extraction respectively. RESULTS We conduct experiments on the corpus provided by the 2019 n2c2/OHNLP shared task track on FH information extraction. Our system achieved the highest F1-scores of 0.8823 on subtask 1 and 0.7048 on subtask 2, respectively, new benchmark results on the 2019 n2c2/OHNLP corpus. CONCLUSIONS This study designed a novel Schema to represent FH information using graph and applied deep biaffine attention to extract FH information. Experimental results show the effectiveness of deep biaffine attention on FH information extraction.


Author(s):  
Hao Fei ◽  
Yafeng Ren ◽  
Yue Zhang ◽  
Donghong Ji ◽  
Xiaohui Liang

Abstract Biomedical information extraction (BioIE) is an important task. The aim is to analyze biomedical texts and extract structured information such as named entities and semantic relations between them. In recent years, pre-trained language models have largely improved the performance of BioIE. However, they neglect to incorporate external structural knowledge, which can provide rich factual information to support the underlying understanding and reasoning for biomedical information extraction. In this paper, we first evaluate current extraction methods, including vanilla neural networks, general language models and pre-trained contextualized language models on biomedical information extraction tasks, including named entity recognition, relation extraction and event extraction. We then propose to enrich a contextualized language model by integrating a large scale of biomedical knowledge graphs (namely, BioKGLM). In order to effectively encode knowledge, we explore a three-stage training procedure and introduce different fusion strategies to facilitate knowledge injection. Experimental results on multiple tasks show that BioKGLM consistently outperforms state-of-the-art extraction models. A further analysis proves that BioKGLM can capture the underlying relations between biomedical knowledge concepts, which are crucial for BioIE.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Xintong Zhao ◽  
Jane Greenberg ◽  
Vanessa Meschke ◽  
Eric Toberer ◽  
Xiaohua Hu

Purpose The output of academic literature has increased significantly due to digital technology, presenting researchers with a challenge across every discipline, including materials science, as it is impossible to manually read and extract knowledge from millions of published literature. The purpose of this study is to address this challenge by exploring knowledge extraction in materials science, as applied to digital scholarship. An overriding goal is to help inform readers about the status knowledge extraction in materials science. Design/methodology/approach The authors conducted a two-part analysis, comparing knowledge extraction methods applied materials science scholarship, across a sample of 22 articles; followed by a comparison of HIVE-4-MAT, an ontology-based knowledge extraction and MatScholar, a named entity recognition (NER) application. This paper covers contextual background, and a review of three tiers of knowledge extraction (ontology-based, NER and relation extraction), followed by the research goals and approach. Findings The results indicate three key needs for researchers to consider for advancing knowledge extraction: the need for materials science focused corpora; the need for researchers to define the scope of the research being pursued, and the need to understand the tradeoffs among different knowledge extraction methods. This paper also points to future material science research potential with relation extraction and increased availability of ontologies. Originality/value To the best of the authors’ knowledge, there are very few studies examining knowledge extraction in materials science. This work makes an important contribution to this underexplored research area.


2020 ◽  
Vol 34 (05) ◽  
pp. 9225-9232
Author(s):  
Wenya Wang ◽  
Sinno Jialin Pan

Information extraction (IE) aims to produce structured information from an input text, e.g., Named Entity Recognition and Relation Extraction. Various attempts have been proposed for IE via feature engineering or deep learning. However, most of them fail to associate the complex relationships inherent in the task itself, which has proven to be especially crucial. For example, the relation between 2 entities is highly dependent on their entity types. These dependencies can be regarded as complex constraints that can be efficiently expressed as logical rules. To combine such logic reasoning capabilities with learning capabilities of deep neural networks, we propose to integrate logical knowledge in the form of first-order logic into a deep learning system, which can be trained jointly in an end-to-end manner. The integrated framework is able to enhance neural outputs with knowledge regularization via logic rules, and at the same time update the weights of logic rules to comply with the characteristics of the training data. We demonstrate the effectiveness and generalization of the proposed model on multiple IE tasks.


Author(s):  
Shan Zhao ◽  
Minghao Hu ◽  
Zhiping Cai ◽  
Fang Liu

Joint extraction of entities and their relations benefits from the close interaction between named entities and their relation information. Therefore, how to effectively model such cross-modal interactions is critical for the final performance. Previous works have used simple methods such as label-feature concatenation to perform coarse-grained semantic fusion among cross-modal instances, but fail to capture fine-grained correlations over token and label spaces, resulting in insufficient interactions. In this paper, we propose a deep Cross-Modal Attention Network (CMAN) for joint entity and relation extraction. The network is carefully constructed by stacking multiple attention units in depth to fully model dense interactions over token-label spaces, in which two basic attention units are proposed to explicitly capture fine-grained correlations across different modalities (e.g., token-to-token and labelto-token). Experiment results on CoNLL04 dataset show that our model obtains state-of-the-art results by achieving 90.62% F1 on entity recognition and 72.97% F1 on relation classification. In ADE dataset, our model surpasses existing approaches by more than 1.9% F1 on relation classification. Extensive analyses further confirm the effectiveness of our approach.


Sign in / Sign up

Export Citation Format

Share Document