CovidBERT-Biomedical Relation Extraction for Covid-19

Given the ongoing pandemic of Covid-19 which has had a devastating impact on society and the economy, and the explosive growth of biomedical literature, there has been a growing need to find suitable medical treatments and therapeutics in a short period of time. Developing new treatments and therapeutics can be expensive and a time consuming process. It can be practical to re-purpose existing approved drugs and put them in clinical trial. Hence we propose CovidBERT, a biomedical relationship extraction model based on BERT that extracts new relationships between various biomedical entities, namely gene-disease and chemical-disease relationships. We use the transformer architecture to train on Covid-19 related literature and fine-tune it using standard annotated datasets to show improvement in performance from baseline models. This research uses the transformer BERT model as its foundation and extracts relations from newly published biomedical papers.

Download Full-text

PPPred: Classifying Protein-phenotype Co-mentions Extracted from Biomedical Literature

10.1101/654475 ◽

2019 ◽

Author(s):

Morteza Pourreza Shahri ◽

Mandi M. Roe ◽

Gillian Reynolds ◽

Indika Kahanda

Keyword(s):

Relation Extraction ◽

Biomedical Literature ◽

Supervised Machine Learning ◽

Human Phenotype ◽

Unstructured Text ◽

Gold Standard Dataset ◽

Sentence Level ◽

Machine Learning Approach ◽

Human Proteins ◽

Biomedical Relation Extraction

ABSTRACTThe MEDLINE database provides an extensive source of scientific articles and heterogeneous biomedical information in the form of unstructured text. One of the most important knowledge present within articles are the relations between human proteins and their phenotypes, which can stay hidden due to the exponential growth of publications. This has presented a range of opportunities for the development of computational methods to extract these biomedical relations from the articles. However, currently, no such method exists for the automated extraction of relations involving human proteins and human phenotype ontology (HPO) terms. In our previous work, we developed a comprehensive database composed of all co-mentions of proteins and phenotypes. In this study, we present a supervised machine learning approach called PPPred (Protein-Phenotype Predictor) for classifying the validity of a given sentence-level co-mention. Using an in-house developed gold standard dataset, we demonstrate that PPPred significantly outperforms several baseline methods. This two-step approach of co-mention extraction and classification constitutes a complete biomedical relation extraction pipeline for extracting protein-phenotype relations.CCS CONCEPTS•Computing methodologies → Information extraction; Supervised learning by classification; •Applied computing →Bioinformatics;

Download Full-text

Curriculum Semantic Retrieval System based on Distant Supervision

10.5121/csit.2021.111603 ◽

2021 ◽

Author(s):

Qingwen Tian ◽

Shixing Zhou ◽

Yu Cheng ◽

Jianxia Chen ◽

Yi Gao ◽

...

Keyword(s):

Semantic Network ◽

Relation Extraction ◽

Knowledge Graph ◽

Learning Approaches ◽

Semantic Retrieval ◽

Model Based ◽

Distant Supervision ◽

Relationship Extraction ◽

Extraction Model ◽

The Relationship

Knowledge Graph is a semantic network that reveals the relationship between entities, which construction is to describe various entities, concepts and their relationships in the real world. Since knowledge graph can effectively reveal the relationship between the different knowledge items, it has been widely utilized in the intelligent education. In particular, relation extraction is the critical part of knowledge graph and plays a very important role in the construction of knowledge graph. According to the different magnitude of data labeling, entity relationship extraction tasks of deep learning can be divided into two categories: supervised and distant supervised. Supervised learning approaches can extract effective entity relationships. However, these approaches rely on labeled data heavily resulting in the time-consuming and laborconsuming. The distant supervision approach is widely concerned by researchers because it can generate the entity relation extraction automatically. However, the development and application of the distant supervised approach has been seriously hindered due to the noises, lack of information and disequilibrium in the relation extraction tasks. Inspired by the above analysis, the paper proposes a novel curriculum points relationship extraction model based on the distant supervision. In particular, firstly the research of the distant supervised relationship extraction model based on the sentence bag attention mechanism to extract the relationship of curriculum points. Secondly, the research of knowledge graph construction based on the knowledge ontology. Thirdly, the development of curriculum semantic retrieval platform based on Web. Compared with the existing advanced models, the AUC of this system is increased by 14.2%; At the same time, taking "big data processing" course in computer field as an example, the relationship extraction result with F1 value of 88.1% is realized. The experimental results show that the proposed model provides an effective solution for the development and application of knowledge graph in the field of intelligent education.

Download Full-text

Biomedical Literature Mining for Biomedical Relation Extraction

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i8.8493 ◽

2018 ◽

Vol 6 (8) ◽

pp. 84-93

Author(s):

Jahiruddin .

Keyword(s):

Relation Extraction ◽

Biomedical Literature ◽

Literature Mining ◽

Biomedical Literature Mining ◽

Biomedical Relation Extraction

Download Full-text

Document-Level Biomedical Relation Extraction Leveraging Pretrained Self-Attention Structure and Entity Replacement: Algorithm and Pretreatment Method Validation Study

JMIR Medical Informatics ◽

10.2196/17644 ◽

2020 ◽

Vol 8 (5) ◽

pp. e17644

Author(s):

Xiaofeng Liu ◽

Jianye Fan ◽

Shoubin Dong

Keyword(s):

State Of The Art ◽

Relation Extraction ◽

Biomedical Literature ◽

Long Distance ◽

Replacement Method ◽

Replacement Algorithm ◽

Additional Noise ◽

The Relationship ◽

Document Level ◽

Biomedical Relation Extraction

Background The most current methods applied for intrasentence relation extraction in the biomedical literature are inadequate for document-level relation extraction, in which the relationship may cross sentence boundaries. Hence, some approaches have been proposed to extract relations by splitting the document-level datasets through heuristic rules and learning methods. However, these approaches may introduce additional noise and do not really solve the problem of intersentence relation extraction. It is challenging to avoid noise and extract cross-sentence relations. Objective This study aimed to avoid errors by dividing the document-level dataset, verify that a self-attention structure can extract biomedical relations in a document with long-distance dependencies and complex semantics, and discuss the relative benefits of different entity pretreatment methods for biomedical relation extraction. Methods This paper proposes a new data preprocessing method and attempts to apply a pretrained self-attention structure for document biomedical relation extraction with an entity replacement method to capture very long-distance dependencies and complex semantics. Results Compared with state-of-the-art approaches, our method greatly improved the precision. The results show that our approach increases the F1 value, compared with state-of-the-art methods. Through experiments of biomedical entity pretreatments, we found that a model using an entity replacement method can improve performance. Conclusions When considering all target entity pairs as a whole in the document-level dataset, a pretrained self-attention structure is suitable to capture very long-distance dependencies and learn the textual context and complicated semantics. A replacement method for biomedical entities is conducive to biomedical relation extraction, especially to document-level relation extraction.

Download Full-text

Document-Level Biomedical Relation Extraction Leveraging Pretrained Self-Attention Structure and Entity Replacement: Algorithm and Pretreatment Method Validation Study (Preprint)

10.2196/preprints.17644 ◽

2019 ◽

Cited By ~ 1

Author(s):

Xiaofeng Liu ◽

Jianye Fan ◽

Shoubin Dong

Keyword(s):

State Of The Art ◽

Relation Extraction ◽

Biomedical Literature ◽

Long Distance ◽

Replacement Method ◽

Replacement Algorithm ◽

Additional Noise ◽

The Relationship ◽

Document Level ◽

Biomedical Relation Extraction

BACKGROUND The most current methods applied for intrasentence relation extraction in the biomedical literature are inadequate for document-level relation extraction, in which the relationship may cross sentence boundaries. Hence, some approaches have been proposed to extract relations by splitting the document-level datasets through heuristic rules and learning methods. However, these approaches may introduce additional noise and do not really solve the problem of intersentence relation extraction. It is challenging to avoid noise and extract cross-sentence relations. OBJECTIVE This study aimed to avoid errors by dividing the document-level dataset, verify that a self-attention structure can extract biomedical relations in a document with long-distance dependencies and complex semantics, and discuss the relative benefits of different entity pretreatment methods for biomedical relation extraction. METHODS This paper proposes a new data preprocessing method and attempts to apply a pretrained self-attention structure for document biomedical relation extraction with an entity replacement method to capture very long-distance dependencies and complex semantics. RESULTS Compared with state-of-the-art approaches, our method greatly improved the precision. The results show that our approach increases the F1 value, compared with state-of-the-art methods. Through experiments of biomedical entity pretreatments, we found that a model using an entity replacement method can improve performance. CONCLUSIONS When considering all target entity pairs as a whole in the document-level dataset, a pretrained self-attention structure is suitable to capture very long-distance dependencies and learn the textual context and complicated semantics. A replacement method for biomedical entities is conducive to biomedical relation extraction, especially to document-level relation extraction.

Download Full-text

Biomedical relation extraction via knowledge-enhanced reading comprehension

BMC Bioinformatics ◽

10.1186/s12859-021-04534-5 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Jing Chen ◽

Baotian Hu ◽

Weihua Peng ◽

Qingcai Chen ◽

Buzhou Tang

Keyword(s):

Reading Comprehension ◽

Knowledge Representation ◽

Knowledge Integration ◽

Question Answering ◽

Interaction Mechanism ◽

Relation Extraction ◽

Knowledge Bases ◽

Biomedical Literature ◽

Main Research ◽

Biomedical Relation Extraction

Abstract Background In biomedical research, chemical and disease relation extraction from unstructured biomedical literature is an essential task. Effective context understanding and knowledge integration are two main research problems in this task. Most work of relation extraction focuses on classification for entity mention pairs. Inspired by the effectiveness of machine reading comprehension (RC) in the respect of context understanding, solving biomedical relation extraction with the RC framework at both intra-sentential and inter-sentential levels is a new topic worthy to be explored. Except for the unstructured biomedical text, many structured knowledge bases (KBs) provide valuable guidance for biomedical relation extraction. Utilizing knowledge in the RC framework is also worthy to be investigated. We propose a knowledge-enhanced reading comprehension (KRC) framework to leverage reading comprehension and prior knowledge for biomedical relation extraction. First, we generate questions for each relation, which reformulates the relation extraction task to a question answering task. Second, based on the RC framework, we integrate knowledge representation through an efficient knowledge-enhanced attention interaction mechanism to guide the biomedical relation extraction. Results The proposed model was evaluated on the BioCreative V CDR dataset and CHR dataset. Experiments show that our model achieved a competitive document-level F1 of 71.18% and 93.3%, respectively, compared with other methods. Conclusion Result analysis reveals that open-domain reading comprehension data and knowledge representation can help improve biomedical relation extraction in our proposed KRC framework. Our work can encourage more research on bridging reading comprehension and biomedical relation extraction and promote the biomedical relation extraction.

Download Full-text