MiDTD: A Simple and Effective Distillation Framework for Distantly Supervised Relation Extraction

Relation extraction (RE), an important information extraction task, faced the great challenge brought by limited annotation data. To this end, distant supervision was proposed to automatically label RE data, and thus largely increased the number of annotated instances. Unfortunately, lots of noise relation annotations brought by automatic labeling become a new obstacle. Some recent studies have shown that the teacher-student framework of knowledge distillation can alleviate the interference of noise relation annotations via label softening. Nevertheless, we find that they still suffer from two problems: propagation of inaccurate dark knowledge and constraint of a unified distillation temperature . In this article, we propose a simple and effective Multi-instance Dynamic Temperature Distillation (MiDTD) framework, which is model-agnostic and mainly involves two modules: multi-instance target fusion (MiTF) and dynamic temperature regulation (DTR). MiTF combines the teacher’s predictions for multiple sentences with the same entity pair to amend the inaccurate dark knowledge in each student’s target. DTR allocates alterable distillation temperatures to different training instances to enable the softness of most student’s targets to be regulated to a moderate range. In experiments, we construct three concrete MiDTD instantiations with BERT, PCNN, and BiLSTM-based RE models, and the distilled students significantly outperform their teachers and the state-of-the-art (SOTA) methods.

Download Full-text

Biomedical Relation Extraction Using Distant Supervision

Scientific Programming ◽

10.1155/2020/8893749 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Nada Boudjellal ◽

Huaping Zhang ◽

Asif Khan ◽

Arshad Ahmad

Keyword(s):

Big Data ◽

Information Extraction ◽

State Of The Art ◽

Relation Extraction ◽

Knowledge Bases ◽

Structured Data ◽

Distant Supervision ◽

Future Challenges ◽

Unstructured Information ◽

Biomedical Relation Extraction

With the accelerating growth of big data, especially in the healthcare area, information extraction is more needed currently than ever, for it can convey unstructured information into an easily interpretable structured data. Relation extraction is the second of the two important tasks of relation extraction. This study presents an overview of relation extraction using distant supervision, providing a generalized architecture of this task based on the state-of-the-art work that proposed this method. Besides, it surveys the methods used in the literature targeting this topic with a description of different knowledge bases used in the process along with the corpora, which can be helpful for beginner practitioners seeking knowledge on this subject. Moreover, the limitations of the proposed approaches and future challenges were highlighted, and possible solutions were proposed.

Download Full-text

Learning to Transfer Relational Representations through Analogy

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.330110015 ◽

2019 ◽

Vol 33 ◽

pp. 10015-10016

Author(s):

Gaetano Rossiello ◽

Alfio Gliozzo ◽

Michael Glass

Keyword(s):

State Of The Art ◽

Relation Extraction ◽

Knowledge Bases ◽

The State ◽

Large Set ◽

Relational Information ◽

Siamese Network ◽

Distant Supervision ◽

Novel Approach ◽

Art Methods

We propose a novel approach to learn representations of relations expressed by their textual mentions. In our assumption, if two pairs of entities belong to the same relation, then those two pairs are analogous. We collect a large set of analogous pairs by matching triples in knowledge bases with web-scale corpora through distant supervision. This dataset is adopted to train a hierarchical siamese network in order to learn entity-entity embeddings which encode relational information through the different linguistic paraphrasing expressing the same relation. The model can be used to generate pre-trained embeddings which provide a valuable signal when integrated into an existing neural-based model by outperforming the state-of-the-art methods on a relation extraction task.

Download Full-text

Cross-Relation Cross-Bag Attention for Distantly-Supervised Relation Extraction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301419 ◽

2019 ◽

Vol 33 ◽

pp. 419-426 ◽

Cited By ~ 6

Author(s):

Yujin Yuan ◽

Liyuan Liu ◽

Siliang Tang ◽

Zhongfei Zhang ◽

Yueting Zhuang ◽

...

Keyword(s):

Selective Attention ◽

Supervised Learning ◽

State Of The Art ◽

Relation Extraction ◽

Knowledge Bases ◽

Training Data ◽

Distant Supervision ◽

Sentence Level ◽

Noise Robust

Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C2SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.

Download Full-text

Distant Supervision for Relation Extraction with Sentence Selection and Interaction Representation

Wireless Communications and Mobile Computing ◽

10.1155/2021/8889075 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Tiantian Chen ◽

Nianbin Wang ◽

Hongbin Wang ◽

Haomin Zhan

Keyword(s):

Large Scale ◽

Semantic Information ◽

State Of The Art ◽

Relation Extraction ◽

Semantic Features ◽

Distant Supervision ◽

Word Level ◽

Proposed Model ◽

Relation Prediction ◽

Better Than

Distant supervision (DS) has been widely used for relation extraction (RE), which automatically generates large-scale labeled data. However, there is a wrong labeling problem, which affects the performance of RE. Besides, the existing method suffers from the lack of useful semantic features for some positive training instances. To address the above problems, we propose a novel RE model with sentence selection and interaction representation for distantly supervised RE. First, we propose a pattern method based on the relation trigger words as a sentence selector to filter out noisy sentences to alleviate the wrong labeling problem. After clean instances are obtained, we propose the interaction representation using the word-level attention mechanism-based entity pairs to dynamically increase the weights of the words related to entity pairs, which can provide more useful semantic information for relation prediction. The proposed model outperforms the strongest baseline by 2.61 in F1-score on a widely used dataset, which proves that our model performs significantly better than the state-of-the-art RE systems.

Download Full-text

Simultaneously Linking Entities and Extracting Relations from Biomedical Text without Mention-Level Supervision

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6236 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7407-7414

Author(s):

Trapit Bansal ◽

Pat Verga ◽

Neha Choudhary ◽

Andrew McCallum

Keyword(s):

State Of The Art ◽

Relation Extraction ◽

Training Data ◽

Biomedical Text ◽

Entity Linking ◽

Distant Supervision ◽

Entity Relationships ◽

And Training

Understanding the meaning of text often involves reasoning about entities and their relationships. This requires identifying textual mentions of entities, linking them to a canonical concept, and discerning their relationships. These tasks are nearly always viewed as separate components within a pipeline, each requiring a distinct model and training data. While relation extraction can often be trained with readily available weak or distant supervision, entity linkers typically require expensive mention-level supervision – which is not available in many domains. Instead, we propose a model which is trained to simultaneously produce entity linking and relation decisions while requiring no mention-level annotations. This approach avoids cascading errors that arise from pipelined methods and more accurately predicts entity relationships from text. We show that our model outperforms a state-of-the art entity linking and relation extraction pipeline on two biomedical datasets and can drastically improve the overall recall of the system.

Download Full-text

Bias Modeling for Distantly Supervised Relation Extraction

Mathematical Problems in Engineering ◽

10.1155/2015/969053 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

Yang Xiang ◽

Yaoyun Zhang ◽

Xiaolong Wang ◽

Yang Qin ◽

Wenying Han

Keyword(s):

Language Processing ◽

Learning Algorithm ◽

State Of The Art ◽

Relation Extraction ◽

Knowledge Bases ◽

Training Data ◽

Free Text ◽

Distant Supervision ◽

Annotation Process ◽

Noise Tolerant

Distant supervision (DS) automatically annotates free text with relation mentions from existing knowledge bases (KBs), providing a way to alleviate the problem of insufficient training data for relation extraction in natural language processing (NLP). However, the heuristic annotation process does not guarantee the correctness of the generated labels, promoting a hot research issue on how to efficiently make use of the noisy training data. In this paper, we model two types of biases to reduce noise: (1)bias-distto model the relative distance between points (instances) and classes (relation centers); (2)bias-rewardto model the possibility of each heuristically generated label being incorrect. Based on the biases, we propose three noise tolerant models:MIML-dist,MIML-dist-classify, andMIML-reward, building on top of a state-of-the-art distantly supervised learning algorithm. Experimental evaluations compared with three landmark methods on the KBP dataset validate the effectiveness of the proposed methods.

Download Full-text

Hierarchical Knowledge Squeezed Adversarial Network Compression

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6799 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11370-11377

Author(s):

Peng Li ◽

Chang Shu ◽

Yuan Xie ◽

Yan Qu ◽

Hui Kong

Keyword(s):

State Of The Art ◽

Teacher Student ◽

Adversarial Network ◽

Benchmark Datasets ◽

Knowledge Distillation ◽

Adversarial Training ◽

Rich Information ◽

Process Oriented ◽

Transfer Method ◽

Network Compression

Deep network compression has been achieved notable progress via knowledge distillation, where a teacher-student learning manner is adopted by using predetermined loss. Recently, more focuses have been transferred to employ the adversarial training to minimize the discrepancy between distributions of output from two networks. However, they always emphasize on result-oriented learning while neglecting the scheme of process-oriented learning, leading to the loss of rich information contained in the whole network pipeline. Whereas in other (non GAN-based) process-oriented methods, the knowledge have usually been transferred in a redundant manner. Observing that, the small network can not perfectly mimic a large one due to the huge gap of network scale, we propose a knowledge transfer method, involving effective intermediate supervision, under the adversarial training framework to learn the student network. Different from the other intermediate supervision methods, we design the knowledge representation in a compact form by introducing a task-driven attention mechanism. Meanwhile, to improve the representation capability of the attention-based method, a hierarchical structure is utilized so that powerful but highly squeezed knowledge is realized and the knowledge from teacher network could accommodate the size of student network. Extensive experimental results on three typical benchmark datasets, i.e., CIFAR-10, CIFAR-100, and ImageNet, demonstrate that our method achieves highly superior performances against state-of-the-art methods.

Download Full-text

Improving Distant Supervised Relation Extraction with Noise Detection Strategy

Applied Sciences ◽

10.3390/app11052046 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2046

Author(s):

Xiaoyan Meng ◽

Tonghai Jiang ◽

Xi Zhou ◽

Bo Ma ◽

Yi Wang ◽

...

Keyword(s):

State Of The Art ◽

Relation Extraction ◽

Knowledge Graph ◽

Noise Detection ◽

Plain Text ◽

Distant Supervision ◽

Detection Strategy ◽

Relation Prediction ◽

Noisy Labels ◽

The Web

Distant supervised relation extraction (DSRE) is widely used to extract novel relational facts from plain text, so as to improve the knowledge graph. However, distant supervision inevitably suffers from the noisy labeling problem that will severely damage the performance of relation extraction. Currently, most DSRE methods are mainly focused on reducing the weights of noisy sentences, ignoring the bag-level noise where all sentences in a bag are wrongly labeled. In this paper, we present a novel noise detection-based relation extraction approach (NDRE) to automatically detect noisy labels with entity information and dynamically correct them, which can alleviate both instance-level and bag-level noisy problems. By this means, we can extend the dataset from the Web tables without introducing more noise. In this approach, to embed the semantics of sentences from corpus and web tables, we firstly propose a powerful sentence coder that employs an internal multi-head self-attention mechanism between the piecewise max-pooling convolutional neural network. Second, we adopt a noise detection strategy, which is expected to dynamically detect and correct the original noisy label according to the similarity between sentence representation and entity-aware embeddings. Then, we aggregate the information from corpus and web tables to make the final relation prediction. Experimental results on a public benchmark dataset demonstrate that our proposed approach achieves significant improvements over the state-of-the-art baselines and can effectively reduce the noisy labeling problem.

Download Full-text

Utilizing Entity-Based Gated Convolution and Multilevel Sentence Attention to Improve Distantly Supervised Relation Extraction

Computational Intelligence and Neuroscience ◽

10.1155/2021/6110885 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Qian Yi ◽

Guixuan Zhang ◽

Shuwu Zhang

Keyword(s):

Selective Attention ◽

Large Scale ◽

State Of The Art ◽

Relation Extraction ◽

Benchmark Dataset ◽

Experimental Results ◽

Fine Grained ◽

Convolution Operation ◽

Distant Supervision ◽

Extraction Model

Distant supervision is an effective method to automatically collect large-scale datasets for relation extraction (RE). Automatically constructed datasets usually comprise two types of noise: the intrasentence noise and the wrongly labeled noisy sentence. To address issues caused by the above two types of noise and improve distantly supervised relation extraction, this paper proposes a novel distantly supervised relation extraction model, which consists of an entity-based gated convolution sentence encoder and a multilevel sentence selective attention (Matt) module. Specifically, we first apply an entity-based gated convolution operation to force the sentence encoder to extract entity-pair-related features and filter out useless intrasentence noise information. Furthermore, the multilevel attention schema fuses the bag information to obtain a fine-grained bag-specific query vector, which can better identify valid sentences and reduce the influence of wrongly labeled sentences. Experimental results on a large-scale benchmark dataset show that our model can effectively reduce the influence of the above two types of noise and achieves state-of-the-art performance in relation extraction.

Download Full-text

Modeling Missing Data in Distant Supervision for Information Extraction

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00234 ◽

2013 ◽

Vol 1 ◽

pp. 367-378 ◽

Cited By ~ 29

Author(s):

Alan Ritter ◽

Luke Zettlemoyer ◽

Mausam ◽

Oren Etzioni

Keyword(s):

Missing Data ◽

Information Extraction ◽

Latent Variable ◽

Side Information ◽

Relation Extraction ◽

Distant Supervision ◽

Variable Approach ◽

Unary Relation ◽

Improved Performance ◽

Search Approach

Distant supervision algorithms learn information extraction models given only large readily available databases and text collections. Most previous work has used heuristics for generating labeled data, for example assuming that facts not contained in the database are not mentioned in the text, and facts in the database must be mentioned at least once. In this paper, we propose a new latent-variable approach that models missing data. This provides a natural way to incorporate side information, for instance modeling the intuition that text will often mention rare entities which are likely to be missing in the database. Despite the added complexity introduced by reasoning about missing data, we demonstrate that a carefully designed local search approach to inference is very accurate and scales to large datasets. Experiments demonstrate improved performance for binary and unary relation extraction when compared to learning with heuristic labels, including on average a 27% increase in area under the precision recall curve in the binary case.

Download Full-text