scholarly journals Improving Distant Supervised Relation Extraction with Noise Detection Strategy

2021 ◽  
Vol 11 (5) ◽  
pp. 2046
Author(s):  
Xiaoyan Meng ◽  
Tonghai Jiang ◽  
Xi Zhou ◽  
Bo Ma ◽  
Yi Wang ◽  
...  

Distant supervised relation extraction (DSRE) is widely used to extract novel relational facts from plain text, so as to improve the knowledge graph. However, distant supervision inevitably suffers from the noisy labeling problem that will severely damage the performance of relation extraction. Currently, most DSRE methods are mainly focused on reducing the weights of noisy sentences, ignoring the bag-level noise where all sentences in a bag are wrongly labeled. In this paper, we present a novel noise detection-based relation extraction approach (NDRE) to automatically detect noisy labels with entity information and dynamically correct them, which can alleviate both instance-level and bag-level noisy problems. By this means, we can extend the dataset from the Web tables without introducing more noise. In this approach, to embed the semantics of sentences from corpus and web tables, we firstly propose a powerful sentence coder that employs an internal multi-head self-attention mechanism between the piecewise max-pooling convolutional neural network. Second, we adopt a noise detection strategy, which is expected to dynamically detect and correct the original noisy label according to the similarity between sentence representation and entity-aware embeddings. Then, we aggregate the information from corpus and web tables to make the final relation prediction. Experimental results on a public benchmark dataset demonstrate that our proposed approach achieves significant improvements over the state-of-the-art baselines and can effectively reduce the noisy labeling problem.

2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Tiantian Chen ◽  
Nianbin Wang ◽  
Hongbin Wang ◽  
Haomin Zhan

Distant supervision (DS) has been widely used for relation extraction (RE), which automatically generates large-scale labeled data. However, there is a wrong labeling problem, which affects the performance of RE. Besides, the existing method suffers from the lack of useful semantic features for some positive training instances. To address the above problems, we propose a novel RE model with sentence selection and interaction representation for distantly supervised RE. First, we propose a pattern method based on the relation trigger words as a sentence selector to filter out noisy sentences to alleviate the wrong labeling problem. After clean instances are obtained, we propose the interaction representation using the word-level attention mechanism-based entity pairs to dynamically increase the weights of the words related to entity pairs, which can provide more useful semantic information for relation prediction. The proposed model outperforms the strongest baseline by 2.61 in F1-score on a widely used dataset, which proves that our model performs significantly better than the state-of-the-art RE systems.


Author(s):  
Wanhua Cao ◽  
Yi Zhang ◽  
Juntao Liu ◽  
Ziyun Rao

Knowledge graph embedding improves the performance of relation extraction and knowledge reasoning by encoding entities and relationships in low-dimensional semantic space. During training, negative samples are usually constructed by replacing the head/tail entity. And the different replacing relationships lead to different accuracy of the prediction results. This paper develops a negative triplets construction framework according to the frequency of relational association entities. The proposed construction framework can fully consider the quantitative of relations and entities in the dataset to assign the proportion of relation and entity replacement and the frequency of the entities associated with each relationship to set reasonable proportions for different relations. To verify the validity of the proposed construction framework, it is integrated into the state-of-the-art knowledge graph embedding models, such as TransE, TransH, DistMult, ComplEx, and Analogy. And both the evaluation criteria of relation prediction and entity prediction are used to evaluate the performance of link prediction more comprehensively. The experimental results on two commonly used datasets, WN18 and FB15K, show that the proposed method improves entity link and triplet classification accuracy, especially the accuracy of relational link prediction.


Author(s):  
Gaetano Rossiello ◽  
Alfio Gliozzo ◽  
Michael Glass

We propose a novel approach to learn representations of relations expressed by their textual mentions. In our assumption, if two pairs of entities belong to the same relation, then those two pairs are analogous. We collect a large set of analogous pairs by matching triples in knowledge bases with web-scale corpora through distant supervision. This dataset is adopted to train a hierarchical siamese network in order to learn entity-entity embeddings which encode relational information through the different linguistic paraphrasing expressing the same relation. The model can be used to generate pre-trained embeddings which provide a valuable signal when integrated into an existing neural-based model by outperforming the state-of-the-art methods on a relation extraction task.


2020 ◽  
Vol 2020 ◽  
pp. 1-9 ◽  
Author(s):  
Nada Boudjellal ◽  
Huaping Zhang ◽  
Asif Khan ◽  
Arshad Ahmad

With the accelerating growth of big data, especially in the healthcare area, information extraction is more needed currently than ever, for it can convey unstructured information into an easily interpretable structured data. Relation extraction is the second of the two important tasks of relation extraction. This study presents an overview of relation extraction using distant supervision, providing a generalized architecture of this task based on the state-of-the-art work that proposed this method. Besides, it surveys the methods used in the literature targeting this topic with a description of different knowledge bases used in the process along with the corpora, which can be helpful for beginner practitioners seeking knowledge on this subject. Moreover, the limitations of the proposed approaches and future challenges were highlighted, and possible solutions were proposed.


2022 ◽  
Vol 12 (2) ◽  
pp. 715
Author(s):  
Luodi Xie ◽  
Huimin Huang ◽  
Qing Du

Knowledge graph (KG) embedding has been widely studied to obtain low-dimensional representations for entities and relations. It serves as the basis for downstream tasks, such as KG completion and relation extraction. Traditional KG embedding techniques usually represent entities/relations as vectors or tensors, mapping them in different semantic spaces and ignoring the uncertainties. The affinities between entities and relations are ambiguous when they are not embedded in the same latent spaces. In this paper, we incorporate a co-embedding model for KG embedding, which learns low-dimensional representations of both entities and relations in the same semantic space. To address the issue of neglecting uncertainty for KG components, we propose a variational auto-encoder that represents KG components as Gaussian distributions. In addition, compared with previous methods, our method has the advantages of high quality and interpretability. Our experimental results on several benchmark datasets demonstrate our model’s superiority over the state-of-the-art baselines.


2021 ◽  
Author(s):  
Qingwen Tian ◽  
Shixing Zhou ◽  
Yu Cheng ◽  
Jianxia Chen ◽  
Yi Gao ◽  
...  

Knowledge Graph is a semantic network that reveals the relationship between entities, which construction is to describe various entities, concepts and their relationships in the real world. Since knowledge graph can effectively reveal the relationship between the different knowledge items, it has been widely utilized in the intelligent education. In particular, relation extraction is the critical part of knowledge graph and plays a very important role in the construction of knowledge graph. According to the different magnitude of data labeling, entity relationship extraction tasks of deep learning can be divided into two categories: supervised and distant supervised. Supervised learning approaches can extract effective entity relationships. However, these approaches rely on labeled data heavily resulting in the time-consuming and laborconsuming. The distant supervision approach is widely concerned by researchers because it can generate the entity relation extraction automatically. However, the development and application of the distant supervised approach has been seriously hindered due to the noises, lack of information and disequilibrium in the relation extraction tasks. Inspired by the above analysis, the paper proposes a novel curriculum points relationship extraction model based on the distant supervision. In particular, firstly the research of the distant supervised relationship extraction model based on the sentence bag attention mechanism to extract the relationship of curriculum points. Secondly, the research of knowledge graph construction based on the knowledge ontology. Thirdly, the development of curriculum semantic retrieval platform based on Web. Compared with the existing advanced models, the AUC of this system is increased by 14.2%; At the same time, taking "big data processing" course in computer field as an example, the relationship extraction result with F1 value of 88.1% is realized. The experimental results show that the proposed model provides an effective solution for the development and application of knowledge graph in the field of intelligent education.


Author(s):  
Yujin Yuan ◽  
Liyuan Liu ◽  
Siliang Tang ◽  
Zhongfei Zhang ◽  
Yueting Zhuang ◽  
...  

Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C2SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.


2018 ◽  
Author(s):  
Guanying Wang ◽  
Wen Zhang ◽  
Ruoxu Wang ◽  
Yalin Zhou ◽  
Xi Chen ◽  
...  

Author(s):  
Juan-Luis García-Mendoza ◽  
Luis Villaseñor-Pineda ◽  
Felipe Orihuela-Espina ◽  
Lázaro Bustio-Martínez

Distant Supervision is an approach that allows automatic labeling of instances. This approach has been used in Relation Extraction. Still, the main challenge of this task is handling instances with noisy labels (e.g., when two entities in a sentence are automatically labeled with an invalid relation). The approaches reported in the literature addressed this problem by employing noise-tolerant classifiers. However, if a noise reduction stage is introduced before the classification step, this increases the macro precision values. This paper proposes an Adversarial Autoencoders-based approach for obtaining a new representation that allows noise reduction in Distant Supervision. The representation obtained using Adversarial Autoencoders minimize the intra-cluster distance concerning pre-trained embeddings and classic Autoencoders. Experiments demonstrated that in the noise-reduced datasets, the macro precision values obtained over the original dataset are similar using fewer instances considering the same classifier. For example, in one of the noise-reduced datasets, the macro precision was improved approximately 2.32% using 77% of the original instances. This suggests the validity of using Adversarial Autoencoders to obtain well-suited representations for noise reduction. Also, the proposed approach maintains the macro precision values concerning the original dataset and reduces the total instances needed for classification.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Guowei Shen ◽  
Wanling Wang ◽  
Qilin Mu ◽  
Yanhong Pu ◽  
Ya Qin ◽  
...  

Industrial control systems (ICS) involve many key industries, which once attacked will cause heavy losses. However, traditional passive defense methods of cybersecurity have difficulty effectively dealing with increasingly complex threats; a knowledge graph is a new idea to analyze and process data in cybersecurity analysis. We propose a novel overall framework of data-driven industrial control network security defense, which integrated fragmented multisource threat data with an industrial network layout by a cybersecurity knowledge graph. In order to better correlate data to construct a knowledge graph, we propose a distant supervised relation extraction model ResPCNN-ATT; it is based on a deep residual convolutional neural network and attention mechanism, reduces the influence of noisy data in distant supervision, and better extracts deep semantic features in sentences by using deep residuals. We empirically demonstrate the performance of the proposed method in the field of general cybersecurity by using dataset CSER; the model proposed in this paper achieves higher accuracy than other models. And then, the dataset ICSER was used to construct a cybersecurity knowledge graph (CSKG) on the basis of analyzing specific industrial control scenarios, visualizing the knowledge graph for further security analysis to the industrial control system.


Sign in / Sign up

Export Citation Format

Share Document