Minimally Supervised Novel Relation Extraction Using a Latent Relational Mapping

AbstractA vast amount of usable electronic data is in the form of unstructured text. The relation extraction task aims to identify useful information in text (e.g. PersonW works for OrganisationX, GeneY encodes ProteinZ) and recode it in a format such as a relational database or RDF triplestore that can be more effectively used for querying and automated reasoning. A number of resources have been developed for training and evaluating automatic systems for relation extraction in different domains. However, comparative evaluation is impeded by the fact that these corpora use different markup formats and notions of what constitutes a relation. We describe the preparation of corpora for comparative evaluation of relation extraction across domains based on the publicly available ACE 2004, ACE 2005 and BioInfer data sets. We present a common document type using token standoff and including detailed linguistic markup, while maintaining all information in the original annotation. The subsequent reannotation process normalises the two data sets so that they comply with a notion of relation that is intuitive, simple and informed by the semantic web. For the ACE data, we describe an automatic process that automatically converts many relations involving nested, nominal entity mentions to relations involving non-nested, named or pronominal entity mentions. For example, the first entity is mapped from ‘one’ to ‘Amidu Berry’ in the membership relation described in ‘Amidu Berry, one half of PBS’. Moreover, we describe a comparably reannotated version of the BioInfer corpus that flattens nested relations, maps part-whole to part-part relations and maps n-ary to binary relations. Finally, we summarise experiments that compare approaches to generic relation extraction, a knowledge discovery task that uses minimally supervised techniques to achieve maximally portable extractors. These experiments illustrate the utility of the corpora.1

Download Full-text

Analysis and Improvement of Minimally Supervised Machine Learning for Relation Extraction

Natural Language Processing and Information Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-642-12550-8_2 ◽

2010 ◽

pp. 8-23 ◽

Cited By ~ 5

Author(s):

Hans Uszkoreit ◽

Feiyu Xu ◽

Hong Li

Keyword(s):

Machine Learning ◽

Relation Extraction ◽

Supervised Machine Learning ◽

Minimally Supervised

Download Full-text

Mitigating the Effect of Out-of-Vocabulary Entity Pairs in Matrix Factorization for KB Inference

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/573 ◽

2018 ◽

Cited By ~ 1

Author(s):

Prachi Jain ◽

Shikhar Murty ◽

Mausam . ◽

Soumen Chakrabarti

Keyword(s):

Knowledge Base ◽

Hybrid Model ◽

Matrix Factorization ◽

Relation Extraction ◽

Tensor Factorization ◽

Evaluation Protocol

This paper analyzes the varied performance of Matrix Factorization (MF) on the related tasks of relation extraction and knowledge-base completion, which have been unified recently into a single framework of knowledge-base inference (KBI) [Toutanova et al., 2015]. We first propose a new evaluation protocol that makes comparisons between MF and Tensor Factorization (TF) models fair. We find that this results in a steep drop in MF performance. Our analysis attributes this to the high out-of-vocabulary (OOV) rate of entity pairs in test folds of commonly-used datasets. To alleviate this issue, we propose three extensions to MF. Our best model is a TF-augmented MF model. This hybrid model is robust and obtains strong results across various KBI datasets.

Download Full-text