SpanBERT: Improving Pre-training by Representing and Predicting Spans

We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERTlarge, our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0 respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6% F1), strong performance on the TACRED relation extraction benchmark, and even gains on GLUE. 1

Download Full-text

Relevance-guided Supervision for OpenQA with ColBERT

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00405 ◽

2021 ◽

Vol 9 ◽

pp. 929-944

Author(s):

Omar Khattab ◽

Christopher Potts ◽

Matei Zaharia

Keyword(s):

Question Answering ◽

State Of The Art ◽

Training Data ◽

Coarse Grained ◽

Retrieval Model ◽

Open Domain ◽

Weak Supervision ◽

Fine Grained ◽

Vector Representations ◽

Large Corpus

Abstract Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages. We argue that this modeling choice is insufficiently expressive for dealing with the complexity of natural language questions. To address this, we define ColBERT-QA, which adapts the scalable neural retrieval model ColBERT to OpenQA. ColBERT creates fine-grained interactions between questions and passages. We propose an efficient weak supervision strategy that iteratively uses ColBERT to create its own training data. This greatly improves OpenQA retrieval on Natural Questions, SQuAD, and TriviaQA, and the resulting system attains state-of-the-art extractive OpenQA performance on all three datasets.

Download Full-text

QASC: A Dataset for Question Answering via Sentence Composition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6319 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8082-8090

Author(s):

Tushar Khot ◽

Peter Clark ◽

Michal Guerquin ◽

Peter Jansen ◽

Ashish Sabharwal

Keyword(s):

Common Sense ◽

Human Performance ◽

Question Answering ◽

State Of The Art ◽

Multiple Choice ◽

Training Data ◽

Language Models ◽

Current State ◽

New Concepts ◽

Large Corpus

Composing knowledge from multiple pieces of texts is a key challenge in multi-hop question answering. We present a multi-hop reasoning dataset, Question Answering via Sentence Composition (QASC), that requires retrieving facts from a large corpus and composing them to answer a multiple-choice question. QASC is the first dataset to offer two desirable properties: (a) the facts to be composed are annotated in a large corpus, and (b) the decomposition into these facts is not evident from the question itself. The latter makes retrieval challenging as the system must introduce new concepts or relations in order to discover potential decompositions. Further, the reasoning model must then learn to identify valid compositions of these retrieved facts using common-sense reasoning. To help address these challenges, we provide annotation for supporting facts as well as their composition. Guided by these annotations, we present a two-step approach to mitigate the retrieval challenges. We use other multiple-choice datasets as additional training data to strengthen the reasoning model. Our proposed approach improves over current state-of-the-art language models by 11% (absolute). The reasoning and retrieval problems, however, remain unsolved as this model still lags by 20% behind human performance.

Download Full-text

Towards relation extraction from Arabic text: a review

International Robotics & Automation Journal ◽

10.15406/iratj.2019.05.00195 ◽

2019 ◽

Vol 5 (5) ◽

pp. 212-215

Author(s):

Abeer AlArfaj

Keyword(s):

Language Processing ◽

Question Answering ◽

State Of The Art ◽

Relation Extraction ◽

Arabic Language ◽

Extraction Methods ◽

The State ◽

Semantic Relations ◽

Arabic Text ◽

Taxonomic Relation

Semantic relation extraction is an important component of ontologies that can support many applications e.g. text mining, question answering, and information extraction. However, extracting semantic relations between concepts is not trivial and one of the main challenges in Natural Language Processing (NLP) Field. The Arabic language has complex morphological, grammatical, and semantic aspects since it is a highly inflectional and derivational language, which makes task even more challenging. In this paper, we present a review of the state of the art for relation extraction from texts, addressing the progress and difficulties in this field. We discuss several aspects related to this task, considering the taxonomic and non-taxonomic relation extraction methods. Majority of relation extraction approaches implement a combination of statistical and linguistic techniques to extract semantic relations from text. We also give special attention to the state of the work on relation extraction from Arabic texts, which need further progress.

Download Full-text

Cross-Relation Cross-Bag Attention for Distantly-Supervised Relation Extraction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301419 ◽

2019 ◽

Vol 33 ◽

pp. 419-426 ◽

Cited By ~ 6

Author(s):

Yujin Yuan ◽

Liyuan Liu ◽

Siliang Tang ◽

Zhongfei Zhang ◽

Yueting Zhuang ◽

...

Keyword(s):

Selective Attention ◽

Supervised Learning ◽

State Of The Art ◽

Relation Extraction ◽

Knowledge Bases ◽

Training Data ◽

Distant Supervision ◽

Sentence Level ◽

Noise Robust

Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C2SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.

Download Full-text

Automatically Paraphrasing via Sentence Reconstruction and Round-trip Translation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/525 ◽

2021 ◽

Author(s):

Zilu Guo ◽

Zhongqiang Huang ◽

Kenny Q. Zhu ◽

Guandan Chen ◽

Kaibo Zhang ◽

...

Keyword(s):

Machine Translation ◽

Question Answering ◽

Domain Adaptation ◽

State Of The Art ◽

Training Data ◽

Round Trip ◽

Previous State ◽

Supervised Methods ◽

Paraphrase Generation ◽

Better Than

Paraphrase generation plays key roles in NLP tasks such as question answering, machine translation, and information retrieval. In this paper, we propose a novel framework for paraphrase generation. It simultaneously decodes the output sentence using a pretrained wordset-to-sequence model and a round-trip translation model. We evaluate this framework on Quora, WikiAnswers, MSCOCO and Twitter, and show its advantage over previous state-of-the-art unsupervised methods and distantly-supervised methods by significant margins on all datasets. For Quora and WikiAnswers, our framework even performs better than some strongly supervised methods with domain adaptation. Further, we show that the generated paraphrases can be used to augment the training data for machine translation to achieve substantial improvements.

Download Full-text

Joint Extraction of Entities and Overlapping Relations Using Position-Attentive Sequence Labeling

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016300 ◽

2019 ◽

Vol 33 ◽

pp. 6300-6308 ◽

Cited By ~ 5

Author(s):

Dai Dai ◽

Xinyan Xiao ◽

Yajuan Lyu ◽

Shan Dou ◽

Qiaoqiao She ◽

...

Keyword(s):

Long Range ◽

State Of The Art ◽

Relation Extraction ◽

Attention Mechanism ◽

Single Model ◽

Sequence Labeling ◽

Word Position ◽

Query Word ◽

Public Datasets ◽

Extraction Model

Joint entity and relation extraction is to detect entity and relation using a single model. In this paper, we present a novel unified joint extraction model which directly tags entity and relation labels according to a query word position p, i.e., detecting entity at p, and identifying entities at other positions that have relationship with the former. To this end, we first design a tagging scheme to generate n tag sequences for an n-word sentence. Then a position-attention mechanism is introduced to produce different sentence representations for every query position to model these n tag sequences. In this way, our method can simultaneously extract all entities and their type, as well as all overlapping relations. Experiment results show that our framework performances significantly better on extracting overlapping relations as well as detecting long-range relation, and thus we achieve state-of-the-art performance on two public datasets.

Download Full-text

CAWA: An Attention-Network for Credit Attribution

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6367 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8472-8479

Author(s):

Saurav Manchanda ◽

George Karypis

Keyword(s):

State Of The Art ◽

Text Summarization ◽

Training Data ◽

Learning Framework ◽

Distant Supervision ◽

Sentence Level ◽

Class Labels ◽

The Individual ◽

Traditional Approaches ◽

Better Than

Credit attribution is the task of associating individual parts in a document with their most appropriate class labels. It is an important task with applications to information retrieval and text summarization. When labeled training data is available, traditional approaches for sequence tagging can be used for credit attribution. However, generating such labeled datasets is expensive and time-consuming. In this paper, we present Credit Attribution With Attention (CAWA), a neural-network-based approach, that instead of using sentence-level labeled data, uses the set of class labels that are associated with an entire document as a source of distant-supervision. CAWA combines an attention mechanism with a multilabel classifier into an end-to-end learning framework to perform credit attribution. CAWA labels the individual sentences from the input document using the resultant attention-weights. CAWA improves upon the state-of-the-art credit attribution approach by not constraining a sentence to belong to just one class, but modeling each sentence as a distribution over all classes, leading to better modeling of semantically-similar classes. Experiments on the credit attribution task on a variety of datasets show that the sentence class labels generated by CAWA outperform the competing approaches. Additionally, on the multilabel text classification task, CAWA performs better than the competing credit attribution approaches1.

Download Full-text

Simultaneously Linking Entities and Extracting Relations from Biomedical Text without Mention-Level Supervision

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6236 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7407-7414

Author(s):

Trapit Bansal ◽

Pat Verga ◽

Neha Choudhary ◽

Andrew McCallum

Keyword(s):

State Of The Art ◽

Relation Extraction ◽

Training Data ◽

Biomedical Text ◽

Entity Linking ◽

Distant Supervision ◽

Entity Relationships ◽

And Training

Understanding the meaning of text often involves reasoning about entities and their relationships. This requires identifying textual mentions of entities, linking them to a canonical concept, and discerning their relationships. These tasks are nearly always viewed as separate components within a pipeline, each requiring a distinct model and training data. While relation extraction can often be trained with readily available weak or distant supervision, entity linkers typically require expensive mention-level supervision – which is not available in many domains. Instead, we propose a model which is trained to simultaneously produce entity linking and relation decisions while requiring no mention-level annotations. This approach avoids cascading errors that arise from pipelined methods and more accurately predicts entity relationships from text. We show that our model outperforms a state-of-the art entity linking and relation extraction pipeline on two biomedical datasets and can drastically improve the overall recall of the system.

Download Full-text

An Evaluation of State-of-the-Art Approaches to Relation Extraction for Usage on Domain-Specific Corpora

10.5121/csit.2021.112006 ◽

2021 ◽

Author(s):

Christoph Brandl ◽

Jens Albrecht ◽

Renato Budinich

Keyword(s):

State Of The Art ◽

Extraction Procedure ◽

Named Entity Recognition ◽

Relation Extraction ◽

Building Blocks ◽

Training Data ◽

Entity Recognition ◽

Data Set ◽

Named Entity ◽

Domain Specific

The task of relation extraction aims at classifying the semantic relations between entities in a text. When coupled with named-entity recognition these can be used as the building blocks for an information extraction procedure that results in the construction of a Knowledge Graph. While many NLP libraries support named-entity recognition, there is no off-the-shelf solution for relation extraction. In this paper, we evaluate and compare several state-of-the-art approaches on a subset of the FewRel data set as well as a manually annotated corpus. The custom corpus contains six relations from the area of market research and is available for public use. Our approach provides guidance for the selection of models and training data for relation extraction in realworld projects.

Download Full-text

Are Noisy Sentences Useless for Distant Supervised Relation Extraction?

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6407 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8799-8806

Author(s):

Yuming Shang ◽

He-Yan Huang ◽

Xian-Ling Mao ◽

Xin Sun ◽

Wei Wei

Keyword(s):

Selective Attention ◽

State Of The Art ◽

Relation Extraction ◽

The State ◽

Benchmark Dataset ◽

Experimental Results ◽

Training Data ◽

High Confidence ◽

Feature Representations ◽

Novel Method

The noisy labeling problem has been one of the major obstacles for distant supervised relation extraction. Existing approaches usually consider that the noisy sentences are useless and will harm the model's performance. Therefore, they mainly alleviate this problem by reducing the influence of noisy sentences, such as applying bag-level selective attention or removing noisy sentences from sentence-bags. However, the underlying cause of the noisy labeling problem is not the lack of useful information, but the missing relation labels. Intuitively, if we can allocate credible labels for noisy sentences, they will be transformed into useful training data and benefit the model's performance. Thus, in this paper, we propose a novel method for distant supervised relation extraction, which employs unsupervised deep clustering to generate reliable labels for noisy sentences. Specifically, our model contains three modules: a sentence encoder, a noise detector and a label generator. The sentence encoder is used to obtain feature representations. The noise detector detects noisy sentences from sentence-bags, and the label generator produces high-confidence relation labels for noisy sentences. Extensive experimental results demonstrate that our model outperforms the state-of-the-art baselines on a popular benchmark dataset, and can indeed alleviate the noisy labeling problem.

Download Full-text