FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization

Machine reading comprehension (MRC) is a challenging natural language processing (NLP) task. It has a wide application potential in the fields of question answering robots, human-computer interactions in mobile virtual reality systems, etc. Recently, the emergence of pretrained models (PTMs) has brought this research field into a new era, in which the training objective plays a key role. The masked language model (MLM) is a self-supervised training objective widely used in various PTMs. With the development of training objectives, many variants of MLM have been proposed, such as whole word masking, entity masking, phrase masking, and span masking. In different MLMs, the length of the masked tokens is different. Similarly, in different machine reading comprehension tasks, the length of the answer is also different, and the answer is often a word, phrase, or sentence. Thus, in MRC tasks with different answer lengths, whether the length of MLM is related to performance is a question worth studying. If this hypothesis is true, it can guide us on how to pretrain the MLM with a relatively suitable mask length distribution for MRC tasks. In this paper, we try to uncover how much of MLM’s success in the machine reading comprehension tasks comes from the correlation between masking length distribution and answer length in the MRC dataset. In order to address this issue, herein, (1) we propose four MRC tasks with different answer length distributions, namely, the short span extraction task, long span extraction task, short multiple-choice cloze task, and long multiple-choice cloze task; (2) four Chinese MRC datasets are created for these tasks; (3) we also have pretrained four masked language models according to the answer length distributions of these datasets; and (4) ablation experiments are conducted on the datasets to verify our hypothesis. The experimental results demonstrate that our hypothesis is true. On four different machine reading comprehension datasets, the performance of the model with correlation length distribution surpasses the model without correlation.

Download Full-text

Query-focused Abstractive Summarization via Question-answering Model

10.1109/ickg52313.2021.00065 ◽

2021 ◽

Author(s):

JianCheng Du ◽

Yang Gao

Keyword(s):

Question Answering ◽

Abstractive Summarization

Download Full-text

COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization

npj Digital Medicine ◽

10.1038/s41746-021-00437-0 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Andre Esteva ◽

Anuprit Kale ◽

Romain Paulus ◽

Kazuma Hashimoto ◽

Wenpeng Yin ◽

...

Keyword(s):

Information Retrieval ◽

Deep Learning ◽

Question Answering ◽

Health Workers ◽

Complex Queries ◽

Global Pandemic ◽

Multi Stage ◽

Scientific Disciplines ◽

Abstractive Summarization ◽

Deep Learning Model

AbstractThe COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines. Throughout 2020, over 400,000 coronavirus-related publications have been collected through the COVID-19 Open Research Dataset. Here, we present CO-Search, a semantic, multi-stage, search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers and avoiding misinformation during a time of crisis. CO-Search is built from two sequential parts: a hybrid semantic-keyword retriever, which takes an input query and returns a sorted list of the 1000 most relevant documents, and a re-ranker, which further orders them by relevance. The retriever is composed of a deep learning model (Siamese-BERT) that encodes query-level meaning, along with two keyword-based models (BM25, TF-IDF) that emphasize the most important words of a query. The re-ranker assigns a relevance score to each document, computed from the outputs of (1) a question–answering module which gauges how much each document answers the query, and (2) an abstractive summarization module which determines how well a query matches a generated summary of the document. To account for the relatively limited dataset, we develop a text augmentation technique which splits the documents into pairs of paragraphs and the citations contained in them, creating millions of (citation title, paragraph) tuples for training the retriever. We evaluate our system (http://einstein.ai/covid) on the data of the TREC-COVID information retrieval challenge, obtaining strong performance across multiple key information retrieval metrics.

Download Full-text

Using Question Answering Rewards to Improve Abstractive Summarization

10.18653/v1/2021.findings-emnlp.47 ◽

2021 ◽

Author(s):

Chulaka Gunasekara ◽

Guy Feigenblat ◽

Benjamin Sznajder ◽

Ranit Aharonov ◽

Sachindra Joshi

Keyword(s):

Question Answering ◽

Abstractive Summarization

Download Full-text

Neural Abstractive Summarization with Structural Attention

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/514 ◽

2020 ◽

Author(s):

Tanya Chowdhury ◽

Sachin Kumar ◽

Tanmoy Chakraborty

Keyword(s):

Question Answering ◽

Popular Opinion ◽

Document Summarization ◽

Community Question Answering ◽

Proposed Model ◽

Abstractive Summarization

Attentional, RNN-based encoder-decoder architectures have obtained impressive performance on abstractive summarization of news articles. However, these methods fail to account for long term dependencies within the sentences of a document. This problem is exacerbated in multi-document summarization tasks such as summarizing the popular opinion in threads present in community question answering (CQA) websites such as Yahoo! Answers and Quora. These threads contain answers which often overlap or contradict each other. In this work, we present a hierarchical encoder based on structural attention to model such inter-sentence and inter-document dependencies. We set the popular pointer-generator architecture and some of the architectures derived from it as our baselines and show that they fail to generate good summaries in a multi-document setting. We further illustrate that our proposed model achieves significant improvement over the baseline in both single and multi-document summarization settings -- in the former setting, it beats the baseline by 1.31 and 7.8 ROUGE-1 points on CNN and CQA datasets, respectively; in the latter setting, the performance is further improved by 1.6 ROUGE-1 points on the CQA dataset.

Download Full-text

Improving Factual Consistency of Abstractive Summarization via Question Answering

10.18653/v1/2021.acl-long.536 ◽

2021 ◽

Author(s):

Feng Nan ◽

Cicero Nogueira dos Santos ◽

Henghui Zhu ◽

Patrick Ng ◽

Kathleen McKeown ◽

...

Keyword(s):

Question Answering ◽

Abstractive Summarization

Download Full-text

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/553 ◽

2020 ◽

Author(s):

Dongling Xiao ◽

Han Zhang ◽

Yukun Li ◽

Yu Sun ◽

Hao Tian ◽

...

Keyword(s):

Natural Language ◽

Question Answering ◽

Natural Language Generation ◽

Training Data ◽

Fine Tuning ◽

Training Methods ◽

Question Generation ◽

Language Generation ◽

Source Codes ◽

Abstractive Summarization

Current pre-training works in natural language generation pay little attention to the problem of exposure bias on downstream tasks. To address this issue, we propose an enhanced multi-flow sequence to sequence pre-training and fine-tuning framework named ERNIE-GEN, which bridges the discrepancy between training and inference with an infilling generation mechanism and a noise-aware generation method. To make generation closer to human writing patterns, this framework introduces a span-by-span generation flow that trains the model to predict semantically-complete spans consecutively rather than predicting word by word. Unlike existing pre-training methods, ERNIE-GEN incorporates multi-granularity target sampling to construct pre-training data, which enhances the correlation between encoder and decoder. Experimental results demonstrate that ERNIE-GEN achieves state-of-the-art results with a much smaller amount of pre-training data and parameters on a range of language generation tasks, including abstractive summarization (Gigaword and CNN/DailyMail), question generation (SQuAD), dialogue generation (Persona-Chat) and generative question answering (CoQA). The source codes and pre-trained models have been released at https://github.com/PaddlePaddle/ERNIE/ernie-gen.

Download Full-text