Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks

Anna Rogers; Olga Kovaleva; Matthew Downey; Anna Rumshisky

doi:10.1609/aaai.v34i05.6398

Translucent Answer Predictions in Multi-Hop Reading Comprehension

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6272 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7700-7707

Author(s):

G P Shrivatsa Bhargav ◽

Michael Glass ◽

Dinesh Garg ◽

Shirish Shevade ◽

Saswati Dana ◽

...

Keyword(s):

Reading Comprehension ◽

Question Answering ◽

State Of The Art ◽

Local Context ◽

The Novel ◽

Loose Coupling ◽

Loosely Coupled ◽

Neural Architecture ◽

Coupled Networks ◽

Novel Design

Research on the task of Reading Comprehension style Question Answering (RCQA) has gained momentum in recent years due to the emergence of human annotated datasets and associated leaderboards, for example CoQA, HotpotQA, SQuAD, TriviaQA, etc. While state-of-the-art has advanced considerably, there is still ample opportunity to advance it further on some important variants of the RCQA task. In this paper, we propose a novel deep neural architecture, called TAP (Translucent Answer Prediction), to identify answers and evidence (in the form of supporting facts) in an RCQA task requiring multi-hop reasoning. TAP comprises two loosely coupled networks – Local and Global Interaction eXtractor (LoGIX) and Answer Predictor (AP). LoGIX predicts supporting facts, whereas AP consumes these predicted supporting facts to predict the answer span. The novel design of LoGIX is inspired by two key design desiderata – local context and global interaction– that we identified by analyzing examples of multi-hop RCQA task. The loose coupling between LoGIX and the AP reveals the set of sentences used by the AP in predicting an answer. Therefore, answer predictions of TAP can be interpreted in a translucent manner. TAP offers state-of-the-art performance on the HotpotQA (Yang et al. 2018) dataset – an apt dataset for multi-hop RCQA task – as it occupies Rank-1 on its leaderboard (https://hotpotqa.github.io/) at the time of submission.

Download Full-text

Context-aware Frame-Semantic Role Labeling

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00150 ◽

2015 ◽

Vol 3 ◽

pp. 449-460 ◽

Cited By ~ 5

Author(s):

Michael Roth ◽

Mirella Lapata

Keyword(s):

Social Network ◽

Question Answering ◽

State Of The Art ◽

Semantic Role ◽

Context Aware ◽

Semantic Role Labeling ◽

Current State ◽

Sentence Level ◽

Small Set ◽

Labeling System

Frame semantic representations have been useful in several applications ranging from text-to-scene generation, to question answering and social network analysis. Predicting such representations from raw text is, however, a challenging task and corresponding models are typically only trained on a small set of sentence-level annotations. In this paper, we present a semantic role labeling system that takes into account sentence and discourse context. We introduce several new features which we motivate based on linguistic insights and experimentally demonstrate that they lead to significant improvements over the current state-of-the-art in FrameNet-based semantic role labeling.

Download Full-text

QASC: A Dataset for Question Answering via Sentence Composition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6319 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8082-8090

Author(s):

Tushar Khot ◽

Peter Clark ◽

Michal Guerquin ◽

Peter Jansen ◽

Ashish Sabharwal

Keyword(s):

Common Sense ◽

Human Performance ◽

Question Answering ◽

State Of The Art ◽

Multiple Choice ◽

Training Data ◽

Language Models ◽

Current State ◽

New Concepts ◽

Large Corpus

Composing knowledge from multiple pieces of texts is a key challenge in multi-hop question answering. We present a multi-hop reasoning dataset, Question Answering via Sentence Composition (QASC), that requires retrieving facts from a large corpus and composing them to answer a multiple-choice question. QASC is the first dataset to offer two desirable properties: (a) the facts to be composed are annotated in a large corpus, and (b) the decomposition into these facts is not evident from the question itself. The latter makes retrieval challenging as the system must introduce new concepts or relations in order to discover potential decompositions. Further, the reasoning model must then learn to identify valid compositions of these retrieved facts using common-sense reasoning. To help address these challenges, we provide annotation for supporting facts as well as their composition. Guided by these annotations, we present a two-step approach to mitigate the retrieval challenges. We use other multiple-choice datasets as additional training data to strengthen the reasoning model. Our proposed approach improves over current state-of-the-art language models by 11% (absolute). The reasoning and retrieval problems, however, remain unsolved as this model still lags by 20% behind human performance.

Download Full-text

A Deep Reinforcement Learning Based Multi-Step Coarse to Fine Question Answering (MSCQA) System

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017224 ◽

2019 ◽

Vol 33 ◽

pp. 7224-7232 ◽

Cited By ~ 1

Author(s):

Yu Wang ◽

Hongxia Jin

Keyword(s):

Reinforcement Learning ◽

Question Answering ◽

State Of The Art ◽

Learning Model ◽

Current State ◽

Coarse To Fine ◽

Reinforcement Learning Model

In this paper, we present a multi-step coarse to fine question answering (MSCQA) system which can efficiently processes documents with different lengths by choosing appropriate actions. The system is designed using an actor-critic based deep reinforcement learning model to achieve multistep question answering. Compared to previous QA models targeting on datasets mainly containing either short or long documents, our multi-step coarse to fine model takes the merits from multiple system modules, which can handle both short and long documents. The system hence obtains a much better accuracy and faster trainings speed compared to the current state-of-the-art models. We test our model on four QA datasets, WIKEREADING, WIKIREADING LONG, CNN and SQuAD, and demonstrate 1.3%-1.7% accuracy improvements with 1.5x-3.4x training speed-ups in comparison to the baselines using state-of-the-art models.

Download Full-text

Capturing Greater Context for Question Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6440 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9065-9072

Author(s):

Luu Anh Tuan ◽

Darsh Shah ◽

Regina Barzilay

Keyword(s):

Reading Comprehension ◽

Question Answering ◽

State Of The Art ◽

Attention Mechanism ◽

Dialogue Systems ◽

Question Generation ◽

Multi Stage ◽

Art Methods ◽

Relevant Context ◽

Automatic Question Generation

Automatic question generation can benefit many applications ranging from dialogue systems to reading comprehension. While questions are often asked with respect to long documents, there are many challenges with modeling such long documents. Many existing techniques generate questions by effectively looking at one sentence at a time, leading to questions that are easy and not reflective of the human process of question generation. Our goal is to incorporate interactions across multiple sentences to generate realistic questions for long documents. In order to link a broad document context to the target answer, we represent the relevant context via a multi-stage attention mechanism, which forms the foundation of a sequence to sequence model. We outperform state-of-the-art methods on question generation on three question-answering datasets - SQuAD, MS MARCO and NewsQA. 1

Download Full-text

A review of public datasets in question answering research

ACM SIGIR Forum ◽

10.1145/3483382.3483389 ◽

2020 ◽

Vol 54 (2) ◽

pp. 1-23

Author(s):

B. Barla Cambazoglu ◽

Mark Sanderson ◽

Falk Scholer ◽

Bruce Croft

Keyword(s):

Question Answering ◽

State Of The Art ◽

Related Research ◽

Current State ◽

Research Challenges ◽

Question Answering Systems ◽

Faceted Classification ◽

Public Datasets ◽

Recent Evaluation

Recent years have seen an increase in the number of publicly available datasets that are released to foster research in question answering systems. In this work, we survey the available datasets and also provide a simple, multi-faceted classification of those datasets. We further survey the most recent evaluation results that form the current state of the art in question answering research by exploring related research challenges and associated online leaderboards. Finally, we provide a discussion around the existing online challenges and provide a wishlist of datasets whose release could benefit question answering research in the future.

Download Full-text

A Yes/No Answer Generator Based on Sentiment-Word Scores in Biomedical Question Answering

Data Analytics in Medicine ◽

10.4018/978-1-7998-1204-3.ch005 ◽

2020 ◽

pp. 103-116

Author(s):

Mourad Sarrouti ◽

Said Ouatik El Alaoui

Keyword(s):

Question Answering ◽

State Of The Art ◽

Biomedical Domain ◽

Open Domain ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Current State ◽

Sentiment Score ◽

Speech Tagging ◽

Sentiment Word

Background and Objective: Yes/no question answering (QA) in open-domain is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in the biomedical domain. Yes/no QA aims at answering yes/no questions, which are seeking for a clear “yes” or “no” answer. In this paper, we present a novel yes/no answer generator based on sentiment-word scores in biomedical QA. Methods: In the proposed method, we first use the Stanford CoreNLP for tokenization and part-of-speech tagging all relevant passages to a given yes/no question. We then assign a sentiment score based on SentiWordNet to each word of the passages. Finally, the decision on either the answers “yes” or “no” is based on the obtained sentiment-passages score: “yes” for a positive final sentiment-passages score and “no” for a negative one. Results: Experimental evaluations performed on BioASQ collections show that the proposed method is more effective as compared with the current state-of-the-art method, and significantly outperforms it by an average of 15.68% in terms of accuracy.

Download Full-text

Reinforced Mnemonic Reader for Machine Reading Comprehension

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/570 ◽

2018 ◽

Cited By ~ 25

Author(s):

Minghao Hu ◽

Yuxing Peng ◽

Zhen Huang ◽

Xipeng Qiu ◽

Furu Wei ◽

...

Keyword(s):

Reading Comprehension ◽

Reinforcement Learning ◽

Question Answering ◽

State Of The Art ◽

Learning Algorithms ◽

Optimization Approach ◽

Exact Match ◽

Machine Reading

In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.

Download Full-text

A Yes/No Answer Generator Based on Sentiment-Word Scores in Biomedical Question Answering

International Journal of Healthcare Information Systems and Informatics ◽

10.4018/ijhisi.2017070104 ◽

2017 ◽

Vol 12 (3) ◽

pp. 62-74 ◽

Cited By ~ 5

Author(s):

Mourad Sarrouti ◽

Said Ouatik El Alaoui

Keyword(s):

Question Answering ◽

State Of The Art ◽

Biomedical Domain ◽

Open Domain ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Current State ◽

Sentiment Score ◽

Speech Tagging ◽

Sentiment Word

Background and Objective: Yes/no question answering (QA) in open-domain is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in the biomedical domain. Yes/no QA aims at answering yes/no questions, which are seeking for a clear “yes” or “no” answer. In this paper, we present a novel yes/no answer generator based on sentiment-word scores in biomedical QA. Methods: In the proposed method, we first use the Stanford CoreNLP for tokenization and part-of-speech tagging all relevant passages to a given yes/no question. We then assign a sentiment score based on SentiWordNet to each word of the passages. Finally, the decision on either the answers “yes” or “no” is based on the obtained sentiment-passages score: “yes” for a positive final sentiment-passages score and “no” for a negative one. Results: Experimental evaluations performed on BioASQ collections show that the proposed method is more effective as compared with the current state-of-the-art method, and significantly outperforms it by an average of 15.68% in terms of accuracy.

Download Full-text

KVQA: Knowledge-Aware Visual Question Answering

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018876 ◽

2019 ◽

Vol 33 ◽

pp. 8876-8884 ◽

Cited By ~ 6

Author(s):

Sanket Shah ◽

Anand Mishra ◽

Naganand Yadati ◽

Partha Pratim Talukdar

Keyword(s):

Language Processing ◽

Question Answering ◽

State Of The Art ◽

World Knowledge ◽

Named Entities ◽

Commonsense Knowledge ◽

White House ◽

Visual Question Answering ◽

Knowledge Graphs ◽

Answering Questions

Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natural Language Processing and Artificial Intelligence (AI). In conventional VQA, one may ask questions about an image which can be answered purely based on its content. For example, given an image with people in it, a typical VQA question may inquire about the number of people in the image. More recently, there is growing interest in answering questions which require commonsense knowledge involving common nouns (e.g., cats, dogs, microphones) present in the image. In spite of this progress, the important problem of answering questions requiring world knowledge about named entities (e.g., Barack Obama, White House, United Nations) in the image has not been addressed in prior research. We address this gap in this paper, and introduce KVQA – the first dataset for the task of (world) knowledge-aware VQA. KVQA consists of 183K question-answer pairs involving more than 18K named entities and 24K images. Questions in this dataset require multi-entity, multi-relation, and multi-hop reasoning over large Knowledge Graphs (KG) to arrive at an answer. To the best of our knowledge, KVQA is the largest dataset for exploring VQA over KG. Further, we also provide baseline performances using state-of-the-art methods on KVQA.

Download Full-text