scholarly journals Academic Reader: An Interactive Question Answering System on Academic Literatures

Author(s):  
Yining Hong ◽  
Jialu Wang ◽  
Yuting Jia ◽  
Weinan Zhang ◽  
Xinbing Wang

We present Academic Reader, a system which can read academic literatures and answer the relevant questions for researchers. Academic Reader leverages machine reading comprehension technique, which has been successfully applied in many fields but has not been involved in academic literature reading. An interactive platform is established to demonstrate the functions of Academic Reader. Pieces of academic literature and relevant questions are input to our system, which then outputs answers. The system can also gather users’ revised answers and perform active learning to continuously improve its performance. A case study is provided presenting the performance of our system on all papers accepted in KDD 2018, which demonstrates how our system facilitates massive academic literature reading.

2021 ◽  
Vol 2050 (1) ◽  
pp. 012002
Author(s):  
Qian Shang ◽  
Ming Xu ◽  
Bin Qin ◽  
Pengbin Lei ◽  
Junjian Huang

Abstract Question answering(Q&A) system is important for accelerating the landing of artificial intelligence. This paper makes an improvement on the Q&A system which uses the method of retrieval-machine reading comprehension (MRC). In the retrieval phase, we use BM25 to recall some documents and split these documents into paragraphs, then we reorder the paragraphs according to the correlation with the question, so as to reduce the number of recalled paragraphs and improve the speed of MRC. In the MRC stage, we design a multi-task MRC structure, which can judge whether the paragraph contains answer and locate answer accurately. Besides, we modify the loss function to fit the sparse labels during the training. The experiments are carried out on multiple data sets to verify the effectiveness of the improved system.


2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Changchang Zeng ◽  
Shaobo Li

Machine reading comprehension (MRC) is a challenging natural language processing (NLP) task. It has a wide application potential in the fields of question answering robots, human-computer interactions in mobile virtual reality systems, etc. Recently, the emergence of pretrained models (PTMs) has brought this research field into a new era, in which the training objective plays a key role. The masked language model (MLM) is a self-supervised training objective widely used in various PTMs. With the development of training objectives, many variants of MLM have been proposed, such as whole word masking, entity masking, phrase masking, and span masking. In different MLMs, the length of the masked tokens is different. Similarly, in different machine reading comprehension tasks, the length of the answer is also different, and the answer is often a word, phrase, or sentence. Thus, in MRC tasks with different answer lengths, whether the length of MLM is related to performance is a question worth studying. If this hypothesis is true, it can guide us on how to pretrain the MLM with a relatively suitable mask length distribution for MRC tasks. In this paper, we try to uncover how much of MLM’s success in the machine reading comprehension tasks comes from the correlation between masking length distribution and answer length in the MRC dataset. In order to address this issue, herein, (1) we propose four MRC tasks with different answer length distributions, namely, the short span extraction task, long span extraction task, short multiple-choice cloze task, and long multiple-choice cloze task; (2) four Chinese MRC datasets are created for these tasks; (3) we also have pretrained four masked language models according to the answer length distributions of these datasets; and (4) ablation experiments are conducted on the datasets to verify our hypothesis. The experimental results demonstrate that our hypothesis is true. On four different machine reading comprehension datasets, the performance of the model with correlation length distribution surpasses the model without correlation.


2020 ◽  
Author(s):  
Marie-Anne Xu ◽  
Rahul Khanna

Recent progress in machine reading comprehension and question-answering has allowed machines to reach and even surpass human question-answering. However, the majority of these questions have only one answer, and more substantial testing on questions with multiple answers, or multi-span questions, has not yet been applied. Thus, we introduce a newly compiled dataset consisting of questions with multiple answers that originate from previously existing datasets. In addition, we run BERT-based models pre-trained for question-answering on our constructed dataset to evaluate their reading comprehension abilities. Among the three of BERT-based models we ran, RoBERTa exhibits the highest consistent performance, regardless of size. We find that all our models perform similarly on this new, multi-span dataset (21.492% F1) compared to the single-span source datasets (~33.36% F1). While the models tested on the source datasets were slightly fine-tuned, performance is similar enough to judge that task formulation does not drastically affect question-answering abilities. Our evaluations indicate that these models are indeed capable of adjusting to answer questions that require multiple answers. We hope that our findings will assist future development in questionanswering and improve existing question-answering products and methods.


Author(s):  
Tianyong Hao ◽  
Feifei Xu ◽  
Jingsheng Lei ◽  
Liu Wenyin ◽  
Qing Li

A strategy of automatic answer retrieval for repeated or similar questions in user-interactive systems by employing semantic question patterns is proposed in this paper. The used semantic question pattern is a generalized representation of a group of questions with both similar structure and relevant semantics. Specifically, it consists of semantic annotations (or constraints) for the variable components in the pattern and hence enhances the semantic representation and greatly reduces the ambiguity of a question instance when asked by a user using such pattern. The proposed method consists of four major steps: structure processing, similar pattern matching and filtering, automatic pattern generation, question similarity evaluation and answer retrieval. Preliminary experiments in a real question answering system show a precision of more than 90% of the method.


2020 ◽  
Vol 34 (10) ◽  
pp. 13987-13988
Author(s):  
Xuanyu Zhang ◽  
Zhichun Wang

Most of models for machine reading comprehension (MRC) usually focus on recurrent neural networks (RNNs) and attention mechanism, though convolutional neural networks (CNNs) are also involved for time efficiency. However, little attention has been paid to leverage CNNs and RNNs in MRC. For a deeper understanding, humans sometimes need local information for short phrases, sometimes need global context for long passages. In this paper, we propose a novel architecture, i.e., Rception, to capture and leverage both local deep information and global wide context. It fuses different kinds of networks and hyper-parameters horizontally rather than simply stacking them layer by layer vertically. Experiments on the Stanford Question Answering Dataset (SQuAD) show that our proposed architecture achieves good performance.


2021 ◽  
Author(s):  
Samreen Ahmed ◽  
shakeel khoja

<p>In recent years, low-resource Machine Reading Comprehension (MRC) has made significant progress, with models getting remarkable performance on various language datasets. However, none of these models have been customized for the Urdu language. This work explores the semi-automated creation of the Urdu Question Answering Dataset (UQuAD1.0) by combining machine-translated SQuAD with human-generated samples derived from Wikipedia articles and Urdu RC worksheets from Cambridge O-level books. UQuAD1.0 is a large-scale Urdu dataset intended for extractive machine reading comprehension tasks consisting of 49k question Answers pairs in question, passage, and answer format. In UQuAD1.0, 45000 pairs of QA were generated by machine translation of the original SQuAD1.0 and approximately 4000 pairs via crowdsourcing. In this study, we used two types of MRC models: rule-based baseline and advanced Transformer-based models. However, we have discovered that the latter outperforms the others; thus, we have decided to concentrate solely on Transformer-based architectures. Using XLMRoBERTa and multi-lingual BERT, we acquire an F<sub>1</sub> score of 0.66 and 0.63, respectively.</p>


2020 ◽  
Vol 34 (05) ◽  
pp. 8010-8017 ◽  
Author(s):  
Di Jin ◽  
Shuyang Gao ◽  
Jiun-Yu Kao ◽  
Tagyoung Chung ◽  
Dilek Hakkani-tur

Machine Reading Comprehension (MRC) for question answering (QA), which aims to answer a question given the relevant context passages, is an important way to test the ability of intelligence systems to understand human language. Multiple-Choice QA (MCQA) is one of the most difficult tasks in MRC because it often requires more advanced reading comprehension skills such as logical reasoning, summarization, and arithmetic operations, compared to the extractive counterpart where answers are usually spans of text within given passages. Moreover, most existing MCQA datasets are small in size, making the task even harder. We introduce MMM, a Multi-stage Multi-task learning framework for Multi-choice reading comprehension. Our method involves two sequential stages: coarse-tuning stage using out-of-domain datasets and multi-task learning stage using a larger in-domain dataset to help model generalize better with limited data. Furthermore, we propose a novel multi-step attention network (MAN) as the top-level classifier for this task. We demonstrate MMM significantly advances the state-of-the-art on four representative MCQA datasets.


2009 ◽  
Vol 15 (1) ◽  
pp. 73-95 ◽  
Author(s):  
S. QUARTERONI ◽  
S. MANANDHAR

AbstractInteractive question answering (QA), where a dialogue interface enables follow-up and clarification questions, is a recent although long-advocated field of research. We report on the design and implementation of YourQA, our open-domain, interactive QA system. YourQA relies on a Web search engine to obtain answers to both fact-based and complex questions, such as descriptions and definitions. We describe the dialogue moves and management model making YourQA interactive, and discuss the architecture, implementation and evaluation of its chat-based dialogue interface. Our Wizard-of-Oz study and final evaluation results show how the designed architecture can effectively achieve open-domain, interactive QA.


Author(s):  
Tianyong Hao ◽  
Feifei Xu ◽  
Jingsheng Lei ◽  
Liu Wenyin ◽  
Qing Li

A strategy of automatic answer retrieval for repeated or similar questions in user-interactive systems by employing semantic question patterns is proposed in this paper. The used semantic question pattern is a generalized representation of a group of questions with both similar structure and relevant semantics. Specifically, it consists of semantic annotations (or constraints) for the variable components in the pattern and hence enhances the semantic representation and greatly reduces the ambiguity of a question instance when asked by a user using such pattern. The proposed method consists of four major steps: structure processing, similar pattern matching and filtering, automatic pattern generation, question similarity evaluation and answer retrieval. Preliminary experiments in a real question answering system show a precision of more than 90% of the method.


Sign in / Sign up

Export Citation Format

Share Document