Developing MCQA Framework for Basic Science Subjects using Distributed Similarity Model and Classification Based Approaches

Author(s):  
Sandip Sarkar ◽  
Dipankar Das ◽  
Partha Pakray ◽  
David Eduardo Pinto Avendano

In this paper, we proposed a novel approach to improve the performance of multiple choice question answering (MCQA) system using distributed semantic similarity and classification approach. We mainly focus on science-based MCQ which is really difficult to handle. Our proposed method is based on the hypothesis that the relation between question and answer of that question will be high in distributional semantic model rather than other options of that question. We are using IJCNLP shared Task 5 and SciQ dataset for our experiments. We have built three Models (i.e., Model 1, Model 2, Model 3) based on the dataset format. The basic difference between IJCNLP Task 5 and SciQ datasets is that SciQ dataset contains supporting text with questions whereas IJCNLP Task 5 dataset does not contain supporting text. Model 1 and Model 2 are mainly built to deal with IJCNLP Task 5 dataset whereas Model 3 is mainly built for SciQ dataset. Model 2 is mainly built to deal with the dependencies between options (i.e., all of these, two of them, none of them) whereas Model 1 is the basic model for MCQA and it cannot capture the dependencies between options. We also compare the result of SciQ dataset with supporting text (i.e., using Model 3) and without supporting text (i.e., using Model 1). We also compared our system with other existing methods. Though in some cases the performance of our proposed method is not satisfactory, we have noted that our submission is simple and robust that allows it to be more easily integrated into complex applications. This work investigates different techniques for choosing the correct answer of a given question in MCQA system. These experiments may therefore be useful to improve the performance of current science-based question answering (QA) systems. For IJCNLP Task 5 dataset, we achieved 44.5% using Model 2 and PubMed Dataset. Similarly for SciQ dataset we achieved 82.25% using Model 3 and PubMed dataset.

Author(s):  
Cao Liu ◽  
Shizhu He ◽  
Kang Liu ◽  
Jun Zhao

By reason of being able to obtain natural language responses, natural answers are more favored in real-world Question Answering (QA) systems. Generative models learn to automatically generate natural answers from large-scale question answer pairs (QA-pairs). However, they are suffering from the uncontrollable and uneven quality of QA-pairs crawled from the Internet. To address this problem, we propose a curriculum learning based framework for natural answer generation (CL-NAG), which is able to take full advantage of the valuable learning data from a noisy and uneven-quality corpus. Specifically, we employ two practical measures to automatically measure the quality (complexity) of QA-pairs. Based on the measurements, CL-NAG firstly utilizes simple and low-quality QA-pairs to learn a basic model, and then gradually learns to produce better answers with richer contents and more complete syntaxes based on more complex and higher-quality QA-pairs. In this way, all valuable information in the noisy and uneven-quality corpus could be fully exploited. Experiments demonstrate that CL-NAG outperforms the state-of-the-arts, which increases 6.8% and 8.7% in the accuracy for simple and complex questions, respectively.


2019 ◽  
Vol 5 (1) ◽  
Author(s):  
Jens Nevens ◽  
Paul Van Eecke ◽  
Katrien Beuls

AbstractIn order to be able to answer a natural language question, a computational system needs three main capabilities. First, the system needs to be able to analyze the question into a structured query, revealing its component parts and how these are combined. Second, it needs to have access to relevant knowledge sources, such as databases, texts or images. Third, it needs to be able to execute the query on these knowledge sources. This paper focuses on the first capability, presenting a novel approach to semantically parsing questions expressed in natural language. The method makes use of a computational construction grammar model for mapping questions onto their executable semantic representations. We demonstrate and evaluate the methodology on the CLEVR visual question answering benchmark task. Our system achieves a 100% accuracy, effectively solving the language understanding part of the benchmark task. Additionally, we demonstrate how this solution can be embedded in a full visual question answering system, in which a question is answered by executing its semantic representation on an image. The main advantages of the approach include (i) its transparent and interpretable properties, (ii) its extensibility, and (iii) the fact that the method does not rely on any annotated training data.


2020 ◽  
Vol 34 (04) ◽  
pp. 5182-5190
Author(s):  
Pasquale Minervini ◽  
Matko Bošnjak ◽  
Tim Rocktäschel ◽  
Sebastian Riedel ◽  
Edward Grefenstette

Reasoning with knowledge expressed in natural language and Knowledge Bases (KBs) is a major challenge for Artificial Intelligence, with applications in machine reading, dialogue, and question answering. General neural architectures that jointly learn representations and transformations of text are very data-inefficient, and it is hard to analyse their reasoning process. These issues are addressed by end-to-end differentiable reasoning systems such as Neural Theorem Provers (NTPs), although they can only be used with small-scale symbolic KBs. In this paper we first propose Greedy NTPs (GNTPs), an extension to NTPs addressing their complexity and scalability limitations, thus making them applicable to real-world datasets. This result is achieved by dynamically constructing the computation graph of NTPs and including only the most promising proof paths during inference, thus obtaining orders of magnitude more efficient models 1. Then, we propose a novel approach for jointly reasoning over KBs and textual mentions, by embedding logic facts and natural language sentences in a shared embedding space. We show that GNTPs perform on par with NTPs at a fraction of their cost while achieving competitive link prediction results on large datasets, providing explanations for predictions, and inducing interpretable models.


2018 ◽  
Vol 25 (1) ◽  
pp. 5-41
Author(s):  
PRESLAV NAKOV ◽  
LLUÍS MÀRQUEZ ◽  
ALESSANDRO MOSCHITTI ◽  
HAMDY MUBARAK

AbstractWe analyze resources and models for Arabic community Question Answering (cQA). In particular, we focus on CQA-MD, our cQA corpus for Arabic in the domain of medical forums. We describe the corpus and the main challenges it poses due to its mix of informal and formal language, and of different Arabic dialects, as well as due to its medical nature. We further present a shared task on cQA at SemEval, the International Workshop on Semantic Evaluation, based on this corpus. We discuss the features and the machine learning approaches used by the teams who participated in the task, with focus on the models that exploit syntactic information using convolutional tree kernels and neural word embeddings. We further analyze and extend the outcome of the SemEval challenge by training a meta-classifier combining the output of several systems. This allows us to compare different features and different learning algorithms in an indirect way. Finally, we analyze the most frequent errors common to all approaches, categorizing them into prototypical cases, and zooming into the way syntactic information in tree kernel approaches can help solve some of the most difficult cases. We believe that our analysis and the lessons learned from the process of corpus creation as well as from the shared task analysis will be helpful for future research on Arabic cQA.


2015 ◽  
Author(s):  
Yongshuai Hou ◽  
Cong Tan ◽  
Xiaolong Wang ◽  
Yaoyun Zhang ◽  
Jun Xu ◽  
...  

Author(s):  
Thanh Thi Ha ◽  
Atsuhiro Takasu ◽  
Thanh Chinh Nguyen ◽  
Kiem Hieu Nguyen ◽  
Van Nha Nguyen ◽  
...  

<span class="fontstyle0">Answer selection is an important task in Community Question Answering (CQA). In recent years, attention-based neural networks have been extensively studied in various natural language processing problems, including question answering. This paper explores </span><span class="fontstyle2">matchLSTM </span><span class="fontstyle0">for answer selection in CQA. A lexical gap in CQA is more challenging as questions and answers typical contain multiple sentences, irrelevant information, and noisy expressions. In our investigation, word-by-word attention in the original model does not work well on social question-answer pairs. We propose integrating supervised attention into </span><span class="fontstyle2">matchLSTM</span><span class="fontstyle0">. Specifically, we leverage lexical-semantic from external to guide the learning of attention weights for question-answer pairs. The proposed model learns more meaningful attention that allows performing better than the basic model. Our performance is among the top on SemEval datasets.</span> <br /><br />


Author(s):  
Manvi Breja

<span>User profiling, one of the main issue faced while implementing the efficient question answering system, in which the user profile is made, containing the data posed by the user, capturing their domain of interest. The paper presents the method of predicting the next related questions to the first initial question provided by the user to the question answering search engine. A novel approach of the association rule mining is highlighted in which the information is extracted from the log of the previously submitted questions to the question answering search engine, using algorithms for mining association rules and predicts the set of next questions that the user will provide to the system in the next session. Using this approach, the question answering system keeps the relevant answers of the next questions in the repository for providing a speedy response to the user and thus increasing the efficiency of the system.</span>


2017 ◽  
Vol 11 (03) ◽  
pp. 345-371
Author(s):  
Avani Chandurkar ◽  
Ajay Bansal

With the inception of the World Wide Web, the amount of data present on the Internet is tremendous. This makes the task of navigating through this enormous amount of data quite difficult for the user. As users struggle to navigate through this wealth of information, the need for the development of an automated system that can extract the required information becomes urgent. This paper presents a Question Answering system to ease the process of information retrieval. Question Answering systems have been around for quite some time and are a sub-field of information retrieval and natural language processing. The task of any Question Answering system is to seek an answer to a free form factual question. The difficulty of pinpointing and verifying the precise answer makes question answering more challenging than simple information retrieval done by search engines. The research objective of this paper is to develop a novel approach to Question Answering based on a composition of conventional approaches of Information Retrieval (IR) and Natural Language processing (NLP). The focus is on using a structured and annotated knowledge base instead of an unstructured one. The knowledge base used here is DBpedia and the final system is evaluated on the Text REtrieval Conference (TREC) 2004 questions dataset.


Sign in / Sign up

Export Citation Format

Share Document