A review of public datasets in question answering research

2020 ◽  
Vol 54 (2) ◽  
pp. 1-23
Author(s):  
B. Barla Cambazoglu ◽  
Mark Sanderson ◽  
Falk Scholer ◽  
Bruce Croft

Recent years have seen an increase in the number of publicly available datasets that are released to foster research in question answering systems. In this work, we survey the available datasets and also provide a simple, multi-faceted classification of those datasets. We further survey the most recent evaluation results that form the current state of the art in question answering research by exploring related research challenges and associated online leaderboards. Finally, we provide a discussion around the existing online challenges and provide a wishlist of datasets whose release could benefit question answering research in the future.

2019 ◽  
Vol 5 (2) ◽  
pp. 85-94 ◽  
Author(s):  
Mohammed S. Alqahtani ◽  
Abdulsalam Al-Tamimi ◽  
Henrique Almeida ◽  
Glen Cooper ◽  
Paulo Bartolo

Abstract Orthoses (exoskeletons and fracture fixation devices) enhance users’ ability to function and improve their quality of life by supporting alignment correction, restoring mobility, providing protection, immobilisation and stabilisation. Ideally, these devices should be personalised to each patient to improve comfort and performance. Production costs have been one of the main constraints for the production of personalised orthoses. However, customisation and personalisation of orthoses are now possible through the use of additive manufacturing. This paper presents the current state of the art of additive manufacturing for the fabrication of orthoses, providing several examples, and discusses key research challenges to be addressed to further develop this field.


2015 ◽  
Vol 3 ◽  
pp. 449-460 ◽  
Author(s):  
Michael Roth ◽  
Mirella Lapata

Frame semantic representations have been useful in several applications ranging from text-to-scene generation, to question answering and social network analysis. Predicting such representations from raw text is, however, a challenging task and corresponding models are typically only trained on a small set of sentence-level annotations. In this paper, we present a semantic role labeling system that takes into account sentence and discourse context. We introduce several new features which we motivate based on linguistic insights and experimentally demonstrate that they lead to significant improvements over the current state-of-the-art in FrameNet-based semantic role labeling.


2020 ◽  
Vol 2020 ◽  
pp. 1-9 ◽  
Author(s):  
Lingyun Jiang ◽  
Kai Qiao ◽  
Ruoxi Qin ◽  
Linyuan Wang ◽  
Wanting Yu ◽  
...  

In image classification of deep learning, adversarial examples where input is intended to add small magnitude perturbations may mislead deep neural networks (DNNs) to incorrect results, which means DNNs are vulnerable to them. Different attack and defense strategies have been proposed to better research the mechanism of deep learning. However, those researches in these networks are only for one aspect, either an attack or a defense. There is in the improvement of offensive and defensive performance, and it is difficult to promote each other in the same framework. In this paper, we propose Cycle-Consistent Adversarial GAN (CycleAdvGAN) to generate adversarial examples, which can learn and approximate the distribution of the original instances and adversarial examples, especially promoting attackers and defenders to confront each other and improve their ability. For CycleAdvGAN, once the GeneratorA and D are trained, GA can generate adversarial perturbations efficiently for any instance, improving the performance of the existing attack methods, and GD can generate recovery adversarial examples to clean instances, defending against existing attack methods. We apply CycleAdvGAN under semiwhite-box and black-box settings on two public datasets MNIST and CIFAR10. Using the extensive experiments, we show that our method has achieved the state-of-the-art adversarial attack method and also has efficiently improved the defense ability, which made the integration of adversarial attack and defense come true. In addition, it has improved the attack effect only trained on the adversarial dataset generated by any kind of adversarial attack.


2019 ◽  
Vol 2 (3) ◽  
pp. 175-186 ◽  
Author(s):  
Robin H Lemaire ◽  
Remco S Mannak ◽  
Sonia M Ospina ◽  
Martijn Groenleer

Abstract With the growing amount and increasing heterogeneity of research on purpose-oriented networks (PONs) in the public sector, it is imperative to find a way to synthesize this research. Drawing on the varied research perspectives on PONs, we advance the idea of paradigm interplay and meta-synthesis as aspirations for the field and argue this is especially key if we want the study of PONs to inform practice. However, we recognize several challenges in the current state of the PON research that prevent the field from making strides in paradigm interplay and meta-synthesis. We discuss six challenges which we consider the most critical: different labels, differences across research foci, variation in measurement, the nestedness of networks, the dynamism of networks, and variation in the network context. We suggest six good research practices that could contribute to overcoming the challenges now so as to make integration of the research field more of a possibility in the future.


2020 ◽  
Vol 34 (05) ◽  
pp. 8082-8090
Author(s):  
Tushar Khot ◽  
Peter Clark ◽  
Michal Guerquin ◽  
Peter Jansen ◽  
Ashish Sabharwal

Composing knowledge from multiple pieces of texts is a key challenge in multi-hop question answering. We present a multi-hop reasoning dataset, Question Answering via Sentence Composition (QASC), that requires retrieving facts from a large corpus and composing them to answer a multiple-choice question. QASC is the first dataset to offer two desirable properties: (a) the facts to be composed are annotated in a large corpus, and (b) the decomposition into these facts is not evident from the question itself. The latter makes retrieval challenging as the system must introduce new concepts or relations in order to discover potential decompositions. Further, the reasoning model must then learn to identify valid compositions of these retrieved facts using common-sense reasoning. To help address these challenges, we provide annotation for supporting facts as well as their composition. Guided by these annotations, we present a two-step approach to mitigate the retrieval challenges. We use other multiple-choice datasets as additional training data to strengthen the reasoning model. Our proposed approach improves over current state-of-the-art language models by 11% (absolute). The reasoning and retrieval problems, however, remain unsolved as this model still lags by 20% behind human performance.


Semantic Web ◽  
2021 ◽  
pp. 1-17
Author(s):  
Lucia Siciliani ◽  
Pierpaolo Basile ◽  
Pasquale Lops ◽  
Giovanni Semeraro

Question Answering (QA) over Knowledge Graphs (KG) aims to develop a system that is capable of answering users’ questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata, and so on. Question Answering systems need to translate the user’s question, written using natural language, into a query formulated through a specific data query language that is compliant with the underlying KG. This translation process is already non-trivial when trying to answer simple questions that involve a single triple pattern. It becomes even more troublesome when trying to cope with questions that require modifiers in the final query, i.e., aggregate functions, query forms, and so on. The attention over this last aspect is growing but has never been thoroughly addressed by the existing literature. Starting from the latest advances in this field, we want to further step in this direction. This work aims to provide a publicly available dataset designed for evaluating the performance of a QA system in translating articulated questions into a specific data query language. This dataset has also been used to evaluate three QA systems available at the state of the art.


Author(s):  
Inma Mendoza García

In the context of Translation Studies, this paper presents a proposal for classifying culturally marked translation units from a functional dynamic perspective that is considered to be more useful for both translation practice and translation-related research than other taxonomies so far suggested by the majority of theorists. For this purpose, first I provide an overview of the current state of the art in research on these specific translation units with regard to their designation, concept and classification. Second, I conduct a critical analysis of the heterogeneity of designations and definitions as well as of the static taxonomies so far prevailing in scientific literature in this respect. Third, I select a designation for these sorts of units and justify the decision made. Fourth, I provide a detailed description of the concept and its nature. Finally, I design a classificatory model that is not based on a mere classification of culture-related areas and topics but takes into account all the intratextual and extratextual factors involved in the translation process. The proposal put forward is guided by two main parameters: the degree of lingüistic and cultural (in)equivalence between the source system and the target system and the level of knowledge the reader is supposed to possess about the culturally marked textual units.


Author(s):  
Yu Wang ◽  
Hongxia Jin

In this paper, we present a multi-step coarse to fine question answering (MSCQA) system which can efficiently processes documents with different lengths by choosing appropriate actions. The system is designed using an actor-critic based deep reinforcement learning model to achieve multistep question answering. Compared to previous QA models targeting on datasets mainly containing either short or long documents, our multi-step coarse to fine model takes the merits from multiple system modules, which can handle both short and long documents. The system hence obtains a much better accuracy and faster trainings speed compared to the current state-of-the-art models. We test our model on four QA datasets, WIKEREADING, WIKIREADING LONG, CNN and SQuAD, and demonstrate 1.3%-1.7% accuracy improvements with 1.5x-3.4x training speed-ups in comparison to the baselines using state-of-the-art models.


Author(s):  
Mohan John Blooma ◽  
Jayan Chirayath Kurian

Social Question Answering (SQA) services are emerging as a valuable information resource that is rich not only in the expertise of the user community but also their interactions and insights. The next generation SQA services are challenged in many fronts, including but not limited to: massive, heterogeneous, and streaming collections, diverse and challenging users, and the need to be sensitive to context and ambiguity. However, scholarly inquiries have yet to dovetail into a composite research stream where techniques gleaned from various research domains could be used for harnessing the information richness in SQA services to address these challenges. This chapter first explores the SQA domain by understanding the service and its modules, and then investigating previous studies that were conducted in this domain. This chapter then compares SQA services with traditional question answering systems to identify possible research challenges. Finally, new directions in SQA are proposed.


2016 ◽  
Vol 2016 ◽  
pp. 1-8 ◽  
Author(s):  
Wojciech Wieczorek ◽  
Olgierd Unold

The present paper is a novel contribution to the field of bioinformatics by using grammatical inference in the analysis of data. We developed an algorithm for generating star-free regular expressions which turned out to be good recommendation tools, as they are characterized by a relatively high correlation coefficient between the observed and predicted binary classifications. The experiments have been performed for three datasets of amyloidogenic hexapeptides, and our results are compared with those obtained using the graph approaches, the current state-of-the-art methods in heuristic automata induction, and the support vector machine. The results showed the superior performance of the new grammatical inference algorithm on fixed-length amyloid datasets.


Sign in / Sign up

Export Citation Format

Share Document