scholarly journals TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection

2020 ◽  
Vol 34 (05) ◽  
pp. 7780-7788
Author(s):  
Siddhant Garg ◽  
Thuy Vu ◽  
Alessandro Moschitti

We propose TandA, an effective technique for fine-tuning pre-trained Transformer models for natural language tasks. Specifically, we first transfer a pre-trained model into a model for a general task by fine-tuning it with a large and high-quality dataset. We then perform a second fine-tuning step to adapt the transferred model to the target domain. We demonstrate the benefits of our approach for answer sentence selection, which is a well-known inference task in Question Answering. We built a large scale dataset to enable the transfer step, exploiting the Natural Questions dataset. Our approach establishes the state of the art on two well-known benchmarks, WikiQA and TREC-QA, achieving the impressive MAP scores of 92% and 94.3%, respectively, which largely outperform the the highest scores of 83.4% and 87.5% of previous work. We empirically show that TandA generates more stable and robust models reducing the effort required for selecting optimal hyper-parameters. Additionally, we show that the transfer step of TandA makes the adaptation step more robust to noise. This enables a more effective use of noisy datasets for fine-tuning. Finally, we also confirm the positive impact of TandA in an industrial setting, using domain specific datasets subject to different types of noise.

2020 ◽  
Author(s):  
kanji tanaka

Fine-tuning a deep convolutional neural network (DCN) as a place-class detector (PCD) is a direct method to realize domain-adaptive visual place recognition (VPR). Although the PCD model is effective, a PCD model requires considerable amount of class-specific training examples and class-set maintenance in long-term large-scale VPR scenarios. Therefore, we propose to employ a DCN as a landmark-class detector (LCD), which allows to distinguish exponentially large numbers of different places by combining multiple landmarks, and furthermore, allows to select a stable part of the scenes (such as buildings) as landmark classes to reduce the need for class-set maintenance. However, the following important questions remain. 1) How we should mine such training examples (landmark objects) even when we have no domain-specific object detector? 2) How we should fine-tune the architecture and parameters of the DCN to a new domain-specific landmark set? To answer these questions, we present a self-supervised landmark mining approach for collecting pseudo-labeled landmark examples, and then consider the network architecture search (NAS) on the LCD task, which has significantly larger search space than typical NAS applications such as PCD. Extensive verification experiments demonstrate the superiority of the proposed framework to previous LCD methods with hand-crafted architectures and/or non-adaptive parameters, and 90% reduction in NAS cost compared with the naive NAS implementation.


AI Magazine ◽  
2010 ◽  
Vol 31 (3) ◽  
pp. 93 ◽  
Author(s):  
Stephen Soderland ◽  
Brendan Roof ◽  
Bo Qin ◽  
Shi Xu ◽  
Mausam ◽  
...  

Information extraction (IE) can identify a set of relations from free text to support question answering (QA). Until recently, IE systems were domain-specific and needed a combination of manual engineering and supervised learning to adapt to each target domain. A new paradigm, Open IE operates on large text corpora without any manual tagging of relations, and indeed without any pre-specified relations. Due to its open-domain and open-relation nature, Open IE is purely textual and is unable to relate the surface forms to an ontology, if known in advance. We explore the steps needed to adapt Open IE to a domain-specific ontology and demonstrate our approach of mapping domain-independent tuples to an ontology using domains from DARPA’s Machine Reading Project. Our system achieves precision over 0.90 from as few as 8 training examples for an NFL-scoring domain.


2007 ◽  
Vol 13 (4) ◽  
pp. 317-351
Author(s):  
HANS-ULRICH KRIEGER

AbstractWe present a simple and intuitive unsound corpus-driven approximation method for turning unification-based grammars, such as HPSG, CLE, or PATR-II into context-free grammars (CFGs). Our research is motivated by the idea that we can exploit (large-scale), hand-written unification grammars not only for the purpose of describing natural language and obtaining a syntactic structure (and perhaps a semantic form), but also to address several other very practical topics. Firstly, to speed up deep parsing by having a cheap recognition pre-flter (the approximated CFG). Secondly, to obtain an indirect stochastic parsing model for the unification grammar through a trained PCFG, obtained from the approximated CFG. This gives us an efficient disambiguation model for the unification-based grammar. Thirdly, to generate domain-specific subgrammars for application areas such as information extraction or question answering. And finally, to compile context-free language models which assist the acoustic model of a speech recognizer. The approximation method is unsound in that it does not generate a CFG whose language is a true superset of the language accepted by the original unification-based grammar. It is a corpus-driven method in that it relies on a corpus of parsed sentences and generates broader CFGs when given more input samples. Our open approach can be fine-tuned in different directions, allowing us to monotonically come close to the original parse trees by shifting more information into the context-free symbols. The approach has been fully implemented in JAVA.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Asma Ben Abacha ◽  
Dina Demner-Fushman

Abstract Background One of the challenges in large-scale information retrieval (IR) is developing fine-grained and domain-specific methods to answer natural language questions. Despite the availability of numerous sources and datasets for answer retrieval, Question Answering (QA) remains a challenging problem due to the difficulty of the question understanding and answer extraction tasks. One of the promising tracks investigated in QA is mapping new questions to formerly answered questions that are “similar”. Results We propose a novel QA approach based on Recognizing Question Entailment (RQE) and we describe the QA system and resources that we built and evaluated on real medical questions. First, we compare logistic regression and deep learning methods for RQE using different kinds of datasets including textual inference, question similarity, and entailment in both the open and clinical domains. Second, we combine IR models with the best RQE method to select entailed questions and rank the retrieved answers. To study the end-to-end QA approach, we built the MedQuAD collection of 47,457 question-answer pairs from trusted medical sources which we introduce and share in the scope of this paper. Following the evaluation process used in TREC 2017 LiveQA, we find that our approach exceeds the best results of the medical task with a 29.8% increase over the best official score. Conclusions The evaluation results support the relevance of question entailment for QA and highlight the effectiveness of combining IR and RQE for future QA efforts. Our findings also show that relying on a restricted set of reliable answer sources can bring a substantial improvement in medical QA.


2021 ◽  
Vol 9 ◽  
pp. 211-225
Author(s):  
Hiroaki Hayashi ◽  
Prashant Budania ◽  
Peng Wang ◽  
Chris Ackerson ◽  
Raj Neervannan ◽  
...  

Abstract Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different domains (e.g., sentiment, product features), the development of previous models has tended to be domain-specific. In this paper, we propose WikiAsp,1 a large-scale dataset for multi-domain aspect- based summarization that attempts to spur research in the direction of open-domain aspect-based summarization. Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation. We propose several straightforward baseline models for this task and conduct experiments on the dataset. Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.


2021 ◽  
Vol 7 (8) ◽  
pp. 123
Author(s):  
Eva Cetinic

To automatically generate accurate and meaningful textual descriptions of images is an ongoing research challenge. Recently, a lot of progress has been made by adopting multimodal deep learning approaches for integrating vision and language. However, the task of developing image captioning models is most commonly addressed using datasets of natural images, while not many contributions have been made in the domain of artwork images. One of the main reasons for that is the lack of large-scale art datasets of adequate image-text pairs. Another reason is the fact that generating accurate descriptions of artwork images is particularly challenging because descriptions of artworks are more complex and can include multiple levels of interpretation. It is therefore also especially difficult to effectively evaluate generated captions of artwork images. The aim of this work is to address some of those challenges by utilizing a large-scale dataset of artwork images annotated with concepts from the Iconclass classification system. Using this dataset, a captioning model is developed by fine-tuning a transformer-based vision-language pretrained model. Due to the complex relations between image and text pairs in the domain of artwork images, the generated captions are evaluated using several quantitative and qualitative approaches. The performance is assessed using standard image captioning metrics and a recently introduced reference-free metric. The quality of the generated captions and the model’s capacity to generalize to new data is explored by employing the model to another art dataset to compare the relation between commonly generated captions and the genre of artworks. The overall results suggest that the model can generate meaningful captions that indicate a stronger relevance to the art historical context, particularly in comparison to captions obtained from models trained only on natural image datasets.


Sensors ◽  
2019 ◽  
Vol 19 (9) ◽  
pp. 2040 ◽  
Author(s):  
Antoine d’Acremont ◽  
Ronan Fablet ◽  
Alexandre Baussard ◽  
Guillaume Quin

Convolutional neural networks (CNNs) have rapidly become the state-of-the-art models for image classification applications. They usually require large groundtruthed datasets for training. Here, we address object identification and recognition in the wild for infrared (IR) imaging in defense applications, where no such large-scale dataset is available. With a focus on robustness issues, especially viewpoint invariance, we introduce a compact and fully convolutional CNN architecture with global average pooling. We show that this model trained from realistic simulation datasets reaches a state-of-the-art performance compared with other CNNs with no data augmentation and fine-tuning steps. We also demonstrate a significant improvement in the robustness to viewpoint changes with respect to an operational support vector machine (SVM)-based scheme.


2010 ◽  
Vol 08 (01) ◽  
pp. 147-161 ◽  
Author(s):  
YUTAKA SASAKI ◽  
JOHN MCNAUGHT ◽  
SOPHIA ANANIADOU

This paper demonstrates that a large-scale lexicon tailored for the biology domain is effective in improving question analysis for genomics Question Answering (QA). We use the TREC Genomics Track data to evaluate the performance of different question analysis methods. It is hard to process textual information in biology, especially in molecular biology, due to a huge number of technical terms which rarely appear in general English documents and dictionaries. To support biological Text Mining, we have developed a domain-specific resource, the BioLexicon. Started in 2006 from scratch, this lexicon currently includes more than four million biomedical terms consisting of newly curated terms and terms collected from existing biomedical databases. While conventional genomics QA systems provide query expansion based on thesauri and dictionaries, it is not clear to what extent a biology-oriented lexical resource is effective for question pre-processing for genomics QA. Experiments on the genomics QA data set show that question analysis using the BioLexicon performs slightly better than that using n-grams and the UMLS Specialist Lexicon.


2020 ◽  
Author(s):  
kanji tanaka

Fine-tuning a deep convolutional neural network (DCN) as a place-class detector (PCD) is a direct method to realize domain-adaptive visual place recognition (VPR). Although the PCD model is effective, a PCD model requires considerable amount of class-specific training examples and class-set maintenance in long-term large-scale VPR scenarios. Therefore, we propose to employ a DCN as a landmark-class detector (LCD), which allows to distinguish exponentially large numbers of different places by combining multiple landmarks, and furthermore, allows to select a stable part of the scenes (such as buildings) as landmark classes to reduce the need for class-set maintenance. However, the following important questions remain. 1) How we should mine such training examples (landmark objects) even when we have no domain-specific object detector? 2) How we should fine-tune the architecture and parameters of the DCN to a new domain-specific landmark set? To answer these questions, we present a self-supervised landmark mining approach for collecting pseudo-labeled landmark examples, and then consider the network architecture search (NAS) on the LCD task, which has significantly larger search space than typical NAS applications such as PCD. Extensive verification experiments demonstrate the superiority of the proposed framework to previous LCD methods with hand-crafted architectures and/or non-adaptive parameters, and 90% reduction in NAS cost compared with the naive NAS implementation.


Sign in / Sign up

Export Citation Format

Share Document