scholarly journals A Question Type Driven Framework to Diversify Visual Question Generation

Author(s):  
Zhihao Fan ◽  
Zhongyu Wei ◽  
Piji Li ◽  
Yanyan Lan ◽  
Xuanjing Huang

Visual question generation aims at asking questions about an image automatically. Existing research works on this topic usually generate a single question for each given image without considering the issue of diversity. In this paper, we propose a question type driven framework to produce multiple questions for a given image with different focuses. In our framework, each question is constructed following the guidance of a sampled question type in a sequence-to-sequence fashion. To diversify the generated questions, a novel conditional variational auto-encoder is introduced to generate multiple questions with a specific question type. Moreover, we design a strategy to conduct the question type distribution learning for each image to select the final questions. Experimental results on three benchmark datasets show that our framework outperforms the state-of-the-art approaches in terms of both relevance and diversity.

Author(s):  
Zhiguo Wang ◽  
Wael Hamza ◽  
Radu Florian

Natural language sentence matching is a fundamental technology for a variety of tasks. Previous approaches either match sentences from a single direction or only apply single granular (word-by-word or sentence-by-sentence) matching. In this work, we propose a bilateral multi-perspective matching (BiMPM) model. Given two sentences P and Q, our model first encodes them with a BiLSTM encoder. Next, we match the two encoded sentences in two directions P against Q and P against Q. In each matching direction, each time step of one sentence is matched against all time-steps of the other sentence from multiple perspectives. Then, another BiLSTM layer is utilized to aggregate the matching results into a fix-length matching vector. Finally, based on the matching vector, a decision is made through a fully connected layer. We evaluate our model on three tasks: paraphrase identification, natural language inference and answer sentence selection. Experimental results on standard benchmark datasets show that our model achieves the state-of-the-art performance on all tasks.


2020 ◽  
Vol 34 (05) ◽  
pp. 8464-8471
Author(s):  
Xiyao Ma ◽  
Qile Zhu ◽  
Yanlin Zhou ◽  
Xiaolin Li

Taking an answer and its context as input, sequence-to-sequence models have made considerable progress on question generation. However, we observe that these approaches often generate wrong question words or keywords and copy answer-irrelevant words from the input. We believe that lacking global question semantics and exploiting answer position-awareness not well are the key root causes. In this paper, we propose a neural question generation model with two general modules: sentence-level semantic matching and answer position inferring. Further, we enhance the initial state of the decoder by leveraging the answer-aware gated fusion mechanism. Experimental results demonstrate that our model outperforms the state-of-the-art (SOTA) models on SQuAD and MARCO datasets. Owing to its generality, our work also improves the existing models significantly.


2021 ◽  
Vol 54 (1) ◽  
pp. 1-39
Author(s):  
Zara Nasar ◽  
Syed Waqar Jaffry ◽  
Muhammad Kamran Malik

With the advent of Web 2.0, there exist many online platforms that result in massive textual-data production. With ever-increasing textual data at hand, it is of immense importance to extract information nuggets from this data. One approach towards effective harnessing of this unstructured textual data could be its transformation into structured text. Hence, this study aims to present an overview of approaches that can be applied to extract key insights from textual data in a structured way. For this, Named Entity Recognition and Relation Extraction are being majorly addressed in this review study. The former deals with identification of named entities, and the latter deals with problem of extracting relation between set of entities. This study covers early approaches as well as the developments made up till now using machine learning models. Survey findings conclude that deep-learning-based hybrid and joint models are currently governing the state-of-the-art. It is also observed that annotated benchmark datasets for various textual-data generators such as Twitter and other social forums are not available. This scarcity of dataset has resulted into relatively less progress in these domains. Additionally, the majority of the state-of-the-art techniques are offline and computationally expensive. Last, with increasing focus on deep-learning frameworks, there is need to understand and explain the under-going processes in deep architectures.


Author(s):  
Siva Reddy ◽  
Mirella Lapata ◽  
Mark Steedman

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.


2021 ◽  
Vol 11 (23) ◽  
pp. 11344
Author(s):  
Wei Ke ◽  
Ka-Hou Chan

Paragraph-based datasets are hard to analyze by a simple RNN, because a long sequence always contains lengthy problems of long-term dependencies. In this work, we propose a Multilayer Content-Adaptive Recurrent Unit (CARU) network for paragraph information extraction. In addition, we present a type of CNN-based model as an extractor to explore and capture useful features in the hidden state, which represent the content of the entire paragraph. In particular, we introduce the Chebyshev pooling to connect to the end of the CNN-based extractor instead of using the maximum pooling. This can project the features into a probability distribution so as to provide an interpretable evaluation for the final analysis. Experimental results demonstrate the superiority of the proposed approach, being compared to the state-of-the-art models.


2023 ◽  
Vol 55 (1) ◽  
pp. 1-39
Author(s):  
Thanh Tuan Nguyen ◽  
Thanh Phuong Nguyen

Representing dynamic textures (DTs) plays an important role in many real implementations in the computer vision community. Due to the turbulent and non-directional motions of DTs along with the negative impacts of different factors (e.g., environmental changes, noise, illumination, etc.), efficiently analyzing DTs has raised considerable challenges for the state-of-the-art approaches. For 20 years, many different techniques have been introduced to handle the above well-known issues for enhancing the performance. Those methods have shown valuable contributions, but the problems have been incompletely dealt with, particularly recognizing DTs on large-scale datasets. In this article, we present a comprehensive taxonomy of DT representation in order to purposefully give a thorough overview of the existing methods along with overall evaluations of their obtained performances. Accordingly, we arrange the methods into six canonical categories. Each of them is then taken in a brief presentation of its principal methodology stream and various related variants. The effectiveness levels of the state-of-the-art methods are then investigated and thoroughly discussed with respect to quantitative and qualitative evaluations in classifying DTs on benchmark datasets. Finally, we point out several potential applications and the remaining challenges that should be addressed in further directions. In comparison with two existing shallow DT surveys (i.e., the first one is out of date as it was made in 2005, while the newer one (published in 2016) is an inadequate overview), we believe that our proposed comprehensive taxonomy not only provides a better view of DT representation for the target readers but also stimulates future research activities.


2022 ◽  
Vol 22 (3) ◽  
pp. 1-21
Author(s):  
Prayag Tiwari ◽  
Amit Kumar Jaiswal ◽  
Sahil Garg ◽  
Ilsun You

Self-attention mechanisms have recently been embraced for a broad range of text-matching applications. Self-attention model takes only one sentence as an input with no extra information, i.e., one can utilize the final hidden state or pooling. However, text-matching problems can be interpreted either in symmetrical or asymmetrical scopes. For instance, paraphrase detection is an asymmetrical task, while textual entailment classification and question-answer matching are considered asymmetrical tasks. In this article, we leverage attractive properties of self-attention mechanism and proposes an attention-based network that incorporates three key components for inter-sequence attention: global pointwise features, preceding attentive features, and contextual features while updating the rest of the components. Our model follows evaluation on two benchmark datasets cover tasks of textual entailment and question-answer matching. The proposed efficient Self-attention-driven Network for Text Matching outperforms the state of the art on the Stanford Natural Language Inference and WikiQA datasets with much fewer parameters.


Author(s):  
Chao Li ◽  
Cheng Deng ◽  
Lei Wang ◽  
De Xie ◽  
Xianglong Liu

In recent years, hashing has attracted more and more attention owing to its superior capacity of low storage cost and high query efficiency in large-scale cross-modal retrieval. Benefiting from deep leaning, continuously compelling results in cross-modal retrieval community have been achieved. However, existing deep cross-modal hashing methods either rely on amounts of labeled information or have no ability to learn an accuracy correlation between different modalities. In this paper, we proposed Unsupervised coupled Cycle generative adversarial Hashing networks (UCH), for cross-modal retrieval, where outer-cycle network is used to learn powerful common representation, and inner-cycle network is explained to generate reliable hash codes. Specifically, our proposed UCH seamlessly couples these two networks with generative adversarial mechanism, which can be optimized simultaneously to learn representation and hash codes. Extensive experiments on three popular benchmark datasets show that the proposed UCH outperforms the state-of-the-art unsupervised cross-modal hashing methods.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Jiaxi Ye ◽  
Ruilin Li ◽  
Bin Zhang

Directed fuzzing is a practical technique, which concentrates its testing energy on the process toward the target code areas, while costing little on other unconcerned components. It is a promising way to make better use of available resources, especially in testing large-scale programs. However, by observing the state-of-the-art-directed fuzzing engine (AFLGo), we argue that there are two universal limitations, the balance problem between the exploration and the exploitation and the blindness in mutation toward the target code areas. In this paper, we present a new prototype RDFuzz to address these two limitations. In RDFuzz, we first introduce the frequency-guided strategy in the exploration and improve its accuracy by adopting the branch-level instead of the path-level frequency. Then, we introduce the input-distance-based evaluation strategy in the exploitation stage and present an optimized mutation to distinguish and protect the distance sensitive input content. Moreover, an intertwined testing schedule is leveraged to perform the exploration and exploitation in turn. We test RDFuzz on 7 benchmarks, and the experimental results demonstrate that RDFuzz is skilled at driving the program toward the target code areas, and it is not easily stuck by the balance problem of the exploration and the exploitation.


Sensors ◽  
2020 ◽  
Vol 20 (14) ◽  
pp. 3818
Author(s):  
Ye Zhang ◽  
Yi Hou ◽  
Shilin Zhou ◽  
Kewei Ouyang

Recent advances in time series classification (TSC) have exploited deep neural networks (DNN) to improve the performance. One promising approach encodes time series as recurrence plot (RP) images for the sake of leveraging the state-of-the-art DNN to achieve accuracy. Such an approach has been shown to achieve impressive results, raising the interest of the community in it. However, it remains unsolved how to handle not only the variability in the distinctive region scale and the length of sequences but also the tendency confusion problem. In this paper, we tackle the problem using Multi-scale Signed Recurrence Plots (MS-RP), an improvement of RP, and propose a novel method based on MS-RP images and Fully Convolutional Networks (FCN) for TSC. This method first introduces phase space dimension and time delay embedding of RP to produce multi-scale RP images; then, with the use of asymmetrical structure, constructed RP images can represent very long sequences (>700 points). Next, MS-RP images are obtained by multiplying designed sign masks in order to remove the tendency confusion. Finally, FCN is trained with MS-RP images to perform classification. Experimental results on 45 benchmark datasets demonstrate that our method improves the state-of-the-art in terms of classification accuracy and visualization evaluation.


Sign in / Sign up

Export Citation Format

Share Document