scholarly journals Visual Dialogue State Tracking for Question Generation

2020 ◽  
Vol 34 (07) ◽  
pp. 11831-11838 ◽  
Author(s):  
Wei Pang ◽  
Xiaojie Wang

GuessWhat?! is a visual dialogue task between a guesser and an oracle. The guesser aims to locate an object supposed by the oracle oneself in an image by asking a sequence of Yes/No questions. Asking proper questions with the progress of dialogue is vital for achieving successful final guess. As a result, the progress of dialogue should be properly represented and tracked. Previous models for question generation pay less attention on the representation and tracking of dialogue states, and therefore are prone to asking low quality questions such as repeated questions. This paper proposes visual dialogue state tracking (VDST) based method for question generation. A visual dialogue state is defined as the distribution on objects in the image as well as representations of objects. Representations of objects are updated with the change of the distribution on objects. An object-difference based attention is used to decode new question. The distribution on objects is updated by comparing the question-answer pair and objects. Experimental results on GuessWhat?! dataset show that our model significantly outperforms existing methods and achieves new state-of-the-art performance. It is also noticeable that our model reduces the rate of repeated questions from more than 50% to 21.9% compared with previous state-of-the-art methods.

Author(s):  
Yanghoon Kim ◽  
Hwanhee Lee ◽  
Joongbo Shin ◽  
Kyomin Jung

Neural question generation (NQG) is the task of generating a question from a given passage with deep neural networks. Previous NQG models suffer from a problem that a significant proportion of the generated questions include words in the question target, resulting in the generation of unintended questions. In this paper, we propose answer-separated seq2seq, which better utilizes the information from both the passage and the target answer. By replacing the target answer in the original passage with a special token, our model learns to identify which interrogative word should be used. We also propose a new module termed keyword-net, which helps the model better capture the key information in the target answer and generate an appropriate question. Experimental results demonstrate that our answer separation method significantly reduces the number of improper questions which include answers. Consequently, our model significantly outperforms previous state-of-the-art NQG models.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 325
Author(s):  
Zhihao Wu ◽  
Baopeng Zhang ◽  
Tianchen Zhou ◽  
Yan Li ◽  
Jianping Fan

In this paper, we developed a practical approach for automatic detection of discrimination actions from social images. Firstly, an image set is established, in which various discrimination actions and relations are manually labeled. To the best of our knowledge, this is the first work to create a dataset for discrimination action recognition and relationship identification. Secondly, a practical approach is developed to achieve automatic detection and identification of discrimination actions and relationships from social images. Thirdly, the task of relationship identification is seamlessly integrated with the task of discrimination action recognition into one single network called the Co-operative Visual Translation Embedding++ network (CVTransE++). We also compared our proposed method with numerous state-of-the-art methods, and our experimental results demonstrated that our proposed methods can significantly outperform state-of-the-art approaches.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Changyong Li ◽  
Yongxian Fan ◽  
Xiaodong Cai

Abstract Background With the development of deep learning (DL), more and more methods based on deep learning are proposed and achieve state-of-the-art performance in biomedical image segmentation. However, these methods are usually complex and require the support of powerful computing resources. According to the actual situation, it is impractical that we use huge computing resources in clinical situations. Thus, it is significant to develop accurate DL based biomedical image segmentation methods which depend on resources-constraint computing. Results A lightweight and multiscale network called PyConvU-Net is proposed to potentially work with low-resources computing. Through strictly controlled experiments, PyConvU-Net predictions have a good performance on three biomedical image segmentation tasks with the fewest parameters. Conclusions Our experimental results preliminarily demonstrate the potential of proposed PyConvU-Net in biomedical image segmentation with resources-constraint computing.


2019 ◽  
Vol 9 (13) ◽  
pp. 2684 ◽  
Author(s):  
Hongyang Li ◽  
Lizhuang Liu ◽  
Zhenqi Han ◽  
Dan Zhao

Peeling fibre is an indispensable process in the production of preserved Szechuan pickle, the accuracy of which can significantly influence the quality of the products, and thus the contour method of fibre detection, as a core algorithm of the automatic peeling device, is studied. The fibre contour is a kind of non-salient contour, characterized by big intra-class differences and small inter-class differences, meaning that the feature of the contour is not discriminative. The method called dilated-holistically-nested edge detection (Dilated-HED) is proposed to detect the fibre contour, which is built based on the HED network and dilated convolution. The experimental results for our dataset show that the Pixel Accuracy (PA) is 99.52% and the Mean Intersection over Union (MIoU) is 49.99%, achieving state-of-the-art performance.


2020 ◽  
Vol 6 (3) ◽  
pp. 291-306
Author(s):  
Fang-Lue Zhang ◽  
Connelly Barnes ◽  
Hao-Tian Zhang ◽  
Junhong Zhao ◽  
Gabriel Salas

Abstract For many social events such as public performances, multiple hand-held cameras may capture the same event. This footage is often collected by amateur cinematographers who typically have little control over the scene and may not pay close attention to the camera. For these reasons, each individually captured video may fail to cover the whole time of the event, or may lose track of interesting foreground content such as a performer. We introduce a new algorithm that can synthesize a single smooth video sequence of moving foreground objects captured by multiple hand-held cameras. This allows later viewers to gain a cohesive narrative experience that can transition between different cameras, even though the input footage may be less than ideal. We first introduce a graph-based method for selecting a good transition route. This allows us to automatically select good cut points for the hand-held videos, so that smooth transitions can be created between the resulting video shots. We also propose a method to synthesize a smooth photorealistic transition video between each pair of hand-held cameras, which preserves dynamic foreground content during this transition. Our experiments demonstrate that our method outperforms previous state-of-the-art methods, which struggle to preserve dynamic foreground content.


2020 ◽  
Vol 34 (05) ◽  
pp. 8496-8503 ◽  
Author(s):  
Chuan Meng ◽  
Pengjie Ren ◽  
Zhumin Chen ◽  
Christof Monz ◽  
Jun Ma ◽  
...  

Existing conversational systems tend to generate generic responses. Recently, Background Based Conversation (BBCs) have been introduced to address this issue. Here, the generated responses are grounded in some background information. The proposed methods for BBCs are able to generate more informative responses, however, they either cannot generate natural responses or have difficulties in locating the right background information. In this paper, we propose a Reference-aware Network (RefNet) to address both issues. Unlike existing methods that generate responses token by token, RefNet incorporates a novel reference decoder that provides an alternative way to learn to directly select a semantic unit (e.g., a span containing complete semantic information) from the background. Experimental results show that RefNet significantly outperforms state-of-the-art methods in terms of both automatic and human evaluations, indicating that RefNet can generate more appropriate and human-like responses.


2020 ◽  
Vol 34 (05) ◽  
pp. 9065-9072
Author(s):  
Luu Anh Tuan ◽  
Darsh Shah ◽  
Regina Barzilay

Automatic question generation can benefit many applications ranging from dialogue systems to reading comprehension. While questions are often asked with respect to long documents, there are many challenges with modeling such long documents. Many existing techniques generate questions by effectively looking at one sentence at a time, leading to questions that are easy and not reflective of the human process of question generation. Our goal is to incorporate interactions across multiple sentences to generate realistic questions for long documents. In order to link a broad document context to the target answer, we represent the relevant context via a multi-stage attention mechanism, which forms the foundation of a sequence to sequence model. We outperform state-of-the-art methods on question generation on three question-answering datasets - SQuAD, MS MARCO and NewsQA. 1


2019 ◽  
Vol 9 (16) ◽  
pp. 3389 ◽  
Author(s):  
Biqing Zeng ◽  
Heng Yang ◽  
Ruyang Xu ◽  
Wu Zhou ◽  
Xuli Han

Aspect-based sentiment classification (ABSC) aims to predict sentiment polarities of different aspects within sentences or documents. Many previous studies have been conducted to solve this problem, but previous works fail to notice the correlation between the aspect’s sentiment polarity and the local context. In this paper, a Local Context Focus (LCF) mechanism is proposed for aspect-based sentiment classification based on Multi-head Self-Attention (MHSA). This mechanism is called LCF design, and utilizes the Context features Dynamic Mask (CDM) and Context Features Dynamic Weighted (CDW) layers to pay more attention to the local context words. Moreover, a BERT-shared layer is adopted to LCF design to capture internal long-term dependencies of local context and global context. Experiments are conducted on three common ABSC datasets: the laptop and restaurant datasets of SemEval-2014 and the ACL twitter dataset. Experimental results demonstrate that the LCF baseline model achieves considerable performance. In addition, we conduct ablation experiments to prove the significance and effectiveness of LCF design. Especially, by incorporating with BERT-shared layer, the LCF-BERT model refreshes state-of-the-art performance on all three benchmark datasets.


Sensors ◽  
2020 ◽  
Vol 20 (11) ◽  
pp. 3084
Author(s):  
Yoon-Oh Tak ◽  
Anjin Park ◽  
Janghoon Choi ◽  
Jonghyun Eom ◽  
Hyuk-Sang Kwon ◽  
...  

Whole slide imaging (WSI) refers to the process of creating a high-resolution digital image of a whole slide. Since digital images are typically produced by stitching image sequences acquired from different fields of view, the visual quality of the images can be degraded owing to shading distortion, which produces black plaid patterns on the images. A shading correction method for brightfield WSI is presented, which is simple but robust not only against typical image artifacts caused by specks of dust and bubbles, but also against fixed-pattern noise, or spatial variations in pixel values under uniform illumination. The proposed method comprises primarily of two steps. The first step constructs candidates of a shading distortion model from a stack of input image sequences. The second step selects the optimal model from the candidates. The proposed method was compared experimentally with two previous state-of-the-art methods, regularized energy minimization (CIDRE) and background and shading correction (BaSiC) and showed better correction scores, as smooth operations and constraints were not imposed when estimating the shading distortion. The correction scores, averaged over 40 image collections, were as follows: proposed method, 0.39 ± 0.099; CIDRE method, 0.67 ± 0.047; BaSiC method, 0.55 ± 0.038. Based on the quantitative evaluations, we can confirm that the proposed method can correct not only shading distortion, but also fixed-pattern noise, compared with the two previous state-of-the-art methods.


2014 ◽  
Vol 40 (2) ◽  
pp. 269-310 ◽  
Author(s):  
Yanir Seroussi ◽  
Ingrid Zukerman ◽  
Fabian Bohnert

Authorship attribution deals with identifying the authors of anonymous texts. Traditionally, research in this field has focused on formal texts, such as essays and novels, but recently more attention has been given to texts generated by on-line users, such as e-mails and blogs. Authorship attribution of such on-line texts is a more challenging task than traditional authorship attribution, because such texts tend to be short, and the number of candidate authors is often larger than in traditional settings. We address this challenge by using topic models to obtain author representations. In addition to exploring novel ways of applying two popular topic models to this task, we test our new model that projects authors and documents to two disjoint topic spaces. Utilizing our model in authorship attribution yields state-of-the-art performance on several data sets, containing either formal texts written by a few authors or informal texts generated by tens to thousands of on-line users. We also present experimental results that demonstrate the applicability of topical author representations to two other problems: inferring the sentiment polarity of texts, and predicting the ratings that users would give to items such as movies.


Sign in / Sign up

Export Citation Format

Share Document