scholarly journals Modality-Balanced Models for Visual Dialogue

2020 ◽  
Vol 34 (05) ◽  
pp. 8091-8098 ◽  
Author(s):  
Hyounghun Kim ◽  
Hao Tan ◽  
Mohit Bansal

The Visual Dialog task requires a model to exploit both image and conversational context information to generate the next response to the dialogue. However, via manual analysis, we find that a large number of conversational questions can be answered by only looking at the image without any access to the context history, while others still need the conversation context to predict the correct answers. We demonstrate that due to this reason, previous joint-modality (history and image) models over-rely on and are more prone to memorizing the dialogue history (e.g., by extracting certain keywords or patterns in the context information), whereas image-only models are more generalizable (because they cannot memorize or extract keywords from history) and perform substantially better at the primary normalized discounted cumulative gain (NDCG) task metric which allows multiple correct answers. Hence, this observation encourages us to explicitly maintain two models, i.e., an image-only model and an image-history joint model, and combine their complementary abilities for a more balanced multimodal model. We present multiple methods for this integration of the two models, via ensemble and consensus dropout fusion with shared parameters. Empirically, our models achieve strong results on the Visual Dialog challenge 2019 (rank 3 on NDCG and high balance across metrics), and substantially outperform the winner of the Visual Dialog challenge 2018 on most metrics.

2010 ◽  
Vol 41 (3) ◽  
pp. 131-136 ◽  
Author(s):  
Catharina Casper ◽  
Klaus Rothermund ◽  
Dirk Wentura

Processes involving an automatic activation of stereotypes in different contexts were investigated using a priming paradigm with the lexical decision task. The names of social categories were combined with background pictures of specific situations to yield a compound prime comprising category and context information. Significant category priming effects for stereotypic attributes (e.g., Bavarians – beer) emerged for fitting contexts (e.g., in combination with a picture of a marquee) but not for nonfitting contexts (e.g., in combination with a picture of a shop). Findings indicate that social stereotypes are organized as specific mental schemas that are triggered by a combination of category and context information.


Author(s):  
Veronika Lerche ◽  
Ursula Christmann ◽  
Andreas Voss

Abstract. In experiments by Gibbs, Kushner, and Mills (1991) , sentences were supposedly either authored by poets or by a computer. Gibbs et al. (1991) concluded from their results that the assumed source of the text influences speed of processing, with a higher speed for metaphorical sentences in the Poet condition. However, the dependent variables used (e.g., mean RTs) do not allow clear conclusions regarding processing speed. It is also possible that participants had prior biases before the presentation of the stimuli. We conducted a conceptual replication and applied the diffusion model ( Ratcliff, 1978 ) to disentangle a possible effect on processing speed from a prior bias. Our results are in accordance with the interpretation by Gibbs et al. (1991) : The context information affected processing speed, not a priori decision settings. Additionally, analyses of model fit revealed that the diffusion model provided a good account of the data of this complex verbal task.


Author(s):  
Yanlei Gu ◽  
Dailin Li ◽  
Yoshihiko Kamiya ◽  
Shunsuke Kamijo

2018 ◽  
Vol 10 (1) ◽  
pp. 219-234
Author(s):  
John H. Hitchcock ◽  
◽  
Anthony J. Onwuegbuzie ◽  
Shannon David ◽  
Anne-Maree Ruddy ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document