Modality-Balanced Models for Visual Dialogue

The Visual Dialog task requires a model to exploit both image and conversational context information to generate the next response to the dialogue. However, via manual analysis, we find that a large number of conversational questions can be answered by only looking at the image without any access to the context history, while others still need the conversation context to predict the correct answers. We demonstrate that due to this reason, previous joint-modality (history and image) models over-rely on and are more prone to memorizing the dialogue history (e.g., by extracting certain keywords or patterns in the context information), whereas image-only models are more generalizable (because they cannot memorize or extract keywords from history) and perform substantially better at the primary normalized discounted cumulative gain (NDCG) task metric which allows multiple correct answers. Hence, this observation encourages us to explicitly maintain two models, i.e., an image-only model and an image-history joint model, and combine their complementary abilities for a more balanced multimodal model. We present multiple methods for this integration of the two models, via ensemble and consensus dropout fusion with shared parameters. Empirically, our models achieve strong results on the Visual Dialog challenge 2019 (rank 3 on NDCG and high balance across metrics), and substantially outperform the winner of the Visual Dialog challenge 2018 on most metrics.

Download Full-text

Acoustic-to-Word Models with Conversational Context Information

10.18653/v1/n19-1283 ◽

2019 ◽

Author(s):

Suyoun Kim ◽

Florian Metze

Keyword(s):

Context Information ◽

Conversational Context

Download Full-text

Automatic Stereotype Activation Is Context Dependent

Social Psychology ◽

10.1027/1864-9335/a000019 ◽

2010 ◽

Vol 41 (3) ◽

pp. 131-136 ◽

Cited By ~ 35

Author(s):

Catharina Casper ◽

Klaus Rothermund ◽

Dirk Wentura

Keyword(s):

Lexical Decision ◽

Context Information ◽

Decision Task ◽

Social Categories ◽

Priming Effects ◽

Stereotype Activation ◽

Social Stereotypes ◽

Priming Paradigm ◽

Automatic Activation ◽

Mental Schemas

Processes involving an automatic activation of stereotypes in different contexts were investigated using a priming paradigm with the lexical decision task. The names of social categories were combined with background pictures of specific situations to yield a compound prime comprising category and context information. Significant category priming effects for stereotypic attributes (e.g., Bavarians – beer) emerged for fitting contexts (e.g., in combination with a picture of a marquee) but not for nonfitting contexts (e.g., in combination with a picture of a shop). Findings indicate that social stereotypes are organized as specific mental schemas that are triggered by a combination of category and context information.

Download Full-text

Impact of Context Information on Metaphor Elaboration

Experimental Psychology (formerly Zeitschrift für Experimentelle Psychologie) ◽

10.1027/1618-3169/a000422 ◽

2018 ◽

Vol 65 (6) ◽

pp. 370-384 ◽

Cited By ~ 5

Author(s):

Veronika Lerche ◽

Ursula Christmann ◽

Andreas Voss

Keyword(s):

Diffusion Model ◽

Processing Speed ◽

A Priori ◽

Model Fit ◽

Verbal Task ◽

Context Information ◽

Speed Of Processing ◽

Good Account ◽

Dependent Variables ◽

Decision Settings

Abstract. In experiments by Gibbs, Kushner, and Mills (1991) , sentences were supposedly either authored by poets or by a computer. Gibbs et al. (1991) concluded from their results that the assumed source of the text influences speed of processing, with a higher speed for metaphorical sentences in the Poet condition. However, the dependent variables used (e.g., mean RTs) do not allow clear conclusions regarding processing speed. It is also possible that participants had prior biases before the presentation of the stimuli. We conducted a conceptual replication and applied the diffusion model ( Ratcliff, 1978 ) to disentangle a possible effect on processing speed from a prior bias. Our results are in accordance with the interpretation by Gibbs et al. (1991) : The context information affected processing speed, not a priori decision settings. Additionally, analyses of model fit revealed that the diffusion model provided a good account of the data of this complex verbal task.

Download Full-text