Depth and Video Segmentation Based Visual Attention for Embodied Question Answering

Author(s):  
Haonan Luo ◽  
Guosheng Lin ◽  
Yazhou Yao ◽  
Fayao Liu ◽  
Zichuan Liu ◽  
...  
Author(s):  
Zhou Zhao ◽  
Zhu Zhang ◽  
Shuwen Xiao ◽  
Zhou Yu ◽  
Jun Yu ◽  
...  

Open-ended long-form video question answering is challenging problem in visual information retrieval, which automatically generates the natural language answer from the referenced long-form video content according to the question. However, the existing video question answering works mainly focus on the short-form video question answering, due to the lack of modeling the semantic representation of long-form video contents. In this paper, we consider the problem of long-form video question answering from the viewpoint of adaptive hierarchical reinforced encoder-decoder network learning. We propose the adaptive hierarchical encoder network to learn the joint representation of the long-form video contents according to the question with adaptive video segmentation. we then develop the reinforced decoder network to generate the natural language answer for open-ended video question answering. We construct a large-scale long-form video question answering dataset. The extensive experiments show the effectiveness of our method.


Author(s):  
Jingkuan Song ◽  
Pengpeng Zeng ◽  
Lianli Gao ◽  
Heng Tao Shen

Recently, attention-based Visual Question Answering (VQA) has achieved great success by utilizing question to selectively target different visual areas that are related to the answer. Existing visual attention models are generally planar, i.e., different channels of the last conv-layer feature map of an image share the same weight. This conflicts with the attention mechanism because CNN features are naturally spatial and channel-wise. Also, visual attention models are usually conducted on pixel-level, which may cause region discontinuous problem. In this paper we propose a Cubic Visual Attention (CVA) model by successfully applying a novel channel and spatial attention on object regions to improve VQA task. Specifically, instead of attending to pixels, we first take advantage of the object proposal networks to generate a set of object candidates and extract their associated conv features. Then, we utilize the question to guide channel attention and spatial attention calculation based on the con-layer feature map. Finally, the attended visual features and the question are combined to infer the answer. We assess the performance of our proposed CVA on three public image QA datasets, including COCO-QA, VQA and Visual7W. Experimental results show that our proposed method significantly outperforms the state-of-the-arts.


2001 ◽  
Vol 15 (1) ◽  
pp. 22-34 ◽  
Author(s):  
D.H. de Koning ◽  
J.C. Woestenburg ◽  
M. Elton

Migraineurs with and without aura (MWAs and MWOAs) as well as controls were measured twice with an interval of 7 days. The first session of recordings and tests for migraineurs was held about 7 hours after a migraine attack. We hypothesized that electrophysiological changes in the posterior cerebral cortex related to visual spatial attention are influenced by the level of arousal in migraineurs with aura, and that this varies over the course of time. ERPs related to the active visual attention task manifested significant differences between controls and both types of migraine sufferers for the N200, suggesting a common pathophysiological mechanism for migraineurs. Furthermore, migraineurs without aura (MWOAs) showed a significant enhancement for the N200 at the second session, indicating the relevance of time of measurement within migraine studies. Finally, migraineurs with aura (MWAs) showed significantly enhanced P240 and P300 components at central and parietal cortical sites compared to MWOAs and controls, which seemed to be maintained over both sessions and could be indicative of increased noradrenergic activity in MWAs.


1997 ◽  
Vol 42 (6) ◽  
pp. 501-503
Author(s):  
Kyle R. Cave
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document