Information Bottleneck Approach to Spatial Attention Learning

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/108 ◽

2021 ◽

Author(s):

Qiuxia Lai ◽

Yu Li ◽

Ailing Zeng ◽

Minhao Liu ◽

Hanqiu Sun ◽

...

Keyword(s):

Spatial Attention ◽

Visual Recognition ◽

Predictive Accuracy ◽

Visual Awareness ◽

Input Image ◽

Attention Mechanism ◽

Natural Scenes ◽

Fine Grained ◽

Selective Visual Attention ◽

Information Bottleneck

The selective visual attention mechanism in the human visual system (HVS) restricts the amount of information to reach visual awareness for perceiving natural scenes, allowing near real-time information processing with limited computational capacity. This kind of selectivity acts as an ‘Information Bottleneck (IB)’, which seeks a trade-off between information compression and predictive accuracy. However, such information constraints are rarely explored in the attention mechanism for deep neural networks (DNNs). In this paper, we propose an IB-inspired spatial attention module for DNN structures built for visual recognition. The module takes as input an intermediate representation of the input image, and outputs a variational 2D attention map that minimizes the mutual information (MI) between the attention-modulated representation and the input, while maximizing the MI between the attention-modulated representation and the task label. To further restrict the information bypassed by the attention map, we quantize the continuous attention scores to a set of learnable anchor values during training. Extensive experiments show that the proposed IB-inspired spatial attention mechanism can yield attention maps that neatly highlight the regions of interest while suppressing backgrounds, and bootstrap standard DNN structures for visual recognition tasks (e.g., image classification, fine-grained recognition, cross-domain classification). The attention maps are interpretable for the decision making of the DNNs as verified in the experiments. Our code is available at this https URL.

Download Full-text

Attention Mechanism based Real Time Gaze Tracking in Natural Scenes with Residual Blocks

IEEE Transactions on Cognitive and Developmental Systems ◽

10.1109/tcds.2021.3064280 ◽

2021 ◽

pp. 1-1

Author(s):

Lihong Dai ◽

Jinguo Liu ◽

Zhaojie Ju ◽

Yang Gao

Keyword(s):

Real Time ◽

Attention Mechanism ◽

Natural Scenes ◽

Gaze Tracking

Download Full-text

Selective, Structural, Subtle: Trilinear Spatial-Awareness for Few-Shot Fine-Grained Visual Recognition

2021 IEEE International Conference on Multimedia and Expo (ICME) ◽

10.1109/icme51207.2021.9428223 ◽

2021 ◽

Author(s):

Heng Wu ◽

Yifan Zhao ◽

Jia Li

Keyword(s):

Visual Recognition ◽

Spatial Awareness ◽

Fine Grained

Download Full-text

MwoA auxiliary diagnosis via RSN-based 3D deep multiple instance learning with spatial attention mechanism

2020 11th International Conference on Awareness Science and Technology (iCAST) ◽

10.1109/icast51195.2020.9319486 ◽

2020 ◽

Author(s):

Xiang Li ◽

Benzheng Wei ◽

Tianyang Li ◽

Na Zhang

Keyword(s):

Spatial Attention ◽

Multiple Instance Learning ◽

Attention Mechanism

Download Full-text

Lightweight pyramid network with spatial attention mechanism for accurate retinal vessel segmentation

International Journal of Computer Assisted Radiology and Surgery ◽

10.1007/s11548-021-02344-x ◽

2021 ◽

Vol 16 (4) ◽

pp. 673-682

Author(s):

Tengfei Tan ◽

Zhilun Wang ◽

Hongwei Du ◽

Jinzhang Xu ◽

Bensheng Qiu

Keyword(s):

Spatial Attention ◽

Retinal Vessel ◽

Vessel Segmentation ◽

Attention Mechanism ◽

Retinal Vessel Segmentation

Download Full-text

Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition

Computer Vision – ECCV 2018 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-01270-0_35 ◽

2018 ◽

pp. 595-610 ◽

Cited By ~ 32

Author(s):

Chaojian Yu ◽

Xinyi Zhao ◽

Qi Zheng ◽

Peng Zhang ◽

Xinge You

Keyword(s):

Visual Recognition ◽

Fine Grained

Download Full-text

Awareness and Integration: Understanding the Challenges of Inferring Multisensory Integration Outside of Awareness

10.26686/wgtn.17136149 ◽

2021 ◽

Author(s):

◽

Daniel Jenkins

Keyword(s):

Spatial Attention ◽

Multisensory Integration ◽

Subjective Experience ◽

Visual Masking ◽

Target Location ◽

Visual Awareness ◽

Visual Speech ◽

Emotional Prosody ◽

Perceptual Awareness ◽

Spatial Cueing

<p>Multisensory integration describes the cognitive processes by which information from various perceptual domains is combined to create coherent percepts. For consciously aware perception, multisensory integration can be inferred when information in one perceptual domain influences subjective experience in another. Yet the relationship between integration and awareness is not well understood. One current question is whether multisensory integration can occur in the absence of perceptual awareness. Because there is subjective experience for unconscious perception, researchers have had to develop novel tasks to infer integration indirectly. For instance, Palmer and Ramsey (2012) presented auditory recordings of spoken syllables alongside videos of faces speaking either the same or different syllables, while masking the videos to prevent visual awareness. The conjunction of matching voices and faces predicted the location of a subsequent Gabor grating (target) on each trial. Participants indicated the location/orientation of the target more accurately when it appeared in the cued location (80% chance), thus the authors inferred that auditory and visual speech events were integrated in the absence of visual awareness. In this thesis, I investigated whether these findings generalise to the integration of auditory and visual expressions of emotion. In Experiment 1, I presented spatially informative cues in which congruent facial and vocal emotional expressions predicted the target location, with and without visual masking. I found no evidence of spatial cueing in either awareness condition. To investigate the lack of spatial cueing, in Experiment 2, I repeated the task with aware participants only, and had half of those participants explicitly report the emotional prosody. A significant spatial-cueing effect was found only when participants reported emotional prosody, suggesting that audiovisual congruence can cue spatial attention during aware perception. It remains unclear whether audiovisual congruence can cue spatial attention without awareness, and whether such effects genuinely imply multisensory integration.</p>

Download Full-text

Person Reidentification Model Based on Multiattention Modules and Multiscale Residuals

Complexity ◽

10.1155/2021/6673461 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Yongyi Li ◽

Shiqi Wang ◽

Shuang Dong ◽

Xueling Lv ◽

Changzhi Lv ◽

...

Keyword(s):

Local Features ◽

Attention Mechanism ◽

Experimental Results ◽

Original Network ◽

Fine Grained ◽

Backbone Network ◽

Model Based ◽

Local Branch ◽

Feature Expression ◽

Global And Local

At present, person reidentification based on attention mechanism has attracted many scholars’ interests. Although attention module can improve the representation ability and reidentification accuracy of Re-ID model to a certain extent, it depends on the coupling of attention module and original network. In this paper, a person reidentification model that combines multiple attentions and multiscale residuals is proposed. The model introduces combined attention fusion module and multiscale residual fusion module in the backbone network ResNet 50 to enhance the feature flow between residual blocks and better fuse multiscale features. Furthermore, a global branch and a local branch are designed and applied to enhance the channel aggregation and position perception ability of the network by utilizing the dual ensemble attention module, as along as the fine-grained feature expression is obtained by using multiproportion block and reorganization. Thus, the global and local features are enhanced. The experimental results on Market-1501 dataset and DukeMTMC-reID dataset show that the indexes of the presented model, especially Rank-1 accuracy, reach 96.20% and 89.59%, respectively, which can be considered as a progress in Re-ID.

Download Full-text