Action Recognition With Spatio–Temporal Visual Attention on Skeleton Image Sequences

Motion in videos refers to the pattern of the apparent movement of objects, surfaces, and edges over image sequences caused by the relative movement between a camera and a scene. Motion, as well as scene appearance, are essential features to estimate a driver’s visual attention allocation in computer vision. However, the fact that motion can be a crucial factor in a driver’s attention estimation has not been thoroughly studied in the literature, although driver’s attention prediction models focusing on scene appearance have been well studied. Therefore, in this work, we investigate the usefulness of motion information in estimating a driver’s visual attention. To analyze the effectiveness of motion information, we develop a deep neural network framework that provides attention locations and attention levels using optical flow maps, which represent the movements of contents in videos. We validate the performance of the proposed motion-based prediction model by comparing it to the performance of the current state-of-art prediction models using RGB frames. The experimental results for a real-world dataset confirm our hypothesis that motion plays a role in prediction accuracy improvement, and there is a margin for accuracy improvement by using motion features.

Download Full-text

Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation

Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval - ICMR '16 ◽

10.1145/2911996.2912001 ◽

2016 ◽

Cited By ~ 34

Author(s):

Qing Li ◽

Zhaofan Qiu ◽

Ting Yao ◽

Tao Mei ◽

Yong Rui ◽

...

Keyword(s):

Action Recognition ◽

Video Representation ◽

Spatio Temporal

Download Full-text

Spatio-Temporal 3D Action Recognition with Hierarchical Self-Attention Mechanism

2021 26th International Computer Conference, Computer Society of Iran (CSICC) ◽

10.1109/csicc52343.2021.9420631 ◽

2021 ◽

Author(s):

Soheil Araei ◽

Ali Nadian-Ghomsheh

Keyword(s):

Action Recognition ◽

Attention Mechanism ◽

Spatio Temporal

Download Full-text

Glimpse: A Gaze-Based Measure of Temporal Salience

Sensors ◽

10.3390/s21093099 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3099

Author(s):

V. Javier Traver ◽

Judith Zorío ◽

Luis A. Leiva

Keyword(s):

Visual Attention ◽

Computational Models ◽

Temporal Evolution ◽

Temporal Consistency ◽

Visual Salience ◽

Temporal Dimension ◽

Spatial Perspective ◽

Spatio Temporal ◽

Scoring Algorithms ◽

Over Time

Temporal salience considers how visual attention varies over time. Although visual salience has been widely studied from a spatial perspective, its temporal dimension has been mostly ignored, despite arguably being of utmost importance to understand the temporal evolution of attention on dynamic contents. To address this gap, we proposed Glimpse, a novel measure to compute temporal salience based on the observer-spatio-temporal consistency of raw gaze data. The measure is conceptually simple, training free, and provides a semantically meaningful quantification of visual attention over time. As an extension, we explored scoring algorithms to estimate temporal salience from spatial salience maps predicted with existing computational models. However, these approaches generally fall short when compared with our proposed gaze-based measure. Glimpse could serve as the basis for several downstream tasks such as segmentation or summarization of videos. Glimpse’s software and data are publicly available.

Download Full-text