Human Action Recognition Using Spatio-Temporal Multiplier Network and Attentive Correlated Temporal Feature

Human Action ◽

Regional Correlation ◽

Temporal Features ◽

Adaptive Motion ◽

Spatio Temporal ◽

Inter Frame ◽

Temporal Feature

Nowadays, action recognition has gained more attention from the computer vision community. Normally for recognizing human actions, spatial and temporal features are extracted. Two-stream convolutional neural network is used commonly for human action recognition in videos. In this paper, Adaptive motion Attentive Correlated Temporal Feature (ACTF) is used for temporal feature extractor. The temporal average pooling in inter-frame is used for extracting the inter-frame regional correlation feature and mean feature. This proposed method has better accuracy of 96.9% for UCF101 and 74.6% for HMDB51 datasets, respectively, which are higher than the other state-of-the-art methods.

Spatio-temporal feature extraction and representation for RGB-D human action recognition

Pattern Recognition Letters ◽

10.1016/j.patrec.2014.03.024 ◽

2014 ◽

Vol 50 ◽

pp. 139-148 ◽

Cited By ~ 36

Author(s):

Jiajia Luo ◽

Wei Wang ◽

Hairong Qi

Keyword(s):

Feature Extraction ◽

Action Recognition ◽

Human Action ◽

Spatio Temporal ◽

Temporal Feature

Study of Human Action Recognition Based on Improved Spatio-temporal Features

International Journal of Automation and Computing ◽

10.1007/s11633-014-0831-4 ◽

2014 ◽

Vol 11 (5) ◽

pp. 500-509 ◽

Cited By ~ 12

Author(s):

Xiao-Fei Ji ◽

Qian-Qian Wu ◽

Zhao-Jie Ju ◽

Yang-Yang Wang

Keyword(s):

Action Recognition ◽

Human Action ◽

Temporal Features ◽

Human Action Recognition Based on Spatio-temporal Features

Lecture Notes in Computer Science - Pattern Recognition and Machine Intelligence ◽

10.1007/978-3-642-11164-8_58 ◽

2009 ◽

pp. 357-362

Author(s):

Nikhil Sawant ◽

K. K. Biswas

Keyword(s):

Action Recognition ◽

Human Action ◽

Temporal Features ◽

Lecture Notes in Computer Science - Neural Information Processing. Models and Applications ◽

Human Action Recognition by SOM Considering the Probability of Spatio-temporal Features

10.1007/978-3-642-17534-3_48 ◽

2010 ◽

pp. 391-398 ◽

Cited By ~ 1

Author(s):

Yanli Ji ◽

Atsushi Shimada ◽

Rin-ichiro Taniguchi

Keyword(s):

Action Recognition ◽

Human Action ◽

Temporal Features ◽

Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks

IEEE Access ◽

10.1109/access.2018.2817253 ◽

2018 ◽

Vol 6 ◽

pp. 17913-17922 ◽

Cited By ~ 24

Author(s):

Lei Wang ◽

Yangyang Xu ◽

Jun Cheng ◽

Haiying Xia ◽

Jianqin Yin ◽

...

Keyword(s):

Neural Networks ◽

Action Recognition ◽

Deep Neural Networks ◽

Human Action ◽

Temporal Features ◽

2013 International Conference on Control, Automation and Information Sciences (ICCAIS) ◽

An effective fusion scheme of spatio-temporal features for human action recognition in RGB-D video

10.1109/iccais.2013.6720562 ◽

2013 ◽

Cited By ~ 1

Author(s):

Quang D. Tran ◽

Ngoc Q. Ly

Keyword(s):

Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal ◽

Fusion Scheme

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

Mutually Reinforced Spatio-Temporal Convolutional Tube for Human Action Recognition

10.24963/ijcai.2019/136 ◽

2019 ◽

Cited By ~ 2

Author(s):

Haoze Wu ◽

Jiawei Liu ◽

Zheng-Jun Zha ◽

Zhenzhong Chen ◽

Xiaoyan Sun

Keyword(s):

Action Recognition ◽

Computational Cost ◽

Human Action ◽

Temporal Information ◽

Temporal Features ◽

Temporal Decomposition ◽

Spatio Temporal ◽

Different Order ◽

High Computational Cost

Recent works use 3D convolutional neural networks to explore spatio-temporal information for human action recognition. However, they either ignore the correlation between spatial and temporal features or suffer from high computational cost by spatio-temporal features extraction. In this work, we propose a novel and efficient Mutually Reinforced Spatio-Temporal Convolutional Tube (MRST) for human action recognition. It decomposes 3D inputs into spatial and temporal representations, mutually enhances both of them by exploiting the interaction of spatial and temporal information and selectively emphasizes informative spatial appearance and temporal motion, meanwhile reducing the complexity of structure. Moreover, we design three types of MRSTs according to the different order of spatial and temporal information enhancement, each of which contains a spatio-temporal decomposition unit, a mutually reinforced unit and a spatio-temporal fusion unit. An end-to-end deep network, MRST-Net, is also proposed based on the MRSTs to better explore spatio-temporal information in human actions. Extensive experiments show MRST-Net yields the best performance, compared to state-of-the-art approaches.