Learning representative temporal features for action recognition

Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis

CVPR 2011 ◽

10.1109/cvpr.2011.5995496 ◽

2011 ◽

Cited By ~ 473

Author(s):

Quoc V. Le ◽

Will Y. Zou ◽

Serena Y. Yeung ◽

Andrew Y. Ng

Keyword(s):

Action Recognition ◽

Subspace Analysis ◽

Temporal Features ◽

Spatio Temporal ◽

Independent Subspace Analysis

Download Full-text

Evaluation of local spatial–temporal features for cross-view action recognition

Neurocomputing ◽

10.1016/j.neucom.2015.07.105 ◽

2016 ◽

Vol 173 ◽

pp. 110-117 ◽

Cited By ~ 11

Author(s):

Zan Gao ◽

Weizhi Nie ◽

Anan Liu ◽

Hua Zhang

Keyword(s):

Action Recognition ◽

Temporal Features

Download Full-text

Learning Spatio-Temporal Features for Action Recognition with Modified Hidden Conditional Random Field

Computer Vision - ECCV 2014 Workshops - Lecture Notes in Computer Science ◽

10.1007/978-3-319-16178-5_55 ◽

2015 ◽

pp. 786-801

Author(s):

Wanru Xu ◽

Zhenjiang Miao ◽

Jian Zhang ◽

Yi Tian

Keyword(s):

Random Field ◽

Action Recognition ◽

Conditional Random Field ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

A Robust Approach for Action Recognition Based on Spatio-Temporal Features in RGB-D Sequences

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2016.070526 ◽

2016 ◽

Vol 7 (5) ◽

Author(s):

Ly Quoc ◽

Vo Hoai ◽

Tran Thai ◽

Pham Minh

Keyword(s):

Action Recognition ◽

Robust Approach ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Learning spatial–temporal features via a pose-flow relational model for action recognition

AIP Advances ◽

10.1063/5.0011161 ◽

2020 ◽

Vol 10 (7) ◽

pp. 075208

Author(s):

Qianyu Wu ◽

Fangqiang Hu ◽

Aichun Zhu ◽

Zixuan Wang ◽

Yaping Bao

Keyword(s):

Action Recognition ◽

Relational Model ◽

Temporal Features

Download Full-text

I3D-Shufflenet Based Human Action Recognition

Algorithms ◽

10.3390/a13110301 ◽

2020 ◽

Vol 13 (11) ◽

pp. 301

Author(s):

Guocheng Liu ◽

Caixia Zhang ◽

Qingyang Xu ◽

Ruoshi Cheng ◽

Yong Song ◽

...

Keyword(s):

Neural Network ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Recognition Algorithm ◽

Convolution Kernel ◽

Histogram Of Oriented Gradients ◽

Temporal Features ◽

Convolution Kernels

In view of difficulty in application of optical flow based human action recognition due to large amount of calculation, a human action recognition algorithm I3D-shufflenet model is proposed combining the advantages of I3D neural network and lightweight model shufflenet. The 5 × 5 convolution kernel of I3D is replaced by a double 3 × 3 convolution kernels, which reduces the amount of calculations. The shuffle layer is adopted to achieve feature exchange. The recognition and classification of human action is performed based on trained I3D-shufflenet model. The experimental results show that the shuffle layer improves the composition of features in each channel which can promote the utilization of useful information. The Histogram of Oriented Gradients (HOG) spatial-temporal features of the object are extracted for training, which can significantly improve the ability of human action expression and reduce the calculation of feature extraction. The I3D-shufflenet is testified on the UCF101 dataset, and compared with other models. The final result shows that the I3D-shufflenet has higher accuracy than the original I3D with an accuracy of 96.4%.

Download Full-text

Distinct Two-Stream Convolutional Networks for Human Action Recognition in Videos Using Segment-Based Temporal Modeling

Data ◽

10.3390/data5040104 ◽

2020 ◽

Vol 5 (4) ◽

pp. 104

Author(s):

Ashok Sarabu ◽

Ajit Kumar Santra

Keyword(s):

Action Recognition ◽

Data Augmentation ◽

Main Idea ◽

Human Action Recognition ◽

Human Action ◽

Great Success ◽

Temporal Modeling ◽

Convolutional Networks ◽

Temporal Features ◽

Augmentation Techniques

The Two-stream convolution neural network (CNN) has proven a great success in action recognition in videos. The main idea is to train the two CNNs in order to learn spatial and temporal features separately, and two scores are combined to obtain final scores. In the literature, we observed that most of the methods use similar CNNs for two streams. In this paper, we design a two-stream CNN architecture with different CNNs for the two streams to learn spatial and temporal features. Temporal Segment Networks (TSN) is applied in order to retrieve long-range temporal features, and to differentiate the similar type of sub-action in videos. Data augmentation techniques are employed to prevent over-fitting. Advanced cross-modal pre-training is discussed and introduced to the proposed architecture in order to enhance the accuracy of action recognition. The proposed two-stream model is evaluated on two challenging action recognition datasets: HMDB-51 and UCF-101. The findings of the proposed architecture shows the significant performance increase and it outperforms the existing methods.

Download Full-text

SAST: Learning Semantic Action-Aware Spatial-Temporal Features for Efficient Action Recognition

IEEE Access ◽

10.1109/access.2019.2953113 ◽

2019 ◽

Vol 7 ◽

pp. 164876-164886 ◽

Cited By ~ 1

Author(s):

Fei Wang ◽

Guorui Wang ◽

Yunwen Huang ◽

Hao Chu

Keyword(s):

Action Recognition ◽

Temporal Features

Download Full-text

Multi-Term Attention Networks for Skeleton-Based Action Recognition

Applied Sciences ◽

10.3390/app10155326 ◽

2020 ◽

Vol 10 (15) ◽

pp. 5326

Author(s):

Xiaolei Diao ◽

Xiaoqiang Li ◽

Chen Huang

Keyword(s):

Neural Network ◽

Time Scales ◽

Action Recognition ◽

State Of The Art ◽

Attention Networks ◽

Weighted Fusion ◽

Temporal Features ◽

Benchmark Datasets ◽

Spatio Temporal ◽

Different Time Scales

The same action takes different time in different cases. This difference will affect the accuracy of action recognition to a certain extent. We propose an end-to-end deep neural network called “Multi-Term Attention Networks” (MTANs), which solves the above problem by extracting temporal features with different time scales. The network consists of a Multi-Term Attention Recurrent Neural Network (MTA-RNN) and a Spatio-Temporal Convolutional Neural Network (ST-CNN). In MTA-RNN, a method for fusing multi-term temporal features are proposed to extract the temporal dependence of different time scales, and the weighted fusion temporal feature is recalibrated by the attention mechanism. Ablation research proves that this network has powerful spatio-temporal dynamic modeling capabilities for actions with different time scales. We perform extensive experiments on four challenging benchmark datasets, including the NTU RGB+D dataset, UT-Kinect dataset, Northwestern-UCLA dataset, and UWA3DII dataset. Our method achieves better results than the state-of-the-art benchmarks, which demonstrates the effectiveness of MTANs.

Download Full-text

Study of Human Action Recognition Based on Improved Spatio-temporal Features

International Journal of Automation and Computing ◽

10.1007/s11633-014-0831-4 ◽

2014 ◽

Vol 11 (5) ◽

pp. 500-509 ◽

Cited By ~ 12

Author(s):

Xiao-Fei Ji ◽

Qian-Qian Wu ◽

Zhao-Jie Ju ◽

Yang-Yang Wang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text