Multi-Scale Spatial Temporal Graph Convolutional LSTM Network for Skeleton-Based Human Action Recognition

In the skeleton-based human action recognition domain, the spatial-temporal graph convolution networks (ST-GCNs) have made great progress recently. However, they use only one fixed temporal convolution kernel, which is not enough to extract the temporal cues comprehensively. Moreover, simply connecting the spatial graph convolution layer (GCL) and the temporal GCL in series is not the optimal solution. To this end, we propose a novel enhanced spatial and extended temporal graph convolutional network (EE-GCN) in this paper. Three convolution kernels with different sizes are chosen to extract the discriminative temporal features from shorter to longer terms. The corresponding GCLs are then concatenated by a powerful yet efficient one-shot aggregation (OSA) + effective squeeze-excitation (eSE) structure. The OSA module aggregates the features from each layer once to the output, and the eSE module explores the interdependency between the channels of the output. Besides, we propose a new connection paradigm to enhance the spatial features, which expand the serial connection to a combination of serial and parallel connections by adding a spatial GCL in parallel with the temporal GCLs. The proposed method is evaluated on three large scale datasets, and the experimental results show that the performance of our method exceeds previous state-of-the-art methods.

Download Full-text

Tensor analysis and multi-scale features based multi-view human action recognition

2010 2nd International Conference on Computer Engineering and Technology ◽

10.1109/iccet.2010.5485732 ◽

2010 ◽

Cited By ~ 1

Author(s):

Chengcheng Jia ◽

Sujing Wang ◽

Xiangli Xu ◽

Chunguang Zhou ◽

Libiao Zhang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Tensor Analysis ◽

Multi Scale

Download Full-text

Multi-scale skeleton adaptive weighted GCN for skeleton-based human action recognition in IoT

Applied Soft Computing ◽

10.1016/j.asoc.2021.107236 ◽

2021 ◽

pp. 107236

Author(s):

Weiyao Xu ◽

Muqing Wu ◽

Jie Zhu ◽

Min Zhao

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Multi Scale

Download Full-text

An Attention Enhanced Spatial–Temporal Graph Convolutional LSTM Network for Action Recognition in Karate

Applied Sciences ◽

10.3390/app11188641 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8641

Author(s):

Jianping Guo ◽

Hong Liu ◽

Xi Li ◽

Dahong Xu ◽

Yihan Zhang

Keyword(s):

Artificial Intelligence ◽

Action Recognition ◽

Structural Information ◽

Human Action Recognition ◽

Human Action ◽

Competitive Sports ◽

Convolutional Networks ◽

Convolution Model ◽

Artificial Intelligence Technology ◽

Temporal Graph

With the increasing popularity of artificial intelligence applications, artificial intelligence technology has begun to be applied in competitive sports. These applications have promoted the improvement of athletes’ competitive ability, as well as the fitness of the masses. Human action recognition technology, based on deep learning, has gradually been applied to the analysis of the technical actions of competitive sports athletes, as well as the analysis of tactics. In this paper, a new graph convolution model is proposed. Delaunay’s partitioning algorithm was used to construct a new spatiotemporal topology which can effectively obtain the structural information and spatiotemporal features of athletes’ technical actions. At the same time, the attention mechanism was integrated into the model, and different weight coefficients were assigned to the joints, which significantly improved the accuracy of technical action recognition. First, a comparison between the current state-of-the-art methods was undertaken using the general datasets of Kinect and NTU-RGB + D. The performance of the new algorithm model was slightly improved in comparison to the general dataset. Then, the performance of our algorithm was compared with spatial temporal graph convolutional networks (ST-GCN) for the karate technique action dataset. We found that the accuracy of our algorithm was significantly improved.

Download Full-text

Human Action Recognition in Videos of Realistic Scenes Based on Multi-scale CNN Feature

Advances in Multimedia Information Processing – PCM 2017 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-77383-4_31 ◽

2018 ◽

pp. 316-326

Author(s):

Yongsheng Zhou ◽

Nan Pu ◽

Li Qian ◽

Song Wu ◽

Guoqiang Xiao

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Multi Scale ◽

Realistic Scenes

Download Full-text

Human action recognition based on multi-scale feature maps from depth video sequences

Multimedia Tools and Applications ◽

10.1007/s11042-021-11193-4 ◽

2021 ◽

Author(s):

Chang Li ◽

Qian Huang ◽

Xing Li ◽

Qianhan Wu

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Video Sequences ◽

Feature Maps ◽

Scale Feature ◽

Multi Scale ◽

Depth Video

Download Full-text

Multi-Scale Locality-Constrained Spatiotemporal Coding for Local Feature Based Human Action Recognition

The Scientific World JOURNAL ◽

10.1155/2013/405645 ◽

2013 ◽

Vol 2013 ◽

pp. 1-11 ◽

Cited By ~ 3

Author(s):

Bin Wang ◽

Yu Liu ◽

Wei Wang ◽

Wei Xu ◽

Maojun Zhang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Local Features ◽

Local Feature ◽

Multi Scale ◽

Spatiotemporal Feature ◽

Group Information ◽

Feature Based ◽

Relationship Of

We propose a Multiscale Locality-Constrained Spatiotemporal Coding (MLSC) method to improve the traditional bag of features (BoF) algorithm which ignores the spatiotemporal relationship of local features for human action recognition in video. To model this spatiotemporal relationship, MLSC involves the spatiotemporal position of local feature into feature coding processing. It projects local features into a sub space-time-volume (sub-STV) and encodes them with a locality-constrained linear coding. A group of sub-STV features obtained from one video with MLSC and max-pooling are used to classify this video. In classification stage, the Locality-Constrained Group Sparse Representation (LGSR) is adopted to utilize the intrinsic group information of these sub-STV features. The experimental results on KTH, Weizmann, and UCF sports datasets show that our method achieves better performance than the competing local spatiotemporal feature-based human action recognition methods.

Download Full-text