Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatio-Temporal Graph Convolutional Network for Action Recognition

In the era of intelligent education, human behavior recognition based on computer vision is an important branch of pattern recognition. Human behavior recognition is a basic technology in the fields of intelligent monitoring and human-computer interaction in education. The dynamic changes of human skeleton provide important information for the recognition of educational behavior. Traditional methods usually use manual information to label or traverse rules only, resulting in limited representation capabilities and poor generalization performance of the model. In this paper, a kind of dynamic skeleton model with residual is adopted—a spatio-temporal graph convolutional network based on residual connections, which not only overcomes the limitations of previous methods, but also can learn the spatio-temporal model from the skeleton data. In the big bone NTU-RGB + D dataset, the network model not only improved the representation ability of human behavior characteristics, but also improved the generalization ability, and achieved better recognition effect than the existing model. In addition, this paper also compares the results of behavior recognition on subsets of different joint points, and finds that spatial structure division have better effects.

Download Full-text

Human action recognition based on spatio-temporal three-dimensional scattering transform descriptor and an improved VLAD feature encoding algorithm

Neurocomputing ◽

10.1016/j.neucom.2018.05.121 ◽

2019 ◽

Vol 348 ◽

pp. 145-157 ◽

Cited By ~ 1

Author(s):

Bo Lin ◽

Bin Fang ◽

Weibin Yang ◽

Jiye Qian

Keyword(s):

Action Recognition ◽

Three Dimensional ◽

Human Action Recognition ◽

Human Action ◽

Scattering Transform ◽

Feature Encoding ◽

Spatio Temporal

Download Full-text

Data-driven spatio-temporal RGBD feature encoding for action recognition in operating rooms

International Journal of Computer Assisted Radiology and Surgery ◽

10.1007/s11548-015-1186-1 ◽

2015 ◽

Vol 10 (6) ◽

pp. 737-747 ◽

Cited By ~ 16

Author(s):

Andru P. Twinanda ◽

Emre O. Alkan ◽

Afshin Gangi ◽

Michel de Mathelin ◽

Nicolas Padoy

Keyword(s):

Action Recognition ◽

Operating Rooms ◽

Data Driven ◽

Feature Encoding ◽

Spatio Temporal

Download Full-text

Enhanced Spatial and Extended Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Sensors ◽

10.3390/s20185260 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5260 ◽

Cited By ~ 1

Author(s):

Fanjia Li ◽

Juanjuan Li ◽

Aichun Zhu ◽

Yonggang Xu ◽

Hongsheng Yin ◽

...

Keyword(s):

Action Recognition ◽

Large Scale ◽

Optimal Solution ◽

Human Action Recognition ◽

Human Action ◽

Convolutional Network ◽

Spatial Graph ◽

Serial Connection ◽

In Series ◽

Temporal Graph

In the skeleton-based human action recognition domain, the spatial-temporal graph convolution networks (ST-GCNs) have made great progress recently. However, they use only one fixed temporal convolution kernel, which is not enough to extract the temporal cues comprehensively. Moreover, simply connecting the spatial graph convolution layer (GCL) and the temporal GCL in series is not the optimal solution. To this end, we propose a novel enhanced spatial and extended temporal graph convolutional network (EE-GCN) in this paper. Three convolution kernels with different sizes are chosen to extract the discriminative temporal features from shorter to longer terms. The corresponding GCLs are then concatenated by a powerful yet efficient one-shot aggregation (OSA) + effective squeeze-excitation (eSE) structure. The OSA module aggregates the features from each layer once to the output, and the eSE module explores the interdependency between the channels of the output. Besides, we propose a new connection paradigm to enhance the spatial features, which expand the serial connection to a combination of serial and parallel connections by adding a spatial GCL in parallel with the temporal GCLs. The proposed method is evaluated on three large scale datasets, and the experimental results show that the performance of our method exceeds previous state-of-the-art methods.

Download Full-text

Diverse Motion Stylization for Multiple Style Domains via Spatial-Temporal Graph-Based Generative Model

Proceedings of the ACM on Computer Graphics and Interactive Techniques ◽

10.1145/3480145 ◽

2021 ◽

Vol 4 (3) ◽

pp. 1-17

Author(s):

Soomin Park ◽

Deok-Kyeong Jang ◽

Sung-Hee Lee

Keyword(s):

Random Noise ◽

Generative Adversarial Networks ◽

Temporal Modeling ◽

Adversarial Networks ◽

Motion Sequence ◽

Temporal Graph ◽

Spatio Temporal ◽

Temporal Dimensions ◽

Multiple Domains ◽

Content Preservation

This paper presents a novel deep learning-based framework for translating a motion into various styles within multiple domains. Our framework is a single set of generative adversarial networks that learns stylistic features from a collection of unpaired motion clips with style labels to support mapping between multiple style domains. We construct a spatio-temporal graph to model a motion sequence and employ the spatial-temporal graph convolution networks (ST-GCN) to extract stylistic properties along spatial and temporal dimensions. Through spatial-temporal modeling, our framework shows improved style translation results between significantly different actions and on a long motion sequence containing multiple actions. In addition, we first develop a mapping network for motion stylization that maps a random noise to style, which allows for generating diverse stylization results without using reference motions. Through various experiments, we demonstrate the ability of our method to generate improved results in terms of visual quality, stylistic diversity, and content preservation.

Download Full-text