scholarly journals Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatio-Temporal Graph Convolutional Network for Action Recognition

Author(s):  
Konstantinos Papadopoulos ◽  
Enjie Ghorbel ◽  
Djamila Aouada ◽  
Bjorn Ottersten
Author(s):  
Yinong Zhang ◽  
Shanshan Guan ◽  
Cheng Xu ◽  
Hongzhe Liu

In the era of intelligent education, human behavior recognition based on computer vision is an important branch of pattern recognition. Human behavior recognition is a basic technology in the fields of intelligent monitoring and human-computer interaction in education. The dynamic changes of human skeleton provide important information for the recognition of educational behavior. Traditional methods usually use manual information to label or traverse rules only, resulting in limited representation capabilities and poor generalization performance of the model. In this paper, a kind of dynamic skeleton model with residual is adopted—a spatio-temporal graph convolutional network based on residual connections, which not only overcomes the limitations of previous methods, but also can learn the spatio-temporal model from the skeleton data. In the big bone NTU-RGB + D dataset, the network model not only improved the representation ability of human behavior characteristics, but also improved the generalization ability, and achieved better recognition effect than the existing model. In addition, this paper also compares the results of behavior recognition on subsets of different joint points, and finds that spatial structure division have better effects.


Author(s):  
Andru P. Twinanda ◽  
Emre O. Alkan ◽  
Afshin Gangi ◽  
Michel de Mathelin ◽  
Nicolas Padoy

Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5260 ◽  
Author(s):  
Fanjia Li ◽  
Juanjuan Li ◽  
Aichun Zhu ◽  
Yonggang Xu ◽  
Hongsheng Yin ◽  
...  

In the skeleton-based human action recognition domain, the spatial-temporal graph convolution networks (ST-GCNs) have made great progress recently. However, they use only one fixed temporal convolution kernel, which is not enough to extract the temporal cues comprehensively. Moreover, simply connecting the spatial graph convolution layer (GCL) and the temporal GCL in series is not the optimal solution. To this end, we propose a novel enhanced spatial and extended temporal graph convolutional network (EE-GCN) in this paper. Three convolution kernels with different sizes are chosen to extract the discriminative temporal features from shorter to longer terms. The corresponding GCLs are then concatenated by a powerful yet efficient one-shot aggregation (OSA) + effective squeeze-excitation (eSE) structure. The OSA module aggregates the features from each layer once to the output, and the eSE module explores the interdependency between the channels of the output. Besides, we propose a new connection paradigm to enhance the spatial features, which expand the serial connection to a combination of serial and parallel connections by adding a spatial GCL in parallel with the temporal GCLs. The proposed method is evaluated on three large scale datasets, and the experimental results show that the performance of our method exceeds previous state-of-the-art methods.


Author(s):  
Soomin Park ◽  
Deok-Kyeong Jang ◽  
Sung-Hee Lee

This paper presents a novel deep learning-based framework for translating a motion into various styles within multiple domains. Our framework is a single set of generative adversarial networks that learns stylistic features from a collection of unpaired motion clips with style labels to support mapping between multiple style domains. We construct a spatio-temporal graph to model a motion sequence and employ the spatial-temporal graph convolution networks (ST-GCN) to extract stylistic properties along spatial and temporal dimensions. Through spatial-temporal modeling, our framework shows improved style translation results between significantly different actions and on a long motion sequence containing multiple actions. In addition, we first develop a mapping network for motion stylization that maps a random noise to style, which allows for generating diverse stylization results without using reference motions. Through various experiments, we demonstrate the ability of our method to generate improved results in terms of visual quality, stylistic diversity, and content preservation.


2015 ◽  
Vol 10 (7) ◽  
pp. 1177-1177
Author(s):  
Andru P. Twinanda ◽  
Emre O. Alkan ◽  
Afshin Gangi ◽  
Michel de Mathelin ◽  
Nicolas Padoy

Sign in / Sign up

Export Citation Format

Share Document