Skeleton-Based Action Recognition Based on Distance Vector and Multihigh View Adaptive Networks

Computational Intelligence and Neuroscience ◽

10.1155/2021/1507770 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Min Zhang ◽

Haijie Yang ◽

Pengfei Li ◽

Ming Jiang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Adaptive Networks ◽

Convolutional Network ◽

Current Frame ◽

Convolutional Networks ◽

Distance Vector ◽

Temporal Graph ◽

Ablation Study

Skeleton-based human action recognition has attracted much attention in the field of computer vision. Most of the previous studies are based on fixed skeleton graphs so that only the local physical dependencies among joints can be captured, resulting in the omission of implicit joint correlations. In addition, under different views, the content of the same action is very different. In some views, keypoints will be blocked, which will cause recognition errors. In this paper, an action recognition method based on distance vector and multihigh view adaptive network (DV-MHNet) is proposed to address this challenging task. Among the mentioned techniques, the multihigh (MH) view adaptive networks are constructed to automatically determine the best observation view at different heights, obtain complete keypoints information of the current frame image, and enhance the robustness and generalization of the model to recognize actions at different heights. Then, the distance vector (DV) mechanism is introduced on this basis to establish the relative distance and relative orientation between different keypoints in the same frame and the same keypoints in different frame to obtain the global potential relationship of each keypoint, and finally by constructing the spatial temporal graph convolutional network to take into account the information in space and time, the characteristics of the action are learned. This paper has done the ablation study with traditional spatial temporal graph convolutional networks and with or without multihigh view adaptive networks, which reasonably proves the effectiveness of the model. The model is evaluated on two widely used action recognition benchmarks (NTU-RGB + D and PKU-MMD). Our method achieves better performance on both datasets.

Get full-text (via PubEx)

Enhanced Spatial and Extended Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Sensors ◽

10.3390/s20185260 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5260 ◽

Cited By ~ 1

Author(s):

Fanjia Li ◽

Juanjuan Li ◽

Aichun Zhu ◽

Yonggang Xu ◽

Hongsheng Yin ◽

...

Keyword(s):

Action Recognition ◽

Large Scale ◽

Optimal Solution ◽

Human Action Recognition ◽

Human Action ◽

Convolutional Network ◽

Spatial Graph ◽

Serial Connection ◽

In Series ◽

Temporal Graph

In the skeleton-based human action recognition domain, the spatial-temporal graph convolution networks (ST-GCNs) have made great progress recently. However, they use only one fixed temporal convolution kernel, which is not enough to extract the temporal cues comprehensively. Moreover, simply connecting the spatial graph convolution layer (GCL) and the temporal GCL in series is not the optimal solution. To this end, we propose a novel enhanced spatial and extended temporal graph convolutional network (EE-GCN) in this paper. Three convolution kernels with different sizes are chosen to extract the discriminative temporal features from shorter to longer terms. The corresponding GCLs are then concatenated by a powerful yet efficient one-shot aggregation (OSA) + effective squeeze-excitation (eSE) structure. The OSA module aggregates the features from each layer once to the output, and the eSE module explores the interdependency between the channels of the output. Besides, we propose a new connection paradigm to enhance the spatial features, which expand the serial connection to a combination of serial and parallel connections by adding a spatial GCL in parallel with the temporal GCLs. The proposed method is evaluated on three large scale datasets, and the experimental results show that the performance of our method exceeds previous state-of-the-art methods.

Get full-text (via PubEx)

An Attention Enhanced Spatial–Temporal Graph Convolutional LSTM Network for Action Recognition in Karate

Applied Sciences ◽

10.3390/app11188641 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8641

Author(s):

Jianping Guo ◽

Hong Liu ◽

Xi Li ◽

Dahong Xu ◽

Yihan Zhang

Keyword(s):

Artificial Intelligence ◽

Action Recognition ◽

Structural Information ◽

Human Action Recognition ◽

Human Action ◽

Competitive Sports ◽

Convolutional Networks ◽

Convolution Model ◽

Artificial Intelligence Technology ◽

Temporal Graph

With the increasing popularity of artificial intelligence applications, artificial intelligence technology has begun to be applied in competitive sports. These applications have promoted the improvement of athletes’ competitive ability, as well as the fitness of the masses. Human action recognition technology, based on deep learning, has gradually been applied to the analysis of the technical actions of competitive sports athletes, as well as the analysis of tactics. In this paper, a new graph convolution model is proposed. Delaunay’s partitioning algorithm was used to construct a new spatiotemporal topology which can effectively obtain the structural information and spatiotemporal features of athletes’ technical actions. At the same time, the attention mechanism was integrated into the model, and different weight coefficients were assigned to the joints, which significantly improved the accuracy of technical action recognition. First, a comparison between the current state-of-the-art methods was undertaken using the general datasets of Kinect and NTU-RGB + D. The performance of the new algorithm model was slightly improved in comparison to the general dataset. Then, the performance of our algorithm was compared with spatial temporal graph convolutional networks (ST-GCN) for the karate technique action dataset. We found that the accuracy of our algorithm was significantly improved.

Get full-text (via PubEx)

On the spatial attention in spatio-temporal graph convolutional networks for skeleton-based human action recognition

10.1109/ijcnn52387.2021.9534440 ◽

2021 ◽

Author(s):

Negar Heidari ◽

Alexandros Iosifidis

Keyword(s):

Spatial Attention ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Convolutional Networks ◽

Temporal Graph ◽

Spatio Temporal

Get full-text (via PubEx)

Progressive Spatio-Temporal Graph Convolutional Network for Skeleton-Based Human Action Recognition

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413860 ◽

2021 ◽

Author(s):

Negar Heidari ◽

Alexandras Iosifidis

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Convolutional Network ◽

Temporal Graph ◽

Spatio Temporal

Get full-text (via PubEx)

Human Action Recognition Combining Sequential Dynamic Images and Two-Stream Convolutional Network

Laser & Optoelectronics Progress ◽

10.3788/lop202158.0210007 ◽

2021 ◽

Vol 58 (2) ◽

pp. 0210007

Author(s):

张文强 Zhang Wenqiang ◽

王增强 Wang Zengqiang ◽

张良 Zhang Liang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Convolutional Network

Get full-text (via PubEx)

Learning Graph Convolutional Network for Skeleton-Based Human Action Recognition by Neural Searching

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5652 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2669-2676 ◽

Cited By ~ 11

Author(s):

Wei Peng ◽

Xiaopeng Hong ◽

Haoyu Chen ◽

Guoying Zhao

Keyword(s):

Action Recognition ◽

Large Scale ◽

Order Approximation ◽

Human Action Recognition ◽

Search Space ◽

Human Action ◽

Higher Order ◽

Dynamic Graph ◽

Convolutional Network ◽

Representational Capacity

Human action recognition from skeleton data, fuelled by the Graph Convolutional Network (GCN) with its powerful capability of modeling non-Euclidean data, has attracted lots of attention. However, many existing GCNs provide a pre-defined graph structure and share it through the entire network, which can loss implicit joint correlations especially for the higher-level features. Besides, the mainstream spectral GCN is approximated by one-order hop such that higher-order connections are not well involved. All of these require huge efforts to design a better GCN architecture. To address these problems, we turn to Neural Architecture Search (NAS) and propose the first automatically designed GCN for this task. Specifically, we explore the spatial-temporal correlations between nodes and build a search space with multiple dynamic graph modules. Besides, we introduce multiple-hop modules and expect to break the limitation of representational capacity caused by one-order approximation. Moreover, a corresponding sampling- and memory-efficient evolution strategy is proposed to search in this space. The resulted architecture proves the effectiveness of the higher-order approximation and the layer-wise dynamic graph modules. To evaluate the performance of the searched model, we conduct extensive experiments on two very large scale skeleton-based action recognition datasets. The results show that our model gets the state-of-the-art results in term of given metrics.

Get full-text (via PubEx)

Distinct Two-Stream Convolutional Networks for Human Action Recognition in Videos Using Segment-Based Temporal Modeling

Data ◽

10.3390/data5040104 ◽

2020 ◽

Vol 5 (4) ◽

pp. 104

Author(s):

Ashok Sarabu ◽

Ajit Kumar Santra

Keyword(s):

Action Recognition ◽

Data Augmentation ◽

Main Idea ◽

Human Action Recognition ◽

Human Action ◽

Great Success ◽

Temporal Modeling ◽

Convolutional Networks ◽

Temporal Features ◽

Augmentation Techniques

The Two-stream convolution neural network (CNN) has proven a great success in action recognition in videos. The main idea is to train the two CNNs in order to learn spatial and temporal features separately, and two scores are combined to obtain final scores. In the literature, we observed that most of the methods use similar CNNs for two streams. In this paper, we design a two-stream CNN architecture with different CNNs for the two streams to learn spatial and temporal features. Temporal Segment Networks (TSN) is applied in order to retrieve long-range temporal features, and to differentiate the similar type of sub-action in videos. Data augmentation techniques are employed to prevent over-fitting. Advanced cross-modal pre-training is discussed and introduced to the proposed architecture in order to enhance the accuracy of action recognition. The proposed two-stream model is evaluated on two challenging action recognition datasets: HMDB-51 and UCF-101. The findings of the proposed architecture shows the significant performance increase and it outperforms the existing methods.

Get full-text (via PubEx)

Pixel Convolutional Networks for Skeleton-Based Human Action Recognition

Communications in Computer and Information Science - Methods and Applications for Modeling and Simulation of Complex Systems ◽

10.1007/978-981-13-2853-4_40 ◽

2018 ◽

pp. 513-523

Author(s):

Zhichao Chang ◽

Jiangyun Wang ◽

Liang Han

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Convolutional Networks

Get full-text (via PubEx)

Deep Residual Temporal Convolutional Networks for Skeleton-Based Human Action Recognition

Lecture Notes in Computer Science - Computer Vision Systems ◽

10.1007/978-3-030-34995-0_34 ◽

2019 ◽

pp. 376-385

Author(s):

R. Khamsehashari ◽

K. Gadzicki ◽

C. Zetzsche

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Convolutional Networks

Get full-text (via PubEx)

Dual Attention-Guided Multiscale Dynamic Aggregate Graph Convolutional Networks for Skeleton-Based Human Action Recognition

Symmetry ◽

10.3390/sym12101589 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1589

Author(s):

Zeyuan Hu ◽

Eung-Joo Lee

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Great Success ◽

Semantic Features ◽

Convolutional Networks ◽

Temporal Correlations ◽

Semantic Relevance ◽

High Level ◽

Relationship Of

Traditional convolution neural networks have achieved great success in human action recognition. However, it is challenging to establish effective associations between different human bone nodes to capture detailed information. In this paper, we propose a dual attention-guided multiscale dynamic aggregate graph convolution neural network (DAG-GCN) for skeleton-based human action recognition. Our goal is to explore the best correlation and determine high-level semantic features. First, a multiscale dynamic aggregate GCN module is used to capture important semantic information and to establish dependence relationships for different bone nodes. Second, the higher level semantic feature is further refined, and the semantic relevance is emphasized through a dual attention guidance module. In addition, we exploit the relationship of joints hierarchically and the spatial temporal correlations through two modules. Experiments with the DAG-GCN method result in good performance on the NTU-60-RGB+D and NTU-120-RGB+D datasets. The accuracy is 95.76% and 90.01%, respectively, for the cross (X)-View and X-Subon the NTU60dataset.

Get full-text (via PubEx)