scholarly journals Skeleton-Based Action Recognition Based on Distance Vector and Multihigh View Adaptive Networks

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Min Zhang ◽  
Haijie Yang ◽  
Pengfei Li ◽  
Ming Jiang

Skeleton-based human action recognition has attracted much attention in the field of computer vision. Most of the previous studies are based on fixed skeleton graphs so that only the local physical dependencies among joints can be captured, resulting in the omission of implicit joint correlations. In addition, under different views, the content of the same action is very different. In some views, keypoints will be blocked, which will cause recognition errors. In this paper, an action recognition method based on distance vector and multihigh view adaptive network (DV-MHNet) is proposed to address this challenging task. Among the mentioned techniques, the multihigh (MH) view adaptive networks are constructed to automatically determine the best observation view at different heights, obtain complete keypoints information of the current frame image, and enhance the robustness and generalization of the model to recognize actions at different heights. Then, the distance vector (DV) mechanism is introduced on this basis to establish the relative distance and relative orientation between different keypoints in the same frame and the same keypoints in different frame to obtain the global potential relationship of each keypoint, and finally by constructing the spatial temporal graph convolutional network to take into account the information in space and time, the characteristics of the action are learned. This paper has done the ablation study with traditional spatial temporal graph convolutional networks and with or without multihigh view adaptive networks, which reasonably proves the effectiveness of the model. The model is evaluated on two widely used action recognition benchmarks (NTU-RGB + D and PKU-MMD). Our method achieves better performance on both datasets.

Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5260 ◽  
Author(s):  
Fanjia Li ◽  
Juanjuan Li ◽  
Aichun Zhu ◽  
Yonggang Xu ◽  
Hongsheng Yin ◽  
...  

In the skeleton-based human action recognition domain, the spatial-temporal graph convolution networks (ST-GCNs) have made great progress recently. However, they use only one fixed temporal convolution kernel, which is not enough to extract the temporal cues comprehensively. Moreover, simply connecting the spatial graph convolution layer (GCL) and the temporal GCL in series is not the optimal solution. To this end, we propose a novel enhanced spatial and extended temporal graph convolutional network (EE-GCN) in this paper. Three convolution kernels with different sizes are chosen to extract the discriminative temporal features from shorter to longer terms. The corresponding GCLs are then concatenated by a powerful yet efficient one-shot aggregation (OSA) + effective squeeze-excitation (eSE) structure. The OSA module aggregates the features from each layer once to the output, and the eSE module explores the interdependency between the channels of the output. Besides, we propose a new connection paradigm to enhance the spatial features, which expand the serial connection to a combination of serial and parallel connections by adding a spatial GCL in parallel with the temporal GCLs. The proposed method is evaluated on three large scale datasets, and the experimental results show that the performance of our method exceeds previous state-of-the-art methods.


2021 ◽  
Vol 11 (18) ◽  
pp. 8641
Author(s):  
Jianping Guo ◽  
Hong Liu ◽  
Xi Li ◽  
Dahong Xu ◽  
Yihan Zhang

With the increasing popularity of artificial intelligence applications, artificial intelligence technology has begun to be applied in competitive sports. These applications have promoted the improvement of athletes’ competitive ability, as well as the fitness of the masses. Human action recognition technology, based on deep learning, has gradually been applied to the analysis of the technical actions of competitive sports athletes, as well as the analysis of tactics. In this paper, a new graph convolution model is proposed. Delaunay’s partitioning algorithm was used to construct a new spatiotemporal topology which can effectively obtain the structural information and spatiotemporal features of athletes’ technical actions. At the same time, the attention mechanism was integrated into the model, and different weight coefficients were assigned to the joints, which significantly improved the accuracy of technical action recognition. First, a comparison between the current state-of-the-art methods was undertaken using the general datasets of Kinect and NTU-RGB + D. The performance of the new algorithm model was slightly improved in comparison to the general dataset. Then, the performance of our algorithm was compared with spatial temporal graph convolutional networks (ST-GCN) for the karate technique action dataset. We found that the accuracy of our algorithm was significantly improved.


2021 ◽  
Vol 58 (2) ◽  
pp. 0210007
Author(s):  
张文强 Zhang Wenqiang ◽  
王增强 Wang Zengqiang ◽  
张良 Zhang Liang

2020 ◽  
Vol 34 (03) ◽  
pp. 2669-2676 ◽  
Author(s):  
Wei Peng ◽  
Xiaopeng Hong ◽  
Haoyu Chen ◽  
Guoying Zhao

Human action recognition from skeleton data, fuelled by the Graph Convolutional Network (GCN) with its powerful capability of modeling non-Euclidean data, has attracted lots of attention. However, many existing GCNs provide a pre-defined graph structure and share it through the entire network, which can loss implicit joint correlations especially for the higher-level features. Besides, the mainstream spectral GCN is approximated by one-order hop such that higher-order connections are not well involved. All of these require huge efforts to design a better GCN architecture. To address these problems, we turn to Neural Architecture Search (NAS) and propose the first automatically designed GCN for this task. Specifically, we explore the spatial-temporal correlations between nodes and build a search space with multiple dynamic graph modules. Besides, we introduce multiple-hop modules and expect to break the limitation of representational capacity caused by one-order approximation. Moreover, a corresponding sampling- and memory-efficient evolution strategy is proposed to search in this space. The resulted architecture proves the effectiveness of the higher-order approximation and the layer-wise dynamic graph modules. To evaluate the performance of the searched model, we conduct extensive experiments on two very large scale skeleton-based action recognition datasets. The results show that our model gets the state-of-the-art results in term of given metrics.


Data ◽  
2020 ◽  
Vol 5 (4) ◽  
pp. 104
Author(s):  
Ashok Sarabu ◽  
Ajit Kumar Santra

The Two-stream convolution neural network (CNN) has proven a great success in action recognition in videos. The main idea is to train the two CNNs in order to learn spatial and temporal features separately, and two scores are combined to obtain final scores. In the literature, we observed that most of the methods use similar CNNs for two streams. In this paper, we design a two-stream CNN architecture with different CNNs for the two streams to learn spatial and temporal features. Temporal Segment Networks (TSN) is applied in order to retrieve long-range temporal features, and to differentiate the similar type of sub-action in videos. Data augmentation techniques are employed to prevent over-fitting. Advanced cross-modal pre-training is discussed and introduced to the proposed architecture in order to enhance the accuracy of action recognition. The proposed two-stream model is evaluated on two challenging action recognition datasets: HMDB-51 and UCF-101. The findings of the proposed architecture shows the significant performance increase and it outperforms the existing methods.


Symmetry ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 1589
Author(s):  
Zeyuan Hu ◽  
Eung-Joo Lee

Traditional convolution neural networks have achieved great success in human action recognition. However, it is challenging to establish effective associations between different human bone nodes to capture detailed information. In this paper, we propose a dual attention-guided multiscale dynamic aggregate graph convolution neural network (DAG-GCN) for skeleton-based human action recognition. Our goal is to explore the best correlation and determine high-level semantic features. First, a multiscale dynamic aggregate GCN module is used to capture important semantic information and to establish dependence relationships for different bone nodes. Second, the higher level semantic feature is further refined, and the semantic relevance is emphasized through a dual attention guidance module. In addition, we exploit the relationship of joints hierarchically and the spatial temporal correlations through two modules. Experiments with the DAG-GCN method result in good performance on the NTU-60-RGB+D and NTU-120-RGB+D datasets. The accuracy is 95.76% and 90.01%, respectively, for the cross (X)-View and X-Subon the NTU60dataset.


Sign in / Sign up

Export Citation Format

Share Document