A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition

Human action recognition from skeleton data, fuelled by the Graph Convolutional Network (GCN) with its powerful capability of modeling non-Euclidean data, has attracted lots of attention. However, many existing GCNs provide a pre-defined graph structure and share it through the entire network, which can loss implicit joint correlations especially for the higher-level features. Besides, the mainstream spectral GCN is approximated by one-order hop such that higher-order connections are not well involved. All of these require huge efforts to design a better GCN architecture. To address these problems, we turn to Neural Architecture Search (NAS) and propose the first automatically designed GCN for this task. Specifically, we explore the spatial-temporal correlations between nodes and build a search space with multiple dynamic graph modules. Besides, we introduce multiple-hop modules and expect to break the limitation of representational capacity caused by one-order approximation. Moreover, a corresponding sampling- and memory-efficient evolution strategy is proposed to search in this space. The resulted architecture proves the effectiveness of the higher-order approximation and the layer-wise dynamic graph modules. To evaluate the performance of the searched model, we conduct extensive experiments on two very large scale skeleton-based action recognition datasets. The results show that our model gets the state-of-the-art results in term of given metrics.

Download Full-text

Human action recognition with a large-scale brain-inspired photonic computer

Nature Machine Intelligence ◽

10.1038/s42256-019-0110-8 ◽

2019 ◽

Vol 1 (11) ◽

pp. 530-537 ◽

Cited By ~ 10

Author(s):

Piotr Antonik ◽

Nicolas Marsal ◽

Daniel Brunner ◽

Damien Rontani

Keyword(s):

Action Recognition ◽

Large Scale ◽

Human Action Recognition ◽

Human Action

Download Full-text

A Bayesian Dynamical Approach for Human Action Recognition

Sensors ◽

10.3390/s21165613 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5613

Author(s):

Amirreza Farnoosh ◽

Zhouping Wang ◽

Shaotong Zhu ◽

Sarah Ostadabbas

Keyword(s):

Action Recognition ◽

Large Scale ◽

Temporal Dynamics ◽

Human Action Recognition ◽

Human Action ◽

Superior Performance ◽

Action Classification ◽

Motion Data ◽

Highly Correlated ◽

Low Dimensional

We introduce a generative Bayesian switching dynamical model for action recognition in 3D skeletal data. Our model encodes highly correlated skeletal data into a few sets of low-dimensional switching temporal processes and from there decodes to the motion data and their associated action labels. We parameterize these temporal processes with regard to a switching deep autoregressive prior to accommodate both multimodal and higher-order nonlinear inter-dependencies. This results in a dynamical deep generative latent model that parses meaningful intrinsic states in skeletal dynamics and enables action recognition. These sequences of states provide visual and quantitative interpretations about motion primitives that gave rise to each action class, which have not been explored previously. In contrast to previous works, which often overlook temporal dynamics, our method explicitly model temporal transitions and is generative. Our experiments on two large-scale 3D skeletal datasets substantiate the superior performance of our model in comparison with the state-of-the-art methods. Specifically, our method achieved 6.3% higher action classification accuracy (by incorporating a dynamical generative framework), and 3.5% better predictive error (by employing a nonlinear second-order dynamical transition model) when compared with the best-performing competitors.

Download Full-text

Enhanced Spatial and Extended Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Sensors ◽

10.3390/s20185260 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5260 ◽

Cited By ~ 1

Author(s):

Fanjia Li ◽

Juanjuan Li ◽

Aichun Zhu ◽

Yonggang Xu ◽

Hongsheng Yin ◽

...

Keyword(s):

Action Recognition ◽

Large Scale ◽

Optimal Solution ◽

Human Action Recognition ◽

Human Action ◽

Convolutional Network ◽

Spatial Graph ◽

Serial Connection ◽

In Series ◽

Temporal Graph

In the skeleton-based human action recognition domain, the spatial-temporal graph convolution networks (ST-GCNs) have made great progress recently. However, they use only one fixed temporal convolution kernel, which is not enough to extract the temporal cues comprehensively. Moreover, simply connecting the spatial graph convolution layer (GCL) and the temporal GCL in series is not the optimal solution. To this end, we propose a novel enhanced spatial and extended temporal graph convolutional network (EE-GCN) in this paper. Three convolution kernels with different sizes are chosen to extract the discriminative temporal features from shorter to longer terms. The corresponding GCLs are then concatenated by a powerful yet efficient one-shot aggregation (OSA) + effective squeeze-excitation (eSE) structure. The OSA module aggregates the features from each layer once to the output, and the eSE module explores the interdependency between the channels of the output. Besides, we propose a new connection paradigm to enhance the spatial features, which expand the serial connection to a combination of serial and parallel connections by adding a spatial GCL in parallel with the temporal GCLs. The proposed method is evaluated on three large scale datasets, and the experimental results show that the performance of our method exceeds previous state-of-the-art methods.

Download Full-text

Deep Learning-Based Action Recognition Using 3D Skeleton Joints Information

Inventions ◽

10.3390/inventions5030049 ◽

2020 ◽

Vol 5 (3) ◽

pp. 49

Author(s):

Nusrat Tasnim ◽

Md. Mahbubul Islam ◽

Joong-Hwan Baek

Keyword(s):

Action Recognition ◽

Large Scale ◽

Dimensional Space ◽

Human Action Recognition ◽

Human Action ◽

Human Machine Interaction ◽

Human Actions ◽

3 Dimensional ◽

3D Skeleton ◽

Color Depth

Human action recognition has turned into one of the most attractive and demanding fields of research in computer vision and pattern recognition for facilitating easy, smart, and comfortable ways of human-machine interaction. With the witnessing of massive improvements to research in recent years, several methods have been suggested for the discrimination of different types of human actions using color, depth, inertial, and skeleton information. Despite having several action identification methods using different modalities, classifying human actions using skeleton joints information in 3-dimensional space is still a challenging problem. In this paper, we conceive an efficacious method for action recognition using 3D skeleton data. First, large-scale 3D skeleton joints information was analyzed and accomplished some meaningful pre-processing. Then, a simple straight-forward deep convolutional neural network (DCNN) was designed for the classification of the desired actions in order to evaluate the effectiveness and embonpoint of the proposed system. We also conducted prior DCNN models such as ResNet18 and MobileNetV2, which outperform existing systems using human skeleton joints information.

Download Full-text

An efficient and sparse approach for large scale human action recognition in videos

Machine Vision and Applications ◽

10.1007/s00138-016-0760-z ◽

2016 ◽

Vol 27 (4) ◽

pp. 529-543 ◽

Cited By ~ 9

Author(s):

Cyrille Beaudry ◽

Renaud Péteri ◽

Laurent Mascarilla

Keyword(s):

Action Recognition ◽

Large Scale ◽

Human Action Recognition ◽

Human Action

Download Full-text

Ordered Trajectories for Large Scale Human Action Recognition

2014 5th International Conference on Information and Communication Systems (ICICS) ◽

10.1109/iccvw.2013.61 ◽

2013 ◽

Cited By ~ 18

Author(s):

O.V. Ramana Murthy ◽

Roland Goecke

Keyword(s):

Action Recognition ◽

Large Scale ◽

Human Action Recognition ◽

Human Action

Download Full-text

Accelerating Large-Scale Human Action Recognition with GPU-Based Spark

Lecture Notes in Computer Science - Advances in Multimedia Information Processing - PCM 2016 ◽

10.1007/978-3-319-48896-7_66 ◽

2016 ◽

pp. 670-679

Author(s):

Hanli Wang ◽

Xiaobin Zheng ◽

Bo Xiao

Keyword(s):

Action Recognition ◽

Large Scale ◽

Human Action Recognition ◽

Human Action

Download Full-text

Adaptive RNN Tree for Large-Scale Human Action Recognition

2017 IEEE International Conference on Computer Vision (ICCV) ◽

10.1109/iccv.2017.161 ◽

2017 ◽

Cited By ~ 24

Author(s):

Wenbo Li ◽

Longyin Wen ◽

Ming-Ching Chang ◽

Ser Nam Lim ◽

Siwei Lyu

Keyword(s):

Action Recognition ◽

Large Scale ◽

Human Action Recognition ◽

Human Action

Download Full-text

Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/109 ◽

2018 ◽

Cited By ~ 56

Author(s):

Chao Li ◽

Qiaoyong Zhong ◽

Di Xie ◽

Shiliang Pu

Keyword(s):

Action Recognition ◽

Large Scale ◽

Contextual Information ◽

Semantic Representation ◽

Feature Learning ◽

Human Action Recognition ◽

Human Action ◽

The Arts ◽

Level Information ◽

Local Aggregation

Skeleton-based human action recognition has recently drawn increasing attentions with the availability of large-scale skeleton datasets. The most crucial factors for this task lie in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame representation for skeletons' temporal evolutions. In this paper we propose an end-to-end convolutional co-occurrence feature learning framework. The co-occurrence features are learned with a hierarchical methodology, in which different levels of contextual information are aggregated gradually. Firstly point-level information of each joint is encoded independently. Then they are assembled into semantic representation in both spatial and temporal domains. Specifically, we introduce a global spatial aggregation scheme, which is able to learn superior joint co-occurrence features over local aggregation. Besides, raw skeleton coordinates as well as their temporal difference are integrated with a two-stream paradigm. Experiments show that our approach consistently outperforms other state-of-the-arts on action recognition and detection benchmarks like NTU RGB+D, SBU Kinect Interaction and PKU-MMD.

Download Full-text