A Bayesian Dynamical Approach for Human Action Recognition

We introduce a generative Bayesian switching dynamical model for action recognition in 3D skeletal data. Our model encodes highly correlated skeletal data into a few sets of low-dimensional switching temporal processes and from there decodes to the motion data and their associated action labels. We parameterize these temporal processes with regard to a switching deep autoregressive prior to accommodate both multimodal and higher-order nonlinear inter-dependencies. This results in a dynamical deep generative latent model that parses meaningful intrinsic states in skeletal dynamics and enables action recognition. These sequences of states provide visual and quantitative interpretations about motion primitives that gave rise to each action class, which have not been explored previously. In contrast to previous works, which often overlook temporal dynamics, our method explicitly model temporal transitions and is generative. Our experiments on two large-scale 3D skeletal datasets substantiate the superior performance of our model in comparison with the state-of-the-art methods. Specifically, our method achieved 6.3% higher action classification accuracy (by incorporating a dynamical generative framework), and 3.5% better predictive error (by employing a nonlinear second-order dynamical transition model) when compared with the best-performing competitors.

Download Full-text

A Low-Dimensional Radial Silhouette-Based Feature for Fast Human Action Recognition Fusing Multiple Views

International Scholarly Research Notices ◽

10.1155/2014/547069 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 8

Author(s):

Alexandros Andre Chaaraoui ◽

Francisco Flórez-Revuelta

Keyword(s):

Real Time ◽

Action Recognition ◽

Assisted Living ◽

Learning Algorithm ◽

Ambient Assisted Living ◽

Human Action Recognition ◽

Human Action ◽

Sequence Matching ◽

Low Dimensional ◽

Video Frequency

This paper presents a novel silhouette-based feature for vision-based human action recognition, which relies on the contour of the silhouette and a radial scheme. Its low-dimensionality and ease of extraction result in an outstanding proficiency for real-time scenarios. This feature is used in a learning algorithm that by means of model fusion of multiple camera streams builds a bag of key poses, which serves as a dictionary of known poses and allows converting the training sequences into sequences of key poses. These are used in order to perform action recognition by means of a sequence matching algorithm. Experimentation on three different datasets returns high and stable recognition rates. To the best of our knowledge, this paper presents the highest results so far on the MuHAVi-MAS dataset. Real-time suitability is given, since the method easily performs above video frequency. Therefore, the related requirements that applications as ambient-assisted living services impose are successfully fulfilled.

Download Full-text

Learning Graph Convolutional Network for Skeleton-Based Human Action Recognition by Neural Searching

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5652 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2669-2676 ◽

Cited By ~ 11

Author(s):

Wei Peng ◽

Xiaopeng Hong ◽

Haoyu Chen ◽

Guoying Zhao

Keyword(s):

Action Recognition ◽

Large Scale ◽

Order Approximation ◽

Human Action Recognition ◽

Search Space ◽

Human Action ◽

Higher Order ◽

Dynamic Graph ◽

Convolutional Network ◽

Representational Capacity

Human action recognition from skeleton data, fuelled by the Graph Convolutional Network (GCN) with its powerful capability of modeling non-Euclidean data, has attracted lots of attention. However, many existing GCNs provide a pre-defined graph structure and share it through the entire network, which can loss implicit joint correlations especially for the higher-level features. Besides, the mainstream spectral GCN is approximated by one-order hop such that higher-order connections are not well involved. All of these require huge efforts to design a better GCN architecture. To address these problems, we turn to Neural Architecture Search (NAS) and propose the first automatically designed GCN for this task. Specifically, we explore the spatial-temporal correlations between nodes and build a search space with multiple dynamic graph modules. Besides, we introduce multiple-hop modules and expect to break the limitation of representational capacity caused by one-order approximation. Moreover, a corresponding sampling- and memory-efficient evolution strategy is proposed to search in this space. The resulted architecture proves the effectiveness of the higher-order approximation and the layer-wise dynamic graph modules. To evaluate the performance of the searched model, we conduct extensive experiments on two very large scale skeleton-based action recognition datasets. The results show that our model gets the state-of-the-art results in term of given metrics.

Download Full-text

Improved Collaborative Representation Classifier Based on l2-Regularized for Human Action Recognition

Journal of Electrical and Computer Engineering ◽

10.1155/2017/8191537 ◽

2017 ◽

Vol 2017 ◽

pp. 1-6

Author(s):

Shirui Huo ◽

Tianrui Hu ◽

Ce Li

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Test Sample ◽

Human Action ◽

Superior Performance ◽

Depth Image ◽

Collaborative Representation ◽

Depth Images ◽

Spatiotemporal Information ◽

Depth Motion Maps

Human action recognition is an important recent challenging task. Projecting depth images onto three depth motion maps (DMMs) and extracting deep convolutional neural network (DCNN) features are discriminant descriptor features to characterize the spatiotemporal information of a specific action from a sequence of depth images. In this paper, a unified improved collaborative representation framework is proposed in which the probability that a test sample belongs to the collaborative subspace of all classes can be well defined and calculated. The improved collaborative representation classifier (ICRC) based on l2-regularized for human action recognition is presented to maximize the likelihood that a test sample belongs to each class, then theoretical investigation into ICRC shows that it obtains a final classification by computing the likelihood for each class. Coupled with the DMMs and DCNN features, experiments on depth image-based action recognition, including MSRAction3D and MSRGesture3D datasets, demonstrate that the proposed approach successfully using a distance-based representation classifier achieves superior performance over the state-of-the-art methods, including SRC, CRC, and SVM.

Download Full-text

View-Invariant Deep Architecture for Human Action Recognition Using Two-Stream Motion and Shape Temporal Dynamics

IEEE Transactions on Image Processing ◽

10.1109/tip.2020.2965299 ◽

2020 ◽

Vol 29 ◽

pp. 3835-3844 ◽

Cited By ~ 2

Author(s):

Chhavi Dhiman ◽

Dinesh Kumar Vishwakarma

Keyword(s):

Action Recognition ◽

Temporal Dynamics ◽

Human Action Recognition ◽

Human Action ◽

Deep Architecture

Download Full-text

Human action recognition with a large-scale brain-inspired photonic computer

Nature Machine Intelligence ◽

10.1038/s42256-019-0110-8 ◽

2019 ◽

Vol 1 (11) ◽

pp. 530-537 ◽

Cited By ~ 10

Author(s):

Piotr Antonik ◽

Nicolas Marsal ◽

Daniel Brunner ◽

Damien Rontani

Keyword(s):

Action Recognition ◽

Large Scale ◽

Human Action Recognition ◽

Human Action

Download Full-text

Feature Extraction and Representation for Distributed Multi-View Human Action Recognition

IEEE Journal on Emerging and Selected Topics in Circuits and Systems ◽

10.1109/jetcas.2013.2256824 ◽

2013 ◽

Vol 3 (2) ◽

pp. 145-154 ◽

Cited By ~ 7

Author(s):

Jiajia Luo ◽

Wei Wang ◽

Hairong Qi

Keyword(s):

Action Recognition ◽

Approximation Error ◽

Human Action Recognition ◽

Human Action ◽

Base Station ◽

Feature Representation ◽

Superior Performance ◽

Feature Descriptor ◽

Testing Stage ◽

New Feature

Multi-view human action recognition has gained a lot of attention in recent years for its superior performance as compared to single view recognition. In this paper, we propose a new framework for the real-time realization of human action recognition in distributed camera networks (DCNs). We first present a new feature descriptor (Mltp-hist) that is tolerant to illumination change, robust in homogeneous region and computationally efficient. Taking advantage of the proposed Mltp-hist, the noninformative 3-D patches generated from the background can be further removed automatically that effectively highlights the foreground patches. Next, a new feature representation method based on sparse coding is presented to generate the histogram representation of local videos to be transmitted to the base station for classification. Due to the sparse representation of extracted features, the approximation error is reduced. Finally, at the base station, a probability model is produced to fuse the information from various views and a class label is assigned accordingly. Compared to the existing algorithms, the proposed framework has three advantages while having less requirements on memory and bandwidth consumption: 1) no preprocessing is required; 2) communication among cameras is unnecessary; and 3) positions and orientations of cameras do not need to be fixed. We further evaluate the proposed framework on the most popular multi-view action dataset IXMAS. Experimental results indicate that our proposed framework repeatedly achieves state-of-the-art results when various numbers of views are tested. In addition, our approach is tolerant to the various combination of views and benefit from introducing more views at the testing stage. Especially, our results are still satisfactory even when large misalignment exists between the training and testing samples.

Download Full-text

Enhanced Spatial and Extended Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Sensors ◽

10.3390/s20185260 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5260 ◽

Cited By ~ 1

Author(s):

Fanjia Li ◽

Juanjuan Li ◽

Aichun Zhu ◽

Yonggang Xu ◽

Hongsheng Yin ◽

...

Keyword(s):

Action Recognition ◽

Large Scale ◽

Optimal Solution ◽

Human Action Recognition ◽

Human Action ◽

Convolutional Network ◽

Spatial Graph ◽

Serial Connection ◽

In Series ◽

Temporal Graph

In the skeleton-based human action recognition domain, the spatial-temporal graph convolution networks (ST-GCNs) have made great progress recently. However, they use only one fixed temporal convolution kernel, which is not enough to extract the temporal cues comprehensively. Moreover, simply connecting the spatial graph convolution layer (GCL) and the temporal GCL in series is not the optimal solution. To this end, we propose a novel enhanced spatial and extended temporal graph convolutional network (EE-GCN) in this paper. Three convolution kernels with different sizes are chosen to extract the discriminative temporal features from shorter to longer terms. The corresponding GCLs are then concatenated by a powerful yet efficient one-shot aggregation (OSA) + effective squeeze-excitation (eSE) structure. The OSA module aggregates the features from each layer once to the output, and the eSE module explores the interdependency between the channels of the output. Besides, we propose a new connection paradigm to enhance the spatial features, which expand the serial connection to a combination of serial and parallel connections by adding a spatial GCL in parallel with the temporal GCLs. The proposed method is evaluated on three large scale datasets, and the experimental results show that the performance of our method exceeds previous state-of-the-art methods.

Download Full-text

Hierarchical dynamic depth projected difference images–based action recognition in videos with convolutional neural networks

International Journal of Advanced Robotic Systems ◽

10.1177/1729881418825093 ◽

2019 ◽

Vol 16 (1) ◽

pp. 172988141882509 ◽

Cited By ~ 3

Author(s):

Hanbo Wu ◽

Xin Ma ◽

Yibin Li

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Information ◽

Superior Performance ◽

Video Sequences ◽

Depth Video ◽

Difference Images

Temporal information plays a significant role in video-based human action recognition. How to effectively extract the spatial–temporal characteristics of actions in videos has always been a challenging problem. Most existing methods acquire spatial and temporal cues in videos individually. In this article, we propose a new effective representation for depth video sequences, called hierarchical dynamic depth projected difference images that can aggregate the action spatial and temporal information simultaneously at different temporal scales. We firstly project depth video sequences onto three orthogonal Cartesian views to capture the 3D shape and motion information of human actions. Hierarchical dynamic depth projected difference images are constructed with the rank pooling in each projected view to hierarchically encode the spatial–temporal motion dynamics in depth videos. Convolutional neural networks can automatically learn discriminative features from images and have been extended to video classification because of their superior performance. To verify the effectiveness of hierarchical dynamic depth projected difference images representation, we construct a hierarchical dynamic depth projected difference images–based action recognition framework where hierarchical dynamic depth projected difference images in three views are fed into three identical pretrained convolutional neural networks independently for finely retuning. We design three classification schemes in the framework and different schemes utilize different convolutional neural network layers to compare their effects on action recognition. Three views are combined to describe the actions more comprehensively in each classification scheme. The proposed framework is evaluated on three challenging public human action data sets. Experiments indicate that our method has better performance and can provide discriminative spatial–temporal information for human action recognition in depth videos.

Download Full-text

A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition

2018 ACM Multimedia Conference on Multimedia Conference - MM '18 ◽

10.1145/3240508.3240675 ◽

2018 ◽

Cited By ~ 6

Author(s):

Yanli Ji ◽

Feixiang Xu ◽

Yang Yang ◽

Fumin Shen ◽

Heng Tao Shen ◽

...

Keyword(s):

Action Recognition ◽

Large Scale ◽

Human Action Recognition ◽

Human Action

Download Full-text

Deep Learning-Based Action Recognition Using 3D Skeleton Joints Information

Inventions ◽

10.3390/inventions5030049 ◽

2020 ◽

Vol 5 (3) ◽

pp. 49

Author(s):

Nusrat Tasnim ◽

Md. Mahbubul Islam ◽

Joong-Hwan Baek

Keyword(s):

Action Recognition ◽

Large Scale ◽

Dimensional Space ◽

Human Action Recognition ◽

Human Action ◽

Human Machine Interaction ◽

Human Actions ◽

3 Dimensional ◽

3D Skeleton ◽

Color Depth

Human action recognition has turned into one of the most attractive and demanding fields of research in computer vision and pattern recognition for facilitating easy, smart, and comfortable ways of human-machine interaction. With the witnessing of massive improvements to research in recent years, several methods have been suggested for the discrimination of different types of human actions using color, depth, inertial, and skeleton information. Despite having several action identification methods using different modalities, classifying human actions using skeleton joints information in 3-dimensional space is still a challenging problem. In this paper, we conceive an efficacious method for action recognition using 3D skeleton data. First, large-scale 3D skeleton joints information was analyzed and accomplished some meaningful pre-processing. Then, a simple straight-forward deep convolutional neural network (DCNN) was designed for the classification of the desired actions in order to evaluate the effectiveness and embonpoint of the proposed system. We also conducted prior DCNN models such as ResNet18 and MobileNetV2, which outperform existing systems using human skeleton joints information.

Download Full-text