A Lightweight Hierarchical Model with Frame-Level Joints Adaptive Graph Convolution for Skeleton-Based Action Recognition

In skeleton-based human action recognition methods, human behaviours can be analysed through temporal and spatial changes in the human skeleton. Skeletons are not limited by clothing changes, lighting conditions, or complex backgrounds. This recognition method is robust and has aroused great interest; however, many existing studies used deep-layer networks with large numbers of required parameters to improve the model performance and thus lost the advantage of less computation of skeleton data. It is difficult to deploy previously established models to real-life applications based on low-cost embedded devices. To obtain a model with fewer parameters and a higher accuracy, this study designed a lightweight frame-level joints adaptive graph convolutional network (FLAGCN) model to solve skeleton-based action recognition tasks. Compared with the classical 2s-AGCN model, the new model obtained a higher precision with 1/8 of the parameters and 1/9 of the floating-point operations (FLOPs). Our proposed network characterises three main improvements. First, a previous feature-fusion method replaces the multistream network and reduces the number of required parameters. Second, at the spatial level, two kinds of graph convolution methods capture different aspects of human action information. A frame-level graph convolution constructs a human topological structure for each data frame, whereas an adjacency graph convolution captures the characteristics of the adjacent joints. Third, the model proposed in this study hierarchically extracts different levels of action sequence features, making the model clear and easy to understand; further, it reduces the depth of the model and the number of parameters. A large number of experiments on the NTU RGB + D 60 and 120 data sets show that this method has the advantages of few required parameters, low computational costs, and fast speeds. It also has a simple structure and training process that make it easy to deploy in real-time recognition systems based on low-cost embedded devices.

Download Full-text

Low-Cost Embedded System Using Convolutional Neural Networks-Based Spatiotemporal Feature Map for Real-Time Human Action Recognition

Applied Sciences ◽

10.3390/app11114940 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4940

Author(s):

Jinsoo Kim ◽

Jeongho Cho

Keyword(s):

Embedded System ◽

Real Time ◽

Action Recognition ◽

Processing Speed ◽

Recognition Accuracy ◽

Low Cost ◽

Human Action Recognition ◽

Human Action ◽

Video Data ◽

Feature Maps

The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.

Download Full-text

Human Action Recognition Based on Multi-level Feature Fusion

2020 IEEE 6th International Conference on Computer and Communications (ICCC) ◽

10.1109/iccc51575.2020.9344943 ◽

2020 ◽

Author(s):

Xi Cai ◽

Wan Su ◽

Guang Han

Keyword(s):

Action Recognition ◽

Feature Fusion ◽

Human Action Recognition ◽

Human Action ◽

Multi Level

Download Full-text

Human Action Recognition Based On Multi-level Feature Fusion

Proceedings of the International Conference on Computer Information Systems and Industrial Applications ◽

10.2991/cisia-15.2015.96 ◽

2015 ◽

Author(s):

Y.Y Xu ◽

G.Q Xiao ◽

X.Q Tang

Keyword(s):

Action Recognition ◽

Feature Fusion ◽

Human Action Recognition ◽

Human Action ◽

Multi Level

Download Full-text

Study on Machine Learning and Deep Learning Methods for Human Action Recognition

10.20944/preprints202005.0146.v1 ◽

2020 ◽

Author(s):

Gopika Rajendran ◽

Ojus Thomas Lee ◽

Arya Gopi ◽

Jais jose ◽

Neha Gautham

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Action Recognition ◽

Human Body ◽

Human Performance ◽

Human Action Recognition ◽

Human Action ◽

Human Robot Interaction ◽

Learning Approach ◽

Action Sequence

With the evolution of computing technology in many application like human robot interaction, human computer interaction and health-care system, 3D human body models and their dynamic motions has gained popularity. Human performance accompanies human body shapes and their relative motions. Research on human activity recognition is structured around how the complex movement of a human body is identified and analyzed. Vision based action recognition from video is such kind of tasks where actions are inferred by observing the complete set of action sequence performed by human. Many techniques have been revised over the recent decades in order to develop a robust as well as effective framework for action recognition. In this survey, we summarize recent advances in human action recognition, namely the machine learning approach, deep learning approach and evaluation of these approaches.

Download Full-text

Human Action Recognition Combining Sequential Dynamic Images and Two-Stream Convolutional Network

Laser & Optoelectronics Progress ◽

10.3788/lop202158.0210007 ◽

2021 ◽

Vol 58 (2) ◽

pp. 0210007

Author(s):

张文强 Zhang Wenqiang ◽

王增强 Wang Zengqiang ◽

张良 Zhang Liang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Convolutional Network

Download Full-text

Using a Multilearner to Fuse Multimodal Features for Human Action Recognition

Mathematical Problems in Engineering ◽

10.1155/2020/4358728 ◽

2020 ◽

Vol 2020 ◽

pp. 1-18

Author(s):

Chao Tang ◽

Huosheng Hu ◽

Wenjian Wang ◽

Wei Li ◽

Hua Peng ◽

...

Keyword(s):

Action Recognition ◽

Feature Fusion ◽

Human Action Recognition ◽

Human Action ◽

Human Motion ◽

Image Features ◽

Depth Image ◽

Good Ability ◽

Multimodal Features ◽

Feature Based

The representation and selection of action features directly affect the recognition effect of human action recognition methods. Single feature is often affected by human appearance, environment, camera settings, and other factors. Aiming at the problem that the existing multimodal feature fusion methods cannot effectively measure the contribution of different features, this paper proposed a human action recognition method based on RGB-D image features, which makes full use of the multimodal information provided by RGB-D sensors to extract effective human action features. In this paper, three kinds of human action features with different modal information are proposed: RGB-HOG feature based on RGB image information, which has good geometric scale invariance; D-STIP feature based on depth image, which maintains the dynamic characteristics of human motion and has local invariance; and S-JRPF feature-based skeleton information, which has good ability to describe motion space structure. At the same time, multiple K-nearest neighbor classifiers with better generalization ability are used to integrate decision-making classification. The experimental results show that the algorithm achieves ideal recognition results on the public G3D and CAD60 datasets.

Download Full-text

Learning Graph Convolutional Network for Skeleton-Based Human Action Recognition by Neural Searching

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5652 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2669-2676 ◽

Cited By ~ 11

Author(s):

Wei Peng ◽

Xiaopeng Hong ◽

Haoyu Chen ◽

Guoying Zhao

Keyword(s):

Action Recognition ◽

Large Scale ◽

Order Approximation ◽

Human Action Recognition ◽

Search Space ◽

Human Action ◽

Higher Order ◽

Dynamic Graph ◽

Convolutional Network ◽

Representational Capacity

Human action recognition from skeleton data, fuelled by the Graph Convolutional Network (GCN) with its powerful capability of modeling non-Euclidean data, has attracted lots of attention. However, many existing GCNs provide a pre-defined graph structure and share it through the entire network, which can loss implicit joint correlations especially for the higher-level features. Besides, the mainstream spectral GCN is approximated by one-order hop such that higher-order connections are not well involved. All of these require huge efforts to design a better GCN architecture. To address these problems, we turn to Neural Architecture Search (NAS) and propose the first automatically designed GCN for this task. Specifically, we explore the spatial-temporal correlations between nodes and build a search space with multiple dynamic graph modules. Besides, we introduce multiple-hop modules and expect to break the limitation of representational capacity caused by one-order approximation. Moreover, a corresponding sampling- and memory-efficient evolution strategy is proposed to search in this space. The resulted architecture proves the effectiveness of the higher-order approximation and the layer-wise dynamic graph modules. To evaluate the performance of the searched model, we conduct extensive experiments on two very large scale skeleton-based action recognition datasets. The results show that our model gets the state-of-the-art results in term of given metrics.

Download Full-text

Human action recognition based on multiple feature fusion

Advances in Modelling and Analysis B ◽

10.18280/ama_b.600102 ◽

2017 ◽

Vol 60 (1) ◽

pp. 25-42

Author(s):

R.J. Ma ◽

H.S. Zhang

Keyword(s):

Action Recognition ◽

Feature Fusion ◽

Human Action Recognition ◽

Human Action ◽

Multiple Feature ◽

Multiple Feature Fusion

Download Full-text

Human Actions and Hand Gesture Recognition with Deep Learning

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b2815.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 1250-1253

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Gesture Recognition ◽

Human Activities ◽

Low Cost ◽

Human Action Recognition ◽

Human Action ◽

Hand Gesture Recognition ◽

Hand Gesture ◽

Characteristic Features

Over recent times, deep learning has been challenged extensively to automatically read and interpret characteristic features from large volumes of data. Human Action Recognition (HAR) has been experimented with variety of techniques like wearable devices, mobile devices etc., but they can cause unnecessary discomfort to people especially elderly and child. Since it is very vital to monitor the movements of elderly and children in unattended scenarios, thus, HAR is focused. A smart human action recognition method to automatically identify the human activities from skeletal joint motions and combines the competencies are focused. We can also intimate the near ones about the status of the people. Also, it is a low-cost method and has high accuracy. Thus, this provides a way to help the senior citizens and children from any kind of mishaps and health issues. Hand gesture recognition is also discussed along with human activities using deep learning.

Download Full-text

Deep Ensemble Learning for Human Action Recognition in Still Images

Complexity ◽

10.1155/2020/9428612 ◽

2020 ◽

Vol 2020 ◽

pp. 1-23 ◽

Cited By ~ 3

Author(s):

Xiangchun Yu ◽

Zhe Zhang ◽

Lei Wu ◽

Wei Pang ◽

Hechang Chen ◽

...

Keyword(s):

Ensemble Learning ◽

Action Recognition ◽

Model Performance ◽

Human Action Recognition ◽

Human Action ◽

The Body ◽

Model Complexity ◽

Background Information ◽

Still Images ◽

End To End

Numerous human actions such as “Phoning,” “PlayingGuitar,” and “RidingHorse” can be inferred by static cue-based approaches even if their motions in video are available considering one single still image may already sufficiently explain a particular action. In this research, we investigate human action recognition in still images and utilize deep ensemble learning to automatically decompose the body pose and perceive its background information. Firstly, we construct an end-to-end NCNN-based model by attaching the nonsequential convolutional neural network (NCNN) module to the top of the pretrained model. The nonsequential network topology of NCNN can separately learn the spatial- and channel-wise features with parallel branches, which helps improve the model performance. Subsequently, in order to further exploit the advantage of the nonsequential topology, we propose an end-to-end deep ensemble learning based on the weight optimization (DELWO) model. It contributes to fusing the deep information derived from multiple models automatically from the data. Finally, we design the deep ensemble learning based on voting strategy (DELVS) model to pool together multiple deep models with weighted coefficients to obtain a better prediction. More importantly, the model complexity can be reduced by lessening the number of trainable parameters, thereby effectively mitigating overfitting issues of the model in small datasets to some extent. We conduct experiments in Li’s action dataset, uncropped and 1.5x cropped Willow action datasets, and the results have validated the effectiveness and robustness of our proposed models in terms of mitigating overfitting issues in small datasets. Finally, we open source our code for the model in GitHub (https://github.com/yxchspring/deep_ensemble_learning) in order to share our model with the community.

Download Full-text