The Study on Human Action Recognition with Depth Video for Intelligent Monitoring

Temporal information plays a significant role in video-based human action recognition. How to effectively extract the spatial–temporal characteristics of actions in videos has always been a challenging problem. Most existing methods acquire spatial and temporal cues in videos individually. In this article, we propose a new effective representation for depth video sequences, called hierarchical dynamic depth projected difference images that can aggregate the action spatial and temporal information simultaneously at different temporal scales. We firstly project depth video sequences onto three orthogonal Cartesian views to capture the 3D shape and motion information of human actions. Hierarchical dynamic depth projected difference images are constructed with the rank pooling in each projected view to hierarchically encode the spatial–temporal motion dynamics in depth videos. Convolutional neural networks can automatically learn discriminative features from images and have been extended to video classification because of their superior performance. To verify the effectiveness of hierarchical dynamic depth projected difference images representation, we construct a hierarchical dynamic depth projected difference images–based action recognition framework where hierarchical dynamic depth projected difference images in three views are fed into three identical pretrained convolutional neural networks independently for finely retuning. We design three classification schemes in the framework and different schemes utilize different convolutional neural network layers to compare their effects on action recognition. Three views are combined to describe the actions more comprehensively in each classification scheme. The proposed framework is evaluated on three challenging public human action data sets. Experiments indicate that our method has better performance and can provide discriminative spatial–temporal information for human action recognition in depth videos.

Download Full-text

A Flexible High-Level Fusion for an Accurate Human Action Recognition System

Journal of Circuits System and Computers ◽

10.1142/s021812662050190x ◽

2020 ◽

Vol 29 (12) ◽

pp. 2050190

Author(s):

Amel Ben Mahjoub ◽

Mohamed Atri

Keyword(s):

Action Recognition ◽

Short Term Memory ◽

Recognition Rate ◽

Human Action Recognition ◽

Local Binary Patterns ◽

Low Complexity ◽

Recognition System ◽

Human Action ◽

Collaborative Representation ◽

Depth Video

Action recognition is a very effective method of computer vision areas. In the last few years, there has been a growing interest in Deep learning networks as the Long Short–Term Memory (LSTM) architectures due to their efficiency in long-term time sequence processing. In the light of these recent events in deep neural networks, there is now considerable concern about the development of an accurate action recognition approach with low complexity. This paper aims to introduce a method for learning depth activity videos based on the LSTM and the classification fusion. The first step consists in extracting compact depth video features. We start with the calculation of Depth Motion Maps (DMM) from each sequence. Then we encode and concatenate contour and texture DMM characteristics using the histogram-of-oriented-gradient and local-binary-patterns descriptors. The second step is the depth video classification based on the naive Bayes fusion approach. Training three classifiers, which are the collaborative representation classifier, the kernel-based extreme learning machine and the LSTM, is done separately to get classification scores. Finally, we fuse the classification score outputs of all classifiers with the naive Bayesian method to get a final predicted label. Our proposed method achieves a significant improvement in the recognition rate compared to previous work that has used Kinect v2 and UTD-MHAD human action datasets.

Download Full-text

Human action recognition based on multi-scale feature maps from depth video sequences

Multimedia Tools and Applications ◽

10.1007/s11042-021-11193-4 ◽

2021 ◽

Author(s):

Chang Li ◽

Qian Huang ◽

Xing Li ◽

Qianhan Wu

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Video Sequences ◽

Feature Maps ◽

Scale Feature ◽

Multi Scale ◽

Depth Video

Download Full-text

Multi-View Hierarchical Bidirectional Recurrent Neural Network for Depth Video Sequence Based Action Recognition

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001418500337 ◽

2018 ◽

Vol 32 (10) ◽

pp. 1850033 ◽

Cited By ~ 25

Author(s):

Xueping Liu ◽

Yibo Li ◽

Qingjun Wang

Keyword(s):

Action Recognition ◽

Video Sequence ◽

Single Layer ◽

Human Action Recognition ◽

Research Direction ◽

Human Action ◽

Depth Image ◽

Classification Framework ◽

Depth Video ◽

High Computational Efficiency

Human action recognition based on depth video sequence is an important research direction in the field of computer vision. The present study proposed a classification framework based on hierarchical multi-view to resolve depth video sequence-based action recognition. Herein, considering the distinguishing feature of 3D human action space, we project the 3D human action image to three coordinate planes, so that the 3D depth image is converted to three 2D images, and then feed them to three subnets, respectively. With the increase of the number of layers, the representations of subnets are hierarchically fused to be the inputs of next layers. The final representations of the depth video sequence are fed into a single layer perceptron, and the final result is decided by the time accumulated through the output of the perceptron. We compare with other methods on two publicly available datasets, and we also verify the proposed method through the human action database acquired by our Kinect system. Our experimental results demonstrate that our model has high computational efficiency and achieves the performance of state-of-the-art method.

Download Full-text

Searching Human Action Recognition Accuracy from Depth Video Sequences Using HOG and PHOG Shape Features

Advances In Image and Video Processing ◽

10.14738/aivp.65.5340 ◽

2018 ◽

Vol 6 (5) ◽

Author(s):

Mohammad Farhad Bulbul

Keyword(s):

Action Recognition ◽

Recognition Accuracy ◽

Human Action Recognition ◽

Human Action ◽

Video Sequences ◽

Shape Features ◽

Depth Video

Download Full-text

Multi-feature consultation model for human action recognition in depth video sequence

The Journal of Engineering ◽

10.1049/joe.2018.8301 ◽

2018 ◽

Vol 2018 (16) ◽

pp. 1498-1502

Author(s):

Xueping Liu ◽

Yibo Li ◽

Xiaoming Li ◽

Can Tian ◽

Yueqi Yang

Keyword(s):

Action Recognition ◽

Video Sequence ◽

Human Action Recognition ◽

Human Action ◽

Consultation Model ◽

Depth Video

Download Full-text

Learning Motion Features from Dynamic Images of Depth Video for Human Action Recognition

10.1109/m2vip49856.2021.9665132 ◽

2021 ◽

Author(s):

Yao Huang ◽

Jianyu Yang ◽

Zhanpeng Shao ◽

Youfu Li

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Motion Features ◽

Depth Video

Download Full-text

Flexible human action recognition in depth video sequences using masked joint trajectories

EURASIP Journal on Image and Video Processing ◽

10.1186/s13640-016-0120-y ◽

2016 ◽

Vol 2016 (1) ◽

Cited By ~ 3

Author(s):

Antonio Tejero-de-Pablos ◽

Yuta Nakashima ◽

Naokazu Yokoya ◽

Francisco-Javier Díaz-Pernas ◽

Mario Martínez-Zarzuela

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Video Sequences ◽

Depth Video

Download Full-text

Human action recognition using simple geometric features and a finite state machine

Image Processing & Communications ◽

10.2478/v10248-012-0079-y ◽

2013 ◽

Vol 18 (2-3) ◽

pp. 49-60 ◽

Cited By ~ 2

Author(s):

Damian Dudzńiski ◽

Tomasz Kryjak ◽

Zbigniew Mikrut

Keyword(s):

Action Recognition ◽

Finite State Machine ◽

Recognition Rate ◽

Human Action Recognition ◽

Human Action ◽

Video Stream ◽

State Machine ◽

Recognition Algorithm ◽

Finite State ◽

Correct Recognition Rate

Abstract In this paper a human action recognition algorithm, which uses background generation with shadow elimination, silhouette description based on simple geometrical features and a finite state machine for recognizing particular actions is described. The performed tests indicate that this approach obtains a 81 % correct recognition rate allowing real-time image processing of a 360 X 288 video stream.

Download Full-text

Deep Learning for Human Action Recognition Survey

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i10.323328 ◽

2018 ◽

Vol 6 (10) ◽

pp. 323-328

Author(s):

K.Kiruba . ◽

D. Shiloah Elizabeth ◽

C Sunil Retmin Raj

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action

Download Full-text