Learning Motion Features from Dynamic Images of Depth Video for Human Action Recognition

The choice of the motion features affects the result of the human action recognition method directly. Many factors often influence the single feature differently, such as appearance of human body, environment and video camera. So the accuracy of action recognition is limited. On the basis of studying the representation and recognition of human actions, and giving full consideration to the advantages and disadvantages of different features, this paper proposes a mixed feature which combines global silhouette feature and local optical flow feature. This combined representation is used for human action recognition.

Download Full-text

Human Action Recognition Using Salient Opponent-Based Motion Features

2010 Canadian Conference on Computer and Robot Vision ◽

10.1109/crv.2010.54 ◽

2010 ◽

Cited By ~ 7

Author(s):

Amir-Hossein Shabani ◽

John S. Zelek ◽

David A. Clausi

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Motion Features

Download Full-text

Hierarchical dynamic depth projected difference images–based action recognition in videos with convolutional neural networks

International Journal of Advanced Robotic Systems ◽

10.1177/1729881418825093 ◽

2019 ◽

Vol 16 (1) ◽

pp. 172988141882509 ◽

Cited By ~ 3

Author(s):

Hanbo Wu ◽

Xin Ma ◽

Yibin Li

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Information ◽

Superior Performance ◽

Video Sequences ◽

Depth Video ◽

Difference Images

Temporal information plays a significant role in video-based human action recognition. How to effectively extract the spatial–temporal characteristics of actions in videos has always been a challenging problem. Most existing methods acquire spatial and temporal cues in videos individually. In this article, we propose a new effective representation for depth video sequences, called hierarchical dynamic depth projected difference images that can aggregate the action spatial and temporal information simultaneously at different temporal scales. We firstly project depth video sequences onto three orthogonal Cartesian views to capture the 3D shape and motion information of human actions. Hierarchical dynamic depth projected difference images are constructed with the rank pooling in each projected view to hierarchically encode the spatial–temporal motion dynamics in depth videos. Convolutional neural networks can automatically learn discriminative features from images and have been extended to video classification because of their superior performance. To verify the effectiveness of hierarchical dynamic depth projected difference images representation, we construct a hierarchical dynamic depth projected difference images–based action recognition framework where hierarchical dynamic depth projected difference images in three views are fed into three identical pretrained convolutional neural networks independently for finely retuning. We design three classification schemes in the framework and different schemes utilize different convolutional neural network layers to compare their effects on action recognition. Three views are combined to describe the actions more comprehensively in each classification scheme. The proposed framework is evaluated on three challenging public human action data sets. Experiments indicate that our method has better performance and can provide discriminative spatial–temporal information for human action recognition in depth videos.

Download Full-text

A Flexible High-Level Fusion for an Accurate Human Action Recognition System

Journal of Circuits System and Computers ◽

10.1142/s021812662050190x ◽

2020 ◽

Vol 29 (12) ◽

pp. 2050190

Author(s):

Amel Ben Mahjoub ◽

Mohamed Atri

Keyword(s):

Action Recognition ◽

Short Term Memory ◽

Recognition Rate ◽

Human Action Recognition ◽

Local Binary Patterns ◽

Low Complexity ◽

Recognition System ◽

Human Action ◽

Collaborative Representation ◽

Depth Video

Action recognition is a very effective method of computer vision areas. In the last few years, there has been a growing interest in Deep learning networks as the Long Short–Term Memory (LSTM) architectures due to their efficiency in long-term time sequence processing. In the light of these recent events in deep neural networks, there is now considerable concern about the development of an accurate action recognition approach with low complexity. This paper aims to introduce a method for learning depth activity videos based on the LSTM and the classification fusion. The first step consists in extracting compact depth video features. We start with the calculation of Depth Motion Maps (DMM) from each sequence. Then we encode and concatenate contour and texture DMM characteristics using the histogram-of-oriented-gradient and local-binary-patterns descriptors. The second step is the depth video classification based on the naive Bayes fusion approach. Training three classifiers, which are the collaborative representation classifier, the kernel-based extreme learning machine and the LSTM, is done separately to get classification scores. Finally, we fuse the classification score outputs of all classifiers with the naive Bayesian method to get a final predicted label. Our proposed method achieves a significant improvement in the recognition rate compared to previous work that has used Kinect v2 and UTD-MHAD human action datasets.

Download Full-text

An Efficient Human Instance-Guided Framework for Video Action Recognition

Sensors ◽

10.3390/s21248309 ◽

2021 ◽

Vol 21 (24) ◽

pp. 8309

Author(s):

Inwoong Lee ◽

Doyoung Kim ◽

Dongyoon Wee ◽

Sanghoon Lee

Keyword(s):

Action Recognition ◽

Temporal Dynamics ◽

State Of The Art ◽

Confusion Matrix ◽

Human Action Recognition ◽

Human Action ◽

Stream Networks ◽

Comparable Performance ◽

Motion Features ◽

Temporal Action

In recent years, human action recognition has been studied by many computer vision researchers. Recent studies have attempted to use two-stream networks using appearance and motion features, but most of these approaches focused on clip-level video action recognition. In contrast to traditional methods which generally used entire images, we propose a new human instance-level video action recognition framework. In this framework, we represent the instance-level features using human boxes and keypoints, and our action region features are used as the inputs of the temporal action head network, which makes our framework more discriminative. We also propose novel temporal action head networks consisting of various modules, which reflect various temporal dynamics well. In the experiment, the proposed models achieve comparable performance with the state-of-the-art approaches on two challenging datasets. Furthermore, we evaluate the proposed features and networks to verify the effectiveness of them. Finally, we analyze the confusion matrix and visualize the recognized actions at human instance level when there are several people.

Download Full-text

Human action recognition based on multi-scale feature maps from depth video sequences

Multimedia Tools and Applications ◽

10.1007/s11042-021-11193-4 ◽

2021 ◽

Author(s):

Chang Li ◽

Qian Huang ◽

Xing Li ◽

Qianhan Wu

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Video Sequences ◽

Feature Maps ◽

Scale Feature ◽

Multi Scale ◽

Depth Video

Download Full-text

Multi-View Hierarchical Bidirectional Recurrent Neural Network for Depth Video Sequence Based Action Recognition

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001418500337 ◽

2018 ◽

Vol 32 (10) ◽

pp. 1850033 ◽

Cited By ~ 25

Author(s):

Xueping Liu ◽

Yibo Li ◽

Qingjun Wang

Keyword(s):

Action Recognition ◽

Video Sequence ◽

Single Layer ◽

Human Action Recognition ◽

Research Direction ◽

Human Action ◽

Depth Image ◽

Classification Framework ◽

Depth Video ◽

High Computational Efficiency

Human action recognition based on depth video sequence is an important research direction in the field of computer vision. The present study proposed a classification framework based on hierarchical multi-view to resolve depth video sequence-based action recognition. Herein, considering the distinguishing feature of 3D human action space, we project the 3D human action image to three coordinate planes, so that the 3D depth image is converted to three 2D images, and then feed them to three subnets, respectively. With the increase of the number of layers, the representations of subnets are hierarchically fused to be the inputs of next layers. The final representations of the depth video sequence are fed into a single layer perceptron, and the final result is decided by the time accumulated through the output of the perceptron. We compare with other methods on two publicly available datasets, and we also verify the proposed method through the human action database acquired by our Kinect system. Our experimental results demonstrate that our model has high computational efficiency and achieves the performance of state-of-the-art method.

Download Full-text

Self-organizing neural integration of pose-motion features for human action recognition

Frontiers in Neurorobotics ◽

10.3389/fnbot.2015.00003 ◽

2015 ◽

Vol 9 ◽

Cited By ~ 52

Author(s):

German I. Parisi ◽

Cornelius Weber ◽

Stefan Wermter

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Neural Integration ◽

Motion Features ◽

Self Organizing

Download Full-text

Learning Motion Features from Dynamic Images of Depth Video for Human Action Recognition

SVM-Based Human Action Recognition and Its Remarkable Motion Features Discovery Algorithm

Human Action Recognition in Videos Using Hybrid Motion Features

Mixed Features Based Improved Human Action Recognition Algorithm

Human Action Recognition Using Salient Opponent-Based Motion Features

Hierarchical dynamic depth projected difference images–based action recognition in videos with convolutional neural networks

A Flexible High-Level Fusion for an Accurate Human Action Recognition System

An Efficient Human Instance-Guided Framework for Video Action Recognition

Human action recognition based on multi-scale feature maps from depth video sequences

Multi-View Hierarchical Bidirectional Recurrent Neural Network for Depth Video Sequence Based Action Recognition

Self-organizing neural integration of pose-motion features for human action recognition

Export Citation Format