A Flexible High-Level Fusion for an Accurate Human Action Recognition System

Action recognition is a very effective method of computer vision areas. In the last few years, there has been a growing interest in Deep learning networks as the Long Short–Term Memory (LSTM) architectures due to their efficiency in long-term time sequence processing. In the light of these recent events in deep neural networks, there is now considerable concern about the development of an accurate action recognition approach with low complexity. This paper aims to introduce a method for learning depth activity videos based on the LSTM and the classification fusion. The first step consists in extracting compact depth video features. We start with the calculation of Depth Motion Maps (DMM) from each sequence. Then we encode and concatenate contour and texture DMM characteristics using the histogram-of-oriented-gradient and local-binary-patterns descriptors. The second step is the depth video classification based on the naive Bayes fusion approach. Training three classifiers, which are the collaborative representation classifier, the kernel-based extreme learning machine and the LSTM, is done separately to get classification scores. Finally, we fuse the classification score outputs of all classifiers with the naive Bayesian method to get a final predicted label. Our proposed method achieves a significant improvement in the recognition rate compared to previous work that has used Kinect v2 and UTD-MHAD human action datasets.

Download Full-text

VISUAL-BASED HUMAN ACTION RECOGNITION ON SMART PHONES BASED ON 2D AND 3D DESCRIPTORS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001412600099 ◽

2012 ◽

Vol 26 (08) ◽

pp. 1260009 ◽

Cited By ~ 1

Author(s):

MARC BOSCH-JORGE ◽

ANTONIO-JOSÉ SÁNCHEZ-SALMERÓN ◽

CARLOS RICOLFE-VIALA

Keyword(s):

Action Recognition ◽

Human Subjects ◽

Recognition Rate ◽

Human Action Recognition ◽

Recognition System ◽

Human Action ◽

Smart Phones ◽

Image Representations ◽

And Performance ◽

3D Descriptors

The aim of this work is to present a visual-based human action recognition system which is adapted to constrained embedded devices, such as smart phones. Basically, vision-based human action recognition is a combination of feature-tracking, descriptor-extraction and subsequent classification of image representations, with a color-based identification tool to distinguish between multiple human subjects. Simple descriptors sets were evaluated to optimize recognition rate and performance and two dimensional (2D) descriptors were found to be effective. These sets installed on the latest phones can recognize human actions in videos in less than one second with a success rate of over 82%.

Download Full-text

Human Action Recognition Using Motion History Image Based Temporal Segmentation

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800141655017x ◽

2016 ◽

Vol 30 (06) ◽

pp. 1655017 ◽

Cited By ~ 5

Author(s):

Shou-Jen Lin ◽

Mei-Hsuan Chao ◽

Chao-Yang Lee ◽

Chu-Sing Yang

Keyword(s):

Action Recognition ◽

Recognition Rate ◽

Three Dimensional ◽

Human Action Recognition ◽

Recognition System ◽

Human Action ◽

Depth Information ◽

Temporal Segmentation ◽

3D Data ◽

2D Data

A human action recognition system based on image depth is proposed in this paper. Depth information features are not easily disturbed by noise; and due to this characteristic, the system can quickly extract foreground targets. Moreover, the target data, namely, depth and two-dimensional (2D) data, are projected to three orthogonal planes. In this manner, the action featured in the depth motion along the optical axis can clearly describe the trajectory. Based on the change of motion energy and the angle variations of motion orientations, the temporal segmentation (TS) method automatically segments the complex action into several simple movements. Three-dimensional (3D) data is further applied to acquire the three-viewpoint (3V) motion history trajectory, whereby a target’s motion is described through the motion history images (MHIs) from the 3Vs. The weightings corresponding to the gradients of the MHIs are included for determining the viewpoint that bests describe the target’s motion. In terms of feature extraction, the application of multi-resolution motion history histograms can effectively reduce the computational load and achieve a high recognition rate. Experimental results demonstrate that the proposed method can effectively solve the self-occlusion problem.

Download Full-text

Human action recognition using simple geometric features and a finite state machine

Image Processing & Communications ◽

10.2478/v10248-012-0079-y ◽

2013 ◽

Vol 18 (2-3) ◽

pp. 49-60 ◽

Cited By ~ 2

Author(s):

Damian Dudzńiski ◽

Tomasz Kryjak ◽

Zbigniew Mikrut

Keyword(s):

Action Recognition ◽

Finite State Machine ◽

Recognition Rate ◽

Human Action Recognition ◽

Human Action ◽

Video Stream ◽

State Machine ◽

Recognition Algorithm ◽

Finite State ◽

Correct Recognition Rate

Abstract In this paper a human action recognition algorithm, which uses background generation with shadow elimination, silhouette description based on simple geometrical features and a finite state machine for recognizing particular actions is described. The performed tests indicate that this approach obtains a 81 % correct recognition rate allowing real-time image processing of a 360 X 288 video stream.

Download Full-text

Exploring 3D Human Action Recognition Using STACOG on Multi-View Depth Motion Maps Sequences

Sensors ◽

10.3390/s21113642 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3642

Author(s):

Mohammad Farhad Bulbul ◽

Sadiya Tabussum ◽

Hazrat Ali ◽

Wenli Zheng ◽

Mi Young Lee ◽

...

Keyword(s):

Action Recognition ◽

Depth Map ◽

Human Action Recognition ◽

Human Action ◽

Collaborative Representation ◽

Auto Correlation ◽

Time Operation ◽

Real Time Operation ◽

Benchmark Datasets ◽

Depth Motion Maps

This paper proposes an action recognition framework for depth map sequences using the 3D Space-Time Auto-Correlation of Gradients (STACOG) algorithm. First, each depth map sequence is split into two sets of sub-sequences of two different frame lengths individually. Second, a number of Depth Motion Maps (DMMs) sequences from every set are generated and are fed into STACOG to find an auto-correlation feature vector. For two distinct sets of sub-sequences, two auto-correlation feature vectors are obtained and applied gradually to L2-regularized Collaborative Representation Classifier (L2-CRC) for computing a pair of sets of residual values. Next, the Logarithmic Opinion Pool (LOGP) rule is used to combine the two different outcomes of L2-CRC and to allocate an action label of the depth map sequence. Finally, our proposed framework is evaluated on three benchmark datasets named MSR-action 3D dataset, DHA dataset, and UTD-MHAD dataset. We compare the experimental results of our proposed framework with state-of-the-art approaches to prove the effectiveness of the proposed framework. The computational efficiency of the framework is also analyzed for all the datasets to check whether it is suitable for real-time operation or not.

Download Full-text

DMMs-Based Multiple Features Fusion for Human Action Recognition

International Journal of Multimedia Data Engineering and Management ◽

10.4018/ijmdem.2015100102 ◽

2015 ◽

Vol 6 (4) ◽

pp. 23-39 ◽

Cited By ~ 18

Author(s):

Mohammad Farhad Bulbul ◽

Yunsheng Jiang ◽

Jinwen Ma

Keyword(s):

Action Recognition ◽

Recognition Performance ◽

Recognition Task ◽

Human Action Recognition ◽

Fusion Rule ◽

Local Binary Patterns ◽

Human Action ◽

Decision Fusion ◽

Soft Decision ◽

Depth Sensors

The emerging cost-effective depth sensors have facilitated the action recognition task significantly. In this paper, the authors address the action recognition problem using depth video sequences combining three discriminative features. More specifically, the authors generate three Depth Motion Maps (DMMs) over the entire video sequence corresponding to the front, side, and top projection views. Contourlet-based Histogram of Oriented Gradients (CT-HOG), Local Binary Patterns (LBP), and Edge Oriented Histograms (EOH) are then computed from the DMMs. To merge these features, the authors consider decision-level fusion, where a soft decision-fusion rule, Logarithmic Opinion Pool (LOGP), is used to combine the classification outcomes from multiple classifiers each with an individual set of features. Experimental results on two datasets reveal that the fusion scheme achieves superior action recognition performance over the situations when using each feature individually.

Download Full-text

Improved Collaborative Representation Classifier Based on l2-Regularized for Human Action Recognition

Journal of Electrical and Computer Engineering ◽

10.1155/2017/8191537 ◽

2017 ◽

Vol 2017 ◽

pp. 1-6

Author(s):

Shirui Huo ◽

Tianrui Hu ◽

Ce Li

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Test Sample ◽

Human Action ◽

Superior Performance ◽

Depth Image ◽

Collaborative Representation ◽

Depth Images ◽

Spatiotemporal Information ◽

Depth Motion Maps

Human action recognition is an important recent challenging task. Projecting depth images onto three depth motion maps (DMMs) and extracting deep convolutional neural network (DCNN) features are discriminant descriptor features to characterize the spatiotemporal information of a specific action from a sequence of depth images. In this paper, a unified improved collaborative representation framework is proposed in which the probability that a test sample belongs to the collaborative subspace of all classes can be well defined and calculated. The improved collaborative representation classifier (ICRC) based on l2-regularized for human action recognition is presented to maximize the likelihood that a test sample belongs to each class, then theoretical investigation into ICRC shows that it obtains a final classification by computing the likelihood for each class. Coupled with the DMMs and DCNN features, experiments on depth image-based action recognition, including MSRAction3D and MSRGesture3D datasets, demonstrate that the proposed approach successfully using a distance-based representation classifier achieves superior performance over the state-of-the-art methods, including SRC, CRC, and SVM.

Download Full-text

An enhanced multi-view human action recognition system for virtual training simulator

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) ◽

10.1109/apsipa.2016.7820895 ◽

2016 ◽

Cited By ~ 2

Author(s):

Beom Kwon ◽

Junghwan Kim ◽

Sanghoon Lee

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Recognition System ◽

Human Action ◽

Training Simulator ◽

Virtual Training

Download Full-text

Image Based Human Action Recognition System to Support the Blind

Journal of KIISE ◽

10.5626/jok.2015.42.1.138 ◽

2015 ◽

Vol 42 (1) ◽

pp. 138-143

Author(s):

ByoungChul Ko ◽

Mincheol Hwang ◽

Jae-Yeal Nam

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Recognition System ◽

Human Action

Download Full-text

Research on Action Recognition of Human Body Based on Kinect Sensor

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.4162 ◽

2014 ◽

Vol 644-650 ◽

pp. 4162-4166

Author(s):

Dan Dan Guo ◽

Xi’an Zhu

Keyword(s):

Action Recognition ◽

Recognition Rate ◽

Human Action Recognition ◽

Human Action ◽

Human Motion ◽

Skeletal Structure ◽

Depth Sensor ◽

Depth Sensors ◽

3D Space ◽

3D Information

An effective Human action recognition method based on the human skeletal information which is extracted by Kinect depth sensor is proposed in this paper. Skeleton’s 3D space coordinates and the angles between nodes of human related actions are collected as action characteristics through the research of human skeletal structure, node data and research on human actions. First, 3D information of human skeletons is acquired by Kinect depth sensors and the cosine of relevant nodes is calculated. Then human skeletal information within the time prior to current state is stored in real time. Finally, the relevant locations of the skeleton nodes and the variation of the cosine of skeletal joints within a certain time are analyzed to recognize the human motion. This algorithm has higher adaptability and practicability because of the complicated sample trainings and recognizing processes of traditional method is not taken up. The results of the experiment indicate that this method is with high recognition rate.

Download Full-text

HUMAN ACTION RECOGNITION USING META-COGNITIVE NEURO-FUZZY INFERENCE SYSTEM

International Journal of Neural Systems ◽

10.1142/s0129065712500281 ◽

2012 ◽

Vol 22 (06) ◽

pp. 1250028 ◽

Cited By ~ 37

Author(s):

K. SUBRAMANIAN ◽

S. SURESH

Keyword(s):

Action Recognition ◽

Fuzzy Inference System ◽

Fuzzy Inference ◽

Learning Algorithm ◽

Human Action Recognition ◽

Recognition System ◽

Human Action ◽

Sequential Learning ◽

Inference System ◽

Neuro Fuzzy

We propose a sequential Meta-Cognitive learning algorithm for Neuro-Fuzzy Inference System (McFIS) to efficiently recognize human actions from video sequence. Optical flow information between two consecutive image planes can represent actions hierarchically from local pixel level to global object level, and hence are used to describe the human action in McFIS classifier. McFIS classifier and its sequential learning algorithm is developed based on the principles of self-regulation observed in human meta-cognition. McFIS decides on what-to-learn, when-to-learn and how-to-learn based on the knowledge stored in the classifier and the information contained in the new training samples. The sequential learning algorithm of McFIS is controlled and monitored by the meta-cognitive components which uses class-specific, knowledge based criteria along with self-regulatory thresholds to decide on one of the following strategies: (i) Sample deletion (ii) Sample learning and (iii) Sample reserve. Performance of proposed McFIS based human action recognition system is evaluated using benchmark Weizmann and KTH video sequences. The simulation results are compared with well known SVM classifier and also with state-of-the-art action recognition results reported in the literature. The results clearly indicates McFIS action recognition system achieves better performances with minimal computational effort.

Download Full-text