A Flexible High-Level Fusion for an Accurate Human Action Recognition System

2020 ◽  
Vol 29 (12) ◽  
pp. 2050190
Author(s):  
Amel Ben Mahjoub ◽  
Mohamed Atri

Action recognition is a very effective method of computer vision areas. In the last few years, there has been a growing interest in Deep learning networks as the Long Short–Term Memory (LSTM) architectures due to their efficiency in long-term time sequence processing. In the light of these recent events in deep neural networks, there is now considerable concern about the development of an accurate action recognition approach with low complexity. This paper aims to introduce a method for learning depth activity videos based on the LSTM and the classification fusion. The first step consists in extracting compact depth video features. We start with the calculation of Depth Motion Maps (DMM) from each sequence. Then we encode and concatenate contour and texture DMM characteristics using the histogram-of-oriented-gradient and local-binary-patterns descriptors. The second step is the depth video classification based on the naive Bayes fusion approach. Training three classifiers, which are the collaborative representation classifier, the kernel-based extreme learning machine and the LSTM, is done separately to get classification scores. Finally, we fuse the classification score outputs of all classifiers with the naive Bayesian method to get a final predicted label. Our proposed method achieves a significant improvement in the recognition rate compared to previous work that has used Kinect v2 and UTD-MHAD human action datasets.

Author(s):  
MARC BOSCH-JORGE ◽  
ANTONIO-JOSÉ SÁNCHEZ-SALMERÓN ◽  
CARLOS RICOLFE-VIALA

The aim of this work is to present a visual-based human action recognition system which is adapted to constrained embedded devices, such as smart phones. Basically, vision-based human action recognition is a combination of feature-tracking, descriptor-extraction and subsequent classification of image representations, with a color-based identification tool to distinguish between multiple human subjects. Simple descriptors sets were evaluated to optimize recognition rate and performance and two dimensional (2D) descriptors were found to be effective. These sets installed on the latest phones can recognize human actions in videos in less than one second with a success rate of over 82%.


Author(s):  
Shou-Jen Lin ◽  
Mei-Hsuan Chao ◽  
Chao-Yang Lee ◽  
Chu-Sing Yang

A human action recognition system based on image depth is proposed in this paper. Depth information features are not easily disturbed by noise; and due to this characteristic, the system can quickly extract foreground targets. Moreover, the target data, namely, depth and two-dimensional (2D) data, are projected to three orthogonal planes. In this manner, the action featured in the depth motion along the optical axis can clearly describe the trajectory. Based on the change of motion energy and the angle variations of motion orientations, the temporal segmentation (TS) method automatically segments the complex action into several simple movements. Three-dimensional (3D) data is further applied to acquire the three-viewpoint (3V) motion history trajectory, whereby a target’s motion is described through the motion history images (MHIs) from the 3Vs. The weightings corresponding to the gradients of the MHIs are included for determining the viewpoint that bests describe the target’s motion. In terms of feature extraction, the application of multi-resolution motion history histograms can effectively reduce the computational load and achieve a high recognition rate. Experimental results demonstrate that the proposed method can effectively solve the self-occlusion problem.


2013 ◽  
Vol 18 (2-3) ◽  
pp. 49-60 ◽  
Author(s):  
Damian Dudzńiski ◽  
Tomasz Kryjak ◽  
Zbigniew Mikrut

Abstract In this paper a human action recognition algorithm, which uses background generation with shadow elimination, silhouette description based on simple geometrical features and a finite state machine for recognizing particular actions is described. The performed tests indicate that this approach obtains a 81 % correct recognition rate allowing real-time image processing of a 360 X 288 video stream.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3642
Author(s):  
Mohammad Farhad Bulbul ◽  
Sadiya Tabussum ◽  
Hazrat Ali ◽  
Wenli Zheng ◽  
Mi Young Lee ◽  
...  

This paper proposes an action recognition framework for depth map sequences using the 3D Space-Time Auto-Correlation of Gradients (STACOG) algorithm. First, each depth map sequence is split into two sets of sub-sequences of two different frame lengths individually. Second, a number of Depth Motion Maps (DMMs) sequences from every set are generated and are fed into STACOG to find an auto-correlation feature vector. For two distinct sets of sub-sequences, two auto-correlation feature vectors are obtained and applied gradually to L2-regularized Collaborative Representation Classifier (L2-CRC) for computing a pair of sets of residual values. Next, the Logarithmic Opinion Pool (LOGP) rule is used to combine the two different outcomes of L2-CRC and to allocate an action label of the depth map sequence. Finally, our proposed framework is evaluated on three benchmark datasets named MSR-action 3D dataset, DHA dataset, and UTD-MHAD dataset. We compare the experimental results of our proposed framework with state-of-the-art approaches to prove the effectiveness of the proposed framework. The computational efficiency of the framework is also analyzed for all the datasets to check whether it is suitable for real-time operation or not.


Author(s):  
Mohammad Farhad Bulbul ◽  
Yunsheng Jiang ◽  
Jinwen Ma

The emerging cost-effective depth sensors have facilitated the action recognition task significantly. In this paper, the authors address the action recognition problem using depth video sequences combining three discriminative features. More specifically, the authors generate three Depth Motion Maps (DMMs) over the entire video sequence corresponding to the front, side, and top projection views. Contourlet-based Histogram of Oriented Gradients (CT-HOG), Local Binary Patterns (LBP), and Edge Oriented Histograms (EOH) are then computed from the DMMs. To merge these features, the authors consider decision-level fusion, where a soft decision-fusion rule, Logarithmic Opinion Pool (LOGP), is used to combine the classification outcomes from multiple classifiers each with an individual set of features. Experimental results on two datasets reveal that the fusion scheme achieves superior action recognition performance over the situations when using each feature individually.


2017 ◽  
Vol 2017 ◽  
pp. 1-6
Author(s):  
Shirui Huo ◽  
Tianrui Hu ◽  
Ce Li

Human action recognition is an important recent challenging task. Projecting depth images onto three depth motion maps (DMMs) and extracting deep convolutional neural network (DCNN) features are discriminant descriptor features to characterize the spatiotemporal information of a specific action from a sequence of depth images. In this paper, a unified improved collaborative representation framework is proposed in which the probability that a test sample belongs to the collaborative subspace of all classes can be well defined and calculated. The improved collaborative representation classifier (ICRC) based on l2-regularized for human action recognition is presented to maximize the likelihood that a test sample belongs to each class, then theoretical investigation into ICRC shows that it obtains a final classification by computing the likelihood for each class. Coupled with the DMMs and DCNN features, experiments on depth image-based action recognition, including MSRAction3D and MSRGesture3D datasets, demonstrate that the proposed approach successfully using a distance-based representation classifier achieves superior performance over the state-of-the-art methods, including SRC, CRC, and SVM.


2015 ◽  
Vol 42 (1) ◽  
pp. 138-143
Author(s):  
ByoungChul Ko ◽  
Mincheol Hwang ◽  
Jae-Yeal Nam

2014 ◽  
Vol 644-650 ◽  
pp. 4162-4166
Author(s):  
Dan Dan Guo ◽  
Xi’an Zhu

An effective Human action recognition method based on the human skeletal information which is extracted by Kinect depth sensor is proposed in this paper. Skeleton’s 3D space coordinates and the angles between nodes of human related actions are collected as action characteristics through the research of human skeletal structure, node data and research on human actions. First, 3D information of human skeletons is acquired by Kinect depth sensors and the cosine of relevant nodes is calculated. Then human skeletal information within the time prior to current state is stored in real time. Finally, the relevant locations of the skeleton nodes and the variation of the cosine of skeletal joints within a certain time are analyzed to recognize the human motion. This algorithm has higher adaptability and practicability because of the complicated sample trainings and recognizing processes of traditional method is not taken up. The results of the experiment indicate that this method is with high recognition rate.


2012 ◽  
Vol 22 (06) ◽  
pp. 1250028 ◽  
Author(s):  
K. SUBRAMANIAN ◽  
S. SURESH

We propose a sequential Meta-Cognitive learning algorithm for Neuro-Fuzzy Inference System (McFIS) to efficiently recognize human actions from video sequence. Optical flow information between two consecutive image planes can represent actions hierarchically from local pixel level to global object level, and hence are used to describe the human action in McFIS classifier. McFIS classifier and its sequential learning algorithm is developed based on the principles of self-regulation observed in human meta-cognition. McFIS decides on what-to-learn, when-to-learn and how-to-learn based on the knowledge stored in the classifier and the information contained in the new training samples. The sequential learning algorithm of McFIS is controlled and monitored by the meta-cognitive components which uses class-specific, knowledge based criteria along with self-regulatory thresholds to decide on one of the following strategies: (i) Sample deletion (ii) Sample learning and (iii) Sample reserve. Performance of proposed McFIS based human action recognition system is evaluated using benchmark Weizmann and KTH video sequences. The simulation results are compared with well known SVM classifier and also with state-of-the-art action recognition results reported in the literature. The results clearly indicates McFIS action recognition system achieves better performances with minimal computational effort.


Sign in / Sign up

Export Citation Format

Share Document