Action Recognition Using Hybrid Feature Descriptor and VLAD Video Encoding

Author(s):  
Dong Xing ◽  
Xianzhong Wang ◽  
Hongtao Lu
Sensors ◽  
2019 ◽  
Vol 19 (7) ◽  
pp. 1599 ◽  
Author(s):  
Md Uddin ◽  
Young-Koo Lee

Human action recognition plays a significant part in the research community due to its emerging applications. A variety of approaches have been proposed to resolve this problem, however, several issues still need to be addressed. In action recognition, effectively extracting and aggregating the spatial-temporal information plays a vital role to describe a video. In this research, we propose a novel approach to recognize human actions by considering both deep spatial features and handcrafted spatiotemporal features. Firstly, we extract the deep spatial features by employing a state-of-the-art deep convolutional network, namely Inception-Resnet-v2. Secondly, we introduce a novel handcrafted feature descriptor, namely Weber’s law based Volume Local Gradient Ternary Pattern (WVLGTP), which brings out the spatiotemporal features. It also considers the shape information by using gradient operation. Furthermore, Weber’s law based threshold value and the ternary pattern based on an adaptive local threshold is presented to effectively handle the noisy center pixel value. Besides, a multi-resolution approach for WVLGTP based on an averaging scheme is also presented. Afterward, both these extracted features are concatenated and feed to the Support Vector Machine to perform the classification. Lastly, the extensive experimental analysis shows that our proposed method outperforms state-of-the-art approaches in terms of accuracy.


Author(s):  
Jiajia Luo ◽  
Wei Wang ◽  
Hairong Qi

Multi-view human action recognition has gained a lot of attention in recent years for its superior performance as compared to single view recognition. In this paper, we propose a new framework for the real-time realization of human action recognition in distributed camera networks (DCNs). We first present a new feature descriptor (Mltp-hist) that is tolerant to illumination change, robust in homogeneous region and computationally efficient. Taking advantage of the proposed Mltp-hist, the noninformative 3-D patches generated from the background can be further removed automatically that effectively highlights the foreground patches. Next, a new feature representation method based on sparse coding is presented to generate the histogram representation of local videos to be transmitted to the base station for classification. Due to the sparse representation of extracted features, the approximation error is reduced. Finally, at the base station, a probability model is produced to fuse the information from various views and a class label is assigned accordingly. Compared to the existing algorithms, the proposed framework has three advantages while having less requirements on memory and bandwidth consumption: 1) no preprocessing is required; 2) communication among cameras is unnecessary; and 3) positions and orientations of cameras do not need to be fixed. We further evaluate the proposed framework on the most popular multi-view action dataset IXMAS. Experimental results indicate that our proposed framework repeatedly achieves state-of-the-art results when various numbers of views are tested. In addition, our approach is tolerant to the various combination of views and benefit from introducing more views at the testing stage. Especially, our results are still satisfactory even when large misalignment exists between the training and testing samples.


Author(s):  
B. H. Shekar ◽  
P. Rathnakara Shetty ◽  
M. Sharmila Kumari ◽  
L. Mestetsky

<p><strong>Abstract.</strong> Accumulating the motion information from a video sequence is one of the highly challenging and significant phase in Human Action Recognition. To achieve this, several classical and compact representations are proposed by the research community with proven applicability. In this paper, we propose a compact Depth Motion Map based representation methodology with hastey striding, consisely accumulating the motion information. We extract Undecimated Dual Tree Complex Wavelet Transform features from the proposed DMM, to form an efficient feature descriptor. We designate a Sequential Extreme Learning Machine for classifying the human action secquences on benchmark datasets, MSR Action 3D dataset and DHA Dataset. We empirically prove the feasability of our method under standard protocols, achieving proven results.</p>


Author(s):  
Guang Li ◽  
Kai Liu ◽  
Chongyang Ding ◽  
Wenwen Ding ◽  
Evgeny Belyaev ◽  
...  

Being a class of effective feature descriptors for action recognition, action representations based on skeleton sequences have yielded excellent recognition results. Most methods used to construct these action representations are based on the information from all the joint positions in actions. Unfortunately, some joints in the actions do not improve the accuracy of action recognition, and may even cause unnecessary inter-class errors. In this study, the authors propose a new method for action recognition by selecting active joints which are closely related to the movement of the body as the first step. Further, a skeleton is characterized as a set of its active-joint positions, and the set can be mapped to a point on pre-shape space to filter out the scale and translation variability. Then, a skeleton sequence (an action) can be regarded as points on the space. Because the timing-sequence relationship between skeletons is very valuable for action recognition, a tensor-based linear dynamical system (tLDS) is employed to model the temporal information of the action. To avoid using a finite-order sequence to estimate the infinite-order feature descriptor of a tLDS, the descriptor is mapped to a point on an infinite Grassmannian composed of the extended observability subspaces. The action is classified using sparse coding and dictionary learning (SCDL) on the infinite Grassmannian. Experimental results demonstrate that the recognition accuracies of the proposed method outperform state-of-the-art ones on four different action datasets.


2020 ◽  
Vol 9 (1) ◽  
pp. 43-61
Author(s):  
Ushapreethi P ◽  
Lakshmi Priya G G

PurposeTo find a successful human action recognition system (HAR) for the unmanned environments.Design/methodology/approachThis paper describes the key technology of an efficient HAR system. In this paper, the advancements for three key steps of the HAR system are presented to improve the accuracy of the existing HAR systems. The key steps are feature extraction, feature descriptor and action classification, which are implemented and analyzed. The usage of the implemented HAR system in the self-driving car is summarized. Finally, the results of the HAR system and other existing action recognition systems are compared.FindingsThis paper exhibits the proposed modification and improvements in the HAR system, namely the skeleton-based spatiotemporal interest points (STIP) feature and the improved discriminative sparse descriptor for the identified feature and the linear action classification.Research limitations/implicationsThe experiments are carried out on captured benchmark data sets and need to be analyzed in a real-time environment.Practical implicationsThe middleware support between the proposed HAR system and the self-driven car system provides several other challenging opportunities in research.Social implicationsThe authors’ work provides the way to go a step ahead in machine vision especially in self-driving cars.Originality/valueThe method for extracting the new feature and constructing an improved discriminative sparse feature descriptor has been introduced.


Sign in / Sign up

Export Citation Format

Share Document