Action recognition in video using a spatial-temporal graph-based feature representation

Author(s):  
Iveel Jargalsaikhan ◽  
Suzanne Little ◽  
Remi Trichet ◽  
Noel E. O'Connor
Author(s):  
Jianhai Zhang ◽  
Zhiyong Feng ◽  
Yong Su ◽  
Meng Xing

For the merits of high-order statistics and Riemannian geometry, covariance matrix has become a generic feature representation for action recognition. An independent action can be represented by an empirical statistics over all of its pose samples. Two major problems of covariance include the following: (1) it is prone to be singular so that actions fail to be represented properly, and (2) it is short of global action/pose-aware information so that expressive and discriminative power is limited. In this article, we propose a novel Bayesian covariance representation by a prior regularization method to solve the preceding problems. Specifically, covariance is viewed as a parametric maximum likelihood estimate of Gaussian distribution over local poses from an independent action. Then, a Global Informative Prior (GIP) is generated over global poses with sufficient statistics to regularize covariance. In this way, (1) singularity is greatly relieved due to sufficient statistics, (2) global pose information of GIP makes Bayesian covariance theoretically equivalent to a saliency weighting covariance over global action poses so that discriminative characteristics of actions can be represented more clearly. Experimental results show that our Bayesian covariance with GIP efficiently improves the performance of action recognition. In some databases, it outperforms the state-of-the-art variant methods that are based on kernels, temporal-order structures, and saliency weighting attentions, among others.


Author(s):  
Jiajia Luo ◽  
Wei Wang ◽  
Hairong Qi

Multi-view human action recognition has gained a lot of attention in recent years for its superior performance as compared to single view recognition. In this paper, we propose a new framework for the real-time realization of human action recognition in distributed camera networks (DCNs). We first present a new feature descriptor (Mltp-hist) that is tolerant to illumination change, robust in homogeneous region and computationally efficient. Taking advantage of the proposed Mltp-hist, the noninformative 3-D patches generated from the background can be further removed automatically that effectively highlights the foreground patches. Next, a new feature representation method based on sparse coding is presented to generate the histogram representation of local videos to be transmitted to the base station for classification. Due to the sparse representation of extracted features, the approximation error is reduced. Finally, at the base station, a probability model is produced to fuse the information from various views and a class label is assigned accordingly. Compared to the existing algorithms, the proposed framework has three advantages while having less requirements on memory and bandwidth consumption: 1) no preprocessing is required; 2) communication among cameras is unnecessary; and 3) positions and orientations of cameras do not need to be fixed. We further evaluate the proposed framework on the most popular multi-view action dataset IXMAS. Experimental results indicate that our proposed framework repeatedly achieves state-of-the-art results when various numbers of views are tested. In addition, our approach is tolerant to the various combination of views and benefit from introducing more views at the testing stage. Especially, our results are still satisfactory even when large misalignment exists between the training and testing samples.


2014 ◽  
Vol 11 (01) ◽  
pp. 1450005
Author(s):  
Yangyang Wang ◽  
Yibo Li ◽  
Xiaofei Ji

Visual-based human action recognition is currently one of the most active research topics in computer vision. The feature representation directly has a crucial impact on the performance of the recognition. Feature representation based on bag-of-words is popular in current research, but the spatial and temporal relationship among these features is usually discarded. In order to solve this issue, a novel feature representation based on normalized interest points is proposed and utilized to recognize the human actions. The novel representation is called super-interest point. The novelty of the proposed feature is that the spatial-temporal correlation between the interest points and human body can be directly added to the representation without considering scale and location variance of the points by introducing normalized points clustering. The novelty concerns three tasks. First, to solve the diversity of human location and scale, interest points are normalized based on the normalization of the human region. Second, to obtain the spatial-temporal correlation among the interest points, the normalized points with similar spatial and temporal distance are constructed to a super-interest point by using three-dimensional clustering algorithm. Finally, by describing the appearance characteristic of the super-interest points and location relationship among the super-interest points, a new feature representation is gained. The proposed representation formation sets up the relationship among local features and human figure. Experiments on Weizmann, KTH, and UCF sports dataset demonstrate that the proposed feature is effective for human action recognition.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5260 ◽  
Author(s):  
Fanjia Li ◽  
Juanjuan Li ◽  
Aichun Zhu ◽  
Yonggang Xu ◽  
Hongsheng Yin ◽  
...  

In the skeleton-based human action recognition domain, the spatial-temporal graph convolution networks (ST-GCNs) have made great progress recently. However, they use only one fixed temporal convolution kernel, which is not enough to extract the temporal cues comprehensively. Moreover, simply connecting the spatial graph convolution layer (GCL) and the temporal GCL in series is not the optimal solution. To this end, we propose a novel enhanced spatial and extended temporal graph convolutional network (EE-GCN) in this paper. Three convolution kernels with different sizes are chosen to extract the discriminative temporal features from shorter to longer terms. The corresponding GCLs are then concatenated by a powerful yet efficient one-shot aggregation (OSA) + effective squeeze-excitation (eSE) structure. The OSA module aggregates the features from each layer once to the output, and the eSE module explores the interdependency between the channels of the output. Besides, we propose a new connection paradigm to enhance the spatial features, which expand the serial connection to a combination of serial and parallel connections by adding a spatial GCL in parallel with the temporal GCLs. The proposed method is evaluated on three large scale datasets, and the experimental results show that the performance of our method exceeds previous state-of-the-art methods.


2021 ◽  
pp. 1-13
Author(s):  
Cong Pei ◽  
Feng Jiang ◽  
Mao Li

With the advent of cost-efficient depth cameras, many effective feature descriptors have been proposed for action recognition from depth sequences. However, most of them are based on single feature and thus unable to extract the action information comprehensively, e.g., some kinds of feature descriptors can represent the area where the motion occurs while they lack the ability of describing the order in which the action is performed. In this paper, a new feature representation scheme combining different feature descriptors is proposed to capture various aspects of action cues simultaneously. First of all, a depth sequence is divided into a series of sub-sequences using motion energy based spatial-temporal pyramid. For each sub-sequence, on the one hand, the depth motion maps (DMMs) based completed local binary pattern (CLBP) descriptors are calculated through a patch-based strategy. On the other hand, each sub-sequence is partitioned into spatial grids and the polynormals descriptors are obtained for each of the grid sequences. Then, the sparse representation vectors of the DMMs based CLBP and the polynormals are calculated separately. After pooling, the ultimate representation vector of the sample is generated as the input of the classifier. Finally, two different fusion strategies are applied to conduct fusion. Through extensive experiments on two benchmark datasets, the performance of the proposed method is proved better than that of each single feature based recognition method.


2021 ◽  
Vol 11 (18) ◽  
pp. 8641
Author(s):  
Jianping Guo ◽  
Hong Liu ◽  
Xi Li ◽  
Dahong Xu ◽  
Yihan Zhang

With the increasing popularity of artificial intelligence applications, artificial intelligence technology has begun to be applied in competitive sports. These applications have promoted the improvement of athletes’ competitive ability, as well as the fitness of the masses. Human action recognition technology, based on deep learning, has gradually been applied to the analysis of the technical actions of competitive sports athletes, as well as the analysis of tactics. In this paper, a new graph convolution model is proposed. Delaunay’s partitioning algorithm was used to construct a new spatiotemporal topology which can effectively obtain the structural information and spatiotemporal features of athletes’ technical actions. At the same time, the attention mechanism was integrated into the model, and different weight coefficients were assigned to the joints, which significantly improved the accuracy of technical action recognition. First, a comparison between the current state-of-the-art methods was undertaken using the general datasets of Kinect and NTU-RGB + D. The performance of the new algorithm model was slightly improved in comparison to the general dataset. Then, the performance of our algorithm was compared with spatial temporal graph convolutional networks (ST-GCN) for the karate technique action dataset. We found that the accuracy of our algorithm was significantly improved.


Author(s):  
Kumbala Reddy ◽  
◽  
Gullipalli Naidu ◽  
Bulusu Vardhan ◽  
◽  
...  

2021 ◽  
Author(s):  
Tailin Chen ◽  
Desen Zhou ◽  
Jian Wang ◽  
Shidong Wang ◽  
Yu Guan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document