Efficient Sparse Representation based Action Recognition in video

Human Action Recognition (HAR) is an interesting and helpful topic in various real-life applications such as surveillance based security system, computer vision and robotics. The selected features and feature representation methods, classification algorithms decides the accuracy of the HAR systems. A new feature called, Skeletonized STIP (Spatio Temporal Interest Points) is identified and used in this work. The skeletonization on the action video’s foreground frames are performed and the new feature is generated as STIP values of the skeleton frame sequence. Then the feature set is used for initial dictionary construction in sparse coding. The data for action recognition is huge, since the feature set is represented using the sparse representation. To refine the sparse representation the max pooling method is used and the action recognition is performed using SVM classifier. The proposed approach outperforms on the benchmark datasets.

Download Full-text

Feature Extraction and Representation for Distributed Multi-View Human Action Recognition

IEEE Journal on Emerging and Selected Topics in Circuits and Systems ◽

10.1109/jetcas.2013.2256824 ◽

2013 ◽

Vol 3 (2) ◽

pp. 145-154 ◽

Cited By ~ 7

Author(s):

Jiajia Luo ◽

Wei Wang ◽

Hairong Qi

Keyword(s):

Action Recognition ◽

Approximation Error ◽

Human Action Recognition ◽

Human Action ◽

Base Station ◽

Feature Representation ◽

Superior Performance ◽

Feature Descriptor ◽

Testing Stage ◽

New Feature

Multi-view human action recognition has gained a lot of attention in recent years for its superior performance as compared to single view recognition. In this paper, we propose a new framework for the real-time realization of human action recognition in distributed camera networks (DCNs). We first present a new feature descriptor (Mltp-hist) that is tolerant to illumination change, robust in homogeneous region and computationally efficient. Taking advantage of the proposed Mltp-hist, the noninformative 3-D patches generated from the background can be further removed automatically that effectively highlights the foreground patches. Next, a new feature representation method based on sparse coding is presented to generate the histogram representation of local videos to be transmitted to the base station for classification. Due to the sparse representation of extracted features, the approximation error is reduced. Finally, at the base station, a probability model is produced to fuse the information from various views and a class label is assigned accordingly. Compared to the existing algorithms, the proposed framework has three advantages while having less requirements on memory and bandwidth consumption: 1) no preprocessing is required; 2) communication among cameras is unnecessary; and 3) positions and orientations of cameras do not need to be fixed. We further evaluate the proposed framework on the most popular multi-view action dataset IXMAS. Experimental results indicate that our proposed framework repeatedly achieves state-of-the-art results when various numbers of views are tested. In addition, our approach is tolerant to the various combination of views and benefit from introducing more views at the testing stage. Especially, our results are still satisfactory even when large misalignment exists between the training and testing samples.

Download Full-text

Human Action Recognition Based on Normalized Interest Points and Super-Interest Points

International Journal of Humanoid Robotics ◽

10.1142/s0219843614500054 ◽

2014 ◽

Vol 11 (01) ◽

pp. 1450005

Author(s):

Yangyang Wang ◽

Yibo Li ◽

Xiaofei Ji

Keyword(s):

Action Recognition ◽

Clustering Algorithm ◽

Three Dimensional ◽

Temporal Correlation ◽

Human Action Recognition ◽

Human Action ◽

Feature Representation ◽

Interest Point ◽

Interest Points ◽

Active Research

Visual-based human action recognition is currently one of the most active research topics in computer vision. The feature representation directly has a crucial impact on the performance of the recognition. Feature representation based on bag-of-words is popular in current research, but the spatial and temporal relationship among these features is usually discarded. In order to solve this issue, a novel feature representation based on normalized interest points is proposed and utilized to recognize the human actions. The novel representation is called super-interest point. The novelty of the proposed feature is that the spatial-temporal correlation between the interest points and human body can be directly added to the representation without considering scale and location variance of the points by introducing normalized points clustering. The novelty concerns three tasks. First, to solve the diversity of human location and scale, interest points are normalized based on the normalization of the human region. Second, to obtain the spatial-temporal correlation among the interest points, the normalized points with similar spatial and temporal distance are constructed to a super-interest point by using three-dimensional clustering algorithm. Finally, by describing the appearance characteristic of the super-interest points and location relationship among the super-interest points, a new feature representation is gained. The proposed representation formation sets up the relationship among local features and human figure. Experiments on Weizmann, KTH, and UCF sports dataset demonstrate that the proposed feature is effective for human action recognition.

Download Full-text

Shift Invariant Dictionary Learning for Human Action Recognition

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7005.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2107-2111

Keyword(s):

Sparse Representation ◽

Action Recognition ◽

Dictionary Learning ◽

Human Action Recognition ◽

Human Action ◽

Signal Denoising ◽

Learning Approaches ◽

Training Time ◽

Initial Stage ◽

Shift Invariance

Sparse representation is an emerging topic among researchers. The method to represent the huge volume of dense data as sparse data is much needed for various fields such as classification, compression and signal denoising. The base of the sparse representation is dictionary learning. In most of the dictionary learning approaches, the dictionary is learnt based on the input training signals which consumes more time. To solve this issue, the shift-invariant dictionary is used for action recognition in this work. Shift-Invariant Dictionary (SID) is that the dictionary is constructed in the initial stage with shift-invariance of initial atoms. The advantage of the proposed SID based action recognition method is that it requires minimum training time and achieves highest accuracy.

Download Full-text

CGA: a new feature selection model for visual human action recognition

Neural Computing and Applications ◽

10.1007/s00521-020-05297-5 ◽

2020 ◽

Cited By ~ 2

Author(s):

Ritam Guha ◽

Ali Hussain Khan ◽

Pawan Kumar Singh ◽

Ram Sarkar ◽

Debotosh Bhattacharjee

Keyword(s):

Feature Selection ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Selection Model ◽

New Feature

Download Full-text

Human Action Recognition Algorithm Based on DBPSO-SVM Classifier

2019 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) ◽

10.1109/icspcc46631.2019.8960768 ◽

2019 ◽

Author(s):

Yunkun Ning ◽

Sheng Zhang ◽

Weimin Xiong ◽

Guanglin Li ◽

Guoru Zhao

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Recognition Algorithm ◽

Svm Classifier

Download Full-text

Human Action Recognition Using Improved Salient Dense Trajectories

Computational Intelligence and Neuroscience ◽

10.1155/2016/6750459 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 3

Author(s):

Qingwu Li ◽

Haisu Cheng ◽

Yan Zhou ◽

Guanying Huo

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Human Action Recognition ◽

Human Action ◽

Interest Points ◽

Dense Trajectories ◽

Dense Trajectory ◽

Sparse Coefficient ◽

Active Research ◽

Motion Saliency

Human action recognition in videos is a topic of active research in computer vision. Dense trajectory (DT) features were shown to be efficient for representing videos in state-of-the-art approaches. In this paper, we present a more effective approach of video representation using improved salient dense trajectories: first, detecting the motion salient region and extracting the dense trajectories by tracking interest points in each spatial scale separately and then refining the dense trajectories via the analysis of the motion saliency. Then, we compute several descriptors (i.e., trajectory displacement, HOG, HOF, and MBH) in the spatiotemporal volume aligned with the trajectories. Finally, in order to represent the videos better, we optimize the framework of bag-of-words according to the motion salient intensity distribution and the idea of sparse coefficient reconstruction. Our architecture is trained and evaluated on the four standard video actions datasets of KTH, UCF sports, HMDB51, and UCF50, and the experimental results show that our approach performs competitively comparing with the state-of-the-art results.

Download Full-text