Human Action Recognition Using Improved Salient Dense Trajectories

Human action recognition in videos is a topic of active research in computer vision. Dense trajectory (DT) features were shown to be efficient for representing videos in state-of-the-art approaches. In this paper, we present a more effective approach of video representation using improved salient dense trajectories: first, detecting the motion salient region and extracting the dense trajectories by tracking interest points in each spatial scale separately and then refining the dense trajectories via the analysis of the motion saliency. Then, we compute several descriptors (i.e., trajectory displacement, HOG, HOF, and MBH) in the spatiotemporal volume aligned with the trajectories. Finally, in order to represent the videos better, we optimize the framework of bag-of-words according to the motion salient intensity distribution and the idea of sparse coefficient reconstruction. Our architecture is trained and evaluated on the four standard video actions datasets of KTH, UCF sports, HMDB51, and UCF50, and the experimental results show that our approach performs competitively comparing with the state-of-the-art results.

Download Full-text

Human Action Recognition Based on Normalized Interest Points and Super-Interest Points

International Journal of Humanoid Robotics ◽

10.1142/s0219843614500054 ◽

2014 ◽

Vol 11 (01) ◽

pp. 1450005

Author(s):

Yangyang Wang ◽

Yibo Li ◽

Xiaofei Ji

Keyword(s):

Action Recognition ◽

Clustering Algorithm ◽

Three Dimensional ◽

Temporal Correlation ◽

Human Action Recognition ◽

Human Action ◽

Feature Representation ◽

Interest Point ◽

Interest Points ◽

Active Research

Visual-based human action recognition is currently one of the most active research topics in computer vision. The feature representation directly has a crucial impact on the performance of the recognition. Feature representation based on bag-of-words is popular in current research, but the spatial and temporal relationship among these features is usually discarded. In order to solve this issue, a novel feature representation based on normalized interest points is proposed and utilized to recognize the human actions. The novel representation is called super-interest point. The novelty of the proposed feature is that the spatial-temporal correlation between the interest points and human body can be directly added to the representation without considering scale and location variance of the points by introducing normalized points clustering. The novelty concerns three tasks. First, to solve the diversity of human location and scale, interest points are normalized based on the normalization of the human region. Second, to obtain the spatial-temporal correlation among the interest points, the normalized points with similar spatial and temporal distance are constructed to a super-interest point by using three-dimensional clustering algorithm. Finally, by describing the appearance characteristic of the super-interest points and location relationship among the super-interest points, a new feature representation is gained. The proposed representation formation sets up the relationship among local features and human figure. Experiments on Weizmann, KTH, and UCF sports dataset demonstrate that the proposed feature is effective for human action recognition.

Download Full-text

Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition

Sensors ◽

10.3390/s19071599 ◽

2019 ◽

Vol 19 (7) ◽

pp. 1599 ◽

Cited By ~ 6

Author(s):

Md Uddin ◽

Young-Koo Lee

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Human Action Recognition ◽

Human Action ◽

Support Vector ◽

Feature Descriptor ◽

Weber’S Law ◽

Weber's Law ◽

Spatiotemporal Features ◽

Spatial Features

Human action recognition plays a significant part in the research community due to its emerging applications. A variety of approaches have been proposed to resolve this problem, however, several issues still need to be addressed. In action recognition, effectively extracting and aggregating the spatial-temporal information plays a vital role to describe a video. In this research, we propose a novel approach to recognize human actions by considering both deep spatial features and handcrafted spatiotemporal features. Firstly, we extract the deep spatial features by employing a state-of-the-art deep convolutional network, namely Inception-Resnet-v2. Secondly, we introduce a novel handcrafted feature descriptor, namely Weber’s law based Volume Local Gradient Ternary Pattern (WVLGTP), which brings out the spatiotemporal features. It also considers the shape information by using gradient operation. Furthermore, Weber’s law based threshold value and the ternary pattern based on an adaptive local threshold is presented to effectively handle the noisy center pixel value. Besides, a multi-resolution approach for WVLGTP based on an averaging scheme is also presented. Afterward, both these extracted features are concatenated and feed to the Support Vector Machine to perform the classification. Lastly, the extensive experimental analysis shows that our proposed method outperforms state-of-the-art approaches in terms of accuracy.

Download Full-text

Human Action Recognition Based on Foreground Trajectory and Motion Difference Descriptors

Applied Sciences ◽

10.3390/app9102126 ◽

2019 ◽

Vol 9 (10) ◽

pp. 2126 ◽

Cited By ~ 1

Author(s):

Suge Dong ◽

Daidi Hu ◽

Ruijun Li ◽

Mingtao Ge

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Support Vector ◽

Recognition Method ◽

Foreground Region ◽

Dense Trajectory ◽

Trajectory Method ◽

Direction Information ◽

Action Category

Aimed at the problems of high redundancy of trajectory and susceptibility to background interference in traditional dense trajectory behavior recognition methods, a human action recognition method based on foreground trajectory and motion difference descriptors is proposed. First, the motion magnitude of each frame is estimated by optical flow, and the foreground region is determined according to each motion magnitude of the pixels; the trajectories are only extracted from behavior-related foreground regions. Second, in order to better describe the relative temporal information between different actions, a motion difference descriptor is introduced to describe the foreground trajectory, and the direction histogram of the motion difference is constructed by calculating the direction information of the motion difference per unit time of the trajectory point. Finally, a Fisher vector (FV) is used to encode histogram features to obtain video-level action features, and a support vector machine (SVM) is utilized to classify the action category. Experimental results show that this method can better extract the action-related trajectory, and it can improve the recognition accuracy by 7% compared to the traditional dense trajectory method.

Download Full-text

Human Action Recognition Algorithm Based on Improved Dense Trajectories

Proceedings of the 2016 4th International Conference on Machinery, Materials and Information Technology Applications ◽

10.2991/icmmita-16.2016.111 ◽

2016 ◽

Author(s):

Yuling Sun ◽

Peng Gan ◽

Xiao Yu

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Recognition Algorithm ◽

Dense Trajectories

Download Full-text

A reduced LTP coding on spatial interest points for Human Action Recognition

Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE 2013) ◽

10.2991/iccsee.2013.270 ◽

2013 ◽

Author(s):

Chunkeng Dong ◽

Hao Song

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Interest Points

Download Full-text

Human Action Recognition via Body Part Region Segmented Dense Trajectories

Computer Vision – ACCV 2018 Workshops - Lecture Notes in Computer Science ◽

10.1007/978-3-030-21074-8_6 ◽

2019 ◽

pp. 64-72

Author(s):

Kaho Yamada ◽

Seiya Ito ◽

Naoshi Kaneko ◽

Kazuhiko Sumi

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Body Part ◽

Human Action ◽

Dense Trajectories

Download Full-text

A Local 3-D Motion Descriptor for Multi-View Human Action Recognition from 4-D Spatio-Temporal Interest Points

IEEE Journal of Selected Topics in Signal Processing ◽

10.1109/jstsp.2012.2193556 ◽

2012 ◽

Vol 6 (5) ◽

pp. 553-565 ◽

Cited By ~ 42

Author(s):

Michael B. Holte ◽

Bhaskar Chakraborty ◽

Jordi Gonzalez ◽

Thomas B. Moeslund

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Interest Points ◽

Motion Descriptor ◽

Spatio Temporal

Download Full-text

Human Action Recognition Using Spatio-Temporal Multiplier Network and Attentive Correlated Temporal Feature

International Journal of Image and Graphics ◽

10.1142/s0219467822500516 ◽

2021 ◽

Author(s):

C. Indhumathi ◽

V. Murugan ◽

G. Muthulakshmii

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Human Action Recognition ◽

Human Action ◽

Regional Correlation ◽

Temporal Features ◽

Adaptive Motion ◽

Spatio Temporal ◽

Inter Frame ◽

Temporal Feature

Nowadays, action recognition has gained more attention from the computer vision community. Normally for recognizing human actions, spatial and temporal features are extracted. Two-stream convolutional neural network is used commonly for human action recognition in videos. In this paper, Adaptive motion Attentive Correlated Temporal Feature (ACTF) is used for temporal feature extractor. The temporal average pooling in inter-frame is used for extracting the inter-frame regional correlation feature and mean feature. This proposed method has better accuracy of 96.9% for UCF101 and 74.6% for HMDB51 datasets, respectively, which are higher than the other state-of-the-art methods.

Download Full-text