An Efficient Human Instance-Guided Framework for Video Action Recognition

In recent years, human action recognition has been studied by many computer vision researchers. Recent studies have attempted to use two-stream networks using appearance and motion features, but most of these approaches focused on clip-level video action recognition. In contrast to traditional methods which generally used entire images, we propose a new human instance-level video action recognition framework. In this framework, we represent the instance-level features using human boxes and keypoints, and our action region features are used as the inputs of the temporal action head network, which makes our framework more discriminative. We also propose novel temporal action head networks consisting of various modules, which reflect various temporal dynamics well. In the experiment, the proposed models achieve comparable performance with the state-of-the-art approaches on two challenging datasets. Furthermore, we evaluate the proposed features and networks to verify the effectiveness of them. Finally, we analyze the confusion matrix and visualize the recognized actions at human instance level when there are several people.

Download Full-text

SVM-Based Human Action Recognition and Its Remarkable Motion Features Discovery Algorithm

Springer Tracts in Advanced Robotics - Experimental Robotics IX ◽

10.1007/11552246_2 ◽

2006 ◽

pp. 15-25 ◽

Cited By ~ 2

Author(s):

Taketoshi Mori ◽

Masamichi Shimosaka ◽

Tomomasa Sato

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Motion Features

Download Full-text

Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition

Sensors ◽

10.3390/s19071599 ◽

2019 ◽

Vol 19 (7) ◽

pp. 1599 ◽

Cited By ~ 6

Author(s):

Md Uddin ◽

Young-Koo Lee

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Human Action Recognition ◽

Human Action ◽

Support Vector ◽

Feature Descriptor ◽

Weber’S Law ◽

Weber's Law ◽

Spatiotemporal Features ◽

Spatial Features

Human action recognition plays a significant part in the research community due to its emerging applications. A variety of approaches have been proposed to resolve this problem, however, several issues still need to be addressed. In action recognition, effectively extracting and aggregating the spatial-temporal information plays a vital role to describe a video. In this research, we propose a novel approach to recognize human actions by considering both deep spatial features and handcrafted spatiotemporal features. Firstly, we extract the deep spatial features by employing a state-of-the-art deep convolutional network, namely Inception-Resnet-v2. Secondly, we introduce a novel handcrafted feature descriptor, namely Weber’s law based Volume Local Gradient Ternary Pattern (WVLGTP), which brings out the spatiotemporal features. It also considers the shape information by using gradient operation. Furthermore, Weber’s law based threshold value and the ternary pattern based on an adaptive local threshold is presented to effectively handle the noisy center pixel value. Besides, a multi-resolution approach for WVLGTP based on an averaging scheme is also presented. Afterward, both these extracted features are concatenated and feed to the Support Vector Machine to perform the classification. Lastly, the extensive experimental analysis shows that our proposed method outperforms state-of-the-art approaches in terms of accuracy.

Download Full-text

Human Action Recognition in Videos Using Hybrid Motion Features

Lecture Notes in Computer Science - Advances in Multimedia Modeling ◽

10.1007/978-3-642-11301-7_42 ◽

2010 ◽

pp. 411-421 ◽

Cited By ~ 4

Author(s):

Si Liu ◽

Jing Liu ◽

Tianzhu Zhang ◽

Hanqing Lu

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Motion Features ◽

Hybrid Motion

Download Full-text

View-Invariant Deep Architecture for Human Action Recognition Using Two-Stream Motion and Shape Temporal Dynamics

IEEE Transactions on Image Processing ◽

10.1109/tip.2020.2965299 ◽

2020 ◽

Vol 29 ◽

pp. 3835-3844 ◽

Cited By ~ 2

Author(s):

Chhavi Dhiman ◽

Dinesh Kumar Vishwakarma

Keyword(s):

Action Recognition ◽

Temporal Dynamics ◽

Human Action Recognition ◽

Human Action ◽

Deep Architecture

Download Full-text

Mixed Features Based Improved Human Action Recognition Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.989-994.2731 ◽

2014 ◽

Vol 989-994 ◽

pp. 2731-2734

Author(s):

Hai Long Jia ◽

Kun Cao

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Video Camera ◽

Human Action ◽

Recognition Algorithm ◽

Advantages And Disadvantages ◽

Single Feature ◽

Mixed Features ◽

Motion Features ◽

Full Consideration

The choice of the motion features affects the result of the human action recognition method directly. Many factors often influence the single feature differently, such as appearance of human body, environment and video camera. So the accuracy of action recognition is limited. On the basis of studying the representation and recognition of human actions, and giving full consideration to the advantages and disadvantages of different features, this paper proposes a mixed feature which combines global silhouette feature and local optical flow feature. This combined representation is used for human action recognition.

Download Full-text

Human Action Recognition Using Salient Opponent-Based Motion Features

2010 Canadian Conference on Computer and Robot Vision ◽

10.1109/crv.2010.54 ◽

2010 ◽

Cited By ~ 7

Author(s):

Amir-Hossein Shabani ◽

John S. Zelek ◽

David A. Clausi

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Motion Features

Download Full-text

Human Action Recognition Using Improved Salient Dense Trajectories

Computational Intelligence and Neuroscience ◽

10.1155/2016/6750459 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 3

Author(s):

Qingwu Li ◽

Haisu Cheng ◽

Yan Zhou ◽

Guanying Huo

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Human Action Recognition ◽

Human Action ◽

Interest Points ◽

Dense Trajectories ◽

Dense Trajectory ◽

Sparse Coefficient ◽

Active Research ◽

Motion Saliency

Human action recognition in videos is a topic of active research in computer vision. Dense trajectory (DT) features were shown to be efficient for representing videos in state-of-the-art approaches. In this paper, we present a more effective approach of video representation using improved salient dense trajectories: first, detecting the motion salient region and extracting the dense trajectories by tracking interest points in each spatial scale separately and then refining the dense trajectories via the analysis of the motion saliency. Then, we compute several descriptors (i.e., trajectory displacement, HOG, HOF, and MBH) in the spatiotemporal volume aligned with the trajectories. Finally, in order to represent the videos better, we optimize the framework of bag-of-words according to the motion salient intensity distribution and the idea of sparse coefficient reconstruction. Our architecture is trained and evaluated on the four standard video actions datasets of KTH, UCF sports, HMDB51, and UCF50, and the experimental results show that our approach performs competitively comparing with the state-of-the-art results.

Download Full-text

A Bayesian Dynamical Approach for Human Action Recognition

Sensors ◽

10.3390/s21165613 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5613

Author(s):

Amirreza Farnoosh ◽

Zhouping Wang ◽

Shaotong Zhu ◽

Sarah Ostadabbas

Keyword(s):

Action Recognition ◽

Large Scale ◽

Temporal Dynamics ◽

Human Action Recognition ◽

Human Action ◽

Superior Performance ◽

Action Classification ◽

Motion Data ◽

Highly Correlated ◽

Low Dimensional

We introduce a generative Bayesian switching dynamical model for action recognition in 3D skeletal data. Our model encodes highly correlated skeletal data into a few sets of low-dimensional switching temporal processes and from there decodes to the motion data and their associated action labels. We parameterize these temporal processes with regard to a switching deep autoregressive prior to accommodate both multimodal and higher-order nonlinear inter-dependencies. This results in a dynamical deep generative latent model that parses meaningful intrinsic states in skeletal dynamics and enables action recognition. These sequences of states provide visual and quantitative interpretations about motion primitives that gave rise to each action class, which have not been explored previously. In contrast to previous works, which often overlook temporal dynamics, our method explicitly model temporal transitions and is generative. Our experiments on two large-scale 3D skeletal datasets substantiate the superior performance of our model in comparison with the state-of-the-art methods. Specifically, our method achieved 6.3% higher action classification accuracy (by incorporating a dynamical generative framework), and 3.5% better predictive error (by employing a nonlinear second-order dynamical transition model) when compared with the best-performing competitors.

Download Full-text

Human Action Recognition Using Spatio-Temporal Multiplier Network and Attentive Correlated Temporal Feature

International Journal of Image and Graphics ◽

10.1142/s0219467822500516 ◽

2021 ◽

Author(s):

C. Indhumathi ◽

V. Murugan ◽

G. Muthulakshmii

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Human Action Recognition ◽

Human Action ◽

Regional Correlation ◽

Temporal Features ◽

Adaptive Motion ◽

Spatio Temporal ◽

Inter Frame ◽

Temporal Feature

Nowadays, action recognition has gained more attention from the computer vision community. Normally for recognizing human actions, spatial and temporal features are extracted. Two-stream convolutional neural network is used commonly for human action recognition in videos. In this paper, Adaptive motion Attentive Correlated Temporal Feature (ACTF) is used for temporal feature extractor. The temporal average pooling in inter-frame is used for extracting the inter-frame regional correlation feature and mean feature. This proposed method has better accuracy of 96.9% for UCF101 and 74.6% for HMDB51 datasets, respectively, which are higher than the other state-of-the-art methods.

Download Full-text

A Review on Computer Vision-Based Methods for Human Action Recognition

Journal of Imaging ◽

10.3390/jimaging6060046 ◽

2020 ◽

Vol 6 (6) ◽

pp. 46

Author(s):

Mahmoud Al-Faris ◽

John Chiverton ◽

David Ndzi ◽

Ahmed Isam Ahmed

Keyword(s):

Health Care ◽

Computer Vision ◽

Action Recognition ◽

State Of The Art ◽

Human Action Recognition ◽

Human Action ◽

Generative Models ◽

Future Research ◽

Learning Technology ◽

Recognition Systems

Human action recognition targets recognising different actions from a sequence of observations and different environmental conditions. A wide different applications is applicable to vision based action recognition research. This can include video surveillance, tracking, health care, and human–computer interaction. However, accurate and effective vision based recognition systems continue to be a big challenging area of research in the field of computer vision. This review introduces the most recent human action recognition systems and provides the advances of state-of-the-art methods. To this end, the direction of this research is sorted out from hand-crafted representation based methods including holistic and local representation methods with various sources of data, to a deep learning technology including discriminative and generative models and multi-modality based methods. Next, the most common datasets of human action recognition are presented. This review introduces several analyses, comparisons and recommendations that help to find out the direction of future research.

Download Full-text