Pose primitive based human action recognition in videos or still images

Numerous human actions such as “Phoning,” “PlayingGuitar,” and “RidingHorse” can be inferred by static cue-based approaches even if their motions in video are available considering one single still image may already sufficiently explain a particular action. In this research, we investigate human action recognition in still images and utilize deep ensemble learning to automatically decompose the body pose and perceive its background information. Firstly, we construct an end-to-end NCNN-based model by attaching the nonsequential convolutional neural network (NCNN) module to the top of the pretrained model. The nonsequential network topology of NCNN can separately learn the spatial- and channel-wise features with parallel branches, which helps improve the model performance. Subsequently, in order to further exploit the advantage of the nonsequential topology, we propose an end-to-end deep ensemble learning based on the weight optimization (DELWO) model. It contributes to fusing the deep information derived from multiple models automatically from the data. Finally, we design the deep ensemble learning based on voting strategy (DELVS) model to pool together multiple deep models with weighted coefficients to obtain a better prediction. More importantly, the model complexity can be reduced by lessening the number of trainable parameters, thereby effectively mitigating overfitting issues of the model in small datasets to some extent. We conduct experiments in Li’s action dataset, uncropped and 1.5x cropped Willow action datasets, and the results have validated the effectiveness and robustness of our proposed models in terms of mitigating overfitting issues in small datasets. Finally, we open source our code for the model in GitHub (https://github.com/yxchspring/deep_ensemble_learning) in order to share our model with the community.

Download Full-text

Towards optimal VLAD for human action recognition from still images

Image and Vision Computing ◽

10.1016/j.imavis.2016.03.002 ◽

2016 ◽

Vol 55 ◽

pp. 53-63 ◽

Cited By ~ 2

Author(s):

Lei Zhang ◽

Changxi Li ◽

Peipei Peng ◽

Xuezhi Xiang ◽

Jingkuan Song

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Still Images

Download Full-text

Transfer learning with fine tuning for human action recognition from still images

Multimedia Tools and Applications ◽

10.1007/s11042-021-10753-y ◽

2021 ◽

Author(s):

Saikat Chakraborty ◽

Riktim Mondal ◽

Pawan Kumar Singh ◽

Ram Sarkar ◽

Debotosh Bhattacharjee

Keyword(s):

Transfer Learning ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Fine Tuning ◽

Still Images

Download Full-text

Human action recognition in still images using action poselets and a two-layer classification model

Journal of Visual Languages & Computing ◽

10.1016/j.jvlc.2015.01.003 ◽

2015 ◽

Vol 28 ◽

pp. 163-175 ◽

Cited By ~ 5

Author(s):

ByoungChul Ko ◽

JuneHyeok Hong ◽

Jae-Yeal Nam

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Classification Model ◽

Still Images

Download Full-text

Detecting Human Actions by 3D Deformable Parts Models

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.981.331 ◽

2014 ◽

Vol 981 ◽

pp. 331-334

Author(s):

Ming Yang ◽

Yong Yang

Keyword(s):

Object Detection ◽

Action Recognition ◽

High Performance ◽

Human Action Recognition ◽

Human Action ◽

Human Actions ◽

Still Images ◽

Histograms Of Oriented Gradients ◽

Deformable Part Models ◽

Unified Method

In this paper, we introduce the high performance Deformable part models from object detection into human action recognition and localization and propose a unified method to detect action in video sequences. The Deformable part models have attracted intensive attention in the field of object detection. We generalize the approach from 2D still images to 3D spatiotemporal volumes. The human actions are described by 3D histograms of oriented gradients based features. Different poses are presented by mixture of models on different resolutions. The model autonomously selects the most discriminative 3D parts and learns their anchor positions related to the root. Empirical results on several video datasets prove the efficacy of our proposed method on both action recognition and localization.

Download Full-text

Towards optimal vlad for human action recognition from still images

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2016.7471995 ◽

2016 ◽

Author(s):

Lei Zhang ◽

Xiantong Zhen ◽

Jiqing Han

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Still Images

Download Full-text

Human Action Recognition in Still Images

10.33915/etd.670 ◽

2013 ◽

Author(s):

Biyun Lai

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Still Images

Download Full-text

Human action recognition using simple geometric features and a finite state machine

Image Processing & Communications ◽

10.2478/v10248-012-0079-y ◽

2013 ◽

Vol 18 (2-3) ◽

pp. 49-60 ◽

Cited By ~ 2

Author(s):

Damian Dudzńiski ◽

Tomasz Kryjak ◽

Zbigniew Mikrut

Keyword(s):

Action Recognition ◽

Finite State Machine ◽

Recognition Rate ◽

Human Action Recognition ◽

Human Action ◽

Video Stream ◽

State Machine ◽

Recognition Algorithm ◽

Finite State ◽

Correct Recognition Rate

Abstract In this paper a human action recognition algorithm, which uses background generation with shadow elimination, silhouette description based on simple geometrical features and a finite state machine for recognizing particular actions is described. The performed tests indicate that this approach obtains a 81 % correct recognition rate allowing real-time image processing of a 360 X 288 video stream.

Download Full-text