Two-stream spatiotemporal feature fusion for human action recognition

The representation and selection of action features directly affect the recognition effect of human action recognition methods. Single feature is often affected by human appearance, environment, camera settings, and other factors. Aiming at the problem that the existing multimodal feature fusion methods cannot effectively measure the contribution of different features, this paper proposed a human action recognition method based on RGB-D image features, which makes full use of the multimodal information provided by RGB-D sensors to extract effective human action features. In this paper, three kinds of human action features with different modal information are proposed: RGB-HOG feature based on RGB image information, which has good geometric scale invariance; D-STIP feature based on depth image, which maintains the dynamic characteristics of human motion and has local invariance; and S-JRPF feature-based skeleton information, which has good ability to describe motion space structure. At the same time, multiple K-nearest neighbor classifiers with better generalization ability are used to integrate decision-making classification. The experimental results show that the algorithm achieves ideal recognition results on the public G3D and CAD60 datasets.

Download Full-text

Human action recognition based on multiple feature fusion

Advances in Modelling and Analysis B ◽

10.18280/ama_b.600102 ◽

2017 ◽

Vol 60 (1) ◽

pp. 25-42

Author(s):

R.J. Ma ◽

H.S. Zhang

Keyword(s):

Action Recognition ◽

Feature Fusion ◽

Human Action Recognition ◽

Human Action ◽

Multiple Feature ◽

Multiple Feature Fusion

Download Full-text

Multi-feature fusion based human action recognition algorithm

Third International Conference on Cyberspace Technology (CCT 2015) ◽

10.1049/cp.2015.0828 ◽

2015 ◽

Author(s):

Wei Song ◽

Pei Yang ◽

Ning-ning Liu ◽

Guosheng Yang ◽

Fu-hong Lin

Keyword(s):

Action Recognition ◽

Feature Fusion ◽

Human Action Recognition ◽

Human Action ◽

Recognition Algorithm

Download Full-text

Deep Learning of Fuzzy Weighted Multi-Resolution Depth Motion Maps with Spatial Feature Fusion for Action Recognition

Journal of Imaging ◽

10.3390/jimaging5100082 ◽

2019 ◽

Vol 5 (10) ◽

pp. 82 ◽

Cited By ~ 2

Author(s):

Mahmoud Al-Faris ◽

John Chiverton ◽

Yanyan Yang ◽

David Ndzi

Keyword(s):

Action Recognition ◽

Spatial Information ◽

Feature Fusion ◽

Human Action Recognition ◽

Human Action ◽

Single Type ◽

Weight Functions ◽

Motion Model ◽

Temporal Dimension ◽

Depth Motion Maps

Human action recognition (HAR) is an important yet challenging task. This paper presents a novel method. First, fuzzy weight functions are used in computations of depth motion maps (DMMs). Multiple length motion information is also used. These features are referred to as fuzzy weighted multi-resolution DMMs (FWMDMMs). This formulation allows for various aspects of individual actions to be emphasized. It also helps to characterise the importance of the temporal dimension. This is important to help overcome, e.g., variations in time over which a single type of action might be performed. A deep convolutional neural network (CNN) motion model is created and trained to extract discriminative and compact features. Transfer learning is also used to extract spatial information from RGB and depth data using the AlexNet network. Different late fusion techniques are then investigated to fuse the deep motion model with the spatial network. The result is a spatial temporal HAR model. The developed approach is capable of recognising both human action and human–object interaction. Three public domain datasets are used to evaluate the proposed solution. The experimental results demonstrate the robustness of this approach compared with state-of-the art algorithms.

Download Full-text

Feature fusion of triaxial acceleration signals and depth maps for human action recognition

2016 IEEE International Conference on Information and Automation (ICIA) ◽

10.1109/icinfa.2016.7832012 ◽

2016 ◽

Cited By ~ 4

Author(s):

Yi Li ◽

Jun Cheng ◽

Wei Feng ◽

Dapeng Tao

Keyword(s):

Action Recognition ◽

Feature Fusion ◽

Human Action Recognition ◽

Human Action ◽

Depth Maps ◽

Acceleration Signals

Download Full-text

Feature fusion for human action recognition based on classical descriptors and 3D convolutional networks

2017 Eleventh International Conference on Sensing Technology (ICST) ◽

10.1109/icsenst.2017.8304460 ◽

2017 ◽

Author(s):

Yang Qin ◽

Lingfei Mo ◽

Benyi Xie

Keyword(s):

Action Recognition ◽

Feature Fusion ◽

Human Action Recognition ◽

Human Action ◽

Convolutional Networks

Download Full-text

A Discriminative Deep Model With Feature Fusion and Temporal Attention for Human Action Recognition

IEEE Access ◽

10.1109/access.2020.2977856 ◽

2020 ◽

Vol 8 ◽

pp. 43243-43255 ◽

Cited By ~ 1

Author(s):

Jiahui Yu ◽

Hongwei Gao ◽

Wei Yang ◽

Yueqiu Jiang ◽

Weihong Chin ◽

...

Keyword(s):

Action Recognition ◽

Feature Fusion ◽

Human Action Recognition ◽

Human Action ◽

Temporal Attention ◽

Deep Model

Download Full-text

Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition

Neural Computing and Applications ◽

10.1007/s00521-021-06239-5 ◽

2021 ◽

Author(s):

Yaqing Hou ◽

Hua Yu ◽

Dongsheng Zhou ◽

Pengfei Wang ◽

Hongwei Ge ◽

...

Keyword(s):

Action Recognition ◽

Feature Fusion ◽

Human Action Recognition ◽

Human Action ◽

Spatial Network ◽

Stream Networks ◽

Human Actions ◽

Attention Network ◽

Multi Stage ◽

Spatio Temporal

AbstractIn the study of human action recognition, two-stream networks have made excellent progress recently. However, there remain challenges in distinguishing similar human actions in videos. This paper proposes a novel local-aware spatio-temporal attention network with multi-stage feature fusion based on compact bilinear pooling for human action recognition. To elaborate, taking two-stream networks as our essential backbones, the spatial network first employs multiple spatial transformer networks in a parallel manner to locate the discriminative regions related to human actions. Then, we perform feature fusion between the local and global features to enhance the human action representation. Furthermore, the output of the spatial network and the temporal information are fused at a particular layer to learn the pixel-wise correspondences. After that, we bring together three outputs to generate the global descriptors of human actions. To verify the efficacy of the proposed approach, comparison experiments are conducted with the traditional hand-engineered IDT algorithms, the classical machine learning methods (i.e., SVM) and the state-of-the-art deep learning methods (i.e., spatio-temporal multiplier networks). According to the results, our approach is reported to obtain the best performance among existing works, with the accuracy of 95.3% and 72.9% on UCF101 and HMDB51, respectively. The experimental results thus demonstrate the superiority and significance of the proposed architecture in solving the task of human action recognition.

Download Full-text

Two-stream spatiotemporal feature fusion for human action recognition

Human Action Recognition Based on Multi-level Feature Fusion

Human Action Recognition Based On Multi-level Feature Fusion

Using a Multilearner to Fuse Multimodal Features for Human Action Recognition

Human action recognition based on multiple feature fusion

Multi-feature fusion based human action recognition algorithm

Deep Learning of Fuzzy Weighted Multi-Resolution Depth Motion Maps with Spatial Feature Fusion for Action Recognition

Feature fusion of triaxial acceleration signals and depth maps for human action recognition

Feature fusion for human action recognition based on classical descriptors and 3D convolutional networks

A Discriminative Deep Model With Feature Fusion and Temporal Attention for Human Action Recognition

Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition

Export Citation Format