Research on Human Action Recognition in Dance Video Images

This paper presents the simultaneous utilization of video images and inertial signals that are captured at the same time via a video camera and a wearable inertial sensor within a fusion framework in order to achieve a more robust human action recognition compared to the situations when each sensing modality is used individually. The data captured by these sensors are turned into 3D video images and 2D inertial images that are then fed as inputs into a 3D convolutional neural network and a 2D convolutional neural network, respectively, for recognizing actions. Two types of fusion are considered—Decision-level fusion and feature-level fusion. Experiments are conducted using the publicly available dataset UTD-MHAD in which simultaneous video images and inertial signals are captured for a total of 27 actions. The results obtained indicate that both the decision-level and feature-level fusion approaches generate higher recognition accuracies compared to the approaches when each sensing modality is used individually. The highest accuracy of 95.6% is obtained for the decision-level fusion approach.

Download Full-text

RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet

Journal of Sensors ◽

10.1155/2021/8864870 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Yun Liu ◽

Ruidi Ma ◽

Hui Li ◽

Chuanxu Wang ◽

Ye Tao

Keyword(s):

Action Recognition ◽

Recognition Accuracy ◽

Human Action Recognition ◽

Research Direction ◽

Human Action ◽

Stream Network ◽

Video Images ◽

Dependency Relationship ◽

Deep Feature ◽

Important Research Direction

Action recognition is an important research direction of computer vision, whose performance based on video images is easily affected by factors such as background and light, while deep video images can better reduce interference and improve recognition accuracy. Therefore, this paper makes full use of video and deep skeleton data and proposes an RGB-D action recognition based two-stream network (SV-GCN), which can be described as a two-stream architecture that works with two different data. Proposed Nonlocal-stgcn (S-Stream) based on skeleton data, by adding nonlocal to obtain dependency relationship between a wider range of joints, to provide more rich skeleton point features for the model, proposed a video based Dilated-slowfastnet (V-Stream), which replaces traditional random sampling layer with dilated convolutional layers, which can make better use of depth the feature; finally, two stream information is fused to realize action recognition. The experimental results on NTU-RGB+D dataset show that proposed method significantly improves recognition accuracy and is superior to st-gcn and Slowfastnet in both CS and CV.

Download Full-text

Human action recognition using simple geometric features and a finite state machine

Image Processing & Communications ◽

10.2478/v10248-012-0079-y ◽

2013 ◽

Vol 18 (2-3) ◽

pp. 49-60 ◽

Cited By ~ 2

Author(s):

Damian Dudzńiski ◽

Tomasz Kryjak ◽

Zbigniew Mikrut

Keyword(s):

Action Recognition ◽

Finite State Machine ◽

Recognition Rate ◽

Human Action Recognition ◽

Human Action ◽

Video Stream ◽

State Machine ◽

Recognition Algorithm ◽

Finite State ◽

Correct Recognition Rate

Abstract In this paper a human action recognition algorithm, which uses background generation with shadow elimination, silhouette description based on simple geometrical features and a finite state machine for recognizing particular actions is described. The performed tests indicate that this approach obtains a 81 % correct recognition rate allowing real-time image processing of a 360 X 288 video stream.

Download Full-text

Deep Learning for Human Action Recognition Survey

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i10.323328 ◽

2018 ◽

Vol 6 (10) ◽

pp. 323-328

Author(s):

K.Kiruba . ◽

D. Shiloah Elizabeth ◽

C Sunil Retmin Raj

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action

Download Full-text

Vertical View Human Action Recognition from Range Images

Proceedings of the International Display Workshops ◽

10.36463/idw.2019.1342 ◽

2019 ◽

pp. 1342

Author(s):

Akinobu Watanabe ◽

Keiichi Mitani

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Range Images

Download Full-text

Vertical View Human Action Recognition from Range Images

Proceedings of the International Display Workshops ◽

10.36463/idw.2019.prj6_ais3-2 ◽

2019 ◽

pp. 1342

Author(s):

Akinobu Watanabe ◽

Keiichi Mitani

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Range Images

Download Full-text

Human Action Recognition Based on Discriminative Sparse Coding Video Representation

ROBOT ◽

10.3724/sp.j.1218.2012.00745 ◽

2012 ◽

Vol 34 (6) ◽

pp. 745 ◽

Cited By ~ 5

Author(s):

Bin WANG ◽

Yuanyuan WANG ◽

Wenhua XIAO ◽

Wei WANG ◽

Maojun ZHANG

Keyword(s):

Action Recognition ◽

Sparse Coding ◽

Human Action Recognition ◽

Human Action ◽

Video Representation

Download Full-text

Human Action Recognition Based on Context-Dependent Graph Kernels

2014 IEEE Conference on Computer Vision and Pattern Recognition ◽

10.1109/cvpr.2014.334 ◽

2014 ◽

Cited By ~ 31

Author(s):

Baoxin Wu ◽

Chunfeng Yuan ◽

Weiming Hu

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Graph Kernels ◽

Context Dependent

Download Full-text

Fusion of Skeleton and RGB Features for RGB-D Human Action Recognition

IEEE Sensors Journal ◽

10.1109/jsen.2021.3089705 ◽

2021 ◽

pp. 1-1

Author(s):

Xu Weiyao ◽

Wu Muqing ◽

Zhao Min ◽

Xia Ting

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action

Download Full-text

Low-Cost Embedded System Using Convolutional Neural Networks-Based Spatiotemporal Feature Map for Real-Time Human Action Recognition

Applied Sciences ◽

10.3390/app11114940 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4940

Author(s):

Jinsoo Kim ◽

Jeongho Cho

Keyword(s):

Embedded System ◽

Real Time ◽

Action Recognition ◽

Processing Speed ◽

Recognition Accuracy ◽

Low Cost ◽

Human Action Recognition ◽

Human Action ◽

Video Data ◽

Feature Maps

The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.

Download Full-text