ElderSim: A Synthetic Data Generation Platform for Human Action Recognition in Eldercare Applications

Human poses and the behaviour estimation for different activities in (virtual reality/augmented reality) VR/AR could have numerous beneficial applications. Human fall monitoring is especially important for elderly people and for non-typical activities with VR/AR applications. There are a lot of different approaches to improving the fidelity of fall monitoring systems through the use of novel sensors and deep learning architectures; however, there is still a lack of detail and diverse datasets for training deep learning fall detectors using monocular images. The issues with synthetic data generation based on digital human simulation were implemented and examined using the Unreal Engine. The proposed pipeline provides automatic “playback” of various scenarios for digital human behaviour simulation, and the result of a proposed modular pipeline for synthetic data generation of digital human interaction with the 3D environments is demonstrated in this paper. We used the generated synthetic data to train the Mask R-CNN-based segmentation of the falling person interaction area. It is shown that, by training the model with simulation data, it is possible to recognize a falling person with an accuracy of 97.6% and classify the type of person’s interaction impact. The proposed approach also allows for covering a variety of scenarios that can have a positive effect at a deep learning training stage in other human action estimation tasks in an VR/AR environment.

Download Full-text

Synthetic Humans for Action Recognition from Unseen Viewpoints

International Journal of Computer Vision ◽

10.1007/s11263-021-01467-7 ◽

2021 ◽

Author(s):

Gül Varol ◽

Ivan Laptev ◽

Cordelia Schmid ◽

Andrew Zisserman

Keyword(s):

Action Recognition ◽

Recognition Performance ◽

Human Action Recognition ◽

Human Action ◽

Training Data ◽

Data Generation ◽

Uniform Frame ◽

In The Wild ◽

Spatio Temporal ◽

Human Pose

AbstractAlthough synthetic training data has been shown to be beneficial for tasks such as human pose estimation, its use for RGB human action recognition is relatively unexplored. Our goal in this work is to answer the question whether synthetic humans can improve the performance of human action recognition, with a particular focus on generalization to unseen viewpoints. We make use of the recent advances in monocular 3D human body reconstruction from real action sequences to automatically render synthetic training videos for the action labels. We make the following contributions: (1) we investigate the extent of variations and augmentations that are beneficial to improving performance at new viewpoints. We consider changes in body shape and clothing for individuals, as well as more action relevant augmentations such as non-uniform frame sampling, and interpolating between the motion of individuals performing the same action; (2) We introduce a new data generation methodology, SURREACT, that allows training of spatio-temporal CNNs for action classification; (3) We substantially improve the state-of-the-art action recognition performance on the NTU RGB+D and UESTC standard human action multi-view benchmarks; Finally, (4) we extend the augmentation approach to in-the-wild videos from a subset of the Kinetics dataset to investigate the case when only one-shot training data is available, and demonstrate improvements in this case as well.

Download Full-text

Human action recognition using simple geometric features and a finite state machine

Image Processing & Communications ◽

10.2478/v10248-012-0079-y ◽

2013 ◽

Vol 18 (2-3) ◽

pp. 49-60 ◽

Cited By ~ 2

Author(s):

Damian Dudzńiski ◽

Tomasz Kryjak ◽

Zbigniew Mikrut

Keyword(s):

Action Recognition ◽

Finite State Machine ◽

Recognition Rate ◽

Human Action Recognition ◽

Human Action ◽

Video Stream ◽

State Machine ◽

Recognition Algorithm ◽

Finite State ◽

Correct Recognition Rate

Abstract In this paper a human action recognition algorithm, which uses background generation with shadow elimination, silhouette description based on simple geometrical features and a finite state machine for recognizing particular actions is described. The performed tests indicate that this approach obtains a 81 % correct recognition rate allowing real-time image processing of a 360 X 288 video stream.

Download Full-text

Deep Learning for Human Action Recognition Survey

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i10.323328 ◽

2018 ◽

Vol 6 (10) ◽

pp. 323-328

Author(s):

K.Kiruba . ◽

D. Shiloah Elizabeth ◽

C Sunil Retmin Raj

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action

Download Full-text

Vertical View Human Action Recognition from Range Images

Proceedings of the International Display Workshops ◽

10.36463/idw.2019.1342 ◽

2019 ◽

pp. 1342

Author(s):

Akinobu Watanabe ◽

Keiichi Mitani

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Range Images

Download Full-text

Vertical View Human Action Recognition from Range Images

Proceedings of the International Display Workshops ◽

10.36463/idw.2019.prj6_ais3-2 ◽

2019 ◽

pp. 1342

Author(s):

Akinobu Watanabe ◽

Keiichi Mitani

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Range Images

Download Full-text

Human Action Recognition Based on Discriminative Sparse Coding Video Representation

ROBOT ◽

10.3724/sp.j.1218.2012.00745 ◽

2012 ◽

Vol 34 (6) ◽

pp. 745 ◽

Cited By ~ 5

Author(s):

Bin WANG ◽

Yuanyuan WANG ◽

Wenhua XIAO ◽

Wei WANG ◽

Maojun ZHANG

Keyword(s):

Action Recognition ◽

Sparse Coding ◽

Human Action Recognition ◽

Human Action ◽

Video Representation

Download Full-text

Human Action Recognition Based on Context-Dependent Graph Kernels

2014 IEEE Conference on Computer Vision and Pattern Recognition ◽

10.1109/cvpr.2014.334 ◽

2014 ◽

Cited By ~ 31

Author(s):

Baoxin Wu ◽

Chunfeng Yuan ◽

Weiming Hu

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Graph Kernels ◽

Context Dependent

Download Full-text

Fusion of Skeleton and RGB Features for RGB-D Human Action Recognition

IEEE Sensors Journal ◽

10.1109/jsen.2021.3089705 ◽

2021 ◽

pp. 1-1

Author(s):

Xu Weiyao ◽

Wu Muqing ◽

Zhao Min ◽

Xia Ting

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action

Download Full-text

Low-Cost Embedded System Using Convolutional Neural Networks-Based Spatiotemporal Feature Map for Real-Time Human Action Recognition

Applied Sciences ◽

10.3390/app11114940 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4940

Author(s):

Jinsoo Kim ◽

Jeongho Cho

Keyword(s):

Embedded System ◽

Real Time ◽

Action Recognition ◽

Processing Speed ◽

Recognition Accuracy ◽

Low Cost ◽

Human Action Recognition ◽

Human Action ◽

Video Data ◽

Feature Maps

The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.

Download Full-text