Human Action Recognition Based on a Spatio-Temporal Video Autoencoder

Author(s):  
Anderson Carlos Sousa e Santos ◽  
Helio Pedrini

Due to rapid advances in the development of surveillance cameras with high sampling rates, low cost, small size and high resolution, video-based action recognition systems have become more commonly used in various computer vision applications. Human operators can be supported with the aid of such systems to detect events of interest in video sequences, improving recognition results and reducing failure cases. In this work, we propose and evaluate a method to learn two-dimensional (2D) representations from video sequences based on an autoencoder framework. Spatial and temporal information is explored through a multi-stream convolutional neural network in the context of human action recognition. Experimental results on the challenging UCF101 and HMDB51 datasets demonstrate that our representation is capable of achieving competitive accuracy rates when compared to other approaches available in the literature.

2021 ◽  
Vol 11 (11) ◽  
pp. 4940
Author(s):  
Jinsoo Kim ◽  
Jeongho Cho

The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.


2020 ◽  
Vol 79 (17-18) ◽  
pp. 12349-12371
Author(s):  
Qingshan She ◽  
Gaoyuan Mu ◽  
Haitao Gan ◽  
Yingle Fan

2020 ◽  
Vol 10 (12) ◽  
pp. 4412
Author(s):  
Ammar Mohsin Butt ◽  
Muhammad Haroon Yousaf ◽  
Fiza Murtaza ◽  
Saima Nazir ◽  
Serestina Viriri ◽  
...  

Human action recognition has gathered significant attention in recent years due to its high demand in various application domains. In this work, we propose a novel codebook generation and hybrid encoding scheme for classification of action videos. The proposed scheme develops a discriminative codebook and a hybrid feature vector by encoding the features extracted from CNNs (convolutional neural networks). We explore different CNN architectures for extracting spatio-temporal features. We employ an agglomerative clustering approach for codebook generation, which intends to combine the advantages of global and class-specific codebooks. We propose a Residual Vector of Locally Aggregated Descriptors (R-VLAD) and fuse it with locality-based coding to form a hybrid feature vector. It provides a compact representation along with high order statistics. We evaluated our work on two publicly available standard benchmark datasets HMDB-51 and UCF-101. The proposed method achieves 72.6% and 96.2% on HMDB51 and UCF101, respectively. We conclude that the proposed scheme is able to boost recognition accuracy for human action recognition.


Sign in / Sign up

Export Citation Format

Share Document