scholarly journals Combining Spatio-Temporal Appearance Descriptors and Optical Flow for Human Action Recognition in Video Data

2014 ◽  
Author(s):  
Karla Brkić ◽  
Srđan Rašić ◽  
Axel Pinz ◽  
Siniša Šegvić ◽  
Zoran Kalafatić
2021 ◽  
Vol 11 (11) ◽  
pp. 4940
Author(s):  
Jinsoo Kim ◽  
Jeongho Cho

The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.


2020 ◽  
Vol 79 (17-18) ◽  
pp. 12349-12371
Author(s):  
Qingshan She ◽  
Gaoyuan Mu ◽  
Haitao Gan ◽  
Yingle Fan

2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Shaoping Zhu ◽  
Limin Xia

A novel method based on hybrid feature is proposed for human action recognition in video image sequences, which includes two stages of feature extraction and action recognition. Firstly, we use adaptive background subtraction algorithm to extract global silhouette feature and optical flow model to extract local optical flow feature. Then we combine global silhouette feature vector and local optical flow feature vector to form a hybrid feature vector. Secondly, in order to improve the recognition accuracy, we use an optimized Multiple Instance Learning algorithm to recognize human actions, in which an Iterative Querying Heuristic (IQH) optimization algorithm is used to train the Multiple Instance Learning model. We demonstrate that our hybrid feature-based action representation can effectively classify novel actions on two different data sets. Experiments show that our results are comparable to, and significantly better than, the results of two state-of-the-art approaches on these data sets, which meets the requirements of stable, reliable, high precision, and anti-interference ability and so forth.


2020 ◽  
Vol 10 (12) ◽  
pp. 4412
Author(s):  
Ammar Mohsin Butt ◽  
Muhammad Haroon Yousaf ◽  
Fiza Murtaza ◽  
Saima Nazir ◽  
Serestina Viriri ◽  
...  

Human action recognition has gathered significant attention in recent years due to its high demand in various application domains. In this work, we propose a novel codebook generation and hybrid encoding scheme for classification of action videos. The proposed scheme develops a discriminative codebook and a hybrid feature vector by encoding the features extracted from CNNs (convolutional neural networks). We explore different CNN architectures for extracting spatio-temporal features. We employ an agglomerative clustering approach for codebook generation, which intends to combine the advantages of global and class-specific codebooks. We propose a Residual Vector of Locally Aggregated Descriptors (R-VLAD) and fuse it with locality-based coding to form a hybrid feature vector. It provides a compact representation along with high order statistics. We evaluated our work on two publicly available standard benchmark datasets HMDB-51 and UCF-101. The proposed method achieves 72.6% and 96.2% on HMDB51 and UCF101, respectively. We conclude that the proposed scheme is able to boost recognition accuracy for human action recognition.


Sign in / Sign up

Export Citation Format

Share Document