Combining Spatio-Temporal Appearance Descriptors and Optical Flow for Human Action Recognition in Video Data

The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.

Download Full-text

Human action recognition based on two-view optical flow in the transformed domain

2014 IEEE 57th International Midwest Symposium on Circuits and Systems (MWSCAS) ◽

10.1109/mwscas.2014.6908537 ◽

2014 ◽

Author(s):

Mohamed A. Abdelwahab ◽

Moataz M. Abdelwahab

Keyword(s):

Optical Flow ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action

Download Full-text

Human action recognition based on spatio-temporal three-dimensional scattering transform descriptor and an improved VLAD feature encoding algorithm

Neurocomputing ◽

10.1016/j.neucom.2018.05.121 ◽

2019 ◽

Vol 348 ◽

pp. 145-157 ◽

Cited By ~ 1

Author(s):

Bo Lin ◽

Bin Fang ◽

Weibin Yang ◽

Jiye Qian

Keyword(s):

Action Recognition ◽

Three Dimensional ◽

Human Action Recognition ◽

Human Action ◽

Scattering Transform ◽

Feature Encoding ◽

Spatio Temporal

Download Full-text

VIEW-ROBUST HUMAN ACTION RECOGNITION BASED ON SPATIO-TEMPORAL SELF SIMILARITIES

JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES ◽

10.26782/jmcms.2020.01.00010 ◽

2020 ◽

Vol 15 (1) ◽

Author(s):

K. Pradeep Reddy

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Spatio Temporal

Download Full-text

Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos

MultiMedia Modeling - Lecture Notes in Computer Science ◽

10.1007/978-3-319-51811-4_30 ◽

2016 ◽

pp. 365-378 ◽

Cited By ~ 13

Author(s):

Ionut C. Duta ◽

Bogdan Ionescu ◽

Kiyoharu Aizawa ◽

Nicu Sebe

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Spatio Temporal

Download Full-text

Human action recognition in video data using invariant characteristic vectors

2012 19th IEEE International Conference on Image Processing ◽

10.1109/icip.2012.6467127 ◽

2012 ◽

Cited By ~ 1

Author(s):

Nazim Ashraf ◽

Hassan Foroosh

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Video Data ◽

Invariant Characteristic

Download Full-text

Spatio-temporal SRU with global context-aware attention for 3D human action recognition

Multimedia Tools and Applications ◽

10.1007/s11042-019-08587-w ◽

2020 ◽

Vol 79 (17-18) ◽

pp. 12349-12371

Author(s):

Qingshan She ◽

Gaoyuan Mu ◽

Haitao Gan ◽

Yingle Fan

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Context Aware ◽

Global Context ◽

Spatio Temporal

Download Full-text

Human Action Recognition Based on Fusion Features Extraction of Adaptive Background Subtraction and Optical Flow Model

Mathematical Problems in Engineering ◽

10.1155/2015/387464 ◽

2015 ◽

Vol 2015 ◽

pp. 1-11 ◽

Cited By ~ 5

Author(s):

Shaoping Zhu ◽

Limin Xia

Keyword(s):

Optical Flow ◽

Action Recognition ◽

Background Subtraction ◽

Flow Model ◽

Feature Vector ◽

Human Action Recognition ◽

Human Action ◽

Multiple Instance Learning ◽

Data Sets ◽

Flow Feature

A novel method based on hybrid feature is proposed for human action recognition in video image sequences, which includes two stages of feature extraction and action recognition. Firstly, we use adaptive background subtraction algorithm to extract global silhouette feature and optical flow model to extract local optical flow feature. Then we combine global silhouette feature vector and local optical flow feature vector to form a hybrid feature vector. Secondly, in order to improve the recognition accuracy, we use an optimized Multiple Instance Learning algorithm to recognize human actions, in which an Iterative Querying Heuristic (IQH) optimization algorithm is used to train the Multiple Instance Learning model. We demonstrate that our hybrid feature-based action representation can effectively classify novel actions on two different data sets. Experiments show that our results are comparable to, and significantly better than, the results of two state-of-the-art approaches on these data sets, which meets the requirements of stable, reliable, high precision, and anti-interference ability and so forth.

Download Full-text

Agglomerative Clustering and Residual-VLAD Encoding for Human Action Recognition

Applied Sciences ◽

10.3390/app10124412 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4412

Author(s):

Ammar Mohsin Butt ◽

Muhammad Haroon Yousaf ◽

Fiza Murtaza ◽

Saima Nazir ◽

Serestina Viriri ◽

...

Keyword(s):

Action Recognition ◽

Feature Vector ◽

Human Action Recognition ◽

Human Action ◽

Compact Representation ◽

Agglomerative Clustering ◽

Residual Vector ◽

Benchmark Datasets ◽

Codebook Generation ◽

Spatio Temporal

Human action recognition has gathered significant attention in recent years due to its high demand in various application domains. In this work, we propose a novel codebook generation and hybrid encoding scheme for classification of action videos. The proposed scheme develops a discriminative codebook and a hybrid feature vector by encoding the features extracted from CNNs (convolutional neural networks). We explore different CNN architectures for extracting spatio-temporal features. We employ an agglomerative clustering approach for codebook generation, which intends to combine the advantages of global and class-specific codebooks. We propose a Residual Vector of Locally Aggregated Descriptors (R-VLAD) and fuse it with locality-based coding to form a hybrid feature vector. It provides a compact representation along with high order statistics. We evaluated our work on two publicly available standard benchmark datasets HMDB-51 and UCF-101. The proposed method achieves 72.6% and 96.2% on HMDB51 and UCF101, respectively. We conclude that the proposed scheme is able to boost recognition accuracy for human action recognition.

Download Full-text