scholarly journals Local Feature Extraction from RGB and Depth Videos for Human Action Recognition

2018 ◽  
Vol 8 (3) ◽  
pp. 274-279 ◽  
Author(s):  
Rawya Al-Akam ◽  
◽  
Dietrich Paulus

The present The present situation is having many challenges in security and surveillance of Human Action recognition (HAR). HAR has many fields and many techniques to provide modern and technical action implementation. We have studied multiple parameters and techniques used in HAR. We have come out with a list of outcomes and drawbacks of each technique present in different researches. This paper presents the survey on the complete process of recognition of human activity and provides survey on different Motion History Imaging (MHI) methods, model based, multiview and multiple feature extraction based recognition methods.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1656
Author(s):  
Min Dong ◽  
Zhenglin Fang ◽  
Yongfa Li ◽  
Sheng Bi ◽  
Jiangcheng Chen

At present, in the field of video-based human action recognition, deep neural networks are mainly divided into two branches: the 2D convolutional neural network (CNN) and 3D CNN. However, 2D CNN’s temporal and spatial feature extraction processes are independent of each other, which means that it is easy to ignore the internal connection, affecting the performance of recognition. Although 3D CNN can extract the temporal and spatial features of the video sequence at the same time, the parameters of the 3D model increase exponentially, resulting in the model being difficult to train and transfer. To solve this problem, this article is based on 3D CNN combined with a residual structure and attention mechanism to improve the existing 3D CNN model, and we propose two types of human action recognition models (the Residual 3D Network (R3D) and Attention Residual 3D Network (AR3D)). Firstly, in this article, we propose a shallow feature extraction module and improve the ordinary 3D residual structure, which reduces the parameters and strengthens the extraction of temporal features. Secondly, we explore the application of the attention mechanism in human action recognition and design a 3D spatio-temporal attention mechanism module to strengthen the extraction of global features of human action. Finally, in order to make full use of the residual structure and attention mechanism, an Attention Residual 3D Network (AR3D) is proposed, and its two fusion strategies and corresponding model structure (AR3D_V1, AR3D_V2) are introduced in detail. Experiments show that the fused structure shows different degrees of performance improvement compared to a single structure.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Haiyun Wang ◽  
Shujun Hu

With the rapid development of computer vision technology, human action recognition technology has occupied an important position in this field. The basic human action recognition system is mainly composed of three parts: moving target detection, feature extraction, and human action recognition. In order to understand the action signs of gymnastics, this article uses network communication and contour feature extraction to extract different morphological features during gymnastics. Then, the finite difference algorithm of edge curvature is used to classify different gymnastic actions and analyze and discuss the Gaussian background. A modular method, an improved hybrid Gaussian modeling method, is proposed, which adaptively selects the number of Gaussian distributions. The research results show that, compared with traditional contour extraction, the resolution of gymnastic motion features extracted through network communication and body contour features is clearer, and the increase rate is more than 30%. Moreover, the method proposed in this paper removes noise in the image extraction process, the effect is good, and the athlete’s action marks are very clear, which can achieve the research goal.


2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
Bin Wang ◽  
Yu Liu ◽  
Wei Wang ◽  
Wei Xu ◽  
Maojun Zhang

We propose a Multiscale Locality-Constrained Spatiotemporal Coding (MLSC) method to improve the traditional bag of features (BoF) algorithm which ignores the spatiotemporal relationship of local features for human action recognition in video. To model this spatiotemporal relationship, MLSC involves the spatiotemporal position of local feature into feature coding processing. It projects local features into a sub space-time-volume (sub-STV) and encodes them with a locality-constrained linear coding. A group of sub-STV features obtained from one video with MLSC and max-pooling are used to classify this video. In classification stage, the Locality-Constrained Group Sparse Representation (LGSR) is adopted to utilize the intrinsic group information of these sub-STV features. The experimental results on KTH, Weizmann, and UCF sports datasets show that our method achieves better performance than the competing local spatiotemporal feature-based human action recognition methods.


Sign in / Sign up

Export Citation Format

Share Document