scholarly journals RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Yun Liu ◽  
Ruidi Ma ◽  
Hui Li ◽  
Chuanxu Wang ◽  
Ye Tao

Action recognition is an important research direction of computer vision, whose performance based on video images is easily affected by factors such as background and light, while deep video images can better reduce interference and improve recognition accuracy. Therefore, this paper makes full use of video and deep skeleton data and proposes an RGB-D action recognition based two-stream network (SV-GCN), which can be described as a two-stream architecture that works with two different data. Proposed Nonlocal-stgcn (S-Stream) based on skeleton data, by adding nonlocal to obtain dependency relationship between a wider range of joints, to provide more rich skeleton point features for the model, proposed a video based Dilated-slowfastnet (V-Stream), which replaces traditional random sampling layer with dilated convolutional layers, which can make better use of depth the feature; finally, two stream information is fused to realize action recognition. The experimental results on NTU-RGB+D dataset show that proposed method significantly improves recognition accuracy and is superior to st-gcn and Slowfastnet in both CS and CV.

2021 ◽  
Vol 11 (11) ◽  
pp. 4940
Author(s):  
Jinsoo Kim ◽  
Jeongho Cho

The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.


2013 ◽  
Vol 631-632 ◽  
pp. 1303-1308
Author(s):  
He Jin Yuan

A novel human action recognition algorithm based on key posture is proposed in this paper. In the method, the mesh features of each image in human action sequences are firstly calculated; then the key postures of the human mesh features are generated through k-medoids clustering algorithm; and the motion sequences are thus represented as vectors of key postures. The component of the vector is the occurrence number of the corresponding posture included in the action. For human action recognition, the observed action is firstly changed into key posture vector; then the correlevant coefficients to the training samples are calculated and the action which best matches the observed sequence is chosen as the final category. The experiments on Weizmann dataset demonstrate that our method is effective for human action recognition. The average recognition accuracy can exceed 90%.


2021 ◽  
Vol 2021 ◽  
pp. 1-6
Author(s):  
Qiulin Wang ◽  
Baole Tao ◽  
Fulei Han ◽  
Wenting Wei

The extraction and recognition of human actions has always been a research hotspot in the field of state recognition. It has a wide range of application prospects in many fields. In sports, it can reduce the occurrence of accidental injuries and improve the training level of basketball players. How to extract effective features from the dynamic body movements of basketball players is of great significance. In order to improve the fairness of the basketball game, realize the accurate recognition of the athletes’ movements, and simultaneously improve the level of the athletes and regulate the movements of the athletes during training, this article uses deep learning to extract and recognize the movements of the basketball players. This paper implements human action recognition algorithm based on deep learning. This method automatically extracts image features through convolution kernels, which greatly improves the efficiency compared with traditional manual feature extraction methods. This method uses the deep convolutional neural network VGG model on the TensorFlow platform to extract and recognize human actions. On the Matlab platform, the KTH and Weizmann datasets are preprocessed to obtain the input image set. Then, the preprocessed dataset is used to train the model to obtain the optimal network model and corresponding data by testing the two datasets. Finally, the two datasets are analyzed in detail, and the specific cause of each action confusion is given. Simultaneously, the recognition accuracy and average recognition accuracy rates of each action category are calculated. The experimental results show that the human action recognition algorithm based on deep learning obtains a higher recognition accuracy rate.


Author(s):  
Xueping Liu ◽  
Yibo Li ◽  
Qingjun Wang

Human action recognition based on depth video sequence is an important research direction in the field of computer vision. The present study proposed a classification framework based on hierarchical multi-view to resolve depth video sequence-based action recognition. Herein, considering the distinguishing feature of 3D human action space, we project the 3D human action image to three coordinate planes, so that the 3D depth image is converted to three 2D images, and then feed them to three subnets, respectively. With the increase of the number of layers, the representations of subnets are hierarchically fused to be the inputs of next layers. The final representations of the depth video sequence are fed into a single layer perceptron, and the final result is decided by the time accumulated through the output of the perceptron. We compare with other methods on two publicly available datasets, and we also verify the proposed method through the human action database acquired by our Kinect system. Our experimental results demonstrate that our model has high computational efficiency and achieves the performance of state-of-the-art method.


Animals ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. 485
Author(s):  
Liqi Feng ◽  
Yaqin Zhao ◽  
Yichao Sun ◽  
Wenxuan Zhao ◽  
Jiaxi Tang

Behavior analysis of wild felines has significance for the protection of a grassland ecological environment. Compared with human action recognition, fewer researchers have focused on feline behavior analysis. This paper proposes a novel two-stream architecture that incorporates spatial and temporal networks for wild feline action recognition. The spatial portion outlines the object region extracted by Mask region-based convolutional neural network (R-CNN) and builds a Tiny Visual Geometry Group (VGG) network for static action recognition. Compared with VGG16, the Tiny VGG network can reduce the number of network parameters and avoid overfitting. The temporal part presents a novel skeleton-based action recognition model based on the bending angle fluctuation amplitude of the knee joints in a video clip. Due to its temporal features, the model can effectively distinguish between different upright actions, such as standing, ambling, and galloping, particularly when the felines are occluded by objects such as plants, fallen trees, and so on. The experimental results showed that the proposed two-stream network model can effectively outline the wild feline targets in captured images and can significantly improve the performance of wild feline action recognition due to its spatial and temporal features.


2020 ◽  
Vol 8 (6) ◽  
pp. 1556-1566

Human Action Recognition is a key research direction and also a trending topic in several fields like machine learning, computer vision and other fields. The main objective of this research is to recognize the human action in image of video. However, the existing approaches have many limitations like low recognition accuracy and non-robustness. Hence, this paper focused to develop a novel and robust Human Action Recognition framework. In this framework, we proposed a new feature extraction technique based on the Gabor Transform and Dual Tree Complex Wavelet Transform. These two feature extraction techniques helps in the extraction of perfect discriminative features by which the actions present in the image or video are correctly recognized. Later, the proposed framework accomplished the Support Vector Machine algorithm as a classifier. Simulation experiments are conducted over two standard datasets such as KTH and Weizmann. Experimental results reveal that the proposed framework achieves better performance compared to state-of-art recognition methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Xunyun Chang ◽  
Liangqing Peng

The purpose is to study the interactive teaching mode of human action recognition technology in music and dance teaching under computer vision. The human action detection and recognition system based on a three-dimensional (3D) convolutional neural network (CNN) is established. Then, a human action recognition model based on the dual channel is proposed on the basis of CNN, and the visual attention mechanism using the interframe differential channel is introduced into the model. Through experiments, the performance of the system in the process of human dance image recognition based on the Kungliga Tekniska Högskolan (KTH) dataset is verified. The results show that the dual-channel 3D CNN human action recognition system can achieve high accuracy in the first few rounds of training after the frame difference channel is added, the error can be reduced quickly, and the convergence can start quickly; the recognition accuracy of the system on KTH dataset is 96.6%, which is higher than that of other methods; for 3 × 3 × 3 basic convolution kernel, the best performance of the classification network can be obtained by pushing forward 0.0091 seconds in the calculation. Thereby, the dual-channel 3D CNN recognition system has good human action recognition accuracy in the dance interactive teaching mode of music teaching.


Sign in / Sign up

Export Citation Format

Share Document