scholarly journals Searching Human Action Recognition Accuracy from Depth Video Sequences Using HOG and PHOG Shape Features

2018 ◽  
Vol 6 (5) ◽  
Author(s):  
Mohammad Farhad Bulbul
2019 ◽  
Vol 16 (1) ◽  
pp. 172988141882509 ◽  
Author(s):  
Hanbo Wu ◽  
Xin Ma ◽  
Yibin Li

Temporal information plays a significant role in video-based human action recognition. How to effectively extract the spatial–temporal characteristics of actions in videos has always been a challenging problem. Most existing methods acquire spatial and temporal cues in videos individually. In this article, we propose a new effective representation for depth video sequences, called hierarchical dynamic depth projected difference images that can aggregate the action spatial and temporal information simultaneously at different temporal scales. We firstly project depth video sequences onto three orthogonal Cartesian views to capture the 3D shape and motion information of human actions. Hierarchical dynamic depth projected difference images are constructed with the rank pooling in each projected view to hierarchically encode the spatial–temporal motion dynamics in depth videos. Convolutional neural networks can automatically learn discriminative features from images and have been extended to video classification because of their superior performance. To verify the effectiveness of hierarchical dynamic depth projected difference images representation, we construct a hierarchical dynamic depth projected difference images–based action recognition framework where hierarchical dynamic depth projected difference images in three views are fed into three identical pretrained convolutional neural networks independently for finely retuning. We design three classification schemes in the framework and different schemes utilize different convolutional neural network layers to compare their effects on action recognition. Three views are combined to describe the actions more comprehensively in each classification scheme. The proposed framework is evaluated on three challenging public human action data sets. Experiments indicate that our method has better performance and can provide discriminative spatial–temporal information for human action recognition in depth videos.


Author(s):  
Antonio Tejero-de-Pablos ◽  
Yuta Nakashima ◽  
Naokazu Yokoya ◽  
Francisco-Javier Díaz-Pernas ◽  
Mario Martínez-Zarzuela

2021 ◽  
Vol 11 (11) ◽  
pp. 4940
Author(s):  
Jinsoo Kim ◽  
Jeongho Cho

The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.


2013 ◽  
Vol 631-632 ◽  
pp. 1303-1308
Author(s):  
He Jin Yuan

A novel human action recognition algorithm based on key posture is proposed in this paper. In the method, the mesh features of each image in human action sequences are firstly calculated; then the key postures of the human mesh features are generated through k-medoids clustering algorithm; and the motion sequences are thus represented as vectors of key postures. The component of the vector is the occurrence number of the corresponding posture included in the action. For human action recognition, the observed action is firstly changed into key posture vector; then the correlevant coefficients to the training samples are calculated and the action which best matches the observed sequence is chosen as the final category. The experiments on Weizmann dataset demonstrate that our method is effective for human action recognition. The average recognition accuracy can exceed 90%.


2021 ◽  
Vol 2021 ◽  
pp. 1-6
Author(s):  
Qiulin Wang ◽  
Baole Tao ◽  
Fulei Han ◽  
Wenting Wei

The extraction and recognition of human actions has always been a research hotspot in the field of state recognition. It has a wide range of application prospects in many fields. In sports, it can reduce the occurrence of accidental injuries and improve the training level of basketball players. How to extract effective features from the dynamic body movements of basketball players is of great significance. In order to improve the fairness of the basketball game, realize the accurate recognition of the athletes’ movements, and simultaneously improve the level of the athletes and regulate the movements of the athletes during training, this article uses deep learning to extract and recognize the movements of the basketball players. This paper implements human action recognition algorithm based on deep learning. This method automatically extracts image features through convolution kernels, which greatly improves the efficiency compared with traditional manual feature extraction methods. This method uses the deep convolutional neural network VGG model on the TensorFlow platform to extract and recognize human actions. On the Matlab platform, the KTH and Weizmann datasets are preprocessed to obtain the input image set. Then, the preprocessed dataset is used to train the model to obtain the optimal network model and corresponding data by testing the two datasets. Finally, the two datasets are analyzed in detail, and the specific cause of each action confusion is given. Simultaneously, the recognition accuracy and average recognition accuracy rates of each action category are calculated. The experimental results show that the human action recognition algorithm based on deep learning obtains a higher recognition accuracy rate.


2020 ◽  
Vol 29 (12) ◽  
pp. 2050190
Author(s):  
Amel Ben Mahjoub ◽  
Mohamed Atri

Action recognition is a very effective method of computer vision areas. In the last few years, there has been a growing interest in Deep learning networks as the Long Short–Term Memory (LSTM) architectures due to their efficiency in long-term time sequence processing. In the light of these recent events in deep neural networks, there is now considerable concern about the development of an accurate action recognition approach with low complexity. This paper aims to introduce a method for learning depth activity videos based on the LSTM and the classification fusion. The first step consists in extracting compact depth video features. We start with the calculation of Depth Motion Maps (DMM) from each sequence. Then we encode and concatenate contour and texture DMM characteristics using the histogram-of-oriented-gradient and local-binary-patterns descriptors. The second step is the depth video classification based on the naive Bayes fusion approach. Training three classifiers, which are the collaborative representation classifier, the kernel-based extreme learning machine and the LSTM, is done separately to get classification scores. Finally, we fuse the classification score outputs of all classifiers with the naive Bayesian method to get a final predicted label. Our proposed method achieves a significant improvement in the recognition rate compared to previous work that has used Kinect v2 and UTD-MHAD human action datasets.


Electronics ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 1993
Author(s):  
Malik Ali Gul ◽  
Muhammad Haroon Yousaf ◽  
Shah Nawaz ◽  
Zaka Ur Rehman ◽  
HyungWon Kim

Human action recognition has emerged as a challenging research domain for video understanding and analysis. Subsequently, extensive research has been conducted to achieve the improved performance for recognition of human actions. Human activity recognition has various real time applications, such as patient monitoring in which patients are being monitored among a group of normal people and then identified based on their abnormal activities. Our goal is to render a multi class abnormal action detection in individuals as well as in groups from video sequences to differentiate multiple abnormal human actions. In this paper, You Look only Once (YOLO) network is utilized as a backbone CNN model. For training the CNN model, we constructed a large dataset of patient videos by labeling each frame with a set of patient actions and the patient’s positions. We retrained the back-bone CNN model with 23,040 labeled images of patient’s actions for 32 epochs. Across each frame, the proposed model allocated a unique confidence score and action label for video sequences by finding the recurrent action label. The present study shows that the accuracy of abnormal action recognition is 96.8%. Our proposed approach differentiated abnormal actions with improved F1-Score of 89.2% which is higher than state-of-the-art techniques. The results indicate that the proposed framework can be beneficial to hospitals and elder care homes for patient monitoring.


Sign in / Sign up

Export Citation Format

Share Document