scholarly journals Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review

Micromachines ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 72
Author(s):  
Dengshan Li ◽  
Rujing Wang ◽  
Peng Chen ◽  
Chengjun Xie ◽  
Qiong Zhou ◽  
...  

Video object and human action detection are applied in many fields, such as video surveillance, face recognition, etc. Video object detection includes object classification and object location within the frame. Human action recognition is the detection of human actions. Usually, video detection is more challenging than image detection, since video frames are often more blurry than images. Moreover, video detection often has other difficulties, such as video defocus, motion blur, part occlusion, etc. Nowadays, the video detection technology is able to implement real-time detection, or high-accurate detection of blurry video frames. In this paper, various video object and human action detection approaches are reviewed and discussed, many of them have performed state-of-the-art results. We mainly review and discuss the classic video detection methods with supervised learning. In addition, the frequently-used video object detection and human action recognition datasets are reviewed. Finally, a summarization of the video detection is represented, e.g., the video object and human action detection methods could be classified into frame-by-frame (frame-based) detection, extracting-key-frame detection and using-temporal-information detection; the methods of utilizing temporal information of adjacent video frames are mainly the optical flow method, Long Short-Term Memory and convolution among adjacent frames.

Author(s):  
Dianting Liu ◽  
Yilin Yan ◽  
Mei-Ling Shyu ◽  
Guiru Zhao ◽  
Min Chen

Understanding semantic meaning of human actions captured in unconstrained environments has broad applications in fields ranging from patient monitoring, human-computer interaction, to surveillance systems. However, while great progresses have been achieved on automatic human action detection and recognition in videos that are captured in controlled/constrained environments, most existing approaches perform unsatisfactorily on videos with uncontrolled/unconstrained conditions (e.g., significant camera motion, background clutter, scaling, and light conditions). To address this issue, the authors propose a robust human action detection and recognition framework that works effectively on videos taken in controlled or uncontrolled environments. Specifically, the authors integrate the optical flow field and Harris3D corner detector to generate a new spatial-temporal information representation for each video sequence, from which the general Gaussian mixture model (GMM) is learned. All the mean vectors of the Gaussian components in the generated GMM model are concatenated to create the GMM supervector for video action recognition. They build a boosting classifier based on a set of sparse representation classifiers and hamming distance classifiers to improve the accuracy of action recognition. The experimental results on two broadly used public data sets, KTH and UCF YouTube Action, show that the proposed framework outperforms the other state-of-the-art approaches on both action detection and recognition.


Author(s):  
Prof. Rajeshwari. J. Kodulkar

Abstract: In deep neural networks, human action detection is one of the most demanding and complex tasks. Human gesture recognition is the same as human action recognition. Gesture is defined as a series of bodily motions that communicate a message. Gestures are a more natural and preferable way for humans to engage with computers, thereby bridging the gap between humans and robots. The finest communication platform for the deaf and dumb is human action recognition. We propose in this work to create a system for hand gesture identification that recognizes hand movements, hand characteristics such as peak calculation and angle calculation, and then converts gesture photos into text. Index Terms: Human action recognition, Deaf and dumb, CNN.


2021 ◽  
Author(s):  
Shibin Xuan ◽  
Kuan Wang ◽  
Lixia Liu ◽  
Chang Liu ◽  
Jiaxiang Li

Skeleton-based human action recognition is a research hotspot in recent years, but most of the research focuses on the spatio-temporal feature extraction by convolutional neural network. In order to improve the correct recognition rate of these models, this paper proposes three strategies: using algebraic method to reduce redundant video frames, adding auxiliary edges into the joint adjacency graph to improve the skeleton graph structure, and adding some virtual classes to disperse the error recognition rate. Experimental results on NTU-RGB-D60, NTU-RGB-D120 and Kinetics Skeleton 400 databases show that the proposed strategy can effectively improve the accuracy of the original algorithm.


2014 ◽  
Vol 981 ◽  
pp. 331-334
Author(s):  
Ming Yang ◽  
Yong Yang

In this paper, we introduce the high performance Deformable part models from object detection into human action recognition and localization and propose a unified method to detect action in video sequences. The Deformable part models have attracted intensive attention in the field of object detection. We generalize the approach from 2D still images to 3D spatiotemporal volumes. The human actions are described by 3D histograms of oriented gradients based features. Different poses are presented by mixture of models on different resolutions. The model autonomously selects the most discriminative 3D parts and learns their anchor positions related to the root. Empirical results on several video datasets prove the efficacy of our proposed method on both action recognition and localization.


The activities of human can be classified into human actions, interactions, object- human interactions and group actions. The recognition of actions in the input video is very much useful in computer vision technology. This system gives application to develop a model that can detect and recognize the actions. The variety of HAR applications are Surveillance environment systems, healthcare systems, Military, patient monitoring systems (PMS), etc., that involve interactions between electronic devices such as human-computer interfaces with persons. Initially collected the videos containing actions or interactions were performed by the humans. The given input videos were converted into number of frames and then these frames were undergone preprocessing stage using by applying median filter. The median filter identifies the noises present in the frame and then which replaces the noise by the median of the neighboring pixels. Through frames desired features were extracted. The recognize of action present in the person of the video using these extracted features. There are three spatial temporal interest point (STIP) techniques such as Harris SPIT, Gabour SPIT and HOG SPIT were used for feature extraction from video frames. SVM algorithm is applied for classifying the extracted feature. The action recognition is based on the colored label identified by classifier. The system performance is measured by calculating the classifier performance which is the Accuracy, Sensitivity and Specificity. The accuracy represents the classifier reliability. The specificity and sensitivity represents how exactly the classifier categorizes it’s features to each correct category and how the classifier rejects the features that are not belonging to the particular correct category


Sensors ◽  
2019 ◽  
Vol 19 (5) ◽  
pp. 1005 ◽  
Author(s):  
Hong-Bo Zhang ◽  
Yi-Xiang Zhang ◽  
Bineng Zhong ◽  
Qing Lei ◽  
Lijie Yang ◽  
...  

Although widely used in many applications, accurate and efficient human action recognition remains a challenging area of research in the field of computer vision. Most recent surveys have focused on narrow problems such as human action recognition methods using depth data, 3D-skeleton data, still image data, spatiotemporal interest point-based methods, and human walking motion recognition. However, there has been no systematic survey of human action recognition. To this end, we present a thorough review of human action recognition methods and provide a comprehensive overview of recent approaches in human action recognition research, including progress in hand-designed action features in RGB and depth data, current deep learning-based action feature representation methods, advances in human–object interaction recognition methods, and the current prominent research topic of action detection methods. Finally, we present several analysis recommendations for researchers. This survey paper provides an essential reference for those interested in further research on human action recognition.


Author(s):  
Somaya Maadeed ◽  
Noor Almaadeed ◽  
Omar Elharrouss

Face recognition and video summarization represent challenging tasks for several computer vision applications including video surveillance, criminal investigations, and sports applications. For long videos, it is difficult to search within a video for a specific action and/or person. Usually, human action recognition approaches presented in the literature deal with videos that contain only a single person, and they are able to recognize his action. This paper proposes an effective approach to multiple human action detection, recognition, and summarization. The multiple action detection extracts human bodies’ silhouette then generates a specific sequence for each one of them using motion detection and tracking method. Each of the extracted sequences is then divided into shots that represent homogeneous actions in the sequence using the similarity between each pair frames. Using the histogram of the oriented gradient (HOG) of the temporal difference map (TDMap) of the frames of each shot, we recognize the action by performing a comparison between the generated HOG and the existed HOGs in the training phase which represents all the HOGs of many actions using a set of videos for training.


2013 ◽  
Vol 18 (2-3) ◽  
pp. 49-60 ◽  
Author(s):  
Damian Dudzńiski ◽  
Tomasz Kryjak ◽  
Zbigniew Mikrut

Abstract In this paper a human action recognition algorithm, which uses background generation with shadow elimination, silhouette description based on simple geometrical features and a finite state machine for recognizing particular actions is described. The performed tests indicate that this approach obtains a 81 % correct recognition rate allowing real-time image processing of a 360 X 288 video stream.


2018 ◽  
Vol 6 (10) ◽  
pp. 323-328
Author(s):  
K.Kiruba . ◽  
D. Shiloah Elizabeth ◽  
C Sunil Retmin Raj

Sign in / Sign up

Export Citation Format

Share Document