scholarly journals Action Recognition by Joint Spatial-Temporal Motion Feature

2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Weihua Zhang ◽  
Yi Zhang ◽  
Chaobang Gao ◽  
Jiliu Zhou

This paper introduces a method for human action recognition based on optical flow motion features extraction. Automatic spatial and temporal alignments are combined together in order to encourage the temporal consistence on each action by an enhanced dynamic time warping (DTW) algorithm. At the same time, a fast method based on coarse-to-fine DTW constraint to improve computational performance without reducing accuracy is induced. The main contributions of this study include (1) a joint spatial-temporal multiresolution optical flow computation method which can keep encoding more informative motion information than recent proposed methods, (2) an enhanced DTW method to improve temporal consistence of motion in action recognition, and (3) coarse-to-fine DTW constraint on motion features pyramids to speed up recognition performance. Using this method, high recognition accuracy is achieved on different action databases like Weizmann database and KTH database.

Drones ◽  
2021 ◽  
Vol 5 (3) ◽  
pp. 87
Author(s):  
Ketan Kotecha ◽  
Deepak Garg ◽  
Balmukund Mishra ◽  
Pratik Narang ◽  
Vipual Kumar Mishra

Visual data collected from drones has opened a new direction for surveillance applications and has recently attracted considerable attention among computer vision researchers. Due to the availability and increasing use of the drone for both public and private sectors, it is a critical futuristic technology to solve multiple surveillance problems in remote areas. One of the fundamental challenges in recognizing crowd monitoring videos’ human action is the precise modeling of an individual’s motion feature. Most state-of-the-art methods heavily rely on optical flow for motion modeling and representation, and motion modeling through optical flow is a time-consuming process. This article underlines this issue and provides a novel architecture that eliminates the dependency on optical flow. The proposed architecture uses two sub-modules, FMFM (faster motion feature modeling) and AAR (accurate action recognition), to accurately classify the aerial surveillance action. Another critical issue in aerial surveillance is a deficiency of the dataset. Out of few datasets proposed recently, most of them have multiple humans performing different actions in the same scene, such as a crowd monitoring video, and hence not suitable for directly applying to the training of action recognition models. Given this, we have proposed a novel dataset captured from top view aerial surveillance that has a good variety in terms of actors, daytime, and environment. The proposed architecture has shown the capability to be applied in different terrain as it removes the background before using the action recognition model. The proposed architecture is validated through the experiment with varying investigation levels and achieves a remarkable performance of 0.90 validation accuracy in aerial action recognition.


2013 ◽  
Vol 373-375 ◽  
pp. 1188-1191
Author(s):  
Ju Zhong ◽  
Hua Wen Liu ◽  
Chun Li Lin

The extraction methods of both the shape feature based on Fourier descriptors and the motion feature in time domain were introduced. These features were fused to get a hybrid feature which had higher distinguish ability. This combined representation was used for human action recognition. The experimental results show the proposed hybrid feature has efficient recognition performance in the Weizmann action database .


2010 ◽  
Vol 22 (3) ◽  
pp. 413-426 ◽  
Author(s):  
Andrea Serino ◽  
Laura De Filippo ◽  
Chiara Casavecchia ◽  
Michela Coccia ◽  
Maggie Shiffrar ◽  
...  

Several studies have shown that the motor system is involved in action perception, suggesting that action concepts are represented through sensory–motor processes. Such conclusions imply that motor system impairments should diminish action perception. To test this hypothesis, a group of 10 brain-damaged patients with hemiplegia (specifically, a lesion at the motor system that affected the contralesional arm) viewed point-light displays of arm gestures and attempted to name each gesture. To create the dynamic stimuli, patients individually performed simple gestures with their unaffected arm while being videotaped. The videotapes were converted into point-light animations. Each action was presented as it had been performed, that is, as having been produced by the observer's unaffected arm, and in its mirror reversed orientation, that is, as having been produced by the observer's hemiplegic arm. Action recognition accuracy by patients with hemiplegia was compared with that by 8 brain-damaged patients without any motor deficit and by 10 healthy controls. Overall, performance was better in control observers than in patients. Most importantly, performance by hemiplegic patients, but not by nonhemiplegic patients and controls, varied systematically as a function of the observed limb. Action recognition was best when hemiplegic patients viewed actions that appeared to have been performed by their unaffected arm. Action recognition performance dropped significantly when hemiplegic patients viewed actions that appeared to have been produced with their hemiplegic arm or the corresponding arm of another person. The results of a control study involving the recognition of point-light defined animals in motion indicate that a generic deficit to visual and cognitive functions cannot account for this laterality-specific deficit in action recognition. Taken together, these results suggest that motor cortex impairment decreases visual sensitivity to human action. Specifically, when a cortical lesion renders an observer incapable of performing an observed action, action perception is compromised, possibly by a failure to map the observed action onto the observer's contralesional hemisoma.


Author(s):  
Mohammad Farhad Bulbul ◽  
Yunsheng Jiang ◽  
Jinwen Ma

The emerging cost-effective depth sensors have facilitated the action recognition task significantly. In this paper, the authors address the action recognition problem using depth video sequences combining three discriminative features. More specifically, the authors generate three Depth Motion Maps (DMMs) over the entire video sequence corresponding to the front, side, and top projection views. Contourlet-based Histogram of Oriented Gradients (CT-HOG), Local Binary Patterns (LBP), and Edge Oriented Histograms (EOH) are then computed from the DMMs. To merge these features, the authors consider decision-level fusion, where a soft decision-fusion rule, Logarithmic Opinion Pool (LOGP), is used to combine the classification outcomes from multiple classifiers each with an individual set of features. Experimental results on two datasets reveal that the fusion scheme achieves superior action recognition performance over the situations when using each feature individually.


2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Shaoping Zhu ◽  
Limin Xia

A novel method based on hybrid feature is proposed for human action recognition in video image sequences, which includes two stages of feature extraction and action recognition. Firstly, we use adaptive background subtraction algorithm to extract global silhouette feature and optical flow model to extract local optical flow feature. Then we combine global silhouette feature vector and local optical flow feature vector to form a hybrid feature vector. Secondly, in order to improve the recognition accuracy, we use an optimized Multiple Instance Learning algorithm to recognize human actions, in which an Iterative Querying Heuristic (IQH) optimization algorithm is used to train the Multiple Instance Learning model. We demonstrate that our hybrid feature-based action representation can effectively classify novel actions on two different data sets. Experiments show that our results are comparable to, and significantly better than, the results of two state-of-the-art approaches on these data sets, which meets the requirements of stable, reliable, high precision, and anti-interference ability and so forth.


2020 ◽  
Vol 34 (07) ◽  
pp. 12886-12893
Author(s):  
Xiao-Yu Zhang ◽  
Haichao Shi ◽  
Changsheng Li ◽  
Peng Li

Weakly supervised action recognition and localization for untrimmed videos is a challenging problem with extensive applications. The overwhelming irrelevant background contents in untrimmed videos severely hamper effective identification of actions of interest. In this paper, we propose a novel multi-instance multi-label modeling network based on spatio-temporal pre-trimming to recognize actions and locate corresponding frames in untrimmed videos. Motivated by the fact that person is the key factor in a human action, we spatially and temporally segment each untrimmed video into person-centric clips with pose estimation and tracking techniques. Given the bag-of-instances structure associated with video-level labels, action recognition is naturally formulated as a multi-instance multi-label learning problem. The network is optimized iteratively with selective coarse-to-fine pre-trimming based on instance-label activation. After convergence, temporal localization is further achieved with local-global temporal class activation map. Extensive experiments are conducted on two benchmark datasets, i.e. THUMOS14 and ActivityNet1.3, and experimental results clearly corroborate the efficacy of our method when compared with the state-of-the-arts.


2014 ◽  
Vol 989-994 ◽  
pp. 2731-2734
Author(s):  
Hai Long Jia ◽  
Kun Cao

The choice of the motion features affects the result of the human action recognition method directly. Many factors often influence the single feature differently, such as appearance of human body, environment and video camera. So the accuracy of action recognition is limited. On the basis of studying the representation and recognition of human actions, and giving full consideration to the advantages and disadvantages of different features, this paper proposes a mixed feature which combines global silhouette feature and local optical flow feature. This combined representation is used for human action recognition.


Sign in / Sign up

Export Citation Format

Share Document