scholarly journals The Complex Action Recognition via the Correlated Topic Model

2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
Hong-bin Tu ◽  
Li-min Xia ◽  
Zheng-wu Wang

Human complex action recognition is an important research area of the action recognition. Among various obstacles to human complex action recognition, one of the most challenging is to deal with self-occlusion, where one body part occludes another one. This paper presents a new method of human complex action recognition, which is based on optical flow and correlated topic model (CTM). Firstly, the Markov random field was used to represent the occlusion relationship between human body parts in terms of an occlusion state variable. Secondly, the structure from motion (SFM) is used for reconstructing the missing data of point trajectories. Then, we can extract the key frame based on motion feature from optical flow and the ratios of the width and height are extracted by the human silhouette. Finally, we use the topic model of correlated topic model (CTM) to classify action. Experiments were performed on the KTH, Weizmann, and UIUC action dataset to test and evaluate the proposed method. The compared experiment results showed that the proposed method was more effective than compared methods.

Drones ◽  
2021 ◽  
Vol 5 (3) ◽  
pp. 87
Author(s):  
Ketan Kotecha ◽  
Deepak Garg ◽  
Balmukund Mishra ◽  
Pratik Narang ◽  
Vipual Kumar Mishra

Visual data collected from drones has opened a new direction for surveillance applications and has recently attracted considerable attention among computer vision researchers. Due to the availability and increasing use of the drone for both public and private sectors, it is a critical futuristic technology to solve multiple surveillance problems in remote areas. One of the fundamental challenges in recognizing crowd monitoring videos’ human action is the precise modeling of an individual’s motion feature. Most state-of-the-art methods heavily rely on optical flow for motion modeling and representation, and motion modeling through optical flow is a time-consuming process. This article underlines this issue and provides a novel architecture that eliminates the dependency on optical flow. The proposed architecture uses two sub-modules, FMFM (faster motion feature modeling) and AAR (accurate action recognition), to accurately classify the aerial surveillance action. Another critical issue in aerial surveillance is a deficiency of the dataset. Out of few datasets proposed recently, most of them have multiple humans performing different actions in the same scene, such as a crowd monitoring video, and hence not suitable for directly applying to the training of action recognition models. Given this, we have proposed a novel dataset captured from top view aerial surveillance that has a good variety in terms of actors, daytime, and environment. The proposed architecture has shown the capability to be applied in different terrain as it removes the background before using the action recognition model. The proposed architecture is validated through the experiment with varying investigation levels and achieves a remarkable performance of 0.90 validation accuracy in aerial action recognition.


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Weihua Zhang ◽  
Yi Zhang ◽  
Chaobang Gao ◽  
Jiliu Zhou

This paper introduces a method for human action recognition based on optical flow motion features extraction. Automatic spatial and temporal alignments are combined together in order to encourage the temporal consistence on each action by an enhanced dynamic time warping (DTW) algorithm. At the same time, a fast method based on coarse-to-fine DTW constraint to improve computational performance without reducing accuracy is induced. The main contributions of this study include (1) a joint spatial-temporal multiresolution optical flow computation method which can keep encoding more informative motion information than recent proposed methods, (2) an enhanced DTW method to improve temporal consistence of motion in action recognition, and (3) coarse-to-fine DTW constraint on motion features pyramids to speed up recognition performance. Using this method, high recognition accuracy is achieved on different action databases like Weizmann database and KTH database.


Author(s):  
Chunyan Xu ◽  
Rong Liu ◽  
Tong Zhang ◽  
Zhen Cui ◽  
Jian Yang ◽  
...  

In this work, we propose a dual-stream structured graph convolution network ( DS-SGCN ) to solve the skeleton-based action recognition problem. The spatio-temporal coordinates and appearance contexts of the skeletal joints are jointly integrated into the graph convolution learning process on both the video and skeleton modalities. To effectively represent the skeletal graph of discrete joints, we create a structured graph convolution module specifically designed to encode partitioned body parts along with their dynamic interactions in the spatio-temporal sequence. In more detail, we build a set of structured intra-part graphs, each of which can be adopted to represent a distinctive body part (e.g., left arm, right leg, head). The inter-part graph is then constructed to model the dynamic interactions across different body parts; here each node corresponds to an intra-part graph built above, while an edge between two nodes is used to express these internal relationships of human movement. We implement the graph convolution learning on both intra- and inter-part graphs in order to obtain the inherent characteristics and dynamic interactions, respectively, of human action. After integrating the intra- and inter-levels of spatial context/coordinate cues, a convolution filtering process is conducted on time slices to capture these temporal dynamics of human motion. Finally, we fuse two streams of graph convolution responses in order to predict the category information of human action in an end-to-end fashion. Comprehensive experiments on five single/multi-modal benchmark datasets (including NTU RGB+D 60, NTU RGB+D 120, MSR-Daily 3D, N-UCLA, and HDM05) demonstrate that the proposed DS-SGCN framework achieves encouraging performance on the skeleton-based action recognition task.


Author(s):  
Yinzhong Qian ◽  
Wenbin Chen ◽  
I-fan Shen

This paper addresses the problem of action recognition from body pose. Detecting body pose in static image faces great challenges because of pose variability. Our method is based on action-specific hierarchical poselet. We use hierarchical body parts each of which is represented by a set of poselets to demonstrate the pose variability of the body part. Pose signature of a body part is represented by a vector of detection responses of all poselets for the part. In order to suppress detection error and ambiguity we explore to use part-based model (PBM) as detection context. We propose a constrained optimization algorithm for detecting all poselets of each part in context of PBM, which recover neglected pose clue by global optimization. We use a PBM with hierarchical part structure, where body parts have varying granularity from whole body steadily decreasing to limb parts. From the structure we get models with different depth to study saliency of different body parts in action recognition. Pose signature of an action image is composed of pose signature of all the body parts in the PBM, which provides rich discriminate information for our task. We evaluate our algorithm on two datasets. Compared with counterpart methods, pose signature has obvious performance improvement on static image dataset. While using the model trained from static image dataset to label detected action person on video dataset, pose signature achieves state-of-the-art performance.


Author(s):  
Carol Priestley

This chapter discusses body part nouns, a part of language that is central to human life, and the polysemy that arises in connection with them. Examples from everyday speech and narrative in various contexts are examined in a Papuan language called Koromu and semantic characteristics of body part nouns in other studies are also considered. Semantic templates are developed for nouns that represent highly visible body parts: for example, wapi ‘hands/arms’, ehi ‘feet/legs’, and their related parts. Culture-specific explications are expressed in a natural metalanguage that can be translated into Koromu to avoid the cultural bias inherent in using other languages and to reveal both distinctive semantic components and similarities to cross-linguistic examples.


Author(s):  
Laura Mora ◽  
Anna Sedda ◽  
Teresa Esteban ◽  
Gianna Cocchini

AbstractThe representation of the metrics of the hands is distorted, but is susceptible to malleability due to expert dexterity (magicians) and long-term tool use (baseball players). However, it remains unclear whether modulation leads to a stable representation of the hand that is adopted in every circumstance, or whether the modulation is closely linked to the spatial context where the expertise occurs. To this aim, a group of 10 experienced Sign Language (SL) interpreters were recruited to study the selective influence of expertise and space localisation in the metric representation of hands. Experiment 1 explored differences in hands’ size representation between the SL interpreters and 10 age-matched controls in near-reaching (Condition 1) and far-reaching space (Condition 2), using the localisation task. SL interpreters presented reduced hand size in near-reaching condition, with characteristic underestimation of finger lengths, and reduced overestimation of hands and wrists widths in comparison with controls. This difference was lost in far-reaching space, confirming the effect of expertise on hand representations is closely linked to the spatial context where an action is performed. As SL interpreters are also experts in the use of their face with communication purposes, the effects of expertise in the metrics of the face were also studied (Experiment 2). SL interpreters were more accurate than controls, with overall reduction of width overestimation. Overall, expertise modifies the representation of relevant body parts in a specific and context-dependent manner. Hence, different representations of the same body part can coexist simultaneously.


2010 ◽  
Vol 16 (4) ◽  
pp. 112-121 ◽  
Author(s):  
Brennen W. Mills ◽  
Owen B. J. Carter ◽  
Robert J. Donovan

The objective of this case study was to experimentally manipulate the impact on arousal and recall of two characteristics frequently occurring in gruesome depictions of body parts in smoking cessation advertisements: the presence or absence of an external physical insult to the body part depicted; whether or not the image contains a clear figure/ground demarcation. Three hundred participants (46% male, 54% female; mean age 27.3 years, SD = 11.4) participated in a two-stage online study wherein they viewed and responded to a series of gruesome 4-s video images. Seventy-two video clips were created to provide a sample of images across the two conditions: physical insult versus no insult and clear figure/ground demarcation versus merged or no clear figure/ground demarcation. In stage one, participants viewed a randomly ordered series of 36 video clips and rated how “confronting” they considered each to be. Seven days later (stage two), to test recall of each video image, participants viewed all 72 clips and were asked to identify those they had seen previously. Images containing a physical insult were consistently rated more confronting and were remembered more accurately than images with no physical insult. Images with a clear figure/ground demarcation were rated as no more confronting but were consistently recalled with greater accuracy than those with unclear figure/ground demarcation. Makers of gruesome health warning television advertisements should incorporate some form of physical insult and use a clear figure/ground demarcation to maximize image recall and subsequent potential advertising effectiveness.


Author(s):  
André Souza Brito ◽  
Marcelo Bernardes Vieira ◽  
Saulo Moraes Villela ◽  
Hemerson Tacon ◽  
Hugo Lima Chaves ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document