scholarly journals Action Recognition by Weakly-Supervised Discriminative Region Localization

Author(s):  
Hakan Boyraz ◽  
Syed Zain Masood ◽  
Baoyuan Liu ◽  
Marshall Tappen
Author(s):  
Junnan Li ◽  
Jianquan Liu ◽  
Yongkang Wang ◽  
Shoji Nishimura ◽  
Mohan S. Kankanhalli

2020 ◽  
Vol 34 (07) ◽  
pp. 12886-12893
Author(s):  
Xiao-Yu Zhang ◽  
Haichao Shi ◽  
Changsheng Li ◽  
Peng Li

Weakly supervised action recognition and localization for untrimmed videos is a challenging problem with extensive applications. The overwhelming irrelevant background contents in untrimmed videos severely hamper effective identification of actions of interest. In this paper, we propose a novel multi-instance multi-label modeling network based on spatio-temporal pre-trimming to recognize actions and locate corresponding frames in untrimmed videos. Motivated by the fact that person is the key factor in a human action, we spatially and temporally segment each untrimmed video into person-centric clips with pose estimation and tracking techniques. Given the bag-of-instances structure associated with video-level labels, action recognition is naturally formulated as a multi-instance multi-label learning problem. The network is optimized iteratively with selective coarse-to-fine pre-trimming based on instance-label activation. After convergence, temporal localization is further achieved with local-global temporal class activation map. Extensive experiments are conducted on two benchmark datasets, i.e. THUMOS14 and ActivityNet1.3, and experimental results clearly corroborate the efficacy of our method when compared with the state-of-the-arts.


Author(s):  
Xiao-Yu Zhang ◽  
Haichao Shi ◽  
Changsheng Li ◽  
Kai Zheng ◽  
Xiaobin Zhu ◽  
...  

Action recognition in videos has attracted a lot of attention in the past decade. In order to learn robust models, previous methods usually assume videos are trimmed as short sequences and require ground-truth annotations of each video frame/sequence, which is quite costly and time-consuming. In this paper, given only video-level annotations, we propose a novel weakly supervised framework to simultaneously locate action frames as well as recognize actions in untrimmed videos. Our proposed framework consists of two major components. First, for action frame localization, we take advantage of the self-attention mechanism to weight each frame, such that the influence of background frames can be effectively eliminated. Second, considering that there are trimmed videos publicly available and also they contain useful information to leverage, we present an additional module to transfer the knowledge from trimmed videos for improving the classification performance in untrimmed ones. Extensive experiments are conducted on two benchmark datasets (i.e., THUMOS14 and ActivityNet1.3), and experimental results clearly corroborate the efficacy of our method.


Author(s):  
Xiao-Yu Zhang ◽  
Changsheng Li ◽  
Haichao Shi ◽  
Xiaobin Zhu ◽  
Peng Li ◽  
...  

Author(s):  
Yi Liu ◽  
Lei Qin ◽  
Zhongwei Cheng ◽  
Yanhao Zhang ◽  
Weigang Zhang ◽  
...  

2021 ◽  
pp. 108068
Author(s):  
Jonghyun Kim ◽  
Gen Li ◽  
Inyong Yun ◽  
Cheolkon Jung ◽  
Joongkyu Kim

Sign in / Sign up

Export Citation Format

Share Document