Global relative position space based pooling for fine-grained vehicle recognition

2019 ◽  
Vol 367 ◽  
pp. 287-298
Author(s):  
Ye Xiang ◽  
Ying Fu ◽  
Hua Huang
Author(s):  
Zhu Zhang ◽  
Zhou Zhao ◽  
Zhijie Lin ◽  
Jingkuan Song ◽  
Deng Cai

Action localization in untrimmed videos is an important topic in the field of video understanding. However, existing action localization methods are restricted to a pre-defined set of actions and cannot localize unseen activities. Thus, we consider a new task to localize unseen activities in videos via image queries, named Image-Based Activity Localization. This task faces three inherent challenges: (1) how to eliminate the influence of semantically inessential contents in image queries; (2) how to deal with the fuzzy localization of inaccurate image queries; (3) how to determine the precise boundaries of target segments. We then propose a novel self-attention interaction localizer to retrieve unseen activities in an end-to-end fashion. Specifically, we first devise a region self-attention method with relative position encoding to learn fine-grained image region representations. Then, we employ a local transformer encoder to build multi-step fusion and reasoning of image and video contents. We next adopt an order-sensitive localizer to directly retrieve the target segment. Furthermore, we construct a new dataset ActivityIBAL by reorganizing the ActivityNet dataset. The extensive experiments show the effectiveness of our method.


Author(s):  
Jingjing Zhang ◽  
Jingsheng Lei ◽  
Shengying Yang ◽  
Xinqi Yang

IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 171912-171923
Author(s):  
Qianqiu Chen ◽  
Wei Liu ◽  
Xiaoxia Yu

Sign in / Sign up

Export Citation Format

Share Document