Learning Local Descriptors with Multi-Level Feature Aggregation and Spatial Context Pyramid

2021 ◽  
Author(s):  
Pengpeng Liang ◽  
Haoxuanye Ji ◽  
Erkang Cheng ◽  
Yumei Chai ◽  
Liming Wang ◽  
...  
2021 ◽  
Vol 423 ◽  
pp. 46-56
Author(s):  
Fushun Zhu ◽  
Hua Yan ◽  
Xinyue Chen ◽  
Tong Li ◽  
Zhengyu Zhang

Author(s):  
Yang Li ◽  
Kan Li ◽  
Xinxin Wang

In this paper, we propose a deeply-supervised CNN model for action recognition that fully exploits powerful hierarchical features of CNNs. In this model, we build multi-level video representations by applying our proposed aggregation module at different convolutional layers. Moreover, we train this model in a deep supervision manner, which brings improvement in both performance and efficiency. Meanwhile, in order to capture the temporal structure as well as preserve more details about actions, we propose a trainable aggregation module. It models the temporal evolution of each spatial location and projects them into a semantic space using the Vector of Locally Aggregated Descriptors (VLAD) technique. This deeply-supervised CNN model integrating the powerful aggregation module provides a promising solution to recognize actions in videos. We conduct experiments on two action recognition datasets: HMDB51 and UCF101. Results show that our model outperforms the state-of-the-art methods.


2021 ◽  
Author(s):  
Na Li ◽  
Kuangang Fan ◽  
Ouyang Qinghua ◽  
Yahui Liu

Sign in / Sign up

Export Citation Format

Share Document