MFA: Multi-level Feature Aggregation for Video Recognition

Author(s):  
Na Li ◽  
Kuangang Fan ◽  
Ouyang Qinghua ◽  
Yahui Liu
2021 ◽  
Author(s):  
Pengpeng Liang ◽  
Haoxuanye Ji ◽  
Erkang Cheng ◽  
Yumei Chai ◽  
Liming Wang ◽  
...  

2021 ◽  
Vol 423 ◽  
pp. 46-56
Author(s):  
Fushun Zhu ◽  
Hua Yan ◽  
Xinyue Chen ◽  
Tong Li ◽  
Zhengyu Zhang

Author(s):  
Yang Li ◽  
Kan Li ◽  
Xinxin Wang

In this paper, we propose a deeply-supervised CNN model for action recognition that fully exploits powerful hierarchical features of CNNs. In this model, we build multi-level video representations by applying our proposed aggregation module at different convolutional layers. Moreover, we train this model in a deep supervision manner, which brings improvement in both performance and efficiency. Meanwhile, in order to capture the temporal structure as well as preserve more details about actions, we propose a trainable aggregation module. It models the temporal evolution of each spatial location and projects them into a semantic space using the Vector of Locally Aggregated Descriptors (VLAD) technique. This deeply-supervised CNN model integrating the powerful aggregation module provides a promising solution to recognize actions in videos. We conduct experiments on two action recognition datasets: HMDB51 and UCF101. Results show that our model outperforms the state-of-the-art methods.


2021 ◽  
Vol 13 (4) ◽  
pp. 731 ◽  
Author(s):  
Bingyu Chen ◽  
Min Xia ◽  
Junqing Huang

Detailed information regarding land utilization/cover is a valuable resource in various fields. In recent years, remote sensing images, especially aerial images, have become higher in resolution and larger span in time and space, and the phenomenon that the objects in an identical category may yield a different spectrum would lead to the fact that relying on spectral features only is often insufficient to accurately segment the target objects. In convolutional neural networks, down-sampling operations are usually used to extract abstract semantic features, which leads to loss of details and fuzzy edges. To solve these problems, the paper proposes a Multi-level Feature Aggregation Network (MFANet), which is improved in two aspects: deep feature extraction and up-sampling feature fusion. Firstly, the proposed Channel Feature Compression module extracts the deep features and filters the redundant channel information from the backbone to optimize the learned context. Secondly, the proposed Multi-level Feature Aggregation Upsample module nestedly uses the idea that high-level features provide guidance information for low-level features, which is of great significance for positioning the restoration of high-resolution remote sensing images. Finally, the proposed Channel Ladder Refinement module is used to refine the restored high-resolution feature maps. Experimental results show that the proposed method achieves state-of-the-art performance 86.45% mean IOU on LandCover dataset.


Sign in / Sign up

Export Citation Format

Share Document