Learning Sufficient Scene Representation for Unsupervised Cross-modal Retrieval

2021 ◽  
Author(s):  
Jieting Luo ◽  
Yan Wo ◽  
Bicheng Wu ◽  
Guoqiang Han
Keyword(s):  
Author(s):  
Marcin Kwietniewski ◽  
Stephanie Wilson ◽  
Anna Topol ◽  
Sunbir Gill ◽  
Jarek Gryz ◽  
...  

Proceedings ◽  
2018 ◽  
Vol 2 (18) ◽  
pp. 1193
Author(s):  
Roi Santos ◽  
Xose Pardo ◽  
Xose Fdez-Vidal

The increasing use of autonomous UAVs inside buildings and around human-made structures demands new accurate and comprehensive representation of their operation environments. Most of the 3D scene abstraction methods use invariant feature point matching, nevertheless some sparse 3D point clouds do not concisely represent the structure of the environment. Likewise, line clouds constructed by short and redundant segments with inaccurate directions limit the understanding of scenes as those that include environments with poor texture, or whose texture resembles a repetitive pattern. The presented approach is based on observation and representation models using the straight line segments, whose resemble the limits of an urban indoor or outdoor environment. The goal of the work is to get a full method based on the matching of lines that provides a complementary approach to state-of-the-art methods when facing 3D scene representation of poor texture environments for future autonomous UAV.


Author(s):  
Jun Zhu ◽  
Tianfu Wu ◽  
Song-Chun Zhu ◽  
Xiaokang Yang ◽  
Wenjun Zhang
Keyword(s):  

2021 ◽  
Vol 13 (3) ◽  
pp. 433
Author(s):  
Junge Shen ◽  
Tong Zhang ◽  
Yichen Wang ◽  
Ruxin Wang ◽  
Qi Wang ◽  
...  

Remote sensing images contain complex backgrounds and multi-scale objects, which pose a challenging task for scene classification. The performance is highly dependent on the capacity of the scene representation as well as the discriminability of the classifier. Although multiple models possess better properties than a single model on these aspects, the fusion strategy for these models is a key component to maximize the final accuracy. In this paper, we construct a novel dual-model architecture with a grouping-attention-fusion strategy to improve the performance of scene classification. Specifically, the model employs two different convolutional neural networks (CNNs) for feature extraction, where the grouping-attention-fusion strategy is used to fuse the features of the CNNs in a fine and multi-scale manner. In this way, the resultant feature representation of the scene is enhanced. Moreover, to address the issue of similar appearances between different scenes, we develop a loss function which encourages small intra-class diversities and large inter-class distances. Extensive experiments are conducted on four scene classification datasets include the UCM land-use dataset, the WHU-RS19 dataset, the AID dataset, and the OPTIMAL-31 dataset. The experimental results demonstrate the superiority of the proposed method in comparison with the state-of-the-arts.


2010 ◽  
Vol 9 (8) ◽  
pp. 586-586
Author(s):  
A. Oliva ◽  
T. Konkle ◽  
T. F. Brady ◽  
G. A. Alvarez

Sign in / Sign up

Export Citation Format

Share Document