Multiscale Convolutional Descriptor Aggregation for Visual Place Recognition

2020 ◽  
Vol 2020 (10) ◽  
pp. 313-1-313-7
Author(s):  
Raffaele Imbriaco ◽  
Egor Bondarev ◽  
Peter H.N. de With

Visual place recognition using query and database images from different sources remains a challenging task in computer vision. Our method exploits global descriptors for efficient image matching and local descriptors for geometric verification. We present a novel, multi-scale aggregation method for local convolutional descriptors, using memory vector construction for efficient aggregation. The method enables to find preliminary set of image candidate matches and remove visually similar but erroneous candidates. We deploy the multi-scale aggregation for visual place recognition on 3 large-scale datasets. We obtain a Recall@10 larger than 94% for the Pittsburgh dataset, outperforming other popular convolutional descriptors used in image retrieval and place recognition. Additionally, we provide a comparison for these descriptors on a more challenging dataset containing query and database images obtained from different sources, achieving over 77% Recall@10.

2021 ◽  
Vol 11 (19) ◽  
pp. 8976
Author(s):  
Junghyun Oh ◽  
Gyuho Eoh

As mobile robots perform long-term operations in large-scale environments, coping with perceptual changes becomes an important issue recently. This paper introduces a stochastic variational inference and learning architecture that can extract condition-invariant features for visual place recognition in a changing environment. Under the assumption that a latent representation of the variational autoencoder can be divided into condition-invariant and condition-sensitive features, a new structure of the variation autoencoder is proposed and a variational lower bound is derived to train the model. After training the model, condition-invariant features are extracted from test images to calculate the similarity matrix, and the places can be recognized even in severe environmental changes. Experiments were conducted to verify the proposed method, and the experimental results showed that our assumption was reasonable and effective in recognizing places in changing environments.


2021 ◽  
pp. 1039-1049
Author(s):  
Chen Fan ◽  
Adam Jacobson ◽  
Zetao Chen ◽  
Xiaofeng He ◽  
Lilian Zhang ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (1) ◽  
pp. 310
Author(s):  
Liang Chen ◽  
Sheng Jin ◽  
Zhoujun Xia

The application of deep learning is blooming in the field of visual place recognition, which plays a critical role in visual Simultaneous Localization and Mapping (vSLAM) applications. The use of convolutional neural networks (CNNs) achieve better performance than handcrafted feature descriptors. However, visual place recognition is still a challenging task due to two major problems, i.e., perceptual aliasing and perceptual variability. Therefore, designing a customized distance learning method to express the intrinsic distance constraints in the large-scale vSLAM scenarios is of great importance. Traditional deep distance learning methods usually use the triplet loss which requires the mining of anchor images. This may, however, result in very tedious inefficient training and anomalous distance relationships. In this paper, a novel deep distance learning framework for visual place recognition is proposed. Through in-depth analysis of the multiple constraints of the distance relationship in the visual place recognition problem, the multi-constraint loss function is proposed to optimize the distance constraint relationships in the Euclidean space. The new framework can support any kind of CNN such as AlexNet, VGGNet and other user-defined networks to extract more distinguishing features. We have compared the results with the traditional deep distance learning method, and the results show that the proposed method can improve the performance by 19–28%. Additionally, compared to some contemporary visual place recognition techniques, the proposed method can improve the performance by 40%/36% and 27%/24% in average on VGGNet/AlexNet using the New College and the TUM datasets, respectively. It’s verified the method is capable to handle appearance changes in complex environments.


2021 ◽  
Vol 18 (3) ◽  
pp. 2274-2287
Author(s):  
Ren Chaofeng ◽  
◽  
Xiaodong Zhi ◽  
Yuchi Pu ◽  
Fuqiang Zhang ◽  
...  

2018 ◽  
Vol 94 (3-4) ◽  
pp. 777-792 ◽  
Author(s):  
Zhe Xin ◽  
Xiaoguang Cui ◽  
Jixiang Zhang ◽  
Yiping Yang ◽  
Yanqing Wang

Author(s):  
Kai Liu ◽  
Hua Wang ◽  
Fei Han ◽  
Hao Zhang

Visual place recognition is essential for large-scale simultaneous localization and mapping (SLAM). Long-term robot operations across different time of the days, months, and seasons introduce new challenges from significant environment appearance variations. In this paper, we propose a novel method to learn a location representation that can integrate the semantic landmarks of a place with its holistic representation. To promote the robustness of our new model against the drastic appearance variations due to long-term visual changes, we formulate our objective to use non-squared ℓ2-norm distances, which leads to a difficult optimization problem that minimizes the ratio of the ℓ2,1-norms of matrices. To solve our objective, we derive a new efficient iterative algorithm, whose convergence is rigorously guaranteed by theory. In addition, because our solution is strictly orthogonal, the learned location representations can have better place recognition capabilities. We evaluate the proposed method using two large-scale benchmark data sets, the CMU-VL and Nordland data sets. Experimental results have validated the effectiveness of our new method in long-term visual place recognition applications.


2012 ◽  
Vol 182-183 ◽  
pp. 1868-1872
Author(s):  
Jing Hou ◽  
Jin Xiang Pian ◽  
Ying Zhang ◽  
Ming Yue Wang

A new approach is presented to match two images in presenting large scale changes. The novelty of our algorithm is a hierarchical matching strategy for global region features and local descriptors, which combines the descriptive power of global features and the discriminative power of local descriptors. To predict the likely location and scale of an object, global features extracted from the segmentation regions is used in the first stage for an efficient region matching. This initial matching can be ambiguous due to the instability and unreliability of global region feature, and therefore in the later stage local descriptors are matched within each region pair to discard false positives and the final matches are filtered by RANSAC. Experiments show the effectiveness and superiority of the proposed method in comparing to other approaches.


Sign in / Sign up

Export Citation Format

Share Document