scholarly journals Benchmarking Image Retrieval for Visual Localization

Author(s):  
Noe Pion ◽  
Martin Humenberger ◽  
Gabriela Csurka ◽  
Yohann Cabon ◽  
Torsten Sattler
2021 ◽  
Vol 7 (2) ◽  
pp. 20
Author(s):  
Carlos Lassance ◽  
Yasir Latif ◽  
Ravi Garg ◽  
Vincent Gripon ◽  
Ian Reid

Vision-based localization is the problem of inferring the pose of the camera given a single image. One commonly used approach relies on image retrieval where the query input is compared against a database of localized support examples and its pose is inferred with the help of the retrieved items. This assumes that images taken from the same places consist of the same landmarks and thus would have similar feature representations. These representations can learn to be robust to different variations in capture conditions like time of the day or weather. In this work, we introduce a framework which aims at enhancing the performance of such retrieval-based localization methods. It consists in taking into account additional information available, such as GPS coordinates or temporal proximity in the acquisition of the images. More precisely, our method consists in constructing a graph based on this additional information that is later used to improve reliability of the retrieval process by filtering the feature representations of support and/or query images. We show that the proposed method is able to significantly improve the localization accuracy on two large scale datasets, as well as the mean average precision in classical image retrieval scenarios.


Sensors ◽  
2018 ◽  
Vol 18 (8) ◽  
pp. 2692 ◽  
Author(s):  
Yujin Chen ◽  
Ruizhi Chen ◽  
Mengyun Liu ◽  
Aoran Xiao ◽  
Dewen Wu ◽  
...  

Indoor localization is one of the fundamentals of location-based services (LBS) such as seamless indoor and outdoor navigation, location-based precision marketing, spatial cognition of robotics, etc. Visual features take up a dominant part of the information that helps human and robotics understand the environment, and many visual localization systems have been proposed. However, the problem of indoor visual localization has not been well settled due to the tough trade-off of accuracy and cost. To better address this problem, a localization method based on image retrieval is proposed in this paper, which mainly consists of two parts. The first one is CNN-based image retrieval phase, CNN features extracted by pre-trained deep convolutional neural networks (DCNNs) from images are utilized to compare the similarity, and the output of this part are the matched images of the target image. The second one is pose estimation phase that computes accurate localization result. Owing to the robust CNN feature extractor, our scheme is applicable to complex indoor environments and easily transplanted to outdoor environments. The pose estimation scheme was inspired by monocular visual odometer, therefore, only RGB images and poses of reference images are needed for accurate image geo-localization. Furthermore, our method attempts to use lightweight datum to present the scene. To evaluate the performance, experiments are conducted, and the result demonstrates that our scheme can efficiently result in high location accuracy as well as orientation estimation. Currently the positioning accuracy and usability enhanced compared with similar solutions. Furthermore, our idea has a good application foreground, because the algorithms of data acquisition and pose estimation are compatible with the current state of data expansion.


Sensors ◽  
2019 ◽  
Vol 19 (2) ◽  
pp. 249 ◽  
Author(s):  
Song Xu ◽  
Wusheng Chou ◽  
Hongyi Dong

This paper proposes a novel multi-sensor-based indoor global localization system integrating visual localization aided by CNN-based image retrieval with a probabilistic localization approach. The global localization system consists of three parts: coarse place recognition, fine localization and re-localization from kidnapping. Coarse place recognition exploits a monocular camera to realize the initial localization based on image retrieval, in which off-the-shelf features extracted from a pre-trained Convolutional Neural Network (CNN) are adopted to determine the candidate locations of the robot. In the fine localization, a laser range finder is equipped to estimate the accurate pose of a mobile robot by means of an adaptive Monte Carlo localization, in which the candidate locations obtained by image retrieval are considered as seeds for initial random sampling. Additionally, to address the problem of robot kidnapping, we present a closed-loop localization mechanism to monitor the state of the robot in real time and make adaptive adjustments when the robot is kidnapped. The closed-loop mechanism effectively exploits the correlation of image sequences to realize the re-localization based on Long-Short Term Memory (LSTM) network. Extensive experiments were conducted and the results indicate that the proposed method not only exhibits great improvement on accuracy and speed, but also can recover from localization failures compared to two conventional localization methods.


2020 ◽  
Vol 10 (19) ◽  
pp. 6829
Author(s):  
Song Xu ◽  
Huaidong Zhou ◽  
Wusheng Chou

Conventional approaches to global localization and navigation mainly rely on metric maps to provide precise geometric coordinates, which may cause the problem of large-scale structural ambiguity and lack semantic information of the environment. This paper presents a scalable vision-based topological mapping and navigation method for a mobile robot to work robustly and flexibly in large-scale environment. In the vision-based topological navigation, an image-based Monte Carlo localization method is presented to realize global topological localization based on image retrieval, in which fine-tuned local region features from an object detection convolutional neural network (CNN) are adopted to perform image matching. The combination of image retrieval and Monte Carlo provide the robot with the ability to effectively avoid perceptual aliasing. Additionally, we propose an effective visual localization method, simultaneously employing the global and local CNN features of images to construct discriminative representation for environment, which makes the navigation system more robust to the interference of occlusion, translation, and illumination. Extensive experimental results demonstrate that ERF-IMCS exhibits great performance in the robustness and efficiency of navigation.


Author(s):  
M. S. Mueller ◽  
T. Sattler ◽  
M. Pollefeys ◽  
B. Jutzi

<p><strong>Abstract.</strong> The performance of machine learning and deep learning algorithms for image analysis depends significantly on the quantity and quality of the training data. The generation of annotated training data is often costly, time-consuming and laborious. Data augmentation is a powerful option to overcome these drawbacks. Therefore, we augment training data by rendering images with arbitrary poses from 3D models to increase the quantity of training images. These training images usually show artifacts and are of limited use for advanced image analysis. Therefore, we propose to use image-to-image translation to transform images from a <i>rendered</i> domain to a <i>captured</i> domain. We show that translated images in the <i>captured</i> domain are of higher quality than the rendered images. Moreover, we demonstrate that image-to-image translation based on rendered 3D models enhances the performance of common computer vision tasks, namely feature matching, image retrieval and visual localization. The experimental results clearly show the enhancement on translated images over rendered images for all investigated tasks. In addition to this, we present the advantages utilizing translated images over exclusively captured images for visual localization.</p>


Sign in / Sign up

Export Citation Format

Share Document