Hierarchical Visual Place Recognition Based on Semantic-Aggregation

<div>Visual navigation tasks in real-world environments often require both self-motion and place recognition feedback. While deep reinforcement learning has shown success in solving these perception and decision-making problems in an end-to-end manner, these algorithms require large amounts of experience to learn navigation policies from high-dimensional data, which is generally impractical for real robots due to sample complexity. In this paper, we address these problems with two main contributions. We first leverage place recognition and deep learning techniques combined with goal destination feedback to generate compact, bimodal image representations that can then be used to effectively learn control policies from a small amount of experience. Second, we present an interactive framework, CityLearn, that enables for the first time training and deployment of navigation algorithms across city-sized, realistic environments with extreme visual appearance changes. CityLearn features more than 10 benchmark datasets, often used in visual place recognition and autonomous driving research, including over 100 recorded traversals across 60 cities around the world. We evaluate our approach on two CityLearn environments, training our navigation policy on a single traversal. Results show our method can be over 2 orders of magnitude faster than when using raw images, and can also generalize across extreme visual changes including day to night and summer to winter transitions.</div>

Download Full-text

A Holistic Visual Place Recognition Approach Using Lightweight CNNs for Significant ViewPoint and Appearance Changes

IEEE Transactions on Robotics ◽

10.1109/tro.2019.2956352 ◽

2020 ◽

Vol 36 (2) ◽

pp. 561-569 ◽

Cited By ~ 5

Author(s):

Ahmad Khaliq ◽

Shoaib Ehsan ◽

Zetao Chen ◽

Michael Milford ◽

Klaus McDonald-Maier

Keyword(s):

Place Recognition ◽

Visual Place Recognition ◽

Appearance Changes

Download Full-text

A Panoramic Localizer Based on Coarse-to-Fine Descriptors for Navigation Assistance

Sensors ◽

10.3390/s20154177 ◽

2020 ◽

Vol 20 (15) ◽

pp. 4177 ◽

Cited By ~ 1

Author(s):

Yicheng Fang ◽

Kailun Yang ◽

Ruiqi Cheng ◽

Lei Sun ◽

Kaiwei Wang

Keyword(s):

Field Of View ◽

Place Recognition ◽

Precise Localization ◽

High Recall ◽

Pedestrian Navigation ◽

Navigation Assistance ◽

Mobile Navigation ◽

Visual Place Recognition ◽

Coarse To Fine ◽

Instance Retrieval

Visual Place Recognition (VPR) addresses visual instance retrieval tasks against discrepant scenes and gives precise localization. During a traverse, the captured images (query images) would be traced back to the already existing positions in the database images, rendering vehicles or pedestrian navigation devices distinguish ambient environments. Unfortunately, diverse appearance variations can bring about huge challenges for VPR, such as illumination changing, viewpoint varying, seasonal cycling, disparate traverses (forward and backward), and so on. In addition, the majority of current VPR algorithms are designed for forward-facing images, which can only provide with narrow Field of View (FoV) and come with severe viewpoint influences. In this paper, we propose a panoramic localizer, which is based on coarse-to-fine descriptors, leveraging panoramas for omnidirectional perception and sufficient FoV up to 360∘. We adopt NetVLAD descriptors in the coarse matching in a panorama-to-panorama way, for their robust performances in distinguishing different appearances, utilizing Geodesc keypoint descriptors in the fine stage in the meantime, for their capacity of detecting detailed information, formatting powerful coarse-to-fine descriptors. A comprehensive set of experiments is conducted on several datasets including both public benchmarks and our real-world campus scenes. Our system is proved to be with high recall and strong generalization capacity across various appearances. The proposed panoramic localizer can be integrated into mobile navigation devices, available for a variety of localization application scenarios.

Download Full-text

CityLearn: Diverse Real-World Environments for Sample-Efficient Navigation Policy Learning

10.36227/techrxiv.12063582 ◽

2020 ◽

Author(s):

Marvin Chancán

Keyword(s):

Real World ◽

Autonomous Driving ◽

Visual Navigation ◽

Place Recognition ◽

Visual Appearance ◽

Learning Techniques ◽

Benchmark Datasets ◽

Image Representations ◽

Visual Place Recognition ◽

Self Motion

<div>Visual navigation tasks in real-world environments often require both self-motion and place recognition feedback. While deep reinforcement learning has shown success in solving these perception and decision-making problems in an end-to-end manner, these algorithms require large amounts of experience to learn navigation policies from high-dimensional data, which is generally impractical for real robots due to sample complexity. In this paper, we address these problems with two main contributions. We first leverage place recognition and deep learning techniques combined with goal destination feedback to generate compact, bimodal image representations that can then be used to effectively learn control policies from a small amount of experience. Second, we present an interactive framework, CityLearn, that enables for the first time training and deployment of navigation algorithms across city-sized, realistic environments with extreme visual appearance changes. CityLearn features more than 10 benchmark datasets, often used in visual place recognition and autonomous driving research, including over 100 recorded traversals across 60 cities around the world. We evaluate our approach on two CityLearn environments, training our navigation policy on a single traversal. Results show our method can be over 2 orders of magnitude faster than when using raw images, and can also generalize across extreme visual changes including day to night and summer to winter transitions.</div>

Download Full-text

Sequential Dual Attention: Coarse-to-Fine-Grained Hierarchical Generation for Image Captioning

Symmetry ◽

10.3390/sym10110626 ◽

2018 ◽

Vol 10 (11) ◽

pp. 626 ◽

Cited By ~ 1

Author(s):

Zhibin Guan ◽

Kang Liu ◽

Yan Ma ◽

Xu Qian ◽

Tongkai Ji

Keyword(s):

Artificial Intelligence ◽

Visual Information ◽

Coarse Grained ◽

Image Captioning ◽

Fine Grained ◽

The Core ◽

Benchmark Datasets ◽

Coarse To Fine ◽

Image Caption Generation ◽

Image Caption

Image caption generation is a fundamental task to build a bridge between image and its description in text, which is drawing increasing interest in artificial intelligence. Images and textual sentences are viewed as two different carriers of information, which are symmetric and unified in the same content of visual scene. The existing image captioning methods rarely consider generating a final description sentence in a coarse-grained to fine-grained way, which is how humans understand the surrounding scenes; and the generated sentence sometimes only describes coarse-grained image content. Therefore, we propose a coarse-to-fine-grained hierarchical generation method for image captioning, named SDA-CFGHG, to address the two problems above. The core of our SDA-CFGHG method is a sequential dual attention that is used to fuse different grained visual information with sequential means. The advantage of our SDA-CFGHG method is that it can achieve image captioning in a coarse-to-fine-grained way and the generated textual sentence can capture details of the raw image to some degree. Moreover, we validate the impressive performance of our method on benchmark datasets—MS COCO, Flickr—with several popular evaluation metrics—CIDEr, SPICE, METEOR, ROUGE-L, and BLEU.

Download Full-text