Large Scale Category-Structured Image Retrieval for Object Identification Through Supervised Learning of CNN and SURF-Based Matching

Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an extensive review of self-supervised methods that follow the contrastive approach. The work explains commonly used pretext tasks in a contrastive learning setup, followed by different architectures that have been proposed so far. Next, we present a performance comparison of different methods for multiple downstream tasks such as image classification, object detection, and action recognition. Finally, we conclude with the limitations of the current methods and the need for further techniques and future directions to make meaningful progress.

Download Full-text

Improved Visual Localization via Graph Filtering

Journal of Imaging ◽

10.3390/jimaging7020020 ◽

2021 ◽

Vol 7 (2) ◽

pp. 20

Author(s):

Carlos Lassance ◽

Yasir Latif ◽

Ravi Garg ◽

Vincent Gripon ◽

Ian Reid

Keyword(s):

Image Retrieval ◽

Large Scale ◽

Visual Localization ◽

Localization Accuracy ◽

Additional Information ◽

Feature Representations ◽

The Mean ◽

Classical Image ◽

Vision Based Localization ◽

Improve Reliability

Vision-based localization is the problem of inferring the pose of the camera given a single image. One commonly used approach relies on image retrieval where the query input is compared against a database of localized support examples and its pose is inferred with the help of the retrieved items. This assumes that images taken from the same places consist of the same landmarks and thus would have similar feature representations. These representations can learn to be robust to different variations in capture conditions like time of the day or weather. In this work, we introduce a framework which aims at enhancing the performance of such retrieval-based localization methods. It consists in taking into account additional information available, such as GPS coordinates or temporal proximity in the acquisition of the images. More precisely, our method consists in constructing a graph based on this additional information that is later used to improve reliability of the retrieval process by filtering the feature representations of support and/or query images. We show that the proposed method is able to significantly improve the localization accuracy on two large scale datasets, as well as the mean average precision in classical image retrieval scenarios.

Download Full-text