Self-Supervised Learning and Network Architecture Search for Domain-Adaptive Landmark Detector
Fine-tuning a deep convolutional neural network (DCN) as a place-class detector (PCD) is a direct method to realize domain-adaptive visual place recognition (VPR). Although the PCD model is effective, a PCD model requires considerable amount of class-specific training examples and class-set maintenance in long-term large-scale VPR scenarios. Therefore, we propose to employ a DCN as a landmark-class detector (LCD), which allows to distinguish exponentially large numbers of different places by combining multiple landmarks, and furthermore, allows to select a stable part of the scenes (such as buildings) as landmark classes to reduce the need for class-set maintenance. However, the following important questions remain. 1) How we should mine such training examples (landmark objects) even when we have no domain-specific object detector? 2) How we should fine-tune the architecture and parameters of the DCN to a new domain-specific landmark set? To answer these questions, we present a self-supervised landmark mining approach for collecting pseudo-labeled landmark examples, and then consider the network architecture search (NAS) on the LCD task, which has significantly larger search space than typical NAS applications such as PCD. Extensive verification experiments demonstrate the superiority of the proposed framework to previous LCD methods with hand-crafted architectures and/or non-adaptive parameters, and 90% reduction in NAS cost compared with the naive NAS implementation.