Large-scale image-to-video face retrieval with convolutional neural network features
Convolutional neural network features are becoming the norm in instance retrieval. This work investigates the relevance of using an of the shelf object detection network, like Faster R-CNN, as a feature extractor for an image-to-video face retrieval pipeline instead of using hand-crafted features. We use the objects proposals learned by a Region Proposal Network (RPN) and their associated representations taken from a CNN for the filtering and the re-ranking steps. Moreover, we study the relevance of features from a finetuned network. In addition to that we explore the use of face detection, fisher vector and bag of visual words with those CNN features. We also test the impact of different similarity metrics. The results obtained are very promising.