Evaluation of Near-Duplicate Image Retrieval Algorithms for the Identification of Celebrities in Web Images
Near-duplicate image retrieval is a classical research problem in computer vision, for which a large number of diverse approaches have been proposed. Recent studies have revealed that it can be used as an intermediate step to implement search-based celebrity identification given the existence of huge volume of user-tagged or text-surrounded celebrity images on the web. However, the effectiveness of existing near-duplicate image retrieval methods for such a task still remains unclear. To address this issue, this paper presents a comprehensive study of the existing near-duplicate image retrieval methods in a structural way. Four representatives of the existing methods, i.e. hash signature, mean SSIM, BoVW with SIFT features and ARG, are experimentally evaluated using a self-constructed dataset containing 24762 images of 15 top searched celebrities collected using 6 news search engines and the Google image search engine. The experimental results reveal that, compared with global feature based methods, local feature based ones are usually more appropriate for the task of celebrity identification in web images, as they can deal with partial duplicate and scene similar images better. In particular, BoVW with SIFT features is recommended as it provides the best trade-off between on-line speed and retrieval accuracy.