Pairwise Ranking Distillation for Deep Face Recognition
This work addresses the problem of knowledge distillation for deep face recognition task. Knowledge distillation technique is known to be an effective way of model compression, which implies transferring of the knowledge from high-capacity teacher to a lightweight student. The knowledge and the way how it is distilled can be defined in different ways depending on the problem where the technique is applied. Considering the fact that face recognition is a typical metric learning task, we propose to perform knowledge distillation on a score-level. Specifically, for any pair of matching scores computed by teacher, our method forces student to have the same order for the corresponding matching scores. We evaluate proposed pairwise ranking distillation (PWR) approach using several face recognition benchmarks for both face verification and face identification scenarios. Experimental results show that PWR not only can improve over the baseline method by a large margin, but also outperforms other score-level distillation approaches.