BACKGROUND
Otitis media (OM) is a common ear disease, which can induce hearing loss and even life-threatening. However, due to poor classification performance, insufficient data, and high computational costs, OM cannot be diagnosed accurately.
OBJECTIVE
An optimized multi-teacher knowledge distillation method is proposed to realize the early diagnosis of otitis media with insufficient data at a lower computational cost.
METHODS
Based on ensemble learning and conventional knowledge distillation method, an optimized multi-teacher knowledge distillation method is proposed. The framework of the method consists of a teacher network and a student network. The teacher network is responsible for learning from raw data and exporting prior knowledge, and the student network is responsible for the diagnosis task. The teacher network is composed of three components: VGG, ResNet, and Inception. Each component could be regarded as a teacher to learn knowledge. The student network consists of three identical lightweight CNNs (convolutional neural networks). Each CNN could be viewed as a student to obtain the knowledge from teachers and execute the diagnosis task. First, three teachers learn from raw data separately to obtain prior knowledge. Then, the student is trained based on the learned knowledge from a teacher. This is a knowledge transfer process that could compress the teacher network and reduce the computational costs. Next, to improve the diagnosis accuracy, the predicted results of three well-trained students are fused based on two contrastive methods: the voting-based knowledge fusion method and the average-based knowledge fusion method. Finally, the well-trained model forms and could be used for the diagnosis task. The validity of the proposed method is verified on a tympanic membrane data set.
RESULTS
The well-trained model achieves a good performance in the early diagnosis of OM at a lower computational cost. The training diagnosis accuracy of the average-based model reaches 99.02%, and the testing diagnosis accuracy reaches 97.38%, which exceeds that of any teacher. Compared with using the teacher network for the diagnosis task directly, the training time of the proposed well-trained model reduces by 64.37%, which greatly shortens the calculation time. Three deep and large teachers are compressed into a lightweight well-trained model, which greatly reduces the computational costs.
CONCLUSIONS
The optimized multi-teacher knowledge distillation method is suitable for the early diagnosis of OM with insufficient data. In addition, the method realizes model compression and reduces the computational costs.