scholarly journals Cross-Modal Knowledge Distillation Method for Automatic Cued Speech Recognition

Author(s):  
Jianrong Wang ◽  
Ziyue Tang ◽  
Xuewei Li ◽  
Mei Yu ◽  
Qiang Fang ◽  
...  
2020 ◽  
Vol 34 (04) ◽  
pp. 6917-6924 ◽  
Author(s):  
Ya Zhao ◽  
Rui Xu ◽  
Xinchao Wang ◽  
Peng Hou ◽  
Haihong Tang ◽  
...  

Lip reading has witnessed unparalleled development in recent years thanks to deep learning and the availability of large-scale datasets. Despite the encouraging results achieved, the performance of lip reading, unfortunately, remains inferior to the one of its counterpart speech recognition, due to the ambiguous nature of its actuations that makes it challenging to extract discriminant features from the lip movement videos. In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goal is to strengthen lip reading by learning from speech recognizers. The rationale behind our approach is that the features extracted from speech recognizers may provide complementary and discriminant clues, which are formidable to be obtained from the subtle movements of the lips, and consequently facilitate the training of lip readers. This is achieved, specifically, by distilling multi-granularity knowledge from speech recognizers to lip readers. To conduct this cross-modal knowledge distillation, we utilize an efficacious alignment scheme to handle the inconsistent lengths of the audios and videos, as well as an innovative filtering strategy to refine the speech recognizer's prediction. The proposed method achieves the new state-of-the-art performance on the CMLR and LRS2 datasets, outperforming the baseline by a margin of 7.66% and 2.75% in character error rate, respectively.


2020 ◽  
Author(s):  
Hongwei Liu ◽  
Shuai Luo ◽  
Shuaibing Guo

BACKGROUND Otitis media (OM) is a common ear disease, which can induce hearing loss and even life-threatening. However, due to poor classification performance, insufficient data, and high computational costs, OM cannot be diagnosed accurately. OBJECTIVE An optimized multi-teacher knowledge distillation method is proposed to realize the early diagnosis of otitis media with insufficient data at a lower computational cost. METHODS Based on ensemble learning and conventional knowledge distillation method, an optimized multi-teacher knowledge distillation method is proposed. The framework of the method consists of a teacher network and a student network. The teacher network is responsible for learning from raw data and exporting prior knowledge, and the student network is responsible for the diagnosis task. The teacher network is composed of three components: VGG, ResNet, and Inception. Each component could be regarded as a teacher to learn knowledge. The student network consists of three identical lightweight CNNs (convolutional neural networks). Each CNN could be viewed as a student to obtain the knowledge from teachers and execute the diagnosis task. First, three teachers learn from raw data separately to obtain prior knowledge. Then, the student is trained based on the learned knowledge from a teacher. This is a knowledge transfer process that could compress the teacher network and reduce the computational costs. Next, to improve the diagnosis accuracy, the predicted results of three well-trained students are fused based on two contrastive methods: the voting-based knowledge fusion method and the average-based knowledge fusion method. Finally, the well-trained model forms and could be used for the diagnosis task. The validity of the proposed method is verified on a tympanic membrane data set. RESULTS The well-trained model achieves a good performance in the early diagnosis of OM at a lower computational cost. The training diagnosis accuracy of the average-based model reaches 99.02%, and the testing diagnosis accuracy reaches 97.38%, which exceeds that of any teacher. Compared with using the teacher network for the diagnosis task directly, the training time of the proposed well-trained model reduces by 64.37%, which greatly shortens the calculation time. Three deep and large teachers are compressed into a lightweight well-trained model, which greatly reduces the computational costs. CONCLUSIONS The optimized multi-teacher knowledge distillation method is suitable for the early diagnosis of OM with insufficient data. In addition, the method realizes model compression and reduces the computational costs.


2021 ◽  
pp. 644-655
Author(s):  
Jiulin Lang ◽  
Chenwei Tang ◽  
Yi Gao ◽  
Jiancheng Lv

Author(s):  
Takahito Suzuki ◽  
Jun Ogata ◽  
Takashi Tsunakawa ◽  
Masafumi Nishida ◽  
Masafumi Nishimura

Sign in / Sign up

Export Citation Format

Share Document