scholarly journals Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation

Author(s):  
Wei-Ning Hsu ◽  
Yu Zhang ◽  
James Glass

2020 ◽  
Author(s):  
Han Zhu ◽  
Jiangjiang Zhao ◽  
Yuling Ren ◽  
Li Wang ◽  
Pengyuan Zhang


Author(s):  
Yuki Takashima ◽  
Ryoichi Takashima ◽  
Ryota Tsunoda ◽  
Ryo Aihara ◽  
Tetsuya Takiguchi ◽  
...  

AbstractWe present an unsupervised domain adaptation (UDA) method for a lip-reading model that is an image-based speech recognition model. Most of conventional UDA methods cannot be applied when the adaptation data consists of an unknown class, such as out-of-vocabulary words. In this paper, we propose a cross-modal knowledge distillation (KD)-based domain adaptation method, where we use the intermediate layer output in the audio-based speech recognition model as a teacher for the unlabeled adaptation data. Because the audio signal contains more information for recognizing speech than lip images, the knowledge of the audio-based model can be used as a powerful teacher in cases where the unlabeled adaptation data consists of audio-visual parallel data. In addition, because the proposed intermediate-layer-based KD can express the teacher as the sub-class (sub-word)-level representation, this method allows us to use the data of unknown classes for the adaptation. Through experiments on an image-based word recognition task, we demonstrate that the proposed approach can not only improve the UDA performance but can also use the unknown-class adaptation data.





2017 ◽  
Vol 257 ◽  
pp. 79-87 ◽  
Author(s):  
Sining Sun ◽  
Binbin Zhang ◽  
Lei Xie ◽  
Yanning Zhang


Sign in / Sign up

Export Citation Format

Share Document