A Cross-Entropy-Guided (CEG) Measure for Speech Enhancement Front-End Assessing Performances of Back-End Automatic Speech Recognition

Author(s):  
Li Chai ◽  
Jun Du ◽  
Chin-Hui Lee
Author(s):  
Hyeongju Kim ◽  
Hyeonseung Lee ◽  
Woo Hyun Kang ◽  
Hyung Yong Kim ◽  
Nam Soo Kim

For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and noisy speech pairs which are generally obtained via data simulation. Recently, several joint optimization techniques have been proposed to train the front-end without parallel data within an end-to-end automatic speech recognition (ASR) scheme. However, the ASR objective is sub-optimal and insufficient for fully training the front-end, which still leaves room for improvement. In this paper, we propose a novel approach which incorporates flow-based density estimation for the robust front-end using non-parallel clean and noisy speech. Experimental results on the CHiME-4 dataset show that the proposed method outperforms the conventional techniques where the front-end is trained only with ASR objective.


2012 ◽  
Vol 5 (4) ◽  
pp. 426-441 ◽  
Author(s):  
Joyner Cadore ◽  
Francisco J. Valverde-Albacete ◽  
Ascensión Gallardo-Antolín ◽  
Carmen Peláez-Moreno

Sign in / Sign up

Export Citation Format

Share Document