scholarly journals Permutation invariant training of deep models for speaker-independent multi-talker speech separation

Author(s):  
Dong Yu ◽  
Morten Kolbaek ◽  
Zheng-Hua Tan ◽  
Jesper Jensen
Author(s):  
Mandar Gogate ◽  
Ahsan Adeel ◽  
Ricard Marxer ◽  
Jon Barker ◽  
Amir Hussain

Author(s):  
Jing Shi ◽  
Jiaming Xu ◽  
Guangcan Liu ◽  
Bo Xu

Recent deep learning methods have made significant progress in multi-talker mixed speech separation. However, most existing models adopt a driftless strategy to separate all the speech channels rather than selectively attend the target one. As a result, those frameworks may be failed to offer a satisfactory solution in complex auditory scene where the number of input sounds is usually uncertain and even dynamic. In this paper, we present a novel neural network based structure motivated by the top-down attention behavior of human when facing complicated acoustical scene. Different from previous works, our method constructs an inference-attention structure to predict interested candidates and extract each speech channel of them. Our work gets rid of the limitation that the number of channels must be given or the high computation complexity for label permutation problem. We evaluated our model on the WSJ0 mixed-speech tasks. In all the experiments, our model gets highly competitive to reach and even outperform the baselines.


Sign in / Sign up

Export Citation Format

Share Document