A Convolutional Network with Multi-Scale and Attention Mechanisms for End-to-End Single-Channel Speech Enhancement

2021 ◽  
pp. 1-1
Author(s):  
Xiang Xiaoxiao ◽  
Zhang Xiaojuan ◽  
Chen Haozhe
Author(s):  
Yucheng Zhao ◽  
Chong Luo ◽  
Zheng-Jun Zha ◽  
Wenjun Zeng

In this paper, we introduce Transformer to the time-domain methods for single-channel speech separation. Transformer has the potential to boost speech separation performance because of its strong sequence modeling capability. However, its computational complexity, which grows quadratically with the sequence length, has made it largely inapplicable to speech applications. To tackle this issue, we propose a novel variation of Transformer, named multi-scale group Transformer (MSGT). The key ideas are group self-attention, which significantly reduces the complexity, and multi-scale fusion, which retains Transform's ability to capture long-term dependency. We implement two versions of MSGT with different complexities, and apply them to a well-known time-domain speech separation method called Conv-TasNet. By simply replacing the original temporal convolutional network (TCN) with MSGT, our approach called MSGT-TasNet achieves a large gain over Conv-TasNet on both WSJ0-2mix and WHAM! benchmarks. Without bells and whistles, the performance of MSGT-TasNet is already on par with the SOTA methods.


2021 ◽  
Author(s):  
Yanmin Zhu ◽  
Xiang Zheng ◽  
Xinrong Wu ◽  
Wanning Liu ◽  
Lei Pi ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document