An Unsupervised Two-Talker Speech Separation System Based on CASA

Author(s):  
Hongyan Li ◽  
Yue Wang ◽  
Rongrong Zhao ◽  
Xueying Zhang

On the basis of the theory about blind separation of monaural speech based on computational auditory scene analysis (CASA), a two-talker speech separation system combining CASA and speaker recognition was proposed to separate speech from other speech interferences in this paper. First, a tandem algorithm is used to organize voiced speech, then based on the clustering of gammatone frequency cepstral coefficients (GFCCs), an object function is established to recognize the speaker, and the best group is achieved through exhaustive search or beam search, so that voiced speech is organized sequentially. Second, unvoiced segments are generated by estimating onset/offset, and then unvoiced–voiced (U–V) segments and unvoiced–unvoiced (U–U) segments are separated respectively. The U–V segments are managed via the binary mask of the separated voiced speech, while the U–V segments are separated evenly. So far the unvoiced segments are separated. The simulation and performance evaluation verify the feasibility and effectiveness of the proposed algorithm.

2010 ◽  
pp. 61-79 ◽  
Author(s):  
Tariqullah Jan ◽  
Wenwu Wang

Cocktail party problem is a classical scientific problem that has been studied for decades. Humans have remarkable skills in segregating target speech from a complex auditory mixture obtained in a cocktail party environment. Computational modeling for such a mechanism is however extremely challenging. This chapter presents an overview of several recent techniques for the source separation issues associated with this problem, including independent component analysis/blind source separation, computational auditory scene analysis, model-based approaches, non-negative matrix factorization and sparse coding. As an example, a multistage approach for source separation is included. The application areas of cocktail party processing are explored. Potential future research directions are also discussed.


2014 ◽  
Vol 614 ◽  
pp. 363-366
Author(s):  
Yi Jiang ◽  
Yuan Yuan Zu ◽  
Ying Ze Wang

A K-means based unsupervised approach to close-talk speech enhancement is proposed in this paper. With the frame work of computational auditory scene analysis (CASA), the dual-microphone energy difference (DMED) is used as the cue to classify the noise domain time-frequency (T-F) units and target speech domain units. A ratio mask is used to separate the target speech and noise. Experiment results show the robust performance of the proposed algorithm than the Wiener filtering algorithm.


Sign in / Sign up

Export Citation Format

Share Document