An Unsupervised Two-Talker Speech Separation System Based on CASA

On the basis of the theory about blind separation of monaural speech based on computational auditory scene analysis (CASA), a two-talker speech separation system combining CASA and speaker recognition was proposed to separate speech from other speech interferences in this paper. First, a tandem algorithm is used to organize voiced speech, then based on the clustering of gammatone frequency cepstral coefficients (GFCCs), an object function is established to recognize the speaker, and the best group is achieved through exhaustive search or beam search, so that voiced speech is organized sequentially. Second, unvoiced segments are generated by estimating onset/offset, and then unvoiced–voiced (U–V) segments and unvoiced–unvoiced (U–U) segments are separated respectively. The U–V segments are managed via the binary mask of the separated voiced speech, while the U–V segments are separated evenly. So far the unvoiced segments are separated. The simulation and performance evaluation verify the feasibility and effectiveness of the proposed algorithm.

Download Full-text

Monaural Speech Separation Based on Computational Auditory Scene Analysis and Objective Quality Assessment of Speech

IEEE Transactions on Audio Speech and Language Processing ◽

10.1109/tasl.2006.883258 ◽

2006 ◽

Vol 14 (6) ◽

pp. 2014-2023 ◽

Cited By ~ 36

Author(s):

P. Li ◽

Y. Guan ◽

B. Xu ◽

W. Liu

Keyword(s):

Quality Assessment ◽

Auditory Scene Analysis ◽

Scene Analysis ◽

Speech Separation ◽

Computational Auditory Scene Analysis ◽

Objective Quality ◽

Objective Quality Assessment ◽

Auditory Scene

Download Full-text

Monaural Speech Separation Based on Computational Auditory Scene Analysis and Objective Quality Assessment of Speech

First International Conference on Innovative Computing, Information and Control - Volume I (ICICIC'06) ◽

10.1109/icicic.2006.311 ◽

2006 ◽

Author(s):

Peng Li ◽

Yong Guan ◽

Bo Xu ◽

Wenju Liu

Keyword(s):

Quality Assessment ◽

Auditory Scene Analysis ◽

Scene Analysis ◽

Speech Separation ◽

Computational Auditory Scene Analysis ◽

Objective Quality ◽

Objective Quality Assessment ◽

Auditory Scene

Download Full-text

SNR-based mask compensation for computational auditory scene analysis applied to speech recognition in a car environment

10.21437/interspeech.2010-270 ◽

2010 ◽

Author(s):

Ji Hun Park ◽

Seon Man Kim ◽

Jae Sam Yoon ◽

Hong Kook Kim ◽

Sung Joo Lee ◽

...

Keyword(s):

Speech Recognition ◽

Auditory Scene Analysis ◽

Scene Analysis ◽

Computational Auditory Scene Analysis ◽

Auditory Scene

Download Full-text

Cocktail-Party Effect with Computational Auditory Scene Analysis — Preliminary Report —

Advances in Human Factors/Ergonomics - Symbiosis of Human and Artifact - Future Computing and Design for Human-Computer Interaction, Proceedings of the Sixth International Conference on Human-Computer Interaction, (HCI International '95) ◽

10.1016/s0921-2647(06)80266-2 ◽

1995 ◽

pp. 503-508 ◽

Cited By ~ 1

Author(s):

Hiroshi G. Okuno ◽

Tomohiro Nakatani ◽

Takeshi Kawabata

Keyword(s):

Preliminary Report ◽

Auditory Scene Analysis ◽

Scene Analysis ◽

Cocktail Party ◽

Computational Auditory Scene Analysis ◽

Cocktail Party Effect ◽

Auditory Scene

Download Full-text

Computational Auditory Scene Analysis

Spoken Language Processing ◽

10.1002/9780470611180.ch5 ◽

2010 ◽

pp. 189-211

Author(s):

Alain De Cheveign

Keyword(s):

Auditory Scene Analysis ◽

Scene Analysis ◽

Computational Auditory Scene Analysis ◽

Auditory Scene

Download Full-text

Cocktail Party Problem

Machine Audition ◽

10.4018/978-1-61520-919-4.ch003 ◽

2010 ◽

pp. 61-79 ◽

Cited By ~ 1

Author(s):

Tariqullah Jan ◽

Wenwu Wang

Keyword(s):

Sparse Coding ◽

Source Separation ◽

Auditory Scene Analysis ◽

Future Research ◽

Cocktail Party ◽

Analysis Model ◽

Computational Auditory Scene Analysis ◽

Cocktail Party Problem ◽

Auditory Scene ◽

Future Research Directions

Cocktail party problem is a classical scientific problem that has been studied for decades. Humans have remarkable skills in segregating target speech from a complex auditory mixture obtained in a cocktail party environment. Computational modeling for such a mechanism is however extremely challenging. This chapter presents an overview of several recent techniques for the source separation issues associated with this problem, including independent component analysis/blind source separation, computational auditory scene analysis, model-based approaches, non-negative matrix factorization and sparse coding. As an example, a multistage approach for source separation is included. The application areas of cocktail party processing are explored. Potential future research directions are also discussed.

Download Full-text

Fundamentals of Computational Auditory Scene Analysis

Computational Auditory Scene Analysis ◽

10.1109/9780470043387.ch1 ◽

2011 ◽

Keyword(s):

Auditory Scene Analysis ◽

Scene Analysis ◽

Computational Auditory Scene Analysis ◽

Auditory Scene

Download Full-text

Computational auditory scene analysis‐constrained array processing for sound source separation

The Journal of the Acoustical Society of America ◽

10.1121/1.427622 ◽

1999 ◽

Vol 106 (4) ◽

pp. 2238-2238

Author(s):

Laura A. Drake ◽

Janet C. Rutledge ◽

Aggelos Katsaggelos

Keyword(s):

Sound Source ◽

Array Processing ◽

Source Separation ◽

Auditory Scene Analysis ◽

Scene Analysis ◽

Computational Auditory Scene Analysis ◽

Sound Source Separation ◽

Auditory Scene

Download Full-text

Separation of Reverberant Speech Based on Computational Auditory Scene Analysis

Automatic Control and Computer Sciences ◽

10.3103/s0146411618060068 ◽

2018 ◽

Vol 52 (6) ◽

pp. 561-571

Author(s):

Li Hongyan ◽

Cao Meng ◽

Wang Yue

Keyword(s):

Auditory Scene Analysis ◽

Scene Analysis ◽

Computational Auditory Scene Analysis ◽

Auditory Scene ◽

Reverberant Speech

Download Full-text

An Unsupervised Approach to Close-Talk Speech Enhancement

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.614.363 ◽

2014 ◽

Vol 614 ◽

pp. 363-366

Author(s):

Yi Jiang ◽

Yuan Yuan Zu ◽

Ying Ze Wang

Keyword(s):

Speech Enhancement ◽

Energy Difference ◽

Auditory Scene Analysis ◽

Scene Analysis ◽

Computational Auditory Scene Analysis ◽

Time Frequency ◽

Auditory Scene ◽

Unsupervised Approach ◽

Frame Work ◽

Noise Experiment

A K-means based unsupervised approach to close-talk speech enhancement is proposed in this paper. With the frame work of computational auditory scene analysis (CASA), the dual-microphone energy difference (DMED) is used as the cue to classify the noise domain time-frequency (T-F) units and target speech domain units. A ratio mask is used to separate the target speech and noise. Experiment results show the robust performance of the proposed algorithm than the Wiener filtering algorithm.

Download Full-text