Auditory scene analysis based on time-frequency integration of shared FM and AM (II): Optimum time-domain integration and stream sound reconstruction

2002 ◽  
Vol 33 (10) ◽  
pp. 83-94 ◽  
Author(s):  
Mototsugu Abe ◽  
Shigeru Ando
2014 ◽  
Vol 614 ◽  
pp. 363-366
Author(s):  
Yi Jiang ◽  
Yuan Yuan Zu ◽  
Ying Ze Wang

A K-means based unsupervised approach to close-talk speech enhancement is proposed in this paper. With the frame work of computational auditory scene analysis (CASA), the dual-microphone energy difference (DMED) is used as the cue to classify the noise domain time-frequency (T-F) units and target speech domain units. A ratio mask is used to separate the target speech and noise. Experiment results show the robust performance of the proposed algorithm than the Wiener filtering algorithm.


Author(s):  
NEIL McLACHLAN ◽  
DINESH KANT KUMAR ◽  
JOHN BECKER

Computational auditory scene analysis (CASA) has been attracting growing interest since the publication of Bregman's text on human auditory scene analysis, and is expected to find many applications in data retrieval, autonomous robots, security and environmental analysis. This paper reports on the use of Fourier transforms and wavelet transforms to produce spectral data of sounds from different sources for classification by neural networks. It was found that the multiresolution time-frequency analyses of wavelet transforms dramatically improved classification accuracy when statistical descriptors that captured measures of band limited spectral energy and temporal energy fluctuation were used.


2012 ◽  
Vol 229-231 ◽  
pp. 1738-1741 ◽  
Author(s):  
Hong Zhou ◽  
Yi Jiang ◽  
Ming Jiang ◽  
Qiang Chen

Within the framework of computational auditory scene analysis (CASA), a speech separation algorithm based on energy difference for close-talk system was proposed. The two microphones received the mixture signal of close target speech and far noise sound at the same time. The inter-microphone intensity differences (IMID) of the two microphones in time-frequency (T-F) units were calculated. And used as cues to generate the binary masks with the K-means two class clustering method. Experiments indicated that this novel algorithm could separate the target speech from the mixture sound, and performed well in a big noise environment.


2014 ◽  
Vol 78 (3) ◽  
pp. 361-378 ◽  
Author(s):  
Mona Isabel Spielmann ◽  
Erich Schröger ◽  
Sonja A. Kotz ◽  
Alexandra Bendixen

Sign in / Sign up

Export Citation Format

Share Document