Variance based time-frequency mask estimation for unsupervised speech enhancement

In computational auditory scene analysis, the accurate estimation of binary mask or ratio mask plays a key role in noise masking. An inaccurate estimation often leads to some artifacts and temporal discontinuity in the synthesized speech. To overcome this problem, we propose a new ratio mask estimation method in terms of Wiener filtering in each Gammatone channel. In the reconstruction of Wiener filter, we utilize the relationship of the speech and noise power spectra in each Gammatone channel to build the objective function for the convex optimization of speech power. To improve the accuracy of estimation, the estimated ratio mask is further modified based on its adjacent time–frequency units, and then smoothed by interpolating with the estimated binary masks. The objective tests including the signal-to-noise ratio improvement, spectral distortion and intelligibility, and subjective listening test demonstrate the superiority of the proposed method compared with the reference methods.

Download Full-text

Fusion-Net: Time-Frequency Information Fusion Y-Network for Speech Enhancement

10.21437/interspeech.2021-1184 ◽

2021 ◽

Author(s):

Santhan Kumar Reddy Nareddula ◽

Subrahmanyam Gorthi ◽

Rama Krishna Sai S. Gorthi

Keyword(s):

Speech Enhancement ◽

Information Fusion ◽

Time Frequency ◽

Frequency Information

Download Full-text

Speech Enhancement Using Neuro-Fuzzy Classifier

Advances in Data Mining and Database Management - Handbook of Research on Automated Feature Engineering and Advanced Applications in Data Science ◽

10.4018/978-1-7998-6659-6.ch009 ◽

2021 ◽

pp. 164-181

Author(s):

Judith Justin ◽

Vanithamani R.

Keyword(s):

Feature Extraction ◽

Speech Enhancement ◽

The Other ◽

Objective Measures ◽

Noise Levels ◽

Fuzzy Classifier ◽

Noisy Speech ◽

Enhancement Technique ◽

Time Frequency ◽

Neuro Fuzzy

In this chapter, a speech enhancement technique is implemented using a neuro-fuzzy classifier. Noisy speech sentences from NOIZEUS and AURORA databases are taken for the study. Feature extraction is implemented through modifications in amplitude magnitude spectrograms. A four class neuro-fuzzy classifier splits the noisy speech samples into noise-only part, signal only part, more noise-less signal part, and more signal-less noise part of the time-frequency units. Appropriate weights are applied in the enhancement phase. The enhanced speech sentence is evaluated using objective measures. An analysis of the performance of the Neuro-Fuzzy 4 (NF 4) classifier is done. A comparison of the performance of the classifier with other conventional techniques is done for various noises at different noise levels. It is observed that the numerical values of the measures obtained are better when compared to the others. An overall comparison of the performance of the NF 4 classifier is done and it is inferred that NF4 outperforms the other techniques in speech enhancement.

Download Full-text