Deep Learning-Based Speech Enhancement With a Loss Trading Off the Speech Distortion and the Noise Residue for Cochlear Implants

The cochlea plays a key role in the transmission from acoustic vibration to neural stimulation upon which the brain perceives the sound. A cochlear implant (CI) is an auditory prosthesis to replace the damaged cochlear hair cells to achieve acoustic-to-neural conversion. However, the CI is a very coarse bionic imitation of the normal cochlea. The highly resolved time-frequency-intensity information transmitted by the normal cochlea, which is vital to high-quality auditory perception such as speech perception in challenging environments, cannot be guaranteed by CIs. Although CI recipients with state-of-the-art commercial CI devices achieve good speech perception in quiet backgrounds, they usually suffer from poor speech perception in noisy environments. Therefore, noise suppression or speech enhancement (SE) is one of the most important technologies for CI. In this study, we introduce recent progress in deep learning (DL), mostly neural networks (NN)-based SE front ends to CI, and discuss how the hearing properties of the CI recipients could be utilized to optimize the DL-based SE. In particular, different loss functions are introduced to supervise the NN training, and a set of objective and subjective experiments is presented. Results verify that the CI recipients are more sensitive to the residual noise than the SE-induced speech distortion, which has been common knowledge in CI research. Furthermore, speech reception threshold (SRT) in noise tests demonstrates that the intelligibility of the denoised speech can be significantly improved when the NN is trained with a loss function bias to more noise suppression than that with equal attention on noise residue and speech distortion.

Download Full-text

Noise Reduction in Car Speech

Acta Polytechnica ◽

10.14311/1111 ◽

2009 ◽

Vol 49 (2) ◽

Author(s):

V. Bolom

Keyword(s):

Noise Reduction ◽

Speech Enhancement ◽

Mixed Model ◽

Noise Suppression ◽

Speech Signals ◽

Noisy Environment ◽

Free Communication ◽

Speech Distortion ◽

Criteria For Evaluation

This paper presents properties of chosen multichannel algorithms for speech enhancement in a noisy environment. These methods are suitable for hands-free communication in a car cabin. Criteria for evaluation of these systems are also presented. The criteria consider both the level of noise suppression and the level of speech distortion. The performance of multichannel algorithms is investigated for a mixed model of speech signals and car noise and for real signals recorded in a car.

Download Full-text

Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00204-9 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Yuxuan Ke ◽

Andong Li ◽

Chengshi Zheng ◽

Renhua Peng ◽

Xiaodong Li

Keyword(s):

Deep Learning ◽

Speech Enhancement ◽

Noise Suppression ◽

Signal To Noise Ratio ◽

Low Complexity ◽

Speech Quality ◽

Artificial Noise ◽

Noise Power ◽

Noise Masking ◽

Residual Noise

AbstractDeep learning-based speech enhancement algorithms have shown their powerful ability in removing both stationary and non-stationary noise components from noisy speech observations. But they often introduce artificial residual noise, especially when the training target does not contain the phase information, e.g., ideal ratio mask, or the clean speech magnitude and its variations. It is well-known that once the power of the residual noise components exceeds the noise masking threshold of the human auditory system, the perceptual speech quality may degrade. One intuitive way is to further suppress the residual noise components by a postprocessing scheme. However, the highly non-stationary nature of this kind of residual noise makes the noise power spectral density (PSD) estimation a challenging problem. To solve this problem, the paper proposes three strategies to estimate the noise PSD frame by frame, and then the residual noise can be removed effectively by applying a gain function based on the decision-directed approach. The objective measurement results show that the proposed postfiltering strategies outperform the conventional postfilter in terms of segmental signal-to-noise ratio (SNR) as well as speech quality improvement. Moreover, the AB subjective listening test shows that the preference percentages of the proposed strategies are over 60%.

Download Full-text

Deep Learning Assisted Time-Frequency Processing for Speech Enhancement on Drones

IEEE Transactions on Emerging Topics in Computational Intelligence ◽

10.1109/tetci.2020.3014934 ◽

2020 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Lin Wang ◽

Andrea Cavallaro

Keyword(s):

Deep Learning ◽

Speech Enhancement ◽

Time Frequency ◽

Frequency Processing

Download Full-text

The a priori SDR Estimation Techniques with Reduced Speech Distortion for Acoustic Echo and Noise Suppression

IEICE Transactions on Communications ◽

10.1587/transcom.e92.b.3022 ◽

2009 ◽

Vol E92-B (10) ◽

pp. 3022-3033 ◽

Cited By ~ 3

Author(s):

Rattapol THOONSAENGNGAM ◽

Nisachon TANGSANGIUMVISAI

Keyword(s):

Noise Suppression ◽

A Priori ◽

Estimation Techniques ◽

Speech Distortion ◽

Acoustic Echo

Download Full-text

Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement

10.21437/interspeech.2018-1020 ◽

2018 ◽

Cited By ~ 1

Author(s):

Shuai Nie ◽

Shan Liang ◽

Bin Liu ◽

Yaping Zhang ◽

Wenju Liu ◽

...

Keyword(s):

Signal Processing ◽

Deep Learning ◽

Speech Enhancement ◽

Learning Approach

Download Full-text

Deep Learning Based Real-time Speech Enhancement for Dual-microphone Mobile Phones

IEEE/ACM Transactions on Audio Speech and Language Processing ◽

10.1109/taslp.2021.3082318 ◽

2021 ◽

pp. 1-1

Author(s):

Ke Tan ◽

Xueliang Zhang ◽

Deliang Wang

Keyword(s):

Deep Learning ◽

Real Time ◽

Mobile Phones ◽

Speech Enhancement

Download Full-text

Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414569 ◽

2021 ◽

Author(s):

Shengkui Zhao ◽

Trung Hieu Nguyen ◽

Bin Ma

Keyword(s):

Speech Enhancement ◽

Time Frequency

Download Full-text

Deep Learning Representation from Electroencephalography of Early-Stage Creutzfeldt-Jakob Disease and Features for Differentiation from Rapidly Progressive Dementia

International Journal of Neural Systems ◽

10.1142/s0129065716500398 ◽

2016 ◽

Vol 27 (02) ◽

pp. 1650039 ◽

Cited By ~ 57

Author(s):

Francesco Carlo Morabito ◽

Maurizio Campolo ◽

Nadia Mammone ◽

Mario Versaci ◽

Silvana Franceschetti ◽

...

Keyword(s):

Deep Learning ◽

Supervised Learning ◽

Early Stage ◽

Processing System ◽

Fine Tuning ◽

Permutation Entropy ◽

Support Vector ◽

Progressive Dementia ◽

Time Frequency ◽

Jakob Disease

A novel technique of quantitative EEG for differentiating patients with early-stage Creutzfeldt–Jakob disease (CJD) from other forms of rapidly progressive dementia (RPD) is proposed. The discrimination is based on the extraction of suitable features from the time-frequency representation of the EEG signals through continuous wavelet transform (CWT). An average measure of complexity of the EEG signal obtained by permutation entropy (PE) is also included. The dimensionality of the feature space is reduced through a multilayer processing system based on the recently emerged deep learning (DL) concept. The DL processor includes a stacked auto-encoder, trained by unsupervised learning techniques, and a classifier whose parameters are determined in a supervised way by associating the known category labels to the reduced vector of high-level features generated by the previous processing blocks. The supervised learning step is carried out by using either support vector machines (SVM) or multilayer neural networks (MLP-NN). A subset of EEG from patients suffering from Alzheimer’s Disease (AD) and healthy controls (HC) is considered for differentiating CJD patients. When fine-tuning the parameters of the global processing system by a supervised learning procedure, the proposed system is able to achieve an average accuracy of 89%, an average sensitivity of 92%, and an average specificity of 89% in differentiating CJD from RPD. Similar results are obtained for CJD versus AD and CJD versus HC.

Download Full-text