Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users

2019 ◽

Vol 13 (3) ◽

pp. 866

Author(s):

Siriporn Dachasilaruk ◽

Niphat Jantharamin ◽

Apichai Rungruang

Keyword(s):

Cochlear Implant ◽

Speech Enhancement ◽

Speech Intelligibility ◽

English Language ◽

Single Channel ◽

Spectral Subtraction ◽

Monosyllabic Words ◽

Listening Environments ◽

Babble Noise ◽

Vocoded Speech

Cochlear implant (CI) listeners encounter difficulties in communicating with other persons in noisy listening environments. However, most CI research has been carried out using the English language. In this study, single-channel speech enhancement (SE) strategies as a pre-processing approach for the CI system were investigated in terms of Thai speech intelligibility improvement. Two SE algorithms, namely multi-band spectral subtraction (MBSS) and Weiner filter (WF) algorithms, were evaluated. Speech signals consisting of monosyllabic and bisyllabic Thai words were degraded by speech-shaped noise and babble noise at SNR levels of 0, 5, and 10 dB. Then the noisy words were enhanced using SE algorithms. The enhanced words were fed into the CI system to synthesize vocoded speech. The vocoded speech was presented to twenty normal-hearing listeners. The results indicated that speech intelligibility was marginally improved by the MBSS algorithm and significantly improved by the WF algorithm in some conditions. The enhanced bisyllabic words showed a noticeably higher intelligibility improvement than the enhanced monosyllabic words in all conditions, particularly in speech-shaped noise. Such outcomes may be beneficial to Thai-speaking CI listeners.

Download Full-text

Single channel speech enhancement based on harmonic estimation combined with statistical based method to improve speech intelligibility for cochlear implant recipients

The Journal of the Acoustical Society of America ◽

10.1121/1.4989114 ◽

2017 ◽

Vol 141 (5) ◽

pp. 3985-3986

Author(s):

Dongmei Wang ◽

John H. L. Hansen

Keyword(s):

Cochlear Implant ◽

Speech Enhancement ◽

Speech Intelligibility ◽

Single Channel

Download Full-text

Speech Enhancement Based on Harmonic Estimation Combined with MMSE to Improve Speech Intelligibility for Cochlear Implant Recipients

10.21437/interspeech.2017-78 ◽

2017 ◽

Cited By ~ 4

Author(s):

Dongmei Wang ◽

John H.L. Hansen

Keyword(s):

Cochlear Implant ◽

Speech Enhancement ◽

Speech Intelligibility

Download Full-text

Speech enhancement based on neural networks applied to cochlear implant coding strategies

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2016.7472933 ◽

2016 ◽

Cited By ~ 9

Author(s):

Federico Bolner ◽

Tobias Goehring ◽

Jessica Monaghan ◽

Bas van Dijk ◽

Jan Wouters ◽

...

Keyword(s):

Neural Networks ◽

Cochlear Implant ◽

Speech Enhancement ◽

Coding Strategies

Download Full-text

Learning time-frequency mask for noisy speech enhancement using gaussian-bernoulli pre-trained deep neural networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201014 ◽

2021 ◽

Vol 40 (1) ◽

pp. 849-864

Author(s):

Nasir Saleem ◽

Muhammad Irfan Khattak ◽

Mu’ath Al-Hasan ◽

Atif Jan

Keyword(s):

Neural Networks ◽

Speech Enhancement ◽

Speech Intelligibility ◽

Deep Neural Networks ◽

Training Data ◽

Learning Approaches ◽

Performance Gain ◽

Noisy Speech ◽

Time Frequency ◽

Training Scheme

Speech enhancement is a very important problem in various speech processing applications. Recently, supervised speech enhancement using deep learning approaches to estimate a time-frequency mask have proved remarkable performance gain. In this paper, we have proposed time-frequency masking-based supervised speech enhancement method for improving intelligibility and quality of the noisy speech. We believe that a large performance gain can be achieved if deep neural networks (DNNs) are layer-wise pre-trained by stacking Gaussian-Bernoulli Restricted Boltzmann Machine (GB-RBM). The proposed DNN is called as Gaussian-Bernoulli Deep Belief Network (GB-DBN) and are optimized by minimizing errors between the estimated and pre-defined masks. Non-linear Mel-Scale weighted mean square error (LMW-MSE) loss function is used as training criterion. We have examined the performance of the proposed pre-training scheme using different DNNs which are established on three time-frequency masks comprised of the ideal amplitude mask (IAM), ideal ratio mask (IRM), and phase sensitive mask (PSM). The results in different noisy conditions demonstrated that when DNNs are pre-trained by the proposed scheme provided a persistent performance gain in terms of the perceived speech intelligibility and quality. Also, the proposed pre-training scheme is effective and robust in noisy training data.

Download Full-text

Reading and speech intelligibility of a child with auditory impairment and cochlear implant.

Psychology & Neuroscience ◽

10.1037/pne0000139 ◽

2018 ◽

Vol 11 (3) ◽

pp. 306-316 ◽

Cited By ~ 2

Author(s):

Fernando Del Mando Lucchesi ◽

Ana Claudia Moreira Almeida-Verdu ◽

Deisy das Graças de Souza

Keyword(s):

Cochlear Implant ◽

Speech Intelligibility ◽

Auditory Impairment

Download Full-text

Improved Environment-Aware–Based Noise Reduction System for Cochlear Implant Users Based on Knowledge Transfer Approach (Preprint)

10.2196/preprints.25460 ◽

2020 ◽

Author(s):

Lieber Po-Hung Li ◽

Ji-Yan Han ◽

Wei-Zhong Zheng ◽

Ren-Jie Huang ◽

Ying-Hui Lai

Keyword(s):

Knowledge Transfer ◽

Cochlear Implant ◽

Noise Reduction ◽

Speech Intelligibility ◽

Middle Layer ◽

Successful Implementation ◽

Perceptual Evaluation ◽

Reduction System ◽

Listening Tests ◽

Implementation Costs

BACKGROUND The cochlear implant technology is a well-known approach to help deaf patients hear speech again. It can improve speech intelligibility in quiet conditions; however, it still has room for improvement in noisy conditions. More recently, it has been proven that deep learning–based noise reduction (NR), such as noise classification and deep denoising autoencoder (NC+DDAE), can benefit the intelligibility performance of patients with cochlear implants compared to classical noise reduction algorithms. OBJECTIVE Following the successful implementation of the NC+DDAE model in our previous study, this study aimed to (1) propose an advanced noise reduction system using knowledge transfer technology, called NC+DDAE_T, (2) examine the proposed NC+DDAE_T noise reduction system using objective evaluations and subjective listening tests, and (3) investigate which layer substitution of the knowledge transfer technology in the NC+DDAE_T noise reduction system provides the best outcome. METHODS The knowledge transfer technology was adopted to reduce the number of parameters of the NC+DDAE_T compared with the NC+DDAE. We investigated which layer should be substituted using short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores, as well as t-distributed stochastic neighbor embedding to visualize the features in each model layer. Moreover, we enrolled ten cochlear implant users for listening tests to evaluate the benefits of the newly developed NC+DDAE_T. RESULTS The experimental results showed that substituting the middle layer (ie, the second layer in this study) of the noise-independent DDAE (NI-DDAE) model achieved the best performance gain regarding STOI and PESQ scores. Therefore, the parameters of layer three in the NI-DDAE were chosen to be replaced, thereby establishing the NC+DDAE_T. Both objective and listening test results showed that the proposed NC+DDAE_T noise reduction system achieved similar performances compared with the previous NC+DDAE in several noisy test conditions. However, the proposed NC+DDAE_T only needs a quarter of the number of parameters compared to the NC+DDAE. CONCLUSIONS This study demonstrated that knowledge transfer technology can help to reduce the number of parameters in an NC+DDAE while keeping similar performance rates. This suggests that the proposed NC+DDAE_T model may reduce the implementation costs of this noise reduction system and provide more benefits for cochlear implant users.

Download Full-text

Logatome Discrimination in Cochlear Implant Users: Subjective Tests Compared to the Mismatch Negativity

The Scientific World JOURNAL ◽

10.1100/tsw.2010.28 ◽

2010 ◽

Vol 10 ◽

pp. 329-339 ◽

Cited By ~ 7

Author(s):

Torsten Rahne ◽

Michael Ziese ◽

Dorothea Rostalski ◽

Roland Mühler

Keyword(s):

Speech Perception ◽

Cochlear Implant ◽

Mismatch Negativity ◽

Speech Intelligibility ◽

Event Related Potentials ◽

Normal Hearing ◽

Discrimination Test ◽

Speech Database ◽

Automated Speech Recognition ◽

Related Potentials

This paper describes a logatome discrimination test for the assessment of speech perception in cochlear implant users (CI users), based on a multilingual speech database, the Oldenburg Logatome Corpus, which was originally recorded for the comparison of human and automated speech recognition. The logatome discrimination task is based on the presentation of 100 logatome pairs (i.e., nonsense syllables) with balanced representations of alternating “vowel-replacement” and “consonant-replacement” paradigms in order to assess phoneme confusions. Thirteen adult normal hearing listeners and eight adult CI users, including both good and poor performers, were included in the study and completed the test after their speech intelligibility abilities were evaluated with an established sentence test in noise. Furthermore, the discrimination abilities were measured electrophysiologically by recording the mismatch negativity (MMN) as a component of auditory event-related potentials. The results show a clear MMN response only for normal hearing listeners and CI users with good performance, correlating with their logatome discrimination abilities. Higher discrimination scores for vowel-replacement paradigms than for the consonant-replacement paradigms were found. We conclude that the logatome discrimination test is well suited to monitor the speech perception skills of CI users. Due to the large number of available spoken logatome items, the Oldenburg Logatome Corpus appears to provide a useful and powerful basis for further development of speech perception tests for CI users.

Download Full-text

Perceptual weighting deep neural networks for single-channel speech enhancement

2016 12th World Congress on Intelligent Control and Automation (WCICA) ◽

10.1109/wcica.2016.7578300 ◽

2016 ◽

Cited By ~ 2

Author(s):

Wei Han ◽

Xiongwei Zhang ◽

Gang Min ◽

Xingyu Zhou ◽

Wei Zhang

Keyword(s):

Neural Networks ◽

Speech Enhancement ◽

Deep Neural Networks ◽

Single Channel ◽

Perceptual Weighting

Download Full-text

Prediction of NMF-based Wiener Filter for Speech Enhancement Using Deep Neural Networks

2020 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) ◽

10.1109/icspcc50002.2020.9259477 ◽

2020 ◽

Author(s):

Zhigang Bai ◽

Changchun Bao ◽

Zihao Cui

Keyword(s):

Neural Networks ◽

Speech Enhancement ◽

Deep Neural Networks ◽

Wiener Filter

Download Full-text

Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users

Speech intelligibility enhancement for Thai-speaking cochlear implant listeners

Single channel speech enhancement based on harmonic estimation combined with statistical based method to improve speech intelligibility for cochlear implant recipients

Speech Enhancement Based on Harmonic Estimation Combined with MMSE to Improve Speech Intelligibility for Cochlear Implant Recipients

Speech enhancement based on neural networks applied to cochlear implant coding strategies

Learning time-frequency mask for noisy speech enhancement using gaussian-bernoulli pre-trained deep neural networks

Reading and speech intelligibility of a child with auditory impairment and cochlear implant.

Improved Environment-Aware–Based Noise Reduction System for Cochlear Implant Users Based on Knowledge Transfer Approach (Preprint)

Logatome Discrimination in Cochlear Implant Users: Subjective Tests Compared to the Mismatch Negativity

Perceptual weighting deep neural networks for single-channel speech enhancement

Prediction of NMF-based Wiener Filter for Speech Enhancement Using Deep Neural Networks

Export Citation Format