scholarly journals An End-to-End Deep Learning Sound Coding Strategy for Cochlear Implants

2021 ◽  
Author(s):  
Tom Gajecki ◽  
Waldo Nogueira

Cochlear implant (CI) users struggle to understand speech in noisy conditions. In this work, we propose an end-to-end speech coding and denoising sound coding strategy that estimates the electrodograms from the raw audio captured by the microphone. We compared this approach to a classic Wiener filter and TasNet to assess its potential benefits in the context of electric hearing. The performance of the network is assessed by means of noise reduction performance (signal-to-noise-ratio improvement) and objective speech intelligibility measures. Furthermore, speech intelligibility was measured in 5 CI users to assess the potential benefits of each of the investigated algorithms. Results suggest that the speech performance of the tested group seemed to be equally good using our method compared to the front-end speech enhancement algorithm.

2019 ◽  
Vol 8 (3) ◽  
pp. 3509-3516

The primary aim of this paper is to examine the application of binary mask to improve intelligibility in most unfavorable conditions where hearing impaired/normal listeners find it difficult to understand what is being told. Most of the existing noise reduction algorithms are known to improve the speech quality but they hardly improve speech intelligibility. The paper proposed by Gibak Kim and Philipos C. Loizou uses the Weiner gain function for improving speech intelligibility. Here, in this paper we have proposed to apply the same approach in magnitude spectrum using the parametric wiener filter in order to study its effects on overall speech intelligibility. Subjective and objective tests were conducted to evaluate the performance of the enhanced speech for various types of noises. The results clearly indicate that there is an improvement in average segmental signal-to-noise ratio for the speech corrupted at -5dB, 0dB, 5dB and 10dB SNR values for random noise, babble noise, car noise and helicopter noise. This technique can be used in real time applications, such as mobile, hearing aids and speech–activated machines


2013 ◽  
Vol 385-386 ◽  
pp. 1381-1384
Author(s):  
Yi Jiang ◽  
Hong Zhou ◽  
Yuan Yuan Zu ◽  
Xiao Chen

Speech segregation based on energy has a good performance on dual-microphone electronic speech signal processing. The implication of the binary mask to an auditory mixture has been shown to yield substantial improvements in signal-to-noise-ratio (SNR) and intelligibility. To evaluate the performance of a binary mask based dual microphone speech enhancement algorithm, various spatial noise sources and reverberation test conditions are used. Two compare dual microphone systems based on energy difference and machine learning are used at the same time. Result with SNR and speech intelligibility show that more robust performance can be achieved than the two compare systems.


PLoS ONE ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. e0244433
Author(s):  
Eugen Kludt ◽  
Waldo Nogueira ◽  
Thomas Lenarz ◽  
Andreas Buechner

Auditory masking occurs when one sound is perceptually altered by the presence of another sound. Auditory masking in the frequency domain is known as simultaneous masking and in the time domain is known as temporal masking or non-simultaneous masking. This works presents a sound coding strategy that incorporates a temporal masking model to select the most relevant channels for stimulation in a cochlear implant (CI). A previous version of the strategy, termed psychoacoustic advanced combination encoder (PACE), only used a simultaneous masking model for the same purpose, for this reason the new strategy has been termed temporal-PACE (TPACE). We hypothesized that a sound coding strategy that focuses on stimulating the auditory nerve with pulses that are as masked as possible can improve speech intelligibility for CI users. The temporal masking model used within TPACE attenuates the simultaneous masking thresholds estimated by PACE over time. The attenuation is designed to fall exponentially with a strength determined by a single parameter, the temporal masking half-life T½. This parameter gives the time interval at which the simultaneous masking threshold is halved. The study group consisted of 24 postlingually deaf subjects with a minimum of six months experience after CI activation. A crossover design was used to compare four variants of the new temporal masking strategy TPACE (T½ ranging between 0.4 and 1.1 ms) with respect to the clinical MP3000 strategy, a commercial implementation of the PACE strategy, in two prospective, within-subject, repeated-measure experiments. The outcome measure was speech intelligibility in noise at 15 to 5 dB SNR. In two consecutive experiments, the TPACE with T½ of 0.5 ms obtained a speech performance increase of 11% and 10% with respect to the MP3000 (T½ = 0 ms), respectively. The improved speech test scores correlated with the clinical performance of the subjects: CI users with above-average outcome in their routine speech tests showed higher benefit with TPACE. It seems that the consideration of short-acting temporal masking can improve speech intelligibility in CI users. The half-live with the highest average speech perception benefit (0.5 ms) corresponds to time scales that are typical for neuronal refractory behavior.


2020 ◽  
Vol 8 (5) ◽  
pp. 5123-5131

Most of the existing noise reduction algorithms used in hearing aid applications apply a gain function in order to reduce the noise intervention. In the present paper, we study the effect of the two types of speech distortions introduced by the gain functions. If these distortions are properly controlled large gains in intelligibility can be obtained. The sentences were corrupted by various kinds of noises i.e. babble noise, car noise, helicopter noise and random noise and processed through a noise-reduction algorithm. Subjective tests were conducted with normal hearing listeners by presenting the processed speech with controlled distortions. The method proposed by Kim et al uses the wiener filter. Here in this paper, we have used the parametric wiener filter. The experimental results clearly indicated improvement in intelligibility at 0dB, -5dB, +5dB and 10dB input signal-to-noise (SNR) values in short-time objective intelligibility (STOI) and Segmental signal-to-noise ratio (SSNR) objective measures.


2015 ◽  
Vol 719-720 ◽  
pp. 767-772
Author(s):  
Wei Jun Cheng

In this paper, we present the end-to-end performance of a dual-hop amplify-and-forward variablegain relaying system over Mixture Gamma distribution. Novel closed-form expressions for the probability density function and the moment-generation function of the end-to-end Signal-to-noise ratio (SNR) are derived. Moreover, the average symbol error rate, the average SNR and the average capacity are found based on the above new expressions, respectively. These expressions are more simple and accuracy than the previous ones obtained by using generalized-K (KG) distribution. Finally, numerical and simulation results are shown to verify the accuracy of the analytical results.


2020 ◽  
Vol 24 ◽  
pp. 233121652097034
Author(s):  
Florian Langner ◽  
Andreas Büchner ◽  
Waldo Nogueira

Cochlear implant (CI) sound processing typically uses a front-end automatic gain control (AGC), reducing the acoustic dynamic range (DR) to control the output level and protect the signal processing against large amplitude changes. It can also introduce distortions into the signal and does not allow a direct mapping between acoustic input and electric output. For speech in noise, a reduction in DR can result in lower speech intelligibility due to compressed modulations of speech. This study proposes to implement a CI signal processing scheme consisting of a full acoustic DR with adaptive properties to improve the signal-to-noise ratio and overall speech intelligibility. Measurements based on the Short-Time Objective Intelligibility measure and an electrodogram analysis, as well as behavioral tests in up to 10 CI users, were used to compare performance with a single-channel, dual-loop, front-end AGC and with an adaptive back-end multiband dynamic compensation system (Voice Guard [VG]). Speech intelligibility in quiet and at a +10 dB signal-to-noise ratio was assessed with the Hochmair–Schulz–Moser sentence test. A logatome discrimination task with different consonants was performed in quiet. Speech intelligibility was significantly higher in quiet for VG than for AGC, but intelligibility was similar in noise. Participants obtained significantly better scores with VG than AGC in the logatome discrimination task. The objective measurements predicted significantly better performance estimates for VG. Overall, a dynamic compensation system can outperform a single-stage compression (AGC + linear compression) for speech perception in quiet.


Author(s):  
Feng Bao ◽  
Waleed H. Abdulla

In computational auditory scene analysis, the accurate estimation of binary mask or ratio mask plays a key role in noise masking. An inaccurate estimation often leads to some artifacts and temporal discontinuity in the synthesized speech. To overcome this problem, we propose a new ratio mask estimation method in terms of Wiener filtering in each Gammatone channel. In the reconstruction of Wiener filter, we utilize the relationship of the speech and noise power spectra in each Gammatone channel to build the objective function for the convex optimization of speech power. To improve the accuracy of estimation, the estimated ratio mask is further modified based on its adjacent time–frequency units, and then smoothed by interpolating with the estimated binary masks. The objective tests including the signal-to-noise ratio improvement, spectral distortion and intelligibility, and subjective listening test demonstrate the superiority of the proposed method compared with the reference methods.


2019 ◽  
Vol 62 (5) ◽  
pp. 1517-1531 ◽  
Author(s):  
Sungmin Lee ◽  
Lisa Lucks Mendel ◽  
Gavin M. Bidelman

Purpose Although the speech intelligibility index (SII) has been widely applied in the field of audiology and other related areas, application of this metric to cochlear implants (CIs) has yet to be investigated. In this study, SIIs for CI users were calculated to investigate whether the SII could be an effective tool for predicting speech perception performance in a population with CI. Method Fifteen pre- and postlingually deafened adults with CI participated. Speech recognition scores were measured using the AzBio sentence lists. CI users also completed questionnaires and performed psychoacoustic (spectral and temporal resolution) and cognitive function (digit span) tests. Obtained SIIs were compared with predicted SIIs using a transfer function curve. Correlation and regression analyses were conducted on perceptual and demographic predictor variables to investigate the association between these factors and speech perception performance. Result Because of the considerably poor hearing and large individual variability in performance, the SII did not predict speech performance for this CI group using the traditional calculation. However, new SII models were developed incorporating predictive factors, which improved the accuracy of SII predictions in listeners with CI. Conclusion Conventional SII models are not appropriate for predicting speech perception scores for CI users. Demographic variables (aided audibility and duration of deafness) and perceptual–cognitive skills (gap detection and auditory digit span outcomes) are needed to improve the use of the SII for listeners with CI. Future studies are needed to improve our CI-corrected SII model by considering additional predictive factors. Supplemental Material https://doi.org/10.23641/asha.8057003


Signals ◽  
2020 ◽  
Vol 1 (2) ◽  
pp. 138-156
Author(s):  
Raghad Yaseen Lazim ◽  
Zhu Yun ◽  
Xiaojun Wu

In hearing aid devices, speech enhancement techniques are a critical component to enable users with hearing loss to attain improved speech quality under noisy conditions. Recently, the deep denoising autoencoder (DDAE) was adopted successfully for recovering the desired speech from noisy observations. However, a single DDAE cannot extract contextual information sufficiently due to the poor generalization in an unknown signal-to-noise ratio (SNR), the local minima, and the fact that the enhanced output shows some residual noise and some level of discontinuity. In this paper, we propose a hybrid approach for hearing aid applications based on two stages: (1) the Wiener filter, which attenuates the noise component and generates a clean speech signal; (2) a composite of three DDAEs with different window lengths, each of which is specialized for a specific enhancement task. Two typical high-frequency hearing loss audiograms were used to test the performance of the approach: Audiogram 1 = (0, 0, 0, 60, 80, 90) and Audiogram 2 = (0, 15, 30, 60, 80, 85). The hearing-aid speech perception index, the hearing-aid speech quality index, and the perceptual evaluation of speech quality were used to evaluate the performance. The experimental results show that the proposed method achieved significantly better results compared with the Wiener filter or a single deep denoising autoencoder alone.


2020 ◽  
Vol 24 ◽  
pp. 233121652097563
Author(s):  
Christopher F. Hauth ◽  
Simon C. Berning ◽  
Birger Kollmeier ◽  
Thomas Brand

The equalization cancellation model is often used to predict the binaural masking level difference. Previously its application to speech in noise has required separate knowledge about the speech and noise signals to maximize the signal-to-noise ratio (SNR). Here, a novel, blind equalization cancellation model is introduced that can use the mixed signals. This approach does not require any assumptions about particular sound source directions. It uses different strategies for positive and negative SNRs, with the switching between the two steered by a blind decision stage utilizing modulation cues. The output of the model is a single-channel signal with enhanced SNR, which we analyzed using the speech intelligibility index to compare speech intelligibility predictions. In a first experiment, the model was tested on experimental data obtained in a scenario with spatially separated target and masker signals. Predicted speech recognition thresholds were in good agreement with measured speech recognition thresholds with a root mean square error less than 1 dB. A second experiment investigated signals at positive SNRs, which was achieved using time compressed and low-pass filtered speech. The results demonstrated that binaural unmasking of speech occurs at positive SNRs and that the modulation-based switching strategy can predict the experimental results.


Sign in / Sign up

Export Citation Format

Share Document