scholarly journals Speech recognition with a hearing-aid processing scheme combining beamforming with mask-informed speech enhancement

2022 ◽  
Vol 26 ◽  
pp. 233121652110686
Author(s):  
Tim Green ◽  
Gaston Hilkhuysen ◽  
Mark Huckvale ◽  
Stuart Rosen ◽  
Mike Brookes ◽  
...  

A signal processing approach combining beamforming with mask-informed speech enhancement was assessed by measuring sentence recognition in listeners with mild-to-moderate hearing impairment in adverse listening conditions that simulated the output of behind-the-ear hearing aids in a noisy classroom. Two types of beamforming were compared: binaural, with the two microphones of each aid treated as a single array, and bilateral, where independent left and right beamformers were derived. Binaural beamforming produces a narrower beam, maximising improvement in signal-to-noise ratio (SNR), but eliminates the spatial diversity that is preserved in bilateral beamforming. Each beamformer type was optimised for the true target position and implemented with and without additional speech enhancement in which spectral features extracted from the beamformer output were passed to a deep neural network trained to identify time-frequency regions dominated by target speech. Additional conditions comprising binaural beamforming combined with speech enhancement implemented using Wiener filtering or modulation-domain Kalman filtering were tested in normally-hearing (NH) listeners. Both beamformer types gave substantial improvements relative to no processing, with significantly greater benefit for binaural beamforming. Performance with additional mask-informed enhancement was poorer than with beamforming alone, for both beamformer types and both listener groups. In NH listeners the addition of mask-informed enhancement produced significantly poorer performance than both other forms of enhancement, neither of which differed from the beamformer alone. In summary, the additional improvement in SNR provided by binaural beamforming appeared to outweigh loss of spatial information, while speech understanding was not further improved by the mask-informed enhancement method implemented here.

2018 ◽  
Vol 143 (3) ◽  
pp. 1751-1751 ◽  
Author(s):  
Frederic Apoux ◽  
Brittney Carter ◽  
Karl P. Velik ◽  
Eric Healy

2013 ◽  
Vol 24 (10) ◽  
pp. 980-991 ◽  
Author(s):  
Kristi Oeding ◽  
Michael Valente

Background: In the past, bilateral contralateral routing of signals (BICROS) amplification incorporated omnidirectional microphones on the transmitter and receiver sides and some models utilized noise reduction (NR) on the receiver side. Little research has examined the performance of BICROS amplification in background noise. However, previous studies examining contralateral routing of signals (CROS) amplification have reported that the presence of background noise on the transmitter side negatively affected speech recognition. Recently, NR was introduced as a feature on the receiver and transmitter sides of BICROS amplification, which has the potential to decrease the impact of noise on the wanted speech signal by decreasing unwanted noise directed to the transmitter side. Purpose: The primary goal of this study was to examine differences in the reception threshold for sentences (RTS in dB) using the Hearing in Noise Test (HINT) in a diffuse listening environment between unaided and three aided BICROS conditions (no NR, mild NR, and maximum NR) in the Tandem 16 BICROS. A secondary goal was to examine real-world subjective impressions of the Tandem 16 BICROS compared to unaided. Research Design: A randomized block repeated measures single blind design was used to assess differences between no NR, mild NR, and maximum NR listening conditions. Study Sample: Twenty-one adult participants with asymmetric sensorineural hearing loss (ASNHL) and experience with BICROS amplification were recruited from Washington University in St. Louis School of Medicine. Data Collection and Analysis: Participants were fit with the National Acoustic Laboratories’ Nonlinear version 1 prescriptive target (NAL-NL1) with the Tandem 16 BICROS at the initial visit and then verified using real-ear insertion gain (REIG) measures. Participants acclimatized to the Tandem 16 BICROS for 4 wk before returning for final testing. Participants were tested utilizing HINT sentences examining differences in RTS between unaided and three aided listening conditions. Subjective benefit was determined via the Abbreviated Profile of Hearing Aid Benefit (APHAB) questionnaire between the Tandem 16 BICROS and unaided. A repeated measures analysis of variance (ANOVA) was utilized to analyze the results of the HINT and APHAB. Results: Results revealed no significant differences in the RTS between unaided, no NR, mild NR, and maximum NR. Subjective impressions using the APHAB revealed statistically and clinically significant benefit with the Tandem 16 BICROS compared to unaided for the Ease of Communication (EC), Background Noise (BN), and Reverberation (RV) subscales. Conclusions: The RTS was not significantly different between unaided, no NR, mild NR, and maximum NR. None of the three aided listening conditions were significantly different from unaided performance as has been reported for previous studies examining CROS hearing aids. Further, based on comments from participants and previous research studies with conventional hearing aids, manufacturers of BICROS amplification should consider incorporating directional microphones and independent volume controls on the receiver and transmitter sides to potentially provide further improvement in signal-to-noise ratio (SNR) for patients with ASNHL.


2006 ◽  
Vol 120 (5) ◽  
pp. 3157-3157 ◽  
Author(s):  
Junfeng Li ◽  
Shuichi Sakamoto ◽  
Yo‐iti Suzuki ◽  
Satoshi Hongo

2020 ◽  
Vol 10 (7) ◽  
pp. 2218
Author(s):  
Tao Zhang ◽  
Yanzhang Geng ◽  
Jianhong Sun ◽  
Chen Jiao ◽  
Biyun Ding

This paper presents a unified speech enhancement system to remove both background noise and interfering speech in serious noise environments by jointly utilizing the parabolic reflector model and neural beamformer. First, the amplification property of paraboloid is discussed, which significantly improves the Signal-to-Noise Ratio (SNR) of a desired signal. Therefore, an appropriate paraboloid channel is analyzed and designed through the boundary element method. On the other hand, a time-frequency masking approach and a mask-based beamforming approach are discussed and incorporated in an enhancement system. It is worth noticing that signals provided by the paraboloid and the beamformer are exactly complementary. Finally, these signals are employed in a learning-based fusion framework to further improve the system performance in low SNR environments. Experiments demonstrate that our system is effective and robust in five different noisy conditions (speech interfered with factory, pink, destroyer engine, volvo, and babble noise), as well as in different noise levels. Compared with the original noisy speech, significant average objective metrics improvements are about Δ STOI = 0.28, Δ PESQ = 1.31, Δ fwSegSNR = 11.9.


2020 ◽  
Vol 39 (5) ◽  
pp. 6881-6889
Author(s):  
Jie Wang ◽  
Linhuang Yan ◽  
Jiayi Tian ◽  
Minmin Yuan

In this paper, a bilateral spectrogram filtering (BSF)-based optimally modified log-spectral amplitude (OMLSA) estimator for single-channel speech enhancement is proposed, which can significantly improve the performance of OMLSA, especially in highly non-stationary noise environments, by taking advantage of bilateral filtering (BF), a widely used technology in image and visual processing, to preprocess the spectrogram of the noisy speech. BSF is capable of not only sharpening details, removing unwanted textures or background noise from the noisy speech spectrogram, but also preserving edges when considering a speech spectrogram as an image. The a posteriori signal-to-noise ratio (SNR) of OMLSA algorithm is estimated after applying BSF to the noisy speech. Besides, in order to reduce computing costs, a fast and accurate BF is adopted to reduce the algorithm complexity O(1) for each time-frequency bin. Finally, the proposed algorithm is compared with the original OMLSA and other classic denoising methods using various types of noise with different signal-to-noise ratios in terms of objective evaluation metrics such as segmental signal-to-noise ratio improvement and perceptual evaluation of speech quality. The results show the validity of the improved BSF-based OMLSA algorithm.


2020 ◽  
Vol 14 (5) ◽  
pp. 951-960
Author(s):  
Zhuoyi Sun ◽  
Yingdan Li ◽  
Hanjun Jiang ◽  
Fei Chen ◽  
Xiang Xie ◽  
...  

2018 ◽  
Vol 8 (9) ◽  
pp. 1436 ◽  
Author(s):  
Yuexian Zou ◽  
Zhaoyi Liu ◽  
Christian Ritz

Enhancing speech captured by distant microphones is a challenging task. In this study, we investigate the multichannel signal properties of the single acoustic vector sensor (AVS) to obtain the inter-sensor data ratio (ISDR) model in the time-frequency (TF) domain. Then, the monotone functions describing the relationship between the ISDRs and the direction of arrival (DOA) of the target speaker are derived. For the target speech enhancement (SE) task, the DOA of the target speaker is given, and the ISDRs are calculated. Hence, the TF components dominated by the target speech are extracted with high probability using the established monotone functions, and then, a nonlinear soft mask of the target speech is generated. As a result, a masking-based speech enhancement method is developed, which is termed the AVS-SMASK method. Extensive experiments with simulated data and recorded data have been carried out to validate the effectiveness of our proposed AVS-SMASK method in terms of suppressing spatial speech interferences and reducing the adverse impact of the additive background noise while maintaining less speech distortion. Moreover, our AVS-SMASK method is computationally inexpensive, and the AVS is of a small physical size. These merits are favorable to many applications, such as robot auditory systems.


2020 ◽  
Vol 16 (2) ◽  
pp. 140-146
Author(s):  
Gwang Min Kim ◽  
Jae Hee Lee

Purpose: Although hearing-impaired (HI) listeners often have difficulty understanding in noise as their primary complaints, the speech-in-noise intelligibility test is not conducted as a standard audiologic test battery. This study investigated whether the speech audiometry in quiet accurately reflects the sentence-in-noise intelligibility of HI listeners. Methods: Sixty-two HI listeners participated. All the HI listeners had symmetrical high-frequency hearing loss and bilaterally worn hearing aids. Twenty-five normal-hearing (NH) listeners also participated as a control group. The unaided word and sentence recognition scores (WRS and SRS) were obtained in quiet at individually determined most comfortable loudness level. With bilateral hearing aids, the aided WRS and SRS were evaluated at a normal conversational level. The software-based Korean Matrix sentence in noise test was administered at a fixed level (65 dB SPL) of noise while adjusting the sentence level adaptively based on the listener’s response. The signal-to-noise ratio (SNR) required to achieve 50% intelligibility (speech recognition thresholds, SRTs) was obtained. Results: On average, the aided SRT of HI listeners was 0.1 dB SNR, and the mean SRT of NH adults was -8.91 dB SNR. The Matrix sentence-in-noise intelligibility was not sufficiently explained by the unaided WRS or unaided SRS. Conclusion: A traditional measure of the unaided speech-in-quiet recognition cannot accurately predict the aided speech-innoise intelligibility. Clinically, a software-based sentence-in-noise intelligibility test is recommended to directly confirm the actual benefits of hearing aid in noisy situations.


Sensors ◽  
2020 ◽  
Vol 20 (20) ◽  
pp. 5751
Author(s):  
Seon Man Kim

This paper proposes a novel technique to improve a spectral statistical filter for speech enhancement, to be applied in wearable hearing devices such as hearing aids. The proposed method is implemented considering a 32-channel uniform polyphase discrete Fourier transform filter bank, for which the overall algorithm processing delay is 8 ms in accordance with the hearing device requirements. The proposed speech enhancement technique, which exploits the concepts of both non-negative sparse coding (NNSC) and spectral statistical filtering, provides an online unified framework to overcome the problem of residual noise in spectral statistical filters under noisy environments. First, the spectral gain attenuator of the statistical Wiener filter is obtained using the a priori signal-to-noise ratio (SNR) estimated through a decision-directed approach. Next, the spectrum estimated using the Wiener spectral gain attenuator is decomposed by applying the NNSC technique to the target speech and residual noise components. These components are used to develop an NNSC-based Wiener spectral gain attenuator to achieve enhanced speech. The performance of the proposed NNSC–Wiener filter was evaluated through a perceptual evaluation of the speech quality scores under various noise conditions with SNRs ranging from -5 to 20 dB. The results indicated that the proposed NNSC–Wiener filter can outperform the conventional Wiener filter and NNSC-based speech enhancement methods at all SNRs.


2020 ◽  
Vol 63 (11) ◽  
pp. 3855-3864
Author(s):  
Wanting Huang ◽  
Lena L. N. Wong ◽  
Fei Chen ◽  
Haihong Liu ◽  
Wei Liang

Purpose Fundamental frequency (F0) is the primary acoustic cue for lexical tone perception in tonal languages but is processed in a limited way in cochlear implant (CI) systems. The aim of this study was to evaluate the importance of F0 contours in sentence recognition in Mandarin-speaking children with CIs and find out whether it is similar to/different from that in age-matched normal-hearing (NH) peers. Method Age-appropriate sentences, with F0 contours manipulated to be either natural or flattened, were randomly presented to preschool children with CIs and their age-matched peers with NH under three test conditions: in quiet, in white noise, and with competing sentences at 0 dB signal-to-noise ratio. Results The neutralization of F0 contours resulted in a significant reduction in sentence recognition. While this was seen only in noise conditions among NH children, it was observed throughout all test conditions among children with CIs. Moreover, the F0 contour-induced accuracy reduction ratios (i.e., the reduction in sentence recognition resulting from the neutralization of F0 contours compared to the normal F0 condition) were significantly greater in children with CIs than in NH children in all test conditions. Conclusions F0 contours play a major role in sentence recognition in both quiet and noise among pediatric implantees, and the contribution of the F0 contour is even more salient than that in age-matched NH children. These results also suggest that there may be differences between children with CIs and NH children in how F0 contours are processed.


Sign in / Sign up

Export Citation Format

Share Document