scholarly journals Intelligibility Assessment of Ideal Binary-Masked Noisy Speech with Acceptance of Room Acoustic

2015 ◽  
Vol 65 (6) ◽  
pp. 325-332
Author(s):  
Sedlak Vladimír ◽  
Durackova Daniela ◽  
Zalusky Roman ◽  
Kovacik Tomas

Abstract In this paper the intelligibility of ideal binary-masked noisy signal is evaluated for different signal to noise ratio (SNR), mask error, masker types, distance between source and receiver, reverberation time and local criteria for forming the binary mask. The ideal binary mask is computed from time-frequency decompositions of target and masker signals by thresholding the local SNR within time-frequency units. The intelligibility of separated signal is measured using different objective measures computed in frequency and perceptual domain. The present study replicates and extends the findings which were already presented but mainly shows impact of room acoustic on the intelligibility performance of IBM technique.

2020 ◽  
Vol 39 (5) ◽  
pp. 6881-6889
Author(s):  
Jie Wang ◽  
Linhuang Yan ◽  
Jiayi Tian ◽  
Minmin Yuan

In this paper, a bilateral spectrogram filtering (BSF)-based optimally modified log-spectral amplitude (OMLSA) estimator for single-channel speech enhancement is proposed, which can significantly improve the performance of OMLSA, especially in highly non-stationary noise environments, by taking advantage of bilateral filtering (BF), a widely used technology in image and visual processing, to preprocess the spectrogram of the noisy speech. BSF is capable of not only sharpening details, removing unwanted textures or background noise from the noisy speech spectrogram, but also preserving edges when considering a speech spectrogram as an image. The a posteriori signal-to-noise ratio (SNR) of OMLSA algorithm is estimated after applying BSF to the noisy speech. Besides, in order to reduce computing costs, a fast and accurate BF is adopted to reduce the algorithm complexity O(1) for each time-frequency bin. Finally, the proposed algorithm is compared with the original OMLSA and other classic denoising methods using various types of noise with different signal-to-noise ratios in terms of objective evaluation metrics such as segmental signal-to-noise ratio improvement and perceptual evaluation of speech quality. The results show the validity of the improved BSF-based OMLSA algorithm.


Author(s):  
Kwun-Lon Ting ◽  
Yufeng Long

Abstract By employing Taguchi’s concept to mechanism synthesis, this paper presents the theory and technique to identify a robust design, which is the least sensitive to the tolerances, for mechanisms and to determine the tolerance specification for the best performance and manufacturability. The method is demonstrated in finite and infinitesimal position synthesis. The sensitivity Jacobian is first introduced to relate the performance tolerances and the dimensional tolerances. The Rayleigh quotient of the sensitivity Jacobian, which is equivalent to Taguchi’s signal to noise ratio, is then used to define the performance quality and a sensitivity index is introduced to measure the sensitivity of the performance quality to the dimensional tolerances for the whole system. The ideal tolerance specification is obtained in closed form. It shows how the tolerance specification affects the performance quality and that the performance quality can be significantly improved by tightening a key tolerance while loosening the others. The theory is general and the technique is readily adaptable to almost any form and type of mechanical system, including multiple-loop linkages and mechanical assemblies or even structures.


2011 ◽  
Vol 243-249 ◽  
pp. 5085-5088
Author(s):  
Lin Feng Wang ◽  
Hong Mei Tang ◽  
Hong Kai Chen

Shed-tunnel is one of common prevention measures along the highway. Through the wavelet theory we denoised the rockfall impact signal when the rock impact the ordinary shed-tunnel and the energy dissipation shed-tunnel. And then we evaluated the wavelet theory’s denoise effect by the signal-to-noise ratio. The calculation result indicated that the denoise effect is very good. At last, through the autocorrelation analysis and time-frequency analysis for the rockfall impact signal, it was found that the ordinary shed-tunnel’s impact signals didn’t have obvious frequency and the frequency contained many component,but the energy dissipation shed-tunnel’s impact signals had obvious frequency. So the energy dissipation shed-tunnel’s impact signals had a relatively fixed cycle and frequency. The received frequency of rockfall impact by the time-frequency analysis could provide the basis for the design of energy dissipation shed-tunnel’s natural frequency.


Author(s):  
Judith Justin ◽  
Vanithamani R.

In this chapter, a speech enhancement technique is implemented using a neuro-fuzzy classifier. Noisy speech sentences from NOIZEUS and AURORA databases are taken for the study. Feature extraction is implemented through modifications in amplitude magnitude spectrograms. A four class neuro-fuzzy classifier splits the noisy speech samples into noise-only part, signal only part, more noise-less signal part, and more signal-less noise part of the time-frequency units. Appropriate weights are applied in the enhancement phase. The enhanced speech sentence is evaluated using objective measures. An analysis of the performance of the Neuro-Fuzzy 4 (NF 4) classifier is done. A comparison of the performance of the classifier with other conventional techniques is done for various noises at different noise levels. It is observed that the numerical values of the measures obtained are better when compared to the others. An overall comparison of the performance of the NF 4 classifier is done and it is inferred that NF4 outperforms the other techniques in speech enhancement.


2019 ◽  
Vol 23 ◽  
pp. 233121651985459 ◽  
Author(s):  
Jan Rennies ◽  
Virginia Best ◽  
Elin Roverud ◽  
Gerald Kidd

Speech perception in complex sound fields can greatly benefit from different unmasking cues to segregate the target from interfering voices. This study investigated the role of three unmasking cues (spatial separation, gender differences, and masker time reversal) on speech intelligibility and perceived listening effort in normal-hearing listeners. Speech intelligibility and categorically scaled listening effort were measured for a female target talker masked by two competing talkers with no unmasking cues or one to three unmasking cues. In addition to natural stimuli, all measurements were also conducted with glimpsed speech—which was created by removing the time–frequency tiles of the speech mixture in which the maskers dominated the mixture—to estimate the relative amounts of informational and energetic masking as well as the effort associated with source segregation. The results showed that all unmasking cues as well as glimpsing improved intelligibility and reduced listening effort and that providing more than one cue was beneficial in overcoming informational masking. The reduction in listening effort due to glimpsing corresponded to increases in signal-to-noise ratio of 8 to 18 dB, indicating that a significant amount of listening effort was devoted to segregating the target from the maskers. Furthermore, the benefit in listening effort for all unmasking cues extended well into the range of positive signal-to-noise ratios at which speech intelligibility was at ceiling, suggesting that listening effort is a useful tool for evaluating speech-on-speech masking conditions at typical conversational levels.


2018 ◽  
Vol 143 (3) ◽  
pp. 1751-1751 ◽  
Author(s):  
Frederic Apoux ◽  
Brittney Carter ◽  
Karl P. Velik ◽  
Eric Healy

Sign in / Sign up

Export Citation Format

Share Document