Babble and Random-Noise Masking of Speech in High and Low Context Cue Conditions

1988 ◽  
Vol 31 (1) ◽  
pp. 108-114 ◽  
Author(s):  
H. Donell Lewis ◽  
Vernon A. Benignus ◽  
Keith E. Muller ◽  
Carolin M. Malott ◽  
Curtis N. Barton

"Perceptual" masking of speech by multitalker speech (babble) has been widely reported but poorly quantified. Furthermore, the validity of the construct of perceptual masking is questionable. This report describes an experiment using a newly standardized test of speech perception in noise (SPIN) with both babble and spectrally matched random-noise maskers. Classical psychophysieal ogive curves were used to model speech recognition as a function of signal-to-noise ratio (S/N). The two maskers yielded speech recognition functions of the same steepness but different locations on the S/N axis. The high-context items of SPIN yielded speech recognition curves with steeper slope and different locations on the S/N axis than the low-context items. These data are used to argue that perceptual masking was not documented (under certain assumptions) and that the superior masking of babble may be explained in purely acoustical terms. Speculations are offered about the possible acoustical differences that could be responsible for the differences in masking effect.

2017 ◽  
Vol 56 (8) ◽  
pp. 568-579 ◽  
Author(s):  
Christi W. Miller ◽  
Ruth A. Bentler ◽  
Yu-Hsiang Wu ◽  
James Lewis ◽  
Kelly Tremblay

Author(s):  
Yones Lotfi ◽  
Jamileh Chupani ◽  
Mohanna Javanbakht ◽  
Enayatollah Bakhshi

Background and Aim: In most everyday sett­ings, speech is heard in the presence of com­peting sounds and speech perception in noise is affected by various factors, including cognitive factors. In this regard, bilingualism is a pheno­menon that changes cognitive and behavioral processes as well as the nervous system. This study aimed to evaluate speech perception in noise and compare differences in Kurd-Persian bilinguals versus Persian monolinguals. Methods: This descriptive-analytic study was performed on 92 students with normal hearing, 46 of whom were bilingual Kurd-Persian with a mean (SD) age of 22.73 (1.92) years, and 46 other Persian monolinguals with a mean (SD) age of 22.71 (2.28) years. They were examined by consonant-vowel in noise (CV in noise) test and quick speech in noise (Q-SIN) test. The obtained data were analyzed by SPSS 21. Results: The comparison of the results showed differences in both tests between bilingual and monolingual subjects. In both groups, the reduc­tion of signal-to-noise ratio led to lower scores, but decrease in CV in noise test in bilinguals was less than monolinguals (p < 0.001) and in the Q-SIN test, the drop in bilinguals’ score was  more than monolinguals (p = 0.002). Conclusion: Kurd-Persian bilinguals had a bet­ter performance in CV in noise test but had a worse performance in Q-SIN test than Persian monolinguals.


2012 ◽  
Vol 23 (08) ◽  
pp. 590-605 ◽  
Author(s):  
Richard H. Wilson ◽  
Rachel McArdle ◽  
Kelly L. Watts ◽  
Sherri L. Smith

Background: The Revised Speech Perception in Noise Test (R-SPIN; Bilger, 1984b) is composed of 200 target words distributed as the last words in 200 low-predictability (LP) and 200 high-predictability (HP) sentences. Four list pairs, each consisting of two 50-sentence lists, were constructed with the target word in a LP and HP sentence. Traditionally the R-SPIN is presented at a signal-to-noise ratio (SNR, S/N) of 8 dB with the listener task to repeat the last word in the sentence. Purpose: The purpose was to determine the practicality of altering the R-SPIN format from a single SNR paradigm into a multiple SNR paradigm from which the 50% points for the HP and LP sentences can be calculated. Research Design: Three repeated measures experiments were conducted. Study Sample: Forty listeners with normal hearing and 184 older listeners with pure-tone hearing loss participated in the sequence of experiments. Data Collection and Analysis: The R-SPIN sentences were edited digitally (1) to maintain the temporal relation between the sentences and babble, (2) to establish the SNRs, and (3) to mix the speech and noise signals to obtain SNRs between –1 and 23 dB. All materials were recorded on CD and were presented through an earphone with the responses recorded and analyzed at the token level. For reference purposes the Words-in-Noise Test (WIN) was included in the first experiment. Results: In Experiment 1, recognition performances by listeners with normal hearing were better than performances by listeners with hearing loss. For both groups, performances on the HP materials were better than performances on the LP materials. Performances on the LP materials and on the WIN were similar. Performances at 8 dB S/N were the same with the traditional fixed level presentation and the descending presentation level paradigms. The results from Experiment 2 demonstrated that the four list pairs of R-SPIN materials produced good first approximation psychometric functions over the –4 to 23 dB S/N range, but there were irregularities. The data from Experiment 2 were used in Experiment 3 to guide the selection of the words to be used at the various SNRs that would provide homogeneous performances at each SNR and would produce systematic psychometric functions. In Experiment 3, the 50% points were in good agreement for the LP and HP conditions within both groups of listeners. The psychometric functions for List Pairs 1 and 2, 3 and 4, and 5 and 6 had similar characteristics and maintained reasonable separations between the HP and LP functions, whereas the HP and LP functions for List Pair 7 and 8 bisected one another at the lower SNRs. Conclusions: This study indicates that the R-SPIN can be configured into a multiple SNR paradigm. A more in-depth study with the R-SPIN materials is needed to develop lists that are systematic and reasonably equivalent for use on listeners with hearing loss. The approach should be based on the psychometric characteristics of the 200 HP and 200 LP sentences with the current R-SPIN lists discarded. Of importance is maintaining the synchrony between the sentences and their accompanying babble.


2008 ◽  
Vol 18 (1) ◽  
pp. 19-24
Author(s):  
Erin C. Schafer

Children who use cochlear implants experience significant difficulty hearing speech in the presence of background noise, such as in the classroom. To address these difficulties, audiologists often recommend frequency-modulated (FM) systems for children with cochlear implants. The purpose of this article is to examine current empirical research in the area of FM systems and cochlear implants. Discussion topics will include selecting the optimal type of FM receiver, benefits of binaural FM-system input, importance of DAI receiver-gain settings, and effects of speech-processor programming on speech recognition. FM systems significantly improve the signal-to-noise ratio at the child's ear through the use of three types of FM receivers: mounted speakers, desktop speakers, or direct-audio input (DAI). This discussion will aid audiologists in making evidence-based recommendations for children using cochlear implants and FM systems.


2020 ◽  
Author(s):  
chaofeng lan ◽  
yuanyuan Zhang ◽  
hongyun Zhao

Abstract This paper draws on the training method of Recurrent Neural Network (RNN), By increasing the number of hidden layers of RNN and changing the layer activation function from traditional Sigmoid to Leaky ReLU on the input layer, the first group and the last set of data are zero-padded to enhance the effective utilization of data such that the improved reduction model of Denoise Recurrent Neural Network (DRNN) with high calculation speed and good convergence is constructed to solve the problem of low speaker recognition rate in noisy environment. According to this model, the random semantic speech signal with a sampling rate of 16 kHz and a duration of 5 seconds in the speech library is studied. The experimental settings of the signal-to-noise ratios are − 10dB, -5dB, 0dB, 5dB, 10dB, 15dB, 20dB, 25dB. In the noisy environment, the improved model is used to denoise the Mel Frequency Cepstral Coefficients (MFCC) and the Gammatone Frequency Cepstral Coefficents (GFCC), impact of the traditional model and the improved model on the speech recognition rate is analyzed. The research shows that the improved model can effectively eliminate the noise of the feature parameters and improve the speech recognition rate. When the signal-to-noise ratio is low, the speaker recognition rate can be more obvious. Furthermore, when the signal-to-noise ratio is 0dB, the speaker recognition rate of people is increased by 40%, which can be 85% improved compared with the traditional speech model. On the other hand, with the increase in the signal-to-noise ratio, the recognition rate is gradually increased. When the signal-to-noise ratio is 15dB, the recognition rate of speakers is 93%.


Author(s):  
Alka Gautam ◽  
Hoon-Jae Lee ◽  
Wan-Young Chung

In this study, a new algorithm is proposed—Asynchronous Averaging and Filtering (AAF) for ECG signal de-noising. R-peaks are detected with another proposed algorithm—Minimum Slot and Maximum Point selecting method (MSMP). AAF algorithm reduces random noise (major component of EMG noise) from ECG signal and provides comparatively good results for baseline wander noise cancellation. Signal to noise ratio (SNR) improves in filtered ECG signal, while signal shape remains undistorted. The authors conclude that R-peak detection with MSMP method gives comparable results from existing algorithm like Pan-Tomkins algorithm. AAF algorithm is advantageous over adaptation algorithms like Wiener and LMS algorithm. Overall performance of proposed algorithms is comparatively good.


Geophysics ◽  
2013 ◽  
Vol 78 (6) ◽  
pp. V229-V237 ◽  
Author(s):  
Hongbo Lin ◽  
Yue Li ◽  
Baojun Yang ◽  
Haitao Ma

Time-frequency peak filtering (TFPF) may efficiently suppress random noise and hence improve the signal-to-noise ratio. However, the errors are not always satisfactory when applying the TFPF to fast-varying seismic signals. We begin with an error analysis for the TFPF by using the spread factor of the phase and cumulants of noise. This analysis shows that the nonlinear signal component and non-Gaussian random noise lead to the deviation of the pseudo-Wigner-Ville distribution (PWVD) peaks from the instantaneous frequency. The deviation introduces the signal distortion and random oscillations in the result of the TFPF. We propose a weighted reassigned smoothed PWVD with less deviation than PWVD. The proposed method adopts a frequency window to smooth away the residual oscillations in the PWVD, and incorporates a weight function in the reassignment which sharpens the time-frequency distribution for reducing the deviation. Because the weight function is determined by the lateral coherence of seismic data, the smoothed PWVD is assigned to the accurate instantaneous frequency for desired signal components by weighted frequency reassignment. As a result, the TFPF based on the weighted reassigned PWVD (TFPF_WR) can be more effective in suppressing random noise and preserving signal as compared with the TFPF using the PWVD. We test the proposed method on synthetic and field seismic data, and compare it with a wavelet-transform method and [Formula: see text] prediction filter. The results show that the proposed method provides better performance over the other methods in signal preserving under low signal-to-noise ratio.


Perception ◽  
1985 ◽  
Vol 14 (2) ◽  
pp. 209-224 ◽  
Author(s):  
Andrea J van Doorn ◽  
Jan J Koenderink ◽  
Wim A van de Grind

The detection of spatiotemporal correlation in visual displays has been studied with stroboscopically presented random-noise patterns and with a signal-to-noise ratio paradigm in which the moving pattern was masked with spatiotemporal white noise. These methods reveal the ability of the visual system to detect correlation of spatiotemporal structures, rather than luminance contrast. The effects of stroboscopic rate, exposure duration, target size, and the extent of discrete spatial shifts were studied in both the central and the peripheral visual field. Evidence for orientation-selective and speed-selective mechanisms was found, as well as for extensive spatiotemporal integration. Bounds on parameters of spatial and temporal correlation and integration were obtained. The results are similar to those reported earlier, and also extend them. Their relation to results obtained through other paradigms (eg the motion aftereffect) is explored.


2015 ◽  
Vol 26 (06) ◽  
pp. 572-581 ◽  
Author(s):  
Stanley Sheft ◽  
Min-Yu Cheng ◽  
Valeriy Shafiro

Background: Past work has shown that low-rate frequency modulation (FM) may help preserve signal coherence, aid segmentation at word and syllable boundaries, and benefit speech intelligibility in the presence of a masker. Purpose: This study evaluated whether difficulties in speech perception by cochlear implant (CI) users relate to a deficit in the ability to discriminate among stochastic low-rate patterns of FM. Research Design: This is a correlational study assessing the association between the ability to discriminate stochastic patterns of low-rate FM and the intelligibility of speech in noise. Study Sample: Thirteen postlingually deafened adult CI users participated in this study. Data Collection and Analysis: Using modulators derived from 5-Hz lowpass noise applied to a 1-kHz carrier, thresholds were measured in terms of frequency excursion both in quiet and with a speech-babble masker present, stimulus duration, and signal-to-noise ratio in the presence of a speech-babble masker. Speech perception ability was assessed in the presence of the same speech-babble masker. Relationships were evaluated with Pearson product–moment correlation analysis with correction for family-wise error, and commonality analysis to determine the unique and common contributions across psychoacoustic variables to the association with speech ability. Results: Significant correlations were obtained between masked speech intelligibility and three metrics of FM discrimination involving either signal-to-noise ratio or stimulus duration, with shared variance among the three measures accounting for much of the effect. Compared to past results from young normal-hearing adults and older adults with either normal hearing or a mild-to-moderate hearing loss, mean FM discrimination thresholds obtained from CI users were higher in all conditions. Conclusions: The ability to process the pattern of frequency excursions of stochastic FM may, in part, have a common basis with speech perception in noise. Discrimination of differences in the temporally distributed place coding of the stimulus could serve as this common basis for CI users.


Sign in / Sign up

Export Citation Format

Share Document