scholarly journals No interaction between fundamental-frequency differences and spectral region when perceiving speech in a speech background

PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0249654
Author(s):  
Sara M. K. Madsen ◽  
Torsten Dau ◽  
Andrew J. Oxenham

Differences in fundamental frequency (F0) or pitch between competing voices facilitate our ability to segregate a target voice from interferers, thereby enhancing speech intelligibility. Although lower-numbered harmonics elicit a stronger and more accurate pitch sensation than higher-numbered harmonics, it is unclear whether the stronger pitch leads to an increased benefit of pitch differences when segregating competing talkers. To answer this question, sentence recognition was tested in young normal-hearing listeners in the presence of a single competing talker. The stimuli were presented in a broadband condition or were highpass or lowpass filtered to manipulate the pitch accuracy of the voicing, while maintaining roughly equal speech intelligibility in the highpass and lowpass regions. Performance was measured with average F0 differences (ΔF0) between the target and single-talker masker of 0, 2, and 4 semitones. Pitch discrimination abilities were also measured to confirm that the lowpass-filtered stimuli elicited greater pitch accuracy than the highpass-filtered stimuli. No interaction was found between filter type and ΔF0 in the sentence recognition task, suggesting little or no effect of harmonic rank or pitch accuracy on the ability to use F0 to segregate natural voices, even when the average ΔF0 is relatively small. The results suggest that listeners are able to obtain some benefit of pitch differences between competing voices, even when pitch salience and accuracy is low. The accuracy with which we are able to discriminate the pitch of a harmonic complex tone depends on the F0 and the harmonic numbers present. For F0s in the average range of speech (100–200 Hz), pitch discrimination is best (implying accurate F0 coding) when harmonics below about the 10th are present [6–10]. When these lower-numbered harmonics are present, pitch discrimination is also independent of the phase relationships between the harmonics, suggesting that these harmonics are spectrally resolved to some extent. In contrast, when only harmonics above the 10th are present in this range of F0s, pitch discrimination is poorer and is affected by the phase relationships between harmonics, suggesting that interactions occur between these spectrally unresolved harmonics [6–10]. Psychoacoustic studies of sound segregation have often been carried out with interleaved sequences of tones. Some of these studies have investigated segregation based on differences in pitch accuracy and have varied the accuracy by systematically varying whether resolved or only unresolved harmonics are present. Previous studies have found that stream segregation can occur with alternating sequences of tones, even if the tones consist only of unresolved harmonics [11–14]. However, the question of whether streaming is greater with resolved than unresolved harmonics has received mixed answers. In cases where the listeners’ task was to segregate the streams, some studies have shown little difference in streaming between conditions containing resolved or only unresolved harmonics [11, 15], whereas another study using a similar approach found significantly greater stream segregation when resolved harmonics were present than when only unresolved harmonics were present [12]. However, in situations where the task was either neutral or encouraged listeners to integrate the sequences into a single stream, the results have been consistent across studies in showing greater segregation for complex tones containing resolved harmonics than for tones containing only unresolved harmonics [13, 14]. These findings support the idea that pitch accuracy can affect our ability to segregate sounds. Less is known about the role of low-numbered harmonics in the context of segregating competing speech. Bird and Darwin [2] showed that lower harmonics dominate performance in a speech-segregation task based on F0 differences, but they did not test any conditions containing only high-numbered harmonics. Oxenham and Simonson [16] explored the effect of harmonic rank on speech intelligibility by comparing conditions where the target and single-talker masker had been lowpass (LP) or highpass (HP) filtered to either retain (LP-filtered) or remove (HP-filtered) the spectrally resolved components from the target and masker [16]. The LP and HP cutoff frequencies were selected to produce roughly equal performance in noise for both conditions. Surprisingly, performance in the LP and HP conditions improved by similar amounts when the noise masker was replaced by a single-talker masker with a different average F0, suggesting no clear benefit of having resolved harmonic components in the speech. However, that study only used relatively large values of average ΔF0 that according to recent F0 estimates were approximately 4 and 8 semitones (ST). Moreover, this study did not parametrically vary the ΔF0 between the target and masker. It may be that pitch accuracy is only relevant for more challenging conditions, i.e. for conditions with smaller average values of ΔF0. Thus, it remains unclear whether the effect of ΔF0 on performance is affected by the presence or absence of low-numbered, spectrally resolved harmonics. The aim of the present study was to determine whether there is an effect of spectral region, and hence pitch coding accuracy, on the ability of listeners to use average F0 differences between a target and an interfering talker to understand natural speech.

2020 ◽  
Vol 63 (11) ◽  
pp. 3855-3864
Author(s):  
Wanting Huang ◽  
Lena L. N. Wong ◽  
Fei Chen ◽  
Haihong Liu ◽  
Wei Liang

Purpose Fundamental frequency (F0) is the primary acoustic cue for lexical tone perception in tonal languages but is processed in a limited way in cochlear implant (CI) systems. The aim of this study was to evaluate the importance of F0 contours in sentence recognition in Mandarin-speaking children with CIs and find out whether it is similar to/different from that in age-matched normal-hearing (NH) peers. Method Age-appropriate sentences, with F0 contours manipulated to be either natural or flattened, were randomly presented to preschool children with CIs and their age-matched peers with NH under three test conditions: in quiet, in white noise, and with competing sentences at 0 dB signal-to-noise ratio. Results The neutralization of F0 contours resulted in a significant reduction in sentence recognition. While this was seen only in noise conditions among NH children, it was observed throughout all test conditions among children with CIs. Moreover, the F0 contour-induced accuracy reduction ratios (i.e., the reduction in sentence recognition resulting from the neutralization of F0 contours compared to the normal F0 condition) were significantly greater in children with CIs than in NH children in all test conditions. Conclusions F0 contours play a major role in sentence recognition in both quiet and noise among pediatric implantees, and the contribution of the F0 contour is even more salient than that in age-matched NH children. These results also suggest that there may be differences between children with CIs and NH children in how F0 contours are processed.


2020 ◽  
Vol 24 (4) ◽  
pp. 180-190
Author(s):  
Hyo Jeong Kim ◽  
Jae Hee Lee ◽  
Hyun Joon Shim

Background and Objectives: Although many studies have evaluated the effect of the digital noise reduction (DNR) algorithm of hearing aids (HAs) on speech recognition, there are few studies on the effect of DNR on music perception. Therefore, we aimed to evaluate the effect of DNR on music, in addition to speech perception, using objective and subjective measurements. Subjects and Methods: Sixteen HA users participated in this study (58.00±10.44 years; 3 males and 13 females). The objective assessment of speech and music perception was based on the Korean version of the Clinical Assessment of Music Perception test and word and sentence recognition scores. Meanwhile, for the subjective assessment, the quality rating of speech and music as well as self-reported HA benefits were evaluated. Results: There was no improvement conferred with DNR of HAs on the objective assessment tests of speech and music perception. The pitch discrimination at 262 Hz in the DNR-off condition was better than that in the unaided condition (<i>p</i>=0.024); however, the unaided condition and the DNR-on conditions did not differ. In the Korean music background questionnaire, responses regarding ease of communication were better in the DNR-on condition than in the DNR-off condition (<i>p</i>=0.029). Conclusions: Speech and music perception or sound quality did not improve with the activation of DNR. However, DNR positively influenced the listener’s subjective listening comfort. The DNR-off condition in HAs may be beneficial for pitch discrimination at some frequencies.


Author(s):  
Joseph D Wagner ◽  
Alice Gelman ◽  
Kenneth E. Hancock ◽  
Yoojin Chung ◽  
Bertrand Delgutte

The pitch of harmonic complex tones (HCT) common in speech, music and animal vocalizations plays a key role in the perceptual organization of sound. Unraveling the neural mechanisms of pitch perception requires animal models but little is known about complex pitch perception by animals, and some species appear to use different pitch mechanisms than humans. Here, we tested rabbits' ability to discriminate the fundamental frequency (F0) of HCTs with missing fundamentals using a behavioral paradigm inspired by foraging behavior in which rabbits learned to harness a spatial gradient in F0 to find the location of a virtual target within a room for a food reward. Rabbits were initially trained to discriminate HCTs with F0s in the range 400-800 Hz and with harmonics covering a wide frequency range (800-16,000 Hz), and then tested with stimuli differing either in spectral composition to test the role of harmonic resolvability (Experiment 1), or in F0 range (Experiment 2), or both F0 and spectral content (Experiment 3). Together, these experiments show that rabbits can discriminate HCTs over a wide F0 range (200-1600 Hz) encompassing the range of conspecific vocalizations, and can use either the spectral pattern of harmonics resolved by the cochlea for higher F0s or temporal envelope cues resulting from interaction between unresolved harmonics for lower F0s. The qualitative similarity of these results to human performance supports using rabbits as an animal model for studies of pitch mechanisms providing species differences in cochlear frequency selectivity and F0 range of vocalizations are taken into account.


2011 ◽  
Vol 105 (1) ◽  
pp. 188-199 ◽  
Author(s):  
Naoya Itatani ◽  
Georg M. Klump

It has been suggested that successively presented sounds that are perceived as separate auditory streams are represented by separate populations of neurons. Mostly, spectral separation in different peripheral filters has been identified as the cue for segregation. However, stream segregation based on temporal cues is also possible without spectral separation. Here we present sequences of ABA- triplet stimuli providing only temporal cues to neurons in the European starling auditory forebrain. A and B sounds (125 ms duration) were harmonic complexes (fundamentals 100, 200, or 400 Hz; center frequency and bandwidth chosen to fit the neurons' tuning characteristic) with identical amplitude spectra but different phase relations between components (cosine, alternating, or random phase) and presented at different rates. Differences in both rate responses and temporal response patterns of the neurons when stimulated with harmonic complexes with different phase relations provide first evidence for a mechanism allowing a separate neural representation of such stimuli. Recording sites responding >1 kHz showed enhanced rate and temporal differences compared with those responding at lower frequencies. These results demonstrate a neural correlate of streaming by temporal cues due to the variation of phase that shows striking parallels to observations in previous psychophysical studies.


2000 ◽  
Vol 108 (1) ◽  
pp. 263-271 ◽  
Author(s):  
Nicolas Grimault ◽  
Christophe Micheyl ◽  
Robert P. Carlyon ◽  
Patrick Arthaud ◽  
Lionel Collet

2017 ◽  
Vol 344 ◽  
pp. 235-243 ◽  
Author(s):  
Marion David ◽  
Mathieu Lavandier ◽  
Nicolas Grimault ◽  
Andrew J. Oxenham

2010 ◽  
Vol 128 (4) ◽  
pp. 1930-1942 ◽  
Author(s):  
Christophe Micheyl ◽  
Kristin Divis ◽  
David M. Wrobleski ◽  
Andrew J. Oxenham

2017 ◽  
Vol 35 (2) ◽  
pp. 127-143
Author(s):  
Václav Vencovský ◽  
František Rund

This study is focused on the perceived roughness of two simultaneous harmonic complex tones with ratios between their fundamental frequencies set to create intervals on just-tempered (JT) and equal-tempered (ET) scales. According to roughness theories, ET intervals should produce more roughness. However, previous studies have shown the opposite for intervals in which the lower fundamental frequency of the complex was equal to 261.6 Hz. The aim of this study is to verify and explain these results by using intervals composed of complexes whose spectral components were generated with either a sine starting phase or with a random starting phase. Results of the current study showed the same phenomenon as previous studies. To examine whether the explanation of the phenomenon lies in the function of the peripheral ear, three roughness models based upon this function were used: the Daniel and Weber (1997) model, the synchronization index (SI) model, and the model based on a hydrodynamic cochlear model. For most of the corresponding JT and ET intervals, only the Daniel and Weber (1997) model predicted less roughness in the ET intervals. In addition to this, the intervals were analyzed by a model simulating the auditory periphery. The results showed that a possible cause for the roughness differences may be in the frequencies of fluctuations of the signal in the peripheral ear. For JT intervals the fluctuations in the adjacent places on the simulated basilar membrane had either the same frequency or integer multiples of that frequency and were synchronized. Since a previous study showed that synchronized fluctuations in adjacent auditory filters lead to higher roughness than out of phase fluctuations (Terhardt, 1974), this may cause more roughness across JT and ET intervals.


2006 ◽  
Vol 17 (04) ◽  
pp. 241-252 ◽  
Author(s):  
Kevin C.P. Yuen ◽  
Anna C.S. Kam ◽  
Polly S.H. Lau

The amplification outcomes of two hearing aid prescriptions, NAL-NL1 and Digital Perception Processing (DPP), of nine moderate to moderately severe hearing-impaired adults were compared in the same digital hearing instrument. NAL-NL1 aims at optimizing speech intelligibility while amplifying the speech signal to a normal overall loudness level (Dillon, 1999). DPP focuses on restoring loudness based on normal and impaired cochlear excitation models (Launer and Moore, 2003). In this comparison, DPP resulted in better sentence recognition performance than the NAL-NL1 algorithm in the signal-front/noise-side condition, and the two prescriptions gave similar performance in the signal-front/noise-front condition. Subjective evaluations by the participants using the Abbreviated Profile for Hearing Aid Benefit and sound quality comparisons did not give conclusive results between the two prescriptions.With each hearing aid prescription, the ability of the hearing aid circuitry to reduce the effects of noise was evaluated by a sentence-in-noise test in three conditions: (1) adaptive directional microphone (DAZ), (2) multichannel noise reduction system (FNC), and (3) a combination of FNC and DAZ (FNC + DAZ). In the signal-front/noise-side condition, DAZ and FNC + DAZ gave better performance than FNC in nearly all participants, whereas in the signal-front and noise-front evaluation, the conditions revealed no significant differences.


Sign in / Sign up

Export Citation Format

Share Document