scholarly journals Units of processing in perceptual normalization for speaking rate

2021 ◽  
Author(s):  
Meg Cychosz ◽  
Rochelle Newman

Because speaking rates are highly variable, listeners must use cues like phoneme or sentence duration to scale or normalize speech across different contexts. Scaling speech perception in this way allows listeners to distinguish between temporal contrasts, like voiced and voiceless stops, even at different speech speeds. It has long been assumed that this normalization or adjustment of speaking rate can occur over individual phonemes. However, phonemes are often undefined in running speech, so it is not clear that listeners can rely on them for normalization. To evaluate this, we isolate two potential processing units for speaking rate normalization---the phoneme and the syllable---by manipulating phoneme duration in order to cue speaking rate, while also holding syllable duration constant. In doing so, we show that changing the duration of phonemes both with unique acoustic signatures (/k\textscripta/) and overlapping acoustic signatures (/w\textsci/) results in a speaking rate normalization effect. These results suggest that even absent clear acoustic boundaries within syllables, listeners can normalize for rate differences on the basis of individual phonemes.

2018 ◽  
Vol 30 (11) ◽  
pp. 1704-1719 ◽  
Author(s):  
Anna Maria Alexandrou ◽  
Timo Saarinen ◽  
Jan Kujala ◽  
Riitta Salmelin

During natural speech perception, listeners must track the global speaking rate, that is, the overall rate of incoming linguistic information, as well as transient, local speaking rate variations occurring within the global speaking rate. Here, we address the hypothesis that this tracking mechanism is achieved through coupling of cortical signals to the amplitude envelope of the perceived acoustic speech signals. Cortical signals were recorded with magnetoencephalography (MEG) while participants perceived spontaneously produced speech stimuli at three global speaking rates (slow, normal/habitual, and fast). Inherently to spontaneously produced speech, these stimuli also featured local variations in speaking rate. The coupling between cortical and acoustic speech signals was evaluated using audio–MEG coherence. Modulations in audio–MEG coherence spatially differentiated between tracking of global speaking rate, highlighting the temporal cortex bilaterally and the right parietal cortex, and sensitivity to local speaking rate variations, emphasizing the left parietal cortex. Cortical tuning to the temporal structure of natural connected speech thus seems to require the joint contribution of both auditory and parietal regions. These findings suggest that cortical tuning to speech rhythm operates on two functionally distinct levels: one encoding the global rhythmic structure of speech and the other associated with online, rapidly evolving temporal predictions. Thus, it may be proposed that speech perception is shaped by evolutionary tuning, a preference for certain speaking rates, and predictive tuning, associated with cortical tracking of the constantly changing-rate of linguistic information in a speech stream.


2017 ◽  
Vol 26 (2S) ◽  
pp. 631-640 ◽  
Author(s):  
Katarina L. Haley ◽  
Adam Jacks ◽  
Jessica D. Richardson ◽  
Julie L. Wambaugh

Purpose We sought to characterize articulatory distortions in apraxia of speech and aphasia with phonemic paraphasia and to evaluate the diagnostic validity of error frequency of distortion and distorted substitution in differentiating between these disorders. Method Study participants were 66 people with speech sound production difficulties after left-hemisphere stroke or trauma. They were divided into 2 groups on the basis of word syllable duration, which served as an external criterion for speaking rate in multisyllabic words and an index of likely speech diagnosis. Narrow phonetic transcriptions were completed for audio-recorded clinical motor speech evaluations, using 29 diacritic marks. Results Partial voicing and altered vowel tongue placement were common in both groups, and changes in consonant manner and place were also observed. The group with longer word syllable duration produced significantly more distortion and distorted-substitution errors than did the group with shorter word syllable duration, but variations were distributed on a performance continuum that overlapped substantially between groups. Conclusions Segment distortions in focal left-hemisphere lesions can be captured with a customized set of diacritic marks. Frequencies of distortions and distorted substitutions are valid diagnostic criteria for apraxia of speech, but further development of quantitative criteria and dynamic performance profiles is necessary for clinical utility.


2009 ◽  
Vol 125 (4) ◽  
pp. 2657-2657
Author(s):  
Eva Reinisch ◽  
Alexandra Jesse ◽  
James M. McQueen

1975 ◽  
Vol 18 (4) ◽  
pp. 739-753 ◽  
Author(s):  
John W. Folkins ◽  
Creighton J. Miller ◽  
Fred D. Minifie

The rhythm of syllables in repetitions of a phrase was measured with a finger-tapping task. These rhythm measurements were shown to vary with phrase level stress patterning. However, this relationship was not invariant. Acoustic measurements of the time between syllables showed stress pattern relationships similar to those observed in the rhythm-tapping task. The temporal differences between stress patterns appear to be (1) evident even when acoustic measurements exclude syllable duration, (2) significant even at a fast speaking rate, and (3) variable between speakers.


1997 ◽  
Vol 40 (6) ◽  
pp. 1395-1405 ◽  
Author(s):  
Karen Iler Kirk ◽  
David B. Pisoni ◽  
R. Christopher Miyamoto

Traditional word-recognition tests typically use phonetically balanced (PB) word lists produced by one talker at one speaking rate. Intelligibility measures based on these tests may not adequately evaluate the perceptual processes used to perceive speech under more natural listening conditions involving many sources of stimulus variability. The purpose of this study was to examine the influence of stimulus variability and lexical difficulty on the speech-perception abilities of 17 adults with mild-to-moderate hearing loss. The effects of stimulus variability were studied by comparing word-identification performance in single-talker versus multipletalker conditions and at different speaking rates. Lexical difficulty was assessed by comparing recognition of "easy" words (i.e., words that occur frequently and have few phonemically similar neighbors) with "hard" words (i.e., words that occur infrequently and have many similar neighbors). Subjects also completed a 20-item questionnaire to rate their speech understanding abilities in daily listening situations. Both sources of stimulus variability produced significant effects on speech intelligibility. Identification scores were poorer in the multiple-talker condition than in the single-talker condition, and word-recognition performance decreased as speaking rate increased. Lexical effects on speech intelligibility were also observed. Word-recognition performance was significantly higher for lexically easy words than lexically hard words. Finally, word-recognition performance was correlated with scores on the self-report questionnaire rating speech understanding under natural listening conditions. The pattern of results suggest that perceptually robust speech-discrimination tests are able to assess several underlying aspects of speech perception in the laboratory and clinic that appear to generalize to conditions encountered in natural listening situations where the listener is faced with many different sources of stimulus variability. That is, wordrecognition performance measured under conditions where the talker varied from trial to trial was better correlated with self-reports of listening ability than was performance in a single-talker condition where variability was constrained.


2004 ◽  
Vol 47 (5) ◽  
pp. 1103-1116 ◽  
Author(s):  
Caitlin M. Dillon ◽  
Rose A. Burkholder ◽  
Miranda Cleary ◽  
David B. Pisoni

Seventy-six children with cochlear implants completed a nonword repetition task. The children were presented with 20 nonword auditory patterns over a loud-speaker and were asked to repeat them aloud to the experimenter. The children's responses were recorded on digital audiotape and then played back to normal-hearing adult listeners to obtain accuracy ratings on a 7-point scale. The children's nonword repetition performance, as measured by these perceptual accuracy ratings, could be predicted in large part by their performance on independently collected measures of speech perception, verbal rehearsal speed, and speech production. The strongest contributing variable was speaking rate, which is widely argued to reflect verbal rehearsal speed in phonological working memory. Children who had become deaf at older ages received higher perceptual ratings. Children whose early linguistic experience and educational environments emphasized oral communication methods received higher perceptual ratings than children enrolled in total communication programs. The present findings suggest that individual differences in performance on nonword repetition are strongly related to variability observed in the component processes involved in language imitation tasks, including measures of speech perception, speech production, and especially verbal rehearsal speed in phonological working memory. In addition, onset of deafness at a later age and an educational environment emphasizing oral communication may be beneficial to the children's ability to develop the robust phonological processing skills necessary to accurately repeat novel, nonword sound patterns.


2020 ◽  
Vol 41 (3) ◽  
pp. 549-560 ◽  
Author(s):  
Mitchell S. Sommers ◽  
Brent Spehar ◽  
Nancy Tye-Murray ◽  
Joel Myerson ◽  
Sandra Hale

Sign in / Sign up

Export Citation Format

Share Document