Acoustic and Perceptual Analysis of Word-Initial Stop Consonants in Phonologically Disordered Children

1988 ◽  
Vol 31 (3) ◽  
pp. 449-459 ◽  
Author(s):  
Karen Forrest ◽  
Barbara K. Rockman

Spectrographic measures of voice onset time (VOT) were made for phonologically disordered children in whom a voicing contrast was just beginning to emerge. These temporal measures were related to adult listeners' perception of voicing of the initial stop consonant to determine how well VOT could predict perceived voicing. In general, the predictive utility of VOT was not very high. The relation between VOT as produced by the phonologically disordered children and perceived voicing ranged from 0.31 to 0.43. A finer-grained analysis was conducted to determine what other acoustic cues might have influenced the listeners' judgments of voicing. Although no one acoustic cue could be found to explain all listeners' responses, spectral cues such as fundamental and F 1 frequencies at the onset of voicing, as well as the burst and aspiration amplitude relative to the vowel onset amplitude accounted for the perceived voicing of about half of the tokens that were not differentiated by VOT. Rather than relying solely on the temporal characteristics of the VOT interval, a matrix of acoustic cues may influence how a listener perceives word-initial voicing as produced by phonologically disordered children.

2021 ◽  
pp. 026765832110089
Author(s):  
Daniel J Olson

Featural approaches to second language phonetic acquisition posit that the development of new phonetic norms relies on sub-phonemic features, expressed through a constellation of articulatory gestures and their corresponding acoustic cues, which may be shared across multiple phonemes. Within featural approaches, largely supported by research in speech perception, debate remains as to the fundamental scope or ‘size’ of featural units. The current study examines potential featural relationships between voiceless and voiced stop consonants, as expressed through the voice onset time cue. Native English-speaking learners of Spanish received targeted training on Spanish voiceless stop consonant production through a visual feedback paradigm. Analysis focused on the change in voice onset time, for both voiceless (i.e. trained) and voiced (i.e. non-trained) phonemes, across the pretest, posttest, and delayed posttest. The results demonstrated a significant improvement (i.e. reduction) in voice onset time for voiceless stops, which were subject to the training paradigm. In contrast, there was no significant change in the non-trained voiced stop consonants. These results suggest a limited featural relationship, with independent voice onset time (VOT) cues for voiceless and voices phonemes. Possible underlying mechanisms that limit feature generalization in second language (L2) phonetic production, including gestural considerations and acoustic similarity, are discussed.


1999 ◽  
Vol 82 (5) ◽  
pp. 2346-2357 ◽  
Author(s):  
Mitchell Steinschneider ◽  
Igor O. Volkov ◽  
M. Daniel Noh ◽  
P. Charles Garell ◽  
Matthew A. Howard

Voice onset time (VOT) is an important parameter of speech that denotes the time interval between consonant onset and the onset of low-frequency periodicity generated by rhythmic vocal cord vibration. Voiced stop consonants (/b/, /g/, and /d/) in syllable initial position are characterized by short VOTs, whereas unvoiced stop consonants (/p/, /k/, and t/) contain prolonged VOTs. As the VOT is increased in incremental steps, perception rapidly changes from a voiced stop consonant to an unvoiced consonant at an interval of 20–40 ms. This abrupt change in consonant identification is an example of categorical speech perception and is a central feature of phonetic discrimination. This study tested the hypothesis that VOT is represented within auditory cortex by transient responses time-locked to consonant and voicing onset. Auditory evoked potentials (AEPs) elicited by stop consonant-vowel (CV) syllables were recorded directly from Heschl's gyrus, the planum temporale, and the superior temporal gyrus in three patients undergoing evaluation for surgical remediation of medically intractable epilepsy. Voiced CV syllables elicited a triphasic sequence of field potentials within Heschl's gyrus. AEPs evoked by unvoiced CV syllables contained additional response components time-locked to voicing onset. Syllables with a VOT of 40, 60, or 80 ms evoked components time-locked to consonant release and voicing onset. In contrast, the syllable with a VOT of 20 ms evoked a markedly diminished response to voicing onset and elicited an AEP very similar in morphology to that evoked by the syllable with a 0-ms VOT. Similar response features were observed in the AEPs evoked by click trains. In this case, there was a marked decrease in amplitude of the transient response to the second click in trains with interpulse intervals of 20–25 ms. Speech-evoked AEPs recorded from the posterior superior temporal gyrus lateral to Heschl's gyrus displayed comparable response features, whereas field potentials recorded from three locations in the planum temporale did not contain components time-locked to voicing onset. This study demonstrates that VOT at least partially is represented in primary and specific secondary auditory cortical fields by synchronized activity time-locked to consonant release and voicing onset. Furthermore, AEPs exhibit features that may facilitate categorical perception of stop consonants, and these response patterns appear to be based on temporal processing limitations within auditory cortex. Demonstrations of similar speech-evoked response patterns in animals support a role for these experimental models in clarifying selected features of speech encoding.


2011 ◽  
Vol 15 (2) ◽  
pp. 275-287 ◽  
Author(s):  
SUE ANN S. LEE ◽  
GREGORY K. IVERSON

The purpose of this study was to conduct an acoustic examination of the obstruent stops produced by Korean–English bilingual children in connection with the question of whether bilinguals establish distinct categories of speech sounds across languages. Stop productions were obtained from ninety children in two age ranges, five and ten years: thirty Korean–English bilinguals, thirty monolingual Koreans and thirty monolingual English speakers. Voice-Onset-Time (VOT) lag at word-initial stop and fundamental frequency (f0) in the following vowel (hereafter vowel-onset f0) were measured. The bilingual children showed different patterns of VOT in comparison to both English and Korean monolinguals, with longer VOT in their production of Korean stop consonants and shorter VOT for English. Moreover, the ten-year-old bilinguals distinguished all stop categories using both VOT and vowel-onset f0,whereas the five-year-olds tended to make stop distinctions based on VOT but not vowel-onset f0. The results of this study suggest that bilingual children at around five years of age do not yet have fully separate stop systems, and that the systems continue to evolve during the developmental period.


1980 ◽  
Vol 23 (1) ◽  
pp. 152-161 ◽  
Author(s):  
Z. S. Bond ◽  
Howard F. Wilson

Voicing is a phonological contrast which emerges early in the speech of children. However, the acoustic correlates of the voicing contrast for stop consonants are fairly complex. In the initial position, voicing is cued primarily by the relative timing of articulatory versus laryngeal gestures. In the final position, the duration of the preceding vowel is associated with the voicing contrast of stop consonants. The purpose of this study was to examine the pattern of acquisition of the voicing contrast in the speech of ten children diagnosed as language-delayed in comparison with the acquisition of the voicing contrast by normal speaking children. The language-delayed and normal-speaking children were matched according to mean length of utterance (MLU) and placed in one of Brown's five developmental stages. Each participant was first given a short test, using natural speech, to determine his or her ability to identify minimal pairs differing in the voicing of stop consonants. Those who passed the test were recorded under standard recording conditions repeating 12 test words. The test words contrasted voiced and voiceless stop consonants in initial and final positions. Spectrograms of the three best productions of each word were used to examine voice-onset time for stops in initial position and preceding vowel duration for stops in final position. Although the language-delayed and normal-speaking children showed equivalent linguistic sophistication (as measured by MLU), the language-delayed children's control of the acoustic-phonetic details of the voicing contrast was less mature than that of the normal-speaking children.


2018 ◽  
Vol 61 (3) ◽  
pp. 789-796 ◽  
Author(s):  
Shunsuke Tamura ◽  
Kazuhito Ito ◽  
Nobuyuki Hirose ◽  
Shuji Mori

Purpose The purpose of this study was to investigate the psychophysical boundary used for categorization of voiced–voiceless stop consonants in native Japanese speakers. Method Twelve native Japanese speakers participated in the experiment. The stimuli were synthetic stop consonant–vowel stimuli varying in voice onset time (VOT) with manipulation of the amplitude of the initial noise portion and the first formant (F1) frequency of the periodic portion. There were 3 tasks, namely, speech identification to either /d/ or /t/, detection of the noise portion, and simultaneity judgment of onsets of the noise and periodic portions. Results The VOT boundaries of /d/–/t/ were close to the shortest VOT values that allowed for detection of the noise portion but not to those for perceived nonsimultaneity of the noise and periodic portions. The slopes of noise detection functions along VOT were as sharp as those of voiced–voiceless identification functions. In addition, the effects of manipulating the amplitude of the noise portion and the F1 frequency of the periodic portion on the detection of the noise portion were similar to those on voiced–voiceless identification. Conclusion The psychophysical boundary of perception of the initial noise portion masked by the following periodic portion may be used for voiced–voiceless categorization by Japanese speakers.


1980 ◽  
Vol 7 (3) ◽  
pp. 433-458 ◽  
Author(s):  
Marlys A. Macken ◽  
David Barton

ABSTRACTThis paper reports on the acquisition of the voicing contrast in Mexican–Spanish word-initial stops. In Study 1, three monolingual children were recorded every two weeks for seven months, beginning when the children were about 1; 7. In Study 2, four monolingual children about 3; 10 were recorded once or twice. Two analyses were done. Instrumental analysis of the stop productions revealed that not even by age 3; 10 were the children consistently distinguishing between voiced–voiceless stop cognate pairs on the basis of adult-like voice-onset time characteristics. The spirantization analysis, however, more clearly revealed the children's phonological knowledge. Discussion focuses on the implications of the data for phonological development in general and for the phonological description of voicing in Spanish.


2019 ◽  
Vol 62 (2) ◽  
pp. 434-441 ◽  
Author(s):  
Shunsuke Tamura ◽  
Kazuhito Ito ◽  
Nobuyuki Hirose ◽  
Shuji Mori

Purpose The purpose of this study was to investigate whether speech perception would reflect small latency changes in subcortical speech representation. Method Twelve native Japanese listeners participated in the experiment. Those listeners participated in speech identification task and auditory brainstem response (ABR) measurement using /d/–/t/ continuum stimuli varying in voice onset time (VOT) with manipulation of the amplitude of initial noise (consonant) portion, the duration of which corresponded to VOT. Results Increasing the noise portion amplitude lengthened subcortical representation of VOT, which is the latency difference between ABRs synchronizing to the onsets of initial noise and following periodic (vowel) portions (VOT ABR ) and made listeners likely to perceive the stimuli with ambiguous VOT as a voiceless stop /t/. In addition, the amount of VOT ABR lengthening was close to that of the VOT boundary shortening. Conclusion A few milliseconds of difference in subcortical speech representation are important for the perception of speech sounds with ambiguous acoustic cues. Supplemental Material https://doi.org/10.23641/asha.7728695


1980 ◽  
Vol 7 (1) ◽  
pp. 41-74 ◽  
Author(s):  
Marlys A. Macken ◽  
David Barton

ABSTRACTThis paper reports on a longitudinal study of the acquisition of the voicing contrast in American English word-initial stop consonants, as measured by voice onset time. Four monolingual children were recorded at two-week intervals, beginning when the children were about 1; 6. Data provide evidence for three general stages: (1) the child has no contrast; (2) the child has a contrast but one that falls within the adult perceptual boundaries of one (usually voiced) phoneme and thus is presumably not perceptible to adults; and (3) the child has a contrast that resembles the adult contrast. The rate and nature of the developmental process are discussed in relation to two competing models for phonological acquisition and two hypotheses regarding the skills being learned.


2013 ◽  
Vol 56 (4) ◽  
pp. 1097-1107 ◽  
Author(s):  
Matthew B. Winn ◽  
Monita Chatterjee ◽  
William J. Idsardi

Purpose The contributions of voice onset time (VOT) and fundamental frequency (F0) were evaluated for the perception of voicing in syllable-initial stop consonants in words that were low-pass filtered and/or masked by speech-shaped noise. It was expected that listeners would rely less on VOT and more on F0 in these degraded conditions. Method Twenty young listeners with normal hearing identified modified natural speech tokens that varied by VOT and F0 in several conditions of low-pass filtering and masking noise. Stimuli included /b/–/p/ and /d/–/t/ continua that were presented in separate blocks. Identification results were modeled using mixed-effects logistic regression. Results When speech was filtered and/or masked by noise, listeners' voicing perceptions were driven less by VOT and more by F0. Speech-shaped masking noise exerted greater effects on the /b/–/p/ contrast, while low-pass filtering exerted greater effects on the /d/–/t/ contrast, consistent with the acoustics of these contrasts. Conclusion Listeners can adjust their use of acoustic-phonetic cues in a dynamic way that is appropriate for challenging listening conditions; cues that are less influential in ideal conditions can gain priority in challenging conditions.


2021 ◽  
pp. 002383092098682
Author(s):  
Vladimir Kulikov

The current study investigates multiple acoustic cues–voice onset time (VOT), spectral center of gravity (SCG) of burst, pitch (F0), and frequencies of the first (F1) and second (F2) formants at vowel onset—associated with phonological contrasts of voicing and emphasis in production of Arabic coronal stops. The analysis of the acoustic data collected from eight native speakers of the Qatari dialect showed that the three stops form three distinct modes on the VOT scale: [d] is (pre)voiced, voiceless [t] is aspirated, and emphatic [ṭ] is voiceless unaspirated. The contrast is also maintained in spectral cues. Each cue influences production of coronal stops while their relevance to phonological contrasts varies. VOT was most relevant for voicing, but F2 was mostly associated with emphasis. The perception experiment revealed that listeners were able to categorize ambiguous tokens correctly and compensate for phonological contrasts. The listeners’ results were used to evaluate three categorization models to predict the intended category of a coronal stop: a model with unweighted and unadjusted cues, a model with weighted cues compensating for phonetic context, and a model with weighted cues compensating for the voicing and emphasis contrasts. The findings suggest that the model with phonological compensation performed most similar to human listeners both in terms of accuracy rate and error pattern.


Sign in / Sign up

Export Citation Format

Share Document