The role of spectral cues in discrimination of voice onset time differences

Problems in modelling categorical perception (CP) and attempts to apply signal detection theory (SDT) to CP are reviewed. An approach based on SDT supplemented by a theory of criterion setting is presented. Criterion setting theory (CST) postulates mechanisms that reset the response criterion on each trial, and it accounts for sequential dependencies. A criterion setting model for discrimination is shown to fit data from the literature. The hypothesis that “sharp” category boundaries may arise from the suppression of noise caused by intertrial dependencies was examined in an experiment on the identification of [ba] and [pa] syllables, and tone combinations of varying tone-onset time. However, it was shown that both positive and negative intertrial dependencies were present. They could be fitted by the criterion-setting model; in this respect, CP resembles standard psychophysical judgements. Examination of the psychometric functions from the two CP tasks shows that they are not normal ogives, as in standard psychophysical tasks: these curves are steeper centrally and flatter at the extremes than a Gaussian ogive; we describe them as “hypersigmoid”. The description of CP identification functions as hypersigmoid provides a new, qualitative characterization of the “sharp” category boundaries traditionally claimed for CP. Their causation remains to be determined.

Download Full-text

Constructing two phonological systems: A phonetic analysis of /p/, /t/, /k/ among early Spanish–English bilingual speakers

International Journal of Bilingualism ◽

10.1177/1367006916651983 ◽

2016 ◽

Vol 22 (1) ◽

pp. 51-68

Author(s):

Earl K. Brown ◽

Mary T. Copple

Keyword(s):

Native Speakers ◽

Voice Onset Time ◽

First Language ◽

Onset Time ◽

Center Of Gravity ◽

Spanish Speakers ◽

Bilingual Speakers ◽

English Bilingual ◽

The Usa

Aims and objectives/purpose/research questions: Many early Spanish-English bilingual speakers in the USA learn Spanish as their first language at home and English in school. This paper seeks to elucidate whether these speakers develop a separate phonological system for English and, if so, the role of primary and secondary cues in the development of the second language (L2) system. Design/methodology/approach: The phonetic realization of the voiceless stops /p/, /t/, /k/ is analyzed among three groups: early Spanish-English bilinguals; L1 English speakers who are late learners of Spanish; and L1 Spanish speakers who are late learners of English. The participants ( N = 15) engaged in a reading task and a conversation task in each language during a single recording session. Data and analysis: 1578 tokens of /p/, /t/, /k/ were extracted and analyzed using acoustic software. Voice onset time in milliseconds and center of gravity in Hertz were analyzed, and monofactorial and multifactorial analyses were performed to determine the role of linguistic background. Findings/conclusions: Evidence is found of two phonological systems among early bilingual speakers, with varying degrees of assimilation to the phonological systems of the native speakers of each language. Originality: We argue that early bilinguals construct their L2 system of /p/, /t/, /k/ in English based on the primary cue of voice onset time rather than the secondary cue of center of gravity, as they are accustomed to noticing differences in voice onset time in Spanish and because the center of gravity of /p/, /t/, /k/ in English is more variable than voice onset time, and therefore represents a more variable and less predictable cue for early bilinguals as they construct their L2 system. Significance/implications: This paper contributes to the literature on the construction of phonological systems and to research detailing the speech of early Spanish-English bilinguals.

Download Full-text

The role of code-switching and language context in bilingual phonetic transfer

Journal of the International Phonetic Association ◽

10.1017/s0025100315000468 ◽

2016 ◽

Vol 46 (3) ◽

pp. 263-285 ◽

Cited By ~ 8

Author(s):

Daniel J. Olson

Keyword(s):

Dual Language ◽

Voice Onset Time ◽

Onset Time ◽

Code Switching ◽

Additive Effect ◽

Switching Effect ◽

Oral Production ◽

Language Activation ◽

Language Context

The present study examines the effect of two potential catalysts for interlanguage phonetic interaction, code-switching and language mode, on the production of voice onset time (VOT) to better understand the role of (near) simultaneous dual language activation on phonetic production, as well as the nature of phonetic transfer. An oral production paradigm was carried out in which Spanish–English bilinguals produced words with initial voiceless stops as non-switched tokens, code-switched tokens in an otherwise monolingual context, and code-switched tokens in a bilingual context. Results demonstrated a degree of phonetic transfer associated with code-switching, either unidirectional or bi-directional. Specifically, English, with long lag VOT, was more susceptible to phonetic transfer than Spanish (short lag). Contrary to expectations, while the code-switching effect was present in both monolingual and bilingual mode, there was no additional transfer, or additive effect, of bilingual language mode. Differences in the effects of code-switching on English and Spanish are discussed with respect to the inherently different acceptable VOT ranges in the two languages. Furthermore, the lack of difference in VOT between the code-switched tokens in the monolingual and bilingual contexts is taken to suggest limits on phonetic transfer.

Download Full-text

Transformation of a temporal speech cue to a spatial neural code in human auditory cortex

eLife ◽

10.7554/elife.53051 ◽

2020 ◽

Vol 9 ◽

Author(s):

Neal P Fox ◽

Matthew Leonard ◽

Matthias J Sjerps ◽

Edward F Chang

Keyword(s):

Auditory Cortex ◽

Voice Onset Time ◽

Onset Time ◽

Neural Code ◽

Neural Encoding ◽

Spectral Cues ◽

Temporal Cues ◽

Neural Populations ◽

Speech Cues ◽

Simple Neural Network

In speech, listeners extract continuously-varying spectrotemporal cues from the acoustic signal to perceive discrete phonetic categories. Spectral cues are spatially encoded in the amplitude of responses in phonetically-tuned neural populations in auditory cortex. It remains unknown whether similar neurophysiological mechanisms encode temporal cues like voice-onset time (VOT), which distinguishes sounds like /b/ and/p/. We used direct brain recordings in humans to investigate the neural encoding of temporal speech cues with a VOT continuum from /ba/ to /pa/. We found that distinct neural populations respond preferentially to VOTs from one phonetic category, and are also sensitive to sub-phonetic VOT differences within a population’s preferred category. In a simple neural network model, simulated populations tuned to detect either temporal gaps or coincidences between spectral cues captured encoding patterns observed in real neural data. These results demonstrate that a spatial/amplitude neural code underlies the cortical representation of both spectral and temporal speech cues.

Download Full-text

A Contrastive Study of the Voice Onset Time (VOT) in English and Arabic Languages

Al-Adab Journal ◽

10.31973/aj.v1i118.374 ◽

2018 ◽

Vol 1 (118) ◽

pp. 61-74

Author(s):

Rafida Mansoor Mahmood

Keyword(s):

Auditory System ◽

Categorical Perception ◽

Voice Onset Time ◽

Onset Time ◽

Stop Consonants ◽

Contrastive Study ◽

Human Auditory System ◽

Arabic And English ◽

The Voice

The signal sound contains many different features. One of these features is voice onset time henceforth )VOT) and this feature refers to the ways different people of different languages have been distinguished by the way they articulate stop consonants of their own language. This feature (VOT) can be utilized by the human auditory system to distinguish between voiced and devoiced stops such as /p/ and /b /in English, /t/ and /t?/ Arabic. The study is contributed into five sections:- Section One is introductory, which contains the introduction, the problem, the hypothesis, the aim, the limitation and the value of the study. Section Two shows the definitions and types of VOT: positive, negative, zero VOT and role of VOT. Section Three deals with the measurement and categorical perception of VOT, these ways of measurements are spectrograms, waveform and lagtime. Section Four investigates the VOT of two languages, Arabic and English in details with a comparison between these two languages. It ends with a number of conclusions. One of these conclusions is that Arabic VOT is different from English VOT and this approved the hypothesis.

Download Full-text

Effects of Sound Change on the Weighting of Acoustic Cues to the Three-Way Laryngeal Stop Contrast in Korean: Diachronic and Dialectal Comparisons

Language and Speech ◽

10.1177/0023830918786305 ◽

2018 ◽

Vol 62 (3) ◽

pp. 509-530 ◽

Cited By ~ 3

Author(s):

Hyunjung Lee ◽

Allard Jongman

Keyword(s):

Fundamental Frequency ◽

Voice Onset Time ◽

Onset Time ◽

Sound Change ◽

Acoustic Cues ◽

Dialectal Variation ◽

Cue Weighting ◽

Age Variation ◽

Korean Stops

Both segmental and suprasegmental properties of the South Kyungsang dialect of Korean have changed under the influence of standard Seoul Korean. This study examines how such sound change affects acoustic cues to the three-way laryngeal contrast among Korean stops across Kyungsang generations through a comparison with Seoul Korean. Thirty-nine female Korean speakers differing in dialect (Kyungsang, Seoul) and age (older, younger) produced words varying in initial stops and lexical accent patterns, for which voice onset time and fundamental frequency (F0) at vowel onset were measured. This study first confirms previous findings regarding age and dialectal variation in distinguishing the three Korean stops. In addition, we report age variation in the use of voice onset time and F0 for the stops in Kyungsang Korean, with younger speakers using F0 more than older speakers as a cue to the stop distinction. This age variation is accounted for by the reduced lexical tonal properties of Kyungsang Korean and the increased influence of Seoul Korean. A comparison of the specific cue weighting across speaker groups also reveals that younger Kyungsang speakers pattern with Seoul speakers who arguably follow the enhancing F0 role of the innovative younger Seoul speakers. The shared cue weighting pattern across generations and dialects suggests that each speaker group changes the acoustic cue weighting in a similar direction.

Download Full-text

Voice onset time and global foreign accent in German–French simultaneous bilinguals during adulthood

International Journal of Bilingualism ◽

10.1177/1367006915589424 ◽

2016 ◽

Vol 20 (6) ◽

pp. 732-749 ◽

Cited By ~ 17

Author(s):

Tatjana Lein ◽

Tanja Kupisch ◽

Joost van de Weijer

Keyword(s):

Voice Onset Time ◽

Onset Time ◽

Foreign Accent ◽

Childhood Environment ◽

Voiceless Stop ◽

New Perspective ◽

Crosslinguistic Influence ◽

The Impact ◽

The Voice

Aims and objectives: In this study, we investigated crosslinguistic influence in the phonetic systems of simultaneous bilinguals (2L1s) during adulthood. Methodology: Specifically, we analyzed the voice onset time (VOT) of the voiceless stop /k/ in the spontaneous speech of 14 German–French bilinguals who grew up in France or Germany. We looked at both languages, first comparing the groups, second comparing their VOT to their global accent. Data and analysis: The material consisted of interviews, lasting for about half an hour. Findings/conclusions: Most 2L1s showed distinct VOT-ranges in their two languages, even if they were perceived to have a foreign accent in the minority language of their childhood environment. We conclude that the phonetic systems of 2L1s remain separate and stable throughout the lifespan. However, the 2L1s from France had significantly shorter VOTs in German than the 2L1s from Germany, and their speech was overall more accented. These findings are discussed with respect to the role of intra- and extra-linguistic factors. Originality: Our study adds a new perspective to existing VOT studies of bilinguals by using naturalistic speech data and by comparing two groups of 2L1s who have the same language combination but grew up in different countries, which allows us to evaluate the impact of their childhood environment on VOT development. Significance/implications: Language exposure during childhood seems to be beneficial for pronunciation during adulthood.

Download Full-text

I Scream for Ice Cream: Resolving Lexical Ambiguity with Sub-phonemic Information

Language and Speech ◽

10.1177/0023830919866870 ◽

2019 ◽

Vol 63 (3) ◽

pp. 526-549

Author(s):

Yoonjeong Lee ◽

Elsi Kaiser ◽

Louis Goldstein

Keyword(s):

Voice Onset Time ◽

American English ◽

Onset Time ◽

Lexical Ambiguity ◽

Language Recognition ◽

Movement Trajectories ◽

Phonetic Information ◽

Lexical Ambiguity Resolution ◽

Mouse Movement

This study uses a response mouse-tracking paradigm to examine the role of sub-phonemic information in online lexical ambiguity resolution of continuous speech. We examine listeners’ sensitivity to the sub-phonemic information that is specific to the ambiguous internal open juncture /s/-stop sequences in American English (e.g., “ place kin” vs. “ play skin”), that is, voice onset time (VOT) indicating different degrees of aspiration (e.g., long VOT for “ k in” vs. short VOT for “ s k in”) in connected speech contexts. A cross-splicing method was used to create two-word sequences (e.g., “ place kin” or “ play skin”) with matching VOTs (long for “ k in”; short for “ s k in”) or mismatching VOTs ( short for “ k in”; long for “ s k in”). Participants ( n = 20) heard the two-word sequences, while looking at computer displays with the second word in the left/right corner (“ KIN” and “ SKIN”). Then, listeners’ click responses and mouse movement trajectories were recorded. Click responses show significant effects of VOT manipulation, while mouse trajectories do not. Our results show that stop-release information, whether temporal or spectral, can (mis)guide listeners’ interpretation of the possible location of a word boundary between /s/ and a following stop, even when other aspects in the acoustic signal (e.g., duration of /s/) point to the alternative segmentation. Taken together, our results suggest that segmentation and lexical access are highly attuned to bottom-up phonetic information; our results have implications for a model of spoken language recognition with position-specific representations available at the prelexical level and also allude to the possibility that detailed phonetic information may be stored in the listeners’ lexicons.

Download Full-text

Acoustic and Perceptual Analysis of Word-Initial Stop Consonants in Phonologically Disordered Children

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3103.449 ◽

1988 ◽

Vol 31 (3) ◽

pp. 449-459 ◽

Cited By ~ 24

Author(s):

Karen Forrest ◽

Barbara K. Rockman

Keyword(s):

Voice Onset Time ◽

Onset Time ◽

Stop Consonant ◽

Stop Consonants ◽

Acoustic Cues ◽

Perceptual Analysis ◽

Spectral Cues ◽

Voicing Contrast ◽

Temporal Measures ◽

Very High

Spectrographic measures of voice onset time (VOT) were made for phonologically disordered children in whom a voicing contrast was just beginning to emerge. These temporal measures were related to adult listeners' perception of voicing of the initial stop consonant to determine how well VOT could predict perceived voicing. In general, the predictive utility of VOT was not very high. The relation between VOT as produced by the phonologically disordered children and perceived voicing ranged from 0.31 to 0.43. A finer-grained analysis was conducted to determine what other acoustic cues might have influenced the listeners' judgments of voicing. Although no one acoustic cue could be found to explain all listeners' responses, spectral cues such as fundamental and F 1 frequencies at the onset of voicing, as well as the burst and aspiration amplitude relative to the vowel onset amplitude accounted for the perceived voicing of about half of the tokens that were not differentiated by VOT. Rather than relying solely on the temporal characteristics of the VOT interval, a matrix of acoustic cues may influence how a listener perceives word-initial voicing as produced by phonologically disordered children.

Download Full-text