scholarly journals Transformation of a temporal speech cue to a spatial neural code in human auditory cortex

eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Neal P Fox ◽  
Matthew Leonard ◽  
Matthias J Sjerps ◽  
Edward F Chang

In speech, listeners extract continuously-varying spectrotemporal cues from the acoustic signal to perceive discrete phonetic categories. Spectral cues are spatially encoded in the amplitude of responses in phonetically-tuned neural populations in auditory cortex. It remains unknown whether similar neurophysiological mechanisms encode temporal cues like voice-onset time (VOT), which distinguishes sounds like /b/ and/p/. We used direct brain recordings in humans to investigate the neural encoding of temporal speech cues with a VOT continuum from /ba/ to /pa/. We found that distinct neural populations respond preferentially to VOTs from one phonetic category, and are also sensitive to sub-phonetic VOT differences within a population’s preferred category. In a simple neural network model, simulated populations tuned to detect either temporal gaps or coincidences between spectral cues captured encoding patterns observed in real neural data. These results demonstrate that a spatial/amplitude neural code underlies the cortical representation of both spectral and temporal speech cues.

1999 ◽  
Vol 82 (5) ◽  
pp. 2346-2357 ◽  
Author(s):  
Mitchell Steinschneider ◽  
Igor O. Volkov ◽  
M. Daniel Noh ◽  
P. Charles Garell ◽  
Matthew A. Howard

Voice onset time (VOT) is an important parameter of speech that denotes the time interval between consonant onset and the onset of low-frequency periodicity generated by rhythmic vocal cord vibration. Voiced stop consonants (/b/, /g/, and /d/) in syllable initial position are characterized by short VOTs, whereas unvoiced stop consonants (/p/, /k/, and t/) contain prolonged VOTs. As the VOT is increased in incremental steps, perception rapidly changes from a voiced stop consonant to an unvoiced consonant at an interval of 20–40 ms. This abrupt change in consonant identification is an example of categorical speech perception and is a central feature of phonetic discrimination. This study tested the hypothesis that VOT is represented within auditory cortex by transient responses time-locked to consonant and voicing onset. Auditory evoked potentials (AEPs) elicited by stop consonant-vowel (CV) syllables were recorded directly from Heschl's gyrus, the planum temporale, and the superior temporal gyrus in three patients undergoing evaluation for surgical remediation of medically intractable epilepsy. Voiced CV syllables elicited a triphasic sequence of field potentials within Heschl's gyrus. AEPs evoked by unvoiced CV syllables contained additional response components time-locked to voicing onset. Syllables with a VOT of 40, 60, or 80 ms evoked components time-locked to consonant release and voicing onset. In contrast, the syllable with a VOT of 20 ms evoked a markedly diminished response to voicing onset and elicited an AEP very similar in morphology to that evoked by the syllable with a 0-ms VOT. Similar response features were observed in the AEPs evoked by click trains. In this case, there was a marked decrease in amplitude of the transient response to the second click in trains with interpulse intervals of 20–25 ms. Speech-evoked AEPs recorded from the posterior superior temporal gyrus lateral to Heschl's gyrus displayed comparable response features, whereas field potentials recorded from three locations in the planum temporale did not contain components time-locked to voicing onset. This study demonstrates that VOT at least partially is represented in primary and specific secondary auditory cortical fields by synchronized activity time-locked to consonant release and voicing onset. Furthermore, AEPs exhibit features that may facilitate categorical perception of stop consonants, and these response patterns appear to be based on temporal processing limitations within auditory cortex. Demonstrations of similar speech-evoked response patterns in animals support a role for these experimental models in clarifying selected features of speech encoding.


Author(s):  
Mitchell Steinschneider ◽  
Charles E. Schroeder ◽  
Joseph C. Arezzo ◽  
Herbert G. Vaughan

1996 ◽  
Vol 49 (3) ◽  
pp. 745-764 ◽  
Author(s):  
Jörgen Pind

Speech segments are highly context-dependent and acoustically variable. One factor that contributes heavily to the variability of speech is speaking rate. Some speech cues are temporal in nature—that is, the distinctions that they signify are defined over time. How can temporal speech cues keep their distinctiveness in the face of extrinsic transformations, such as those wrought by different speaking rates? This issue is explored with respect to the perception, in Icelandic, of Voice Onset Time as a cue for word-initial stop voicing, wordinitial aspiration as a cue for [h], and Voice Offset Time as a cue for pre-aspiration. All the speech cues show rate-dependent perception though to different degrees, with Voice Offset Time being most sensitive to rate changes and Voice Onset Time least sensitive. The differences in the behaviour of these speech cues are related to their different positions in the syllable.


1988 ◽  
Vol 31 (3) ◽  
pp. 449-459 ◽  
Author(s):  
Karen Forrest ◽  
Barbara K. Rockman

Spectrographic measures of voice onset time (VOT) were made for phonologically disordered children in whom a voicing contrast was just beginning to emerge. These temporal measures were related to adult listeners' perception of voicing of the initial stop consonant to determine how well VOT could predict perceived voicing. In general, the predictive utility of VOT was not very high. The relation between VOT as produced by the phonologically disordered children and perceived voicing ranged from 0.31 to 0.43. A finer-grained analysis was conducted to determine what other acoustic cues might have influenced the listeners' judgments of voicing. Although no one acoustic cue could be found to explain all listeners' responses, spectral cues such as fundamental and F 1 frequencies at the onset of voicing, as well as the burst and aspiration amplitude relative to the vowel onset amplitude accounted for the perceived voicing of about half of the tokens that were not differentiated by VOT. Rather than relying solely on the temporal characteristics of the VOT interval, a matrix of acoustic cues may influence how a listener perceives word-initial voicing as produced by phonologically disordered children.


Sign in / Sign up

Export Citation Format

Share Document