scholarly journals Cortical Tracking of Global and Local Variations of Speech Rhythm during Connected Natural Speech Perception

2018 ◽  
Vol 30 (11) ◽  
pp. 1704-1719 ◽  
Author(s):  
Anna Maria Alexandrou ◽  
Timo Saarinen ◽  
Jan Kujala ◽  
Riitta Salmelin

During natural speech perception, listeners must track the global speaking rate, that is, the overall rate of incoming linguistic information, as well as transient, local speaking rate variations occurring within the global speaking rate. Here, we address the hypothesis that this tracking mechanism is achieved through coupling of cortical signals to the amplitude envelope of the perceived acoustic speech signals. Cortical signals were recorded with magnetoencephalography (MEG) while participants perceived spontaneously produced speech stimuli at three global speaking rates (slow, normal/habitual, and fast). Inherently to spontaneously produced speech, these stimuli also featured local variations in speaking rate. The coupling between cortical and acoustic speech signals was evaluated using audio–MEG coherence. Modulations in audio–MEG coherence spatially differentiated between tracking of global speaking rate, highlighting the temporal cortex bilaterally and the right parietal cortex, and sensitivity to local speaking rate variations, emphasizing the left parietal cortex. Cortical tuning to the temporal structure of natural connected speech thus seems to require the joint contribution of both auditory and parietal regions. These findings suggest that cortical tuning to speech rhythm operates on two functionally distinct levels: one encoding the global rhythmic structure of speech and the other associated with online, rapidly evolving temporal predictions. Thus, it may be proposed that speech perception is shaped by evolutionary tuning, a preference for certain speaking rates, and predictive tuning, associated with cortical tracking of the constantly changing-rate of linguistic information in a speech stream.

2019 ◽  
Author(s):  
Tobias Overath ◽  
Joon H. Paik

AbstractSpeech perception entails the mapping of the acoustic waveform to linguistic representations. For this mapping to succeed, the speech signal needs to be tracked over various temporal windows at high temporal precision in order to decode linguistic units ranging from phonemes (tens of milliseconds) to sentences (seconds). Here, we tested the hypothesis that cortical processing of speech-specific temporal structure is modulated by higher-level linguistic analysis. Using fMRI, we measured BOLD signal changes to 4-s long speech quilts with variable temporal structure (30, 120, 480, 960 ms segment lengths), as well as natural speech, created from a familiar (English) or foreign (Korean) language. We found evidence for the acoustic analysis of temporal speech properties in superior temporal sulcus (STS): the BOLD signal increased as a function of temporal speech structure in both familiar and foreign languages. However, activity in left inferior gyrus (IFG) revealed evidence for linguistic processing of temporal speech properties: the BOLD signal increased as a function of temporal speech structure only in familiar, but not in foreign speech. Network analyses suggested that left IFG modulates processing of speech-specific temporal structure in primary auditory cortex, which in turn sensitizes processing of speech-specific temporal structure in STS. The results thus reveal a network for acousto-linguistic transformation consisting of primary and non-primary auditory cortex, STS, and left IFG.Significance StatementWhere and how the acoustic information contained in complex speech signals is mapped to linguistic information is still not fully explained by current speech/language models. We dissociate acoustic from linguistic analyses of speech by comparing the same acoustic manipulation (varying the extent of temporal speech structure) in two languages (native, foreign). We show that acoustic temporal speech structure is analyzed in superior temporal sulcus (STS), while linguistic information is extracted in left inferior frontal gyrus (IFG). Furthermore, modulation from left IFG enhances sensitivity to temporal speech structure in STS. We propose a model for acousto-linguistic transformation of speech-specific temporal structure in the human brain that can account for these results.


2020 ◽  
Author(s):  
Jonathan E Peelle ◽  
Brent Spehar ◽  
Michael S Jones ◽  
Sarah McConkey ◽  
Joel Myerson ◽  
...  

In everyday conversation, we usually process the talker's face as well as the sound of their voice. Access to visual speech information is particularly useful when the auditory signal is degraded. Here we used fMRI to monitor brain activity while adults (n = 60) were presented with visual-only, auditory-only, and audiovisual words. As expected, audiovisual speech perception recruited both auditory and visual cortex, with a trend towards increased recruitment of premotor cortex in more difficult conditions (for example, in substantial background noise). We then investigated neural connectivity using psychophysiological interaction (PPI) analysis with seed regions in both primary auditory cortex and primary visual cortex. Connectivity between auditory and visual cortices was stronger in audiovisual conditions than in unimodal conditions, including a wide network of regions in posterior temporal cortex and prefrontal cortex. Taken together, our results suggest a prominent role for cross-region synchronization in understanding both visual-only and audiovisual speech.


1991 ◽  
Vol 3 (1) ◽  
pp. 9-24 ◽  
Author(s):  
M. H. Harries ◽  
D. I. Perrett

Physiological recordings along the length of the upper bank of the superior temporal sulcus (STS) revealed cells each of which was selectively responsive to a particular view of the head and body. Such cells were grouped in large patches 3-4 mm across. The patches were separated by regions of cortex containing cells responsive to other stimuli. The distribution of cells projecting from temporal cortex to the posterior regions of the inferior parietal lobe was studied with retrogradely transported fluorescent dyes. A strong temporoparietal projection was found originating from the upper bank of the STS. Cells projecting to the parietal cortex occurred in large patches or bands. The size and periodicity of modules defined through anatomical connections matched the functional subdivisions of the STS cortex involved in face processing evident in physiological recordings. It is speculated that the temporoparietal projections could provide a route through which temporal lobe analysis of facial signals about the direction of others' attention can be passed to parietal systems concerned with spatial awareness.


1993 ◽  
Vol 10 (1) ◽  
pp. 59-72 ◽  
Author(s):  
Joan S. Baizer ◽  
Robert Desimone ◽  
Leslie G. Ungerleider

AbstractTo investigate the subcortical connections of the object vision and spatial vision cortical processing pathways, we injected the inferior temporal and posterior parietal cortex of six Rhesus monkeys with retrograde or anterograde tracers. The temporal injections included area TE on the lateral surface of the hemisphere and adjacent portions of area TEO. The parietal injections covered the posterior bank of the intraparietal sulcus, including areas VIP and LIP. Our results indicate that several structures project to both the temporal and parietal cortex, including the medial and lateral pulvinar, claustrum, and nucleus basalis. However, the cells in both the pulvinar and claustrum that project to the two systems are mainly located in different parts of those structures, as are the terminals which arise from the temporal and parietal cortex. Likewise, the projections from the temporal and parietal cortex to the caudate nucleus and putamen are largely segregated. Finally, we found projections to the pons and superior colliculus from parietal but not temporal cortex, whereas we found the lateral basal and medial basal nuclei of the amygdala to be reciprocally connected with temporal but not parietal cortex. Thus, the results show that, like the cortical connections of the two visual processing systems, the subcortical connections are remarkably segregated.


2003 ◽  
Vol 15 (7) ◽  
pp. 1002-1018 ◽  
Author(s):  
Jeffrey M. Zacks ◽  
Jean M. Vettel ◽  
Pascale Michelon

Human spatial reasoning may depend in part on two dissociable types of mental image transformations: objectbased transformations, in which an object is imagined to move in space relative to the viewer and the environment, and perspective transformations, in which the viewer imagines the scene from a different vantage point. This study measured local brain activity with event-related fMRI while participants were instructed to either imagine an array of objects rotating (an object-based transformation) or imagine themselves rotating around the array (a perspective transformation). Object-based transformations led to selective increases in right parietal cortex and decreases in left parietal cortex, whereas perspective transformations led to selective increases in left temporal cortex. These results argue against the view that mental image transformations are performed by a unitary neural processing system, and they suggest that different overlapping systems are engaged for different image transformations.


2013 ◽  
Vol 34 (3) ◽  
pp. 313-323 ◽  
Author(s):  
Caili Ji ◽  
John J. Galvin ◽  
Anting Xu ◽  
Qian-Jie Fu

2019 ◽  
Vol 62 (9) ◽  
pp. 3290-3301
Author(s):  
Jingjing Guan ◽  
Chang Liu

Purpose Degraded speech intelligibility in background noise is a common complaint of listeners with hearing loss. The purpose of the current study is to explore whether 2nd formant (F2) enhancement improves speech perception in noise for older listeners with hearing impairment (HI) and normal hearing (NH). Method Target words (e.g., color and digit) were selected and presented based on the paradigm of the coordinate response measure corpus. Speech recognition thresholds with original and F2-enhanced speech in 2- and 6-talker babble were examined for older listeners with NH and HI. Results The thresholds for both the NH and HI groups improved for enhanced speech signals primarily in 2-talker babble, but not in 6-talker babble. The F2 enhancement benefits did not correlate significantly with listeners' age and their average hearing thresholds in most listening conditions. However, speech intelligibility index values increased significantly with F2 enhancement in babble for listeners with HI, but not for NH listeners. Conclusions Speech sounds with F2 enhancement may improve listeners' speech perception in 2-talker babble, possibly due to a greater amount of speech information available in temporally modulated noise or a better capacity to separate speech signals from background babble.


Sign in / Sign up

Export Citation Format

Share Document