Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution

AbstractExperimentalists studying multisensory integration compare neural responses to multisensory stimuli with responses to the component modalities presented in isolation. This procedure is problematic for multisensory speech perception since audiovisual speech and auditory-only speech are easily intelligible but visual-only speech is not. To overcome this confound, we developed intracranial encephalography (iEEG) deconvolution. Individual stimuli always contained both auditory and visual speech but jittering the onset asynchrony between modalities allowed for the time course of the unisensory responses and the interaction between them to be independently estimated. We applied this procedure to electrodes implanted in human epilepsy patients (both male and female) over the posterior superior temporal gyrus (pSTG), a brain area known to be important for speech perception. iEEG deconvolution revealed sustained, positive responses to visual-only speech and larger, phasic responses to auditory-only speech. Confirming results from scalp EEG, responses to audiovisual speech were weaker than responses to auditory- only speech, demonstrating a subadditive multisensory neural computation. Leveraging the spatial resolution of iEEG, we extended these results to show that subadditivity is most pronounced in more posterior aspects of the pSTG. Across electrodes, subadditivity correlated with visual responsiveness, supporting a model in visual speech enhances the efficiency of auditory speech processing in pSTG. The ability to separate neural processes may make iEEG deconvolution useful for studying a variety of complex cognitive and perceptual tasks.Significance statementUnderstanding speech is one of the most important human abilities. Speech perception uses information from both the auditory and visual modalities. It has been difficult to study neural responses to visual speech because visual-only speech is difficult or impossible to comprehend, unlike auditory-only and audiovisual speech. We used intracranial encephalography (iEEG) deconvolution to overcome this obstacle. We found that visual speech evokes a positive response in the human posterior superior temporal gyrus, enhancing the efficiency of auditory speech processing.

Download Full-text

Converging Evidence from Electrocorticography and BOLD fMRI for a Sharp Functional Boundary in Superior Temporal Gyrus Related to Multisensory Speech Processing

10.1101/272823 ◽

2018 ◽

Author(s):

Muge Ozker ◽

Michael S. Beauchamp

Keyword(s):

Speech Perception ◽

Speech Processing ◽

Recognition Task ◽

Superior Temporal Gyrus ◽

Visual Speech ◽

Audiovisual Speech ◽

Auditory Modality ◽

Bold Fmri ◽

Auditory Speech ◽

Functional Boundary

AbstractAlthough humans can understand speech using the auditory modality alone, in noisy environments visual speech information from the talker’s mouth can rescue otherwise unintelligible auditory speech. To investigate the neural substrates of multisensory speech perception, we recorded neural activity from the human superior temporal gyrus using two very different techniques: either directly, using surface electrodes implanted in five participants with epilepsy (electrocorticography, ECOG), or indirectly, using blood oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) in six healthy control fMRI participants. Both ECOG and fMRI participants viewed the same clear and noisy audiovisual speech stimuli and performed the same speech recognition task. Both techniques demonstrated a sharp functional boundary in the STG, which corresponded to an anatomical boundary defined by the posterior edge of Heschl’s gyrus. On the anterior side of the boundary, cortex responded more strongly to clear audiovisual speech than to noisy audiovisual speech, suggesting that anterior STG is primarily involved in processing unisensory auditory speech. On the posterior side of the boundary, cortex preferred noisy audiovisual speech or showed no preference and showed robust responses to auditory-only and visual-only speech, suggesting that posterior STG is specialized for processing multisensory audiovisual speech. For both ECOG and fMRI, the transition between the functionally distinct regions happened within 10 mm of anterior-to-posterior distance along the STG. We relate this boundary to the multisensory neural code underlying speech perception and propose that it represents an important functional division within the human speech perception network.

Download Full-text

A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_01110 ◽

2017 ◽

Vol 29 (6) ◽

pp. 1044-1060 ◽

Cited By ~ 16

Author(s):

Muge Ozker ◽

Inga M. Schepers ◽

John F. Magnotti ◽

Daniel Yoshor ◽

Michael S. Beauchamp

Keyword(s):

Speech Perception ◽

Multisensory Integration ◽

Neural Activity ◽

Superior Temporal Gyrus ◽

Visual Speech ◽

Audiovisual Speech ◽

Auditory Information ◽

Neural Responses ◽

Double Dissociation ◽

Auditory Component

Human speech can be comprehended using only auditory information from the talker's voice. However, comprehension is improved if the talker's face is visible, especially if the auditory information is degraded as occurs in noisy environments or with hearing loss. We explored the neural substrates of audiovisual speech perception using electrocorticography, direct recording of neural activity using electrodes implanted on the cortical surface. We observed a double dissociation in the responses to audiovisual speech with clear and noisy auditory component within the superior temporal gyrus (STG), a region long known to be important for speech perception. Anterior STG showed greater neural activity to audiovisual speech with clear auditory component, whereas posterior STG showed similar or greater neural activity to audiovisual speech in which the speech was replaced with speech-like noise. A distinct border between the two response patterns was observed, demarcated by a landmark corresponding to the posterior margin of Heschl's gyrus. To further investigate the computational roles of both regions, we considered Bayesian models of multisensory integration, which predict that combining the independent sources of information available from different modalities should reduce variability in the neural responses. We tested this prediction by measuring the variability of the neural responses to single audiovisual words. Posterior STG showed smaller variability than anterior STG during presentation of audiovisual speech with noisy auditory component. Taken together, these results suggest that posterior STG but not anterior STG is important for multisensory integration of noisy auditory and visual speech.

Download Full-text

Children perceive speech onsets by ear and eye

Journal of Child Language ◽

10.1017/s030500091500077x ◽

2016 ◽

Vol 44 (1) ◽

pp. 185-215 ◽

Cited By ~ 8

Author(s):

SUSAN JERGER ◽

MARKUS F. DAMIAN ◽

NANCY TYE-MURRAY ◽

HERVÉ ABDI

Keyword(s):

Speech Perception ◽

Growth Period ◽

Visual Speech ◽

Cognitive Resources ◽

Audiovisual Speech ◽

Auditory Input ◽

Phonological Priming ◽

Complex Tasks ◽

Auditory Speech ◽

Developmental Theories

AbstractAdults use vision to perceive low-fidelity speech; yet how children acquire this ability is not well understood. The literature indicates that children show reduced sensitivity to visual speech from kindergarten to adolescence. We hypothesized that this pattern reflects the effects of complex tasks and a growth period with harder-to-utilize cognitive resources, not lack of sensitivity. We investigated sensitivity to visual speech in children via the phonological priming produced by low-fidelity (non-intact onset) auditory speech presented audiovisually (see dynamic face articulate consonant/rhyme b/ag; hear non-intact onset/rhyme: –b/ag) vs. auditorily (see still face; hear exactly same auditory input). Audiovisual speech produced greater priming from four to fourteen years, indicating that visual speech filled in the non-intact auditory onsets. The influence of visual speech depended uniquely on phonology and speechreading. Children – like adults – perceive speech onsets multimodally. Findings are critical for incorporating visual speech into developmental theories of speech perception.

Download Full-text

Reading Fluent Speech from Talking Faces: Typical Brain Networks and Individual Differences

Journal of Cognitive Neuroscience ◽

10.1162/0898929054021175 ◽

2005 ◽

Vol 17 (6) ◽

pp. 939-953 ◽

Cited By ~ 71

Author(s):

Deborah A. Hall ◽

Clayton Fussell ◽

A. Quentin Summerfield

Keyword(s):

Individual Differences ◽

Speech Processing ◽

Superior Temporal Gyrus ◽

Brain Regions ◽

Receptive Language ◽

Visual Input ◽

Visual Speech ◽

Middle Temporal ◽

Blank Screen ◽

Auditory Speech

Listeners are able to extract important linguistic information by viewing the talker's face—a process known as “speechreading.” Previous studies of speechreading present small closed sets of simple words and their results indicate that visual speech processing engages a wide network of brain regions in the temporal, frontal, and parietal lobes that are likely to underlie multiple stages of the receptive language system. The present study further explored this network in a large group of subjects by presenting naturally spoken sentences which tap the richer complexities of visual speech processing. Four different baselines (blank screen, static face, nonlinguistic facial gurning, and auditory speech) enabled us to determine the hierarchy of neural processing involved in speechreading and to test the claim that visual input reliably accesses sound-based representations in the auditory cortex. In contrast to passively viewing a blank screen, the static-face condition evoked activation bilaterally across the border of the fusiform gyrus and cerebellum, and in the medial superior frontal gyrus and left precentral gyrus (p < .05, whole brain corrected). With the static face as baseline, the gurning face evoked bilateral activation in the motion-sensitive region of the occipital cortex, whereas visual speech additionally engaged the middle temporal gyrus, inferior and middle frontal gyri, and the inferior parietal lobe, particularly in the left hemisphere. These latter regions are implicated in lexical stages of spoken language processing. Although auditory speech generated extensive bilateral activation across both superior and middle temporal gyri, the group-averaged pattern of speechreading activation failed to include any auditory regions along the superior temporal gyrus, suggesting that fluent visual speech does not always involve sound-based coding of the visual input. An important finding from the individual subject analyses was that activation in the superior temporal gyrus did reach significance (p < .001, small-volume corrected) for a subset of the group. Moreover, the extent of the left-sided superior temporal gyrus activity was strongly correlated with speech-reading performance. Skilled speechreading was also associated with activations and deactivations in other brain regions, suggesting that individual differences reflect the efficiency of a circuit linking sensory, perceptual, memory, cognitive, and linguistic processes rather than the operation of a single component process.

Download Full-text

Speech Perception as a Multimodal Phenomenon

Current Directions in Psychological Science ◽

10.1111/j.1467-8721.2008.00615.x ◽

2008 ◽

Vol 17 (6) ◽

pp. 405-409 ◽

Cited By ~ 79

Author(s):

Lawrence D. Rosenblum

Keyword(s):

Speech Perception ◽

Semantic Context ◽

Visual Speech ◽

Audiovisual Speech ◽

Lexical Status ◽

Auditory Speech ◽

Lip Reading ◽

Imaging Research ◽

Speech Information ◽

The Brain

Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal speech information could explain the reported automaticity, immediacy, and completeness of audiovisual speech integration. However, recent findings suggest that speech integration can be influenced by higher cognitive properties such as lexical status and semantic context. Proponents of amodal accounts will need to explain these results.

Download Full-text

The visual speech head start improves perception and reduces superior temporal cortex responses to auditory speech

eLife ◽

10.7554/elife.48116 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 8

Author(s):

Patrick J Karas ◽

John F Magnotti ◽

Brian A Metzger ◽

Lin L Zhu ◽

Kristen B Smith ◽

...

Keyword(s):

Head Start ◽

Visual Information ◽

Temporal Cortex ◽

Superior Temporal Gyrus ◽

Visual Speech ◽

Association Cortex ◽

Auditory Information ◽

Neural Responses ◽

Auditory Speech ◽

Auditory Association Cortex

Visual information about speech content from the talker’s mouth is often available before auditory information from the talker's voice. Here we examined perceptual and neural responses to words with and without this visual head start. For both types of words, perception was enhanced by viewing the talker's face, but the enhancement was significantly greater for words with a head start. Neural responses were measured from electrodes implanted over auditory association cortex in the posterior superior temporal gyrus (pSTG) of epileptic patients. The presence of visual speech suppressed responses to auditory speech, more so for words with a visual head start. We suggest that the head start inhibits representations of incompatible auditory phonemes, increasing perceptual accuracy and decreasing total neural responses. Together with previous work showing visual cortex modulation (Ozker et al., 2018b) these results from pSTG demonstrate that multisensory interactions are a powerful modulator of activity throughout the speech perception network.

Download Full-text

Schizotypal traits are not related to multisensory integration or audiovisual speech perception

10.31234/osf.io/9vqtf ◽

2020 ◽

Author(s):

Anne-Marie Muller ◽

Tyler C. Dalal ◽

Ryan A Stevenson

Keyword(s):

Speech Perception ◽

Multisensory Integration ◽

Temporal Processing ◽

Visual Speech ◽

Audiovisual Speech ◽

Audiovisual Speech Perception ◽

Speech In Noise ◽

Auditory Speech ◽

Synchrony Judgment ◽

Schizophrenia Spectrum

Multisensory integration, the process by which sensory information from different sensory modalities are bound together, is hypothesized to contribute to perceptual symptomatology in schizophrenia, including hallucinations and aberrant speech perception. Differences in multisensory integration and temporal processing, an important component of multisensory integration, have been consistently found among individuals with schizophrenia. Evidence is emerging that these differences extend across the schizophrenia spectrum, including individuals in the general population with higher levels of schizotypal traits. In the current study, we measured (1) multisensory integration using an audiovisual speech-in-noise task, and the McGurk task. Using the speech-in-noise task, we assessed (2) susceptibility to distracting auditory speech to test the hypothesis that increased perception of distracting speech that is subsequently bound with mismatching visual speech contributes to hallucination-like experiences. As a measure of (3) temporal processing, we used the ternary synchrony judgment task. We measured schizotypal traits using the Schizotypal Personality Questionnaire (SPQ), hypothesizing that higher levels of schizotypal traits, specifically Unusual Perceptual Experiences and Odd Speech subscales, would be associated with (1) decreased multisensory integration, (2) increased susceptibility to distracting auditory speech, and (3) less precise temporal processing. Surprisingly, neither subscales were associated with any of the measures. These results suggest that these perceptual differences may not be present across the schizophrenia spectrum.

Download Full-text

Increased connectivity among sensory and motor regions during visual and audiovisual speech perception

10.1101/2020.12.15.422726 ◽

2020 ◽

Author(s):

Jonathan E Peelle ◽

Brent Spehar ◽

Michael S Jones ◽

Sarah McConkey ◽

Joel Myerson ◽

...

Keyword(s):

Visual Cortex ◽

Speech Perception ◽

Brain Activity ◽

Premotor Cortex ◽

Temporal Cortex ◽

Primary Auditory Cortex ◽

Visual Speech ◽

Auditory Signal ◽

Audiovisual Speech ◽

Audiovisual Speech Perception

In everyday conversation, we usually process the talker's face as well as the sound of their voice. Access to visual speech information is particularly useful when the auditory signal is degraded. Here we used fMRI to monitor brain activity while adults (n = 60) were presented with visual-only, auditory-only, and audiovisual words. As expected, audiovisual speech perception recruited both auditory and visual cortex, with a trend towards increased recruitment of premotor cortex in more difficult conditions (for example, in substantial background noise). We then investigated neural connectivity using psychophysiological interaction (PPI) analysis with seed regions in both primary auditory cortex and primary visual cortex. Connectivity between auditory and visual cortices was stronger in audiovisual conditions than in unimodal conditions, including a wide network of regions in posterior temporal cortex and prefrontal cortex. Taken together, our results suggest a prominent role for cross-region synchronization in understanding both visual-only and audiovisual speech.

Download Full-text

A speech envelope landmark for syllable encoding in human superior temporal gyrus

10.1101/388280 ◽

2018 ◽

Cited By ~ 6

Author(s):

Yulia Oganian ◽

Edward F. Chang

Keyword(s):

Speech Processing ◽

Acoustic Analysis ◽

Brain Area ◽

Superior Temporal Gyrus ◽

Rate Of Change ◽

Local Maxima ◽

Speech Stimuli ◽

Neural Computations ◽

Absolute Amplitude ◽

Speech Envelope

AbstractListeners use the slow amplitude modulations of speech, known as the envelope, to segment continuous speech into syllables. However, the underlying neural computations are heavily debated. We used high-density intracranial cortical recordings while participants listened to natural and synthesized control speech stimuli to determine how the envelope is represented in the human superior temporal gyrus (STG), a critical auditory brain area for speech processing. We found that the STG does not encode the instantaneous, moment-by-moment amplitude envelope of speech. Rather, a zone of the middle STG detects discrete acoustic onset edges, defined by local maxima in the rate-of-change of the envelope. Acoustic analysis demonstrated that acoustic onset edges reliably cue the information-rich transition between the consonant-onset and vowel-nucleus of syllables. Furthermore, the steepness of the acoustic edge cued whether a syllable was stressed. Synthesized amplitude-modulated tone stimuli showed that steeper edges elicited monotonically greater cortical responses, confirming the encoding of relative but not absolute amplitude. Overall, encoding of the timing and magnitude of acoustic onset edges in STG underlies our perception of the syllabic rhythm of speech.

Download Full-text

Lip movements entrain the observers’ low-frequency brain oscillations to facilitate speech intelligibility

eLife ◽

10.7554/elife.14521 ◽

2016 ◽

Vol 5 ◽

Cited By ~ 65

Author(s):

Hyojin Park ◽

Christoph Kayser ◽

Gregor Thut ◽

Joachim Gross

Keyword(s):

Visual Cortex ◽

Speech Processing ◽

Speech Intelligibility ◽

Brain Activity ◽

Low Frequency ◽

Visual Speech ◽

Visual Signals ◽

Partial Coherence ◽

Auditory Speech ◽

Oscillatory Brain Activity

During continuous speech, lip movements provide visual temporal signals that facilitate speech processing. Here, using MEG we directly investigated how these visual signals interact with rhythmic brain activity in participants listening to and seeing the speaker. First, we investigated coherence between oscillatory brain activity and speaker’s lip movements and demonstrated significant entrainment in visual cortex. We then used partial coherence to remove contributions of the coherent auditory speech signal from the lip-brain coherence. Comparing this synchronization between different attention conditions revealed that attending visual speech enhances the coherence between activity in visual cortex and the speaker’s lips. Further, we identified a significant partial coherence between left motor cortex and lip movements and this partial coherence directly predicted comprehension accuracy. Our results emphasize the importance of visually entrained and attention-modulated rhythmic brain activity for the enhancement of audiovisual speech processing.

Download Full-text