Listeners modulate temporally selective attention during natural speech processing

ABSTRACTThe human auditory system is adept at extracting information from speech in both single-speaker and multi-speaker situations. This involves neural processing at the rapid temporal scales seen in natural speech. Non-invasive brain imaging (electro-/magnetoencephalography [EEG/MEG]) signatures of such processing have shown that the phase of neural activity below 16 Hz tracks the dynamics of speech, whereas invasive brain imaging (electrocorticography [ECoG]) has shown that such rapid processing is even more strongly reflected in the power of neural activity at high frequencies (around 70-150 Hz; known as high gamma). The aim of this study was to determine if high gamma power in scalp recorded EEG carries useful stimulus-related information, despite its reputation for having a poor signal to noise ratio. Furthermore, we aimed to assess whether any such information might be complementary to that reflected in well-established low frequency EEG indices of speech processing. We used linear regression to investigate speech envelope and attention decoding in EEG at low frequencies, in high gamma power, and in both signals combined. While low frequency speech tracking was evident for almost all subjects as expected, high gamma power also showed robust speech tracking in a minority of subjects. This same pattern was true for attention decoding using a separate group of subjects who undertook a cocktail party attention experiment. For the subjects who showed speech tracking in high gamma power, the spatiotemporal characteristics of that high gamma tracking differed from that of low-frequency EEG. Furthermore, combining the two neural measures led to improved measures of speech tracking for several subjects. Overall, this indicates that high gamma power EEG can carry useful information regarding speech processing and attentional selection in some subjects and combining it with low frequency EEG can improve the mapping between natural speech and the resulting neural responses.

Download Full-text

Dimension-selective attention as a possible driver of dynamic, context-dependent re-weighting in speech processing

Hearing Research ◽

10.1016/j.heares.2018.06.014 ◽

2018 ◽

Vol 366 ◽

pp. 50-64 ◽

Cited By ~ 7

Author(s):

Lori L. Holt ◽

Adam T. Tierney ◽

Giada Guerra ◽

Aeron Laffere ◽

Frederic Dick

Keyword(s):

Selective Attention ◽

Speech Processing ◽

Context Dependent ◽

Dynamic Context

Download Full-text

Animated virtual characters to explore audio-visual speech in controlled and naturalistic environments

Scientific Reports ◽

10.1038/s41598-020-72375-y ◽

2020 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Raphaël Thézé ◽

Mehdi Ali Gadiri ◽

Louis Albert ◽

Antoine Provost ◽

Anne-Lise Giraud ◽

...

Keyword(s):

Speech Processing ◽

Background Noise ◽

Mcgurk Effect ◽

Visual Speech ◽

Natural Speech ◽

Virtual Characters ◽

Speech Stimuli ◽

Stimulus Timing ◽

Phonetic Features ◽

Set Up

Abstract Natural speech is processed in the brain as a mixture of auditory and visual features. An example of the importance of visual speech is the McGurk effect and related perceptual illusions that result from mismatching auditory and visual syllables. Although the McGurk effect has widely been applied to the exploration of audio-visual speech processing, it relies on isolated syllables, which severely limits the conclusions that can be drawn from the paradigm. In addition, the extreme variability and the quality of the stimuli usually employed prevents comparability across studies. To overcome these limitations, we present an innovative methodology using 3D virtual characters with realistic lip movements synchronized on computer-synthesized speech. We used commercially accessible and affordable tools to facilitate reproducibility and comparability, and the set-up was validated on 24 participants performing a perception task. Within complete and meaningful French sentences, we paired a labiodental fricative viseme (i.e. /v/) with a bilabial occlusive phoneme (i.e. /b/). This audiovisual mismatch is known to induce the illusion of hearing /v/ in a proportion of trials. We tested the rate of the illusion while varying the magnitude of background noise and audiovisual lag. Overall, the effect was observed in 40% of trials. The proportion rose to about 50% with added background noise and up to 66% when controlling for phonetic features. Our results conclusively demonstrate that computer-generated speech stimuli are judicious, and that they can supplement natural speech with higher control over stimulus timing and content.

Download Full-text

Stuttering and Natural Speech Processing of Semantic and Syntactic Constraints on Verbs

Journal of Speech Language and Hearing Research ◽

10.1044/1092-4388(2008/07-0164) ◽

2008 ◽

Vol 51 (5) ◽

pp. 1058-1071 ◽

Cited By ~ 22

Author(s):

Christine Weber-Fox ◽

Amanda Hampton

Keyword(s):

Speech Processing ◽

Natural Speech ◽

Syntactic Constraints

Download Full-text

Attention to speech: Mapping distributed and selective attention systems

10.1101/2021.02.13.431098 ◽

2021 ◽

Author(s):

Galit Agmon ◽

Paz Har-Shai Yahav ◽

Michal Ben-Shachar ◽

Elana Zion Golumbic

Keyword(s):

Selective Attention ◽

Language Processing ◽

Neural Activity ◽

Speech Processing ◽

Auditory Processing ◽

Brain Regions ◽

Association Cortex ◽

Distributed Attention ◽

Listening Strategies ◽

Auditory Association Cortex

AbstractDaily life is full of situations where many people converse at the same time. Under these noisy circumstances, individuals can employ different listening strategies to deal with the abundance of sounds around them. In this fMRI study we investigated how applying two different listening strategies – Selective vs. Distributed attention – affects the pattern of neural activity. Specifically, in a simulated ‘cocktail party’ paradigm, we compared brain activation patterns when listeners attend selectively to only one speaker and ignore all others, versus when they distribute their attention and attempt to follow two or four speakers at the same time. Results indicate that the two attention types activate a highly overlapping, bilateral fronto-temporal-parietal network of functionally connected regions. This network includes auditory association cortex (bilateral STG/STS) and higher-level regions related to speech processing and attention (bilateral IFG/insula, right MFG, left IPS). Within this network, responses in specific areas were modulated by the type of attention required. Specifically, auditory and speech-processing regions exhibited higher activity during Distributed attention, whereas fronto-parietal regions were activated more strongly during Selective attention. This pattern suggests that a common perceptual-attentional network is engaged when dealing with competing speech-inputs, regardless of the specific task at hand. At the same time, local activity within nodes of this network varies when implementing different listening strategies, reflecting the different cognitive demands they impose. These results nicely demonstrate the system’s flexibility to adapt its internal computations to accommodate different task requirements and listener goals.Significance StatementHearing many people talk simultaneously poses substantial challenges for the human perceptual and cognitive systems. We compared neural activity when listeners applied two different listening strategy to deal with these competing inputs: attending selectively to one speaker vs. distributing attention among all speakers. A network of functionally connected brain regions, involved in auditory processing, language processing and attentional control was activated when applying both attention types. However, activity within this network was modulated by the type of attention required and the number of competing speakers. These results suggest a common ‘attention to speech’ network, providing the computational infrastructure to deal effectively with multi-speaker input, but with sufficient flexibility to implement different prioritization strategies and to adapt to different listener goals.

Download Full-text

Temporally selective attention supports speech processing in 3- to 5-year-old children

Developmental Cognitive Neuroscience ◽

10.1016/j.dcn.2011.03.002 ◽

2012 ◽

Vol 2 (1) ◽

pp. 120-128 ◽

Cited By ~ 15

Author(s):

Lori B. Astheimer ◽

Lisa D. Sanders

Keyword(s):

Selective Attention ◽

Speech Processing

Download Full-text

The effects of speech processing units on auditory stream segregation and selective attention in a multi-talker (cocktail party) situation

Cortex ◽

10.1016/j.cortex.2020.06.007 ◽

2020 ◽

Vol 130 ◽

pp. 387-400 ◽

Cited By ~ 1

Author(s):

Brigitta Tóth ◽

Ferenc Honbolygó ◽

Orsolya Szalárdy ◽

Gábor Orosz ◽

Dávid Farkas ◽

...

Keyword(s):

Selective Attention ◽

Speech Processing ◽

Stream Segregation ◽

Cocktail Party ◽

Auditory Stream ◽

Auditory Stream Segregation

Download Full-text

Individual Differences in the Use of Acoustic-Phonetic Versus Lexical Cues for Speech Perception

Frontiers in Communication ◽

10.3389/fcomm.2021.691225 ◽

2021 ◽

Vol 6 ◽

Author(s):

Nikole Giovannone ◽

Rachel M. Theodore

Keyword(s):

Speech Perception ◽

Speech Processing ◽

Receptive Language ◽

Language Ability ◽

Natural Speech ◽

Stimulus Input ◽

Lexical Information ◽

Speech Input ◽

High Conflict ◽

Lexical Cues

Previous research suggests that individuals with weaker receptive language show increased reliance on lexical information for speech perception relative to individuals with stronger receptive language, which may reflect a difference in how acoustic-phonetic and lexical cues are weighted for speech processing. Here we examined whether this relationship is the consequence of conflict between acoustic-phonetic and lexical cues in speech input, which has been found to mediate lexical reliance in sentential contexts. Two groups of participants completed standardized measures of language ability and a phonetic identification task to assess lexical recruitment (i.e., a Ganong task). In the high conflict group, the stimulus input distribution removed natural correlations between acoustic-phonetic and lexical cues, thus placing the two cues in high competition with each other; in the low conflict group, these correlations were present and thus competition was reduced as in natural speech. The results showed that 1) the Ganong effect was larger in the low compared to the high conflict condition in single-word contexts, suggesting that cue conflict dynamically influences online speech perception, 2) the Ganong effect was larger for those with weaker compared to stronger receptive language, and 3) the relationship between the Ganong effect and receptive language was not mediated by the degree to which acoustic-phonetic and lexical cues conflicted in the input. These results suggest that listeners with weaker language ability down-weight acoustic-phonetic cues and rely more heavily on lexical knowledge, even when stimulus input distributions reflect characteristics of natural speech input.

Download Full-text