scholarly journals Cortical Tracking of Surprisal during Continuous Speech Comprehension

2020 ◽  
Vol 32 (1) ◽  
pp. 155-166 ◽  
Author(s):  
Hugo Weissbart ◽  
Katerina D. Kandylaki ◽  
Tobias Reichenbach

Speech comprehension requires rapid online processing of a continuous acoustic signal to extract structure and meaning. Previous studies on sentence comprehension have found neural correlates of the predictability of a word given its context, as well as of the precision of such a prediction. However, they have focused on single sentences and on particular words in those sentences. Moreover, they compared neural responses to words with low and high predictability, as well as with low and high precision. However, in speech comprehension, a listener hears many successive words whose predictability and precision vary over a large range. Here, we show that cortical activity in different frequency bands tracks word surprisal in continuous natural speech and that this tracking is modulated by precision. We obtain these results through quantifying surprisal and precision from naturalistic speech using a deep neural network and through relating these speech features to EEG responses of human volunteers acquired during auditory story comprehension. We find significant cortical tracking of surprisal at low frequencies, including the delta band as well as in the higher frequency beta and gamma bands, and observe that the tracking is modulated by the precision. Our results pave the way to further investigate the neurobiology of natural speech comprehension.

2019 ◽  
Author(s):  
Jiawei Li ◽  
Bo Hong ◽  
Guido Nolte ◽  
Andreas K. Engel ◽  
Dan Zhang

AbstractWhile human speech comprehension is thought to be an active process that involves top-down predictions, it remains unclear how predictive information is used to prepare for the processing of upcoming speech information. We aimed to identify the neural signatures of the preparatory processing of upcoming speech. Participants selectively attended to one of two competing naturalistic, narrative speech streams, and a temporal response function (TRF) method was applied to derive event-related-like neural responses from electroencephalographic data. The phase responses to the attended speech at the delta band (1–4 Hz) were correlated with the comprehension performance of individual participants, with a latency of -200–0 ms before onset over the fronto-central and left-lateralized parietal regions. The phase responses to the attended speech at the alpha band also correlated with comprehension performance, but with a latency of 650–980 ms post-onset over fronto-central regions. Distinct neural signatures were found for the attentional modulation, taking the form of TRF-based amplitude responses at a latency of 240–320 ms post-onset over the left-lateralized fronto-central and occipital regions. Our findings reveal how the brain gets prepared to process an upcoming speech in a continuous, naturalistic speech context.


2020 ◽  
Author(s):  
Cheng Luo ◽  
Nai Ding

AbstractSpeech contains rich acoustic and linguistic information. During speech comprehension, cortical activity tracks the acoustic envelope of speech. Recent studies also observe cortical tracking of higher-level linguistic units, such as words and phrases, using synthesized speech deprived of delta-band acoustic envelope. It remains unclear, however, how cortical activity jointly encodes the acoustic and linguistic information in natural speech. Here, we investigate the neural encoding of words and demonstrate that delta-band cortical activity tracks the rhythm of multi-syllabic words when naturally listening to narratives. Furthermore, by dissociating the word rhythm from acoustic envelope, we find cortical activity primarily tracks the word rhythm during speech comprehension. When listeners’ attention is diverted, however, neural tracking of words diminishes, and delta-band activity becomes phase locked to the acoustic envelope. These results suggest that large-scale cortical dynamics in the delta band are primarily coupled to the rhythm of linguistic units during natural speech comprehension.


2018 ◽  
Vol 4 (1) ◽  
Author(s):  
Jona Sassenhagen ◽  
Ryan Blything ◽  
Elena V. M. Lieven ◽  
Ben Ambridge

How are verb-argument structure preferences acquired? Children typically receive very little negative evidence, raising the question of how they come to understand the restrictions on grammatical constructions. Statistical learning theories propose stochastic patterns in the input contain sufficient clues. For example, if a verb is very common, but never observed in transitive constructions, this would indicate that transitive usage of that verb is illegal. Ambridge et al. (2008) have shown that in offline grammaticality judgements of intransitive verbs used in transitive constructions, low-frequency verbs elicit higher acceptability ratings than high-frequency verbs, as predicted if relative frequency is a cue during statistical learning. Here, we investigate if the same pattern also emerges in on-line processing of English sentences. EEG was recorded while healthy adults listened to sentences featuring transitive uses of semantically matched verb pairs of differing frequencies. We replicate the finding of higher acceptabilities of transitive uses of low- vs. high-frequency intransitive verbs. Event-Related Potentials indicate a similar result: early electrophysiological signals distinguish between misuse of high- vs low-frequency verbs. This indicates online processing shows a similar sensitivity to frequency as off-line judgements, consistent with a parser that reflects an original acquisition of grammatical constructions via statistical cues. However, the nature of the observed neural responses was not of the expected, or an easily interpretable, form, motivating further work into neural correlates of online processing of syntactic constructions.


2017 ◽  
Vol 29 (7) ◽  
pp. 1119-1131 ◽  
Author(s):  
Katerina D. Kandylaki ◽  
Karen Henrich ◽  
Arne Nagels ◽  
Tilo Kircher ◽  
Ulrike Domahs ◽  
...  

While listening to continuous speech, humans process beat information to correctly identify word boundaries. The beats of language are stress patterns that are created by combining lexical (word-specific) stress patterns and the rhythm of a specific language. Sometimes, the lexical stress pattern needs to be altered to obey the rhythm of the language. This study investigated the interplay of lexical stress patterns and rhythmical well-formedness in natural speech with fMRI. Previous electrophysiological studies on cases in which a regular lexical stress pattern may be altered to obtain rhythmical well-formedness showed that even subtle rhythmic deviations are detected by the brain if attention is directed toward prosody. Here, we present a new approach to this phenomenon by having participants listen to contextually rich stories in the absence of a task targeting the manipulation. For the interaction of lexical stress and rhythmical well-formedness, we found one suprathreshold cluster localized between the cerebellum and the brain stem. For the main effect of lexical stress, we found higher BOLD responses to the retained lexical stress pattern in the bilateral SMA, bilateral postcentral gyrus, bilateral middle fontal gyrus, bilateral inferior and right superior parietal lobule, and right precuneus. These results support the view that lexical stress is processed as part of a sensorimotor network of speech comprehension. Moreover, our results connect beat processing in language to domain-independent timing perception.


2019 ◽  
Author(s):  
Shyanthony R. Synigal ◽  
Emily S. Teoh ◽  
Edmund C. Lalor

ABSTRACTThe human auditory system is adept at extracting information from speech in both single-speaker and multi-speaker situations. This involves neural processing at the rapid temporal scales seen in natural speech. Non-invasive brain imaging (electro-/magnetoencephalography [EEG/MEG]) signatures of such processing have shown that the phase of neural activity below 16 Hz tracks the dynamics of speech, whereas invasive brain imaging (electrocorticography [ECoG]) has shown that such rapid processing is even more strongly reflected in the power of neural activity at high frequencies (around 70-150 Hz; known as high gamma). The aim of this study was to determine if high gamma power in scalp recorded EEG carries useful stimulus-related information, despite its reputation for having a poor signal to noise ratio. Furthermore, we aimed to assess whether any such information might be complementary to that reflected in well-established low frequency EEG indices of speech processing. We used linear regression to investigate speech envelope and attention decoding in EEG at low frequencies, in high gamma power, and in both signals combined. While low frequency speech tracking was evident for almost all subjects as expected, high gamma power also showed robust speech tracking in a minority of subjects. This same pattern was true for attention decoding using a separate group of subjects who undertook a cocktail party attention experiment. For the subjects who showed speech tracking in high gamma power, the spatiotemporal characteristics of that high gamma tracking differed from that of low-frequency EEG. Furthermore, combining the two neural measures led to improved measures of speech tracking for several subjects. Overall, this indicates that high gamma power EEG can carry useful information regarding speech processing and attentional selection in some subjects and combining it with low frequency EEG can improve the mapping between natural speech and the resulting neural responses.


Neurology ◽  
2017 ◽  
Vol 88 (10) ◽  
pp. 970-975 ◽  
Author(s):  
Sara B. Pillay ◽  
Jeffrey R. Binder ◽  
Colin Humphries ◽  
William L. Gross ◽  
Diane S. Book

Objective:Voxel-based lesion-symptom mapping (VLSM) was used to localize impairments specific to multiword (phrase and sentence) spoken language comprehension.Methods:Participants were 51 right-handed patients with chronic left hemisphere stroke. They performed an auditory description naming (ADN) task requiring comprehension of a verbal description, an auditory sentence comprehension (ASC) task, and a picture naming (PN) task. Lesions were mapped using high-resolution MRI. VLSM analyses identified the lesion correlates of ADN and ASC impairment, first with no control measures, then adding PN impairment as a covariate to control for cognitive and language processes not specific to spoken language.Results:ADN and ASC deficits were associated with lesions in a distributed frontal-temporal parietal language network. When PN impairment was included as a covariate, both ADN and ASC deficits were specifically correlated with damage localized to the mid-to-posterior portion of the middle temporal gyrus (MTG).Conclusions:Damage to the mid-to-posterior MTG is associated with an inability to integrate multiword utterances during comprehension of spoken language. Impairment of this integration process likely underlies the speech comprehension deficits characteristic of Wernicke aphasia.


2020 ◽  
Vol 14 ◽  
Author(s):  
Fengxiang Song ◽  
Yi Zhan ◽  
James C. Ford ◽  
Dan-Chao Cai ◽  
Abigail M. Fellows ◽  
...  

PurposePrevious studies have revealed increased frontal brain activation during speech comprehension in background noise. Few, however, used tonal languages. The normal pattern of brain activation during a challenging speech-in-nose task using a tonal language remains unclear. The Mandarin Hearing-in-Noise Test (HINT) is a well-established test for assessing the ability to interpret speech in background noise. The current study used Mandarin HINT (MHINT) sentences and functional magnetic resonance imaging (fMRI) to assess brain activation with MHINT sentences.MethodsThirty native Mandarin-speaking subjects with normal peripheral hearing were recruited. Functional MRI was performed while subjects were presented with either HINT “clear” sentences with low-level background noise [signal-to-noise ratio (SNR) = +3 dB] or “noisy” sentences with high-level background noise (SNR = −5 dB). Subjects were instructed to answer with a button press whether a visually presented target word was included in the sentence. Brain activation between noisy and clear sentences was compared. Activation in each condition was also compared to a resting, no sentence presentation, condition.ResultsNoisy sentence comprehension showed increased activity in areas associated with tone processing and working memory, including the right superior and middle frontal gyri [Brodmann Areas (BAs) 46, 10]. Reduced activity with noisy sentences was seen in auditory, language, memory and somatosensory areas, including the bilateral superior and middle temporal gyri, left Heschl’s gyrus (BAs 21, 22), right temporal pole (BA 38), bilateral amygdala-hippocampus junction, and parahippocampal gyrus (BAs 28, 35), left inferior parietal lobule extending to left postcentral gyrus (BAs 2, 40), and left putamen.ConclusionIncreased frontal activation in the right hemisphere occurred when comprehending noisy spoken sentences in Mandarin. Compared to studies using non-tonal languages, this activation was strongly right-sided and involved subregions not previously reported. These findings may reflect additional effort in lexical tone perception in this tonal language. Additionally, this continuous fMRI protocol may offer a time-efficient way to assess group differences in brain activation with a challenging speech-in-noise task.


2020 ◽  
Author(s):  
Yingcan Carol Wang ◽  
Ediz Sohoglu ◽  
Rebecca A. Gilbert ◽  
Richard N. Henson ◽  
Matthew H. Davis

AbstractHuman listeners achieve quick and effortless speech comprehension through computations of conditional probability using Bayes rule. However, the neural implementation of Bayesian perceptual inference remains unclear. Competitive-selection accounts (e.g. TRACE) propose that word recognition is achieved through direct inhibitory connections between units representing candidate words that share segments (e.g. hygiene and hijack share /haid3/). Manipulations that increase lexical uncertainty should increase neural responses associated with word recognition when words cannot be uniquely identified (during the first syllable). In contrast, predictive-selection accounts (e.g. Predictive-Coding) proposes that spoken word recognition involves comparing heard and predicted speech sounds and using prediction error to update lexical representations. Increased lexical uncertainty in words like hygiene and hijack will increase prediction error and hence neural activity only at later time points when different segments are predicted (during the second syllable). We collected MEG data to distinguish these two mechanisms and used a competitor priming manipulation to change the prior probability of specific words. Lexical decision responses showed delayed recognition of target words (hygiene) following presentation of a neighbouring prime word (hijack) several minutes earlier. However, this effect was not observed with pseudoword primes (higent) or targets (hijure). Crucially, MEG responses in the STG showed greater neural responses for word-primed words after the point at which they were uniquely identified (after /haid3/ in hygiene) but not before while similar changes were again absent for pseudowords. These findings are consistent with accounts of spoken word recognition in which neural computations of prediction error play a central role.Significance StatementEffective speech perception is critical to daily life and involves computations that combine speech signals with prior knowledge of spoken words; that is, Bayesian perceptual inference. This study specifies the neural mechanisms that support spoken word recognition by testing two distinct implementations of Bayes perceptual inference. Most established theories propose direct competition between lexical units such that inhibition of irrelevant candidates leads to selection of critical words. Our results instead support predictive-selection theories (e.g. Predictive-Coding): by comparing heard and predicted speech sounds, neural computations of prediction error can help listeners continuously update lexical probabilities, allowing for more rapid word identification.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6437 ◽  
Author(s):  
Xinmiao Liu ◽  
Wenbin Wang ◽  
Haiyan Wang

Animate nouns are preferred for grammatical subjects, whereas inanimate nouns are preferred for grammatical objects. Animacy provides important semantic cues for sentence comprehension. However, how individuals’ ability to use this animacy cue changes with advancing age is still not clear. The current study investigated whether older adults and younger adults were differentially sensitive to this semantic constraint in processing Mandarin relative clauses, using a self-paced reading paradigm. The sentences used in the study contained subject relative clauses or object relative clauses and had animate or inanimate subjects. The results indicate that the animacy manipulation affected the younger adults more than the older adults in online processing. Younger adults had longer reading times for all segments in subject relative clauses than in object relative clauses when the subjects were inanimate, whereas there was no significant difference in reading times between subject and object relative clauses when the subjects were animate. In the older group, animacy was not found to influence the processing difficulty of subject relative clauses and object relative clauses. Compared with younger adults, older adults were less sensitive to animacy constraints in relative clause processing. The findings indicate that the use of animacy cues became less efficient in the ageing population. The results can be explained by the capacity constrained comprehension theory, according to which older adults have greater difficulty in integrating semantic information with syntactic processing due to the lack of sufficient cognitive resources.


2021 ◽  
Author(s):  
Octave Etard ◽  
Rémy Ben Messaoud ◽  
Gabriel Gaugain ◽  
Tobias Reichenbach

AbstractSpeech and music are spectro-temporally complex acoustic signals that a highly relevant for humans. Both contain a temporal fine structure that is encoded in the neural responses of subcortical and cortical processing centres. The subcortical response to the temporal fine structure of speech has recently been shown to be modulated by selective attention to one of two competing voices. Music similarly often consists of several simultaneous melodic lines, and a listener can selectively attend to a particular one at a time. However, the neural mechanisms that enable such selective attention remain largely enigmatic, not least since most investigations to date have focussed on short and simplified musical stimuli. Here we study the neural encoding of classical musical pieces in human volunteers, using scalp electroencephalography (EEG) recordings. We presented volunteers with continuous musical pieces composed of one or two instruments. In the latter case, the participants were asked to selectively attend to one of the two competing instruments and to perform a vibrato identification task. We used linear encoding and decoding models to relate the recorded EEG activity to the stimulus waveform. We show that we can measure neural responses to the temporal fine structure of melodic lines played by one single instrument, at the population level as well as for most individual subjects. The neural response peaks at a latency of 7.6 ms and is not measurable past 15 ms. When analysing the neural responses elicited by competing instruments, we find no evidence of attentional modulation. Our results show that, much like speech, the temporal fine structure of music is tracked by neural activity. In contrast to speech, however, this response appears unaffected by selective attention in the context of our experiment.


Sign in / Sign up

Export Citation Format

Share Document