Predictive Neural Computations Support Spoken Word Recognition: Evidence from MEG and Competitor Priming

AbstractHuman listeners achieve quick and effortless speech comprehension through computations of conditional probability using Bayes rule. However, the neural implementation of Bayesian perceptual inference remains unclear. Competitive-selection accounts (e.g. TRACE) propose that word recognition is achieved through direct inhibitory connections between units representing candidate words that share segments (e.g. hygiene and hijack share /haid3/). Manipulations that increase lexical uncertainty should increase neural responses associated with word recognition when words cannot be uniquely identified (during the first syllable). In contrast, predictive-selection accounts (e.g. Predictive-Coding) proposes that spoken word recognition involves comparing heard and predicted speech sounds and using prediction error to update lexical representations. Increased lexical uncertainty in words like hygiene and hijack will increase prediction error and hence neural activity only at later time points when different segments are predicted (during the second syllable). We collected MEG data to distinguish these two mechanisms and used a competitor priming manipulation to change the prior probability of specific words. Lexical decision responses showed delayed recognition of target words (hygiene) following presentation of a neighbouring prime word (hijack) several minutes earlier. However, this effect was not observed with pseudoword primes (higent) or targets (hijure). Crucially, MEG responses in the STG showed greater neural responses for word-primed words after the point at which they were uniquely identified (after /haid3/ in hygiene) but not before while similar changes were again absent for pseudowords. These findings are consistent with accounts of spoken word recognition in which neural computations of prediction error play a central role.Significance StatementEffective speech perception is critical to daily life and involves computations that combine speech signals with prior knowledge of spoken words; that is, Bayesian perceptual inference. This study specifies the neural mechanisms that support spoken word recognition by testing two distinct implementations of Bayes perceptual inference. Most established theories propose direct competition between lexical units such that inhibition of irrelevant candidates leads to selection of critical words. Our results instead support predictive-selection theories (e.g. Predictive-Coding): by comparing heard and predicted speech sounds, neural computations of prediction error can help listeners continuously update lexical probabilities, allowing for more rapid word identification.

Download Full-text

Rapid computations of spectrotemporal prediction error support perception of degraded speech

eLife ◽

10.7554/elife.58077 ◽

2020 ◽

Vol 9 ◽

Author(s):

Ediz Sohoglu ◽

Matthew H Davis

Keyword(s):

Speech Perception ◽

Prediction Error ◽

Predictive Coding ◽

Signal Quality ◽

Neural Responses ◽

Perceptual Inference ◽

Written Text ◽

Degraded Speech ◽

Neural Representations ◽

Brain Responses

Human speech perception can be described as Bayesian perceptual inference but how are these Bayesian computations instantiated neurally? We used magnetoencephalographic recordings of brain responses to degraded spoken words and experimentally manipulated signal quality and prior knowledge. We first demonstrate that spectrotemporal modulations in speech are more strongly represented in neural responses than alternative speech representations (e.g. spectrogram or articulatory features). Critically, we found an interaction between speech signal quality and expectations from prior written text on the quality of neural representations; increased signal quality enhanced neural representations of speech that mismatched with prior expectations, but led to greater suppression of speech that matched prior expectations. This interaction is a unique neural signature of prediction error computations and is apparent in neural responses within 100 ms of speech input. Our findings contribute to the detailed specification of a computational model of speech perception based on predictive coding frameworks.

Download Full-text

Beat gestures influence which speech sounds you hear

10.1101/2020.07.13.200543 ◽

2020 ◽

Author(s):

Hans Rutger Bosker ◽

David Peeters

Keyword(s):

Word Recognition ◽

Spoken Word Recognition ◽

Empirical Support ◽

Spoken Word ◽

Mcgurk Effect ◽

Lexical Stress ◽

Speech Sounds ◽

Human Communication ◽

Face To Face ◽

Implicit Perception

ABSTRACTBeat gestures – spontaneously produced biphasic movements of the hand – are among the most frequently encountered co-speech gestures in human communication. They are closely temporally aligned to the prosodic characteristics of the speech signal, typically occurring on lexically stressed syllables. Despite their prevalence across speakers of the world’s languages, how beat gestures impact spoken word recognition is unclear. Can these simple ‘flicks of the hand’ influence speech perception? Across six experiments, we demonstrate that beat gestures influence the explicit and implicit perception of lexical stress (e.g., distinguishing OBject from obJECT), and in turn, can influence what vowels listeners hear. Thus, we provide converging evidence for a manual McGurk effect: even the simplest ‘flicks of the hands’ influence which speech sounds we hear.SIGNIFICANCE STATEMENTBeat gestures are very common in human face-to-face communication. Yet we know little about their behavioral consequences for spoken language comprehension. We demonstrate that beat gestures influence the explicit and implicit perception of lexical stress, and, in turn, can even shape what vowels we think we hear. This demonstration of a manual McGurk effect provides some of the first empirical support for a recent multimodal, situated psycholinguistic framework of human communication, while challenging current models of spoken word recognition that do not yet incorporate multimodal prosody. Moreover, it has the potential to enrich human-computer interaction and improve multimodal speech recognition systems.

Download Full-text

Prosody and Spoken-Word Recognition

The Oxford Handbook of Language Prosody ◽

10.1093/oxfordhb/9780198832232.013.33 ◽

2020 ◽

pp. 508-521

Author(s):

James M. McQueen ◽

Laura Dilley

Keyword(s):

Word Recognition ◽

Bayesian Model ◽

Spoken Word Recognition ◽

Spoken Word ◽

Prosodic Structure ◽

Perceptual Inference ◽

Structure Recognition ◽

Contextual Dependency ◽

Utterance Interpretation ◽

Different Levels

This chapter outlines a Bayesian model of spoken-word recognition and reviews how prosody is part of that model. The review focuses on the information that assists the listener in recognizing the prosodic structure of an utterance and on how spoken-word recognition is also constrained by prior knowledge about prosodic structure. Recognition is argued to be a process of perceptual inference that ensures that listening is robust to variability in the speech signal. In essence, the listener makes inferences about the segmental content of each utterance, about its prosodic structure (simultaneously at different levels in the prosodic hierarchy), and about the words it contains, and uses these inferences to form an utterance interpretation. Four characteristics of the proposed prosody-enriched recognition model are discussed: parallel uptake of different information types, high contextual dependency, adaptive processing, and phonological abstraction. The next steps that should be taken to develop the model are also discussed.

Download Full-text

How are visemes and graphemes integrated with speech sounds during spoken word recognition? ERP evidence for supra-additive responses during audiovisual compared to auditory speech processing

Brain and Language ◽

10.1016/j.bandl.2021.105058 ◽

2022 ◽

Vol 225 ◽

pp. 105058

Author(s):

Chotiga Pattamadilok ◽

Marc Sato

Keyword(s):

Word Recognition ◽

Speech Processing ◽

Spoken Word Recognition ◽

Spoken Word ◽

Speech Sounds ◽

Auditory Speech

Download Full-text

Rapid computations of spectrotemporal prediction error support perception of degraded speech

10.1101/2020.04.22.054726 ◽

2020 ◽

Author(s):

Ediz Sohoglu ◽

Matthew H. Davis

Keyword(s):

Speech Perception ◽

Prediction Error ◽

Predictive Coding ◽

Signal Quality ◽

Neural Responses ◽

Perceptual Inference ◽

Written Text ◽

Degraded Speech ◽

Neural Representations ◽

Brain Responses

AbstractHuman speech perception can be described as Bayesian perceptual inference but how are these Bayesian computations instantiated neurally? We use magnetoencephalographic recordings of brain responses to degraded spoken words as a function of signal quality and prior knowledge to demonstrate that spectrotemporal modulations in speech are more clearly represented in neural responses than alternative speech representations (e.g. spectrogram or articulatory features). We found an interaction between speech signal quality and expectations from prior written text on the quality of neural representations; increased signal quality enhanced neural representations of speech that mismatched with prior expectations, but led to greater suppression of speech that matched prior expectations. This interaction is a unique neural signature of prediction error computations and already apparent in neural responses within 250 ms of speech input. Our findings contribute towards the detailed specification of a computational model of speech perception based on predictive coding frameworks.

Download Full-text