Prosody and Spoken-Word Recognition

This chapter outlines a Bayesian model of spoken-word recognition and reviews how prosody is part of that model. The review focuses on the information that assists the listener in recognizing the prosodic structure of an utterance and on how spoken-word recognition is also constrained by prior knowledge about prosodic structure. Recognition is argued to be a process of perceptual inference that ensures that listening is robust to variability in the speech signal. In essence, the listener makes inferences about the segmental content of each utterance, about its prosodic structure (simultaneously at different levels in the prosodic hierarchy), and about the words it contains, and uses these inferences to form an utterance interpretation. Four characteristics of the proposed prosody-enriched recognition model are discussed: parallel uptake of different information types, high contextual dependency, adaptive processing, and phonological abstraction. The next steps that should be taken to develop the model are also discussed.

Download Full-text

Predictive Neural Computations Support Spoken Word Recognition: Evidence from MEG and Competitor Priming

10.1101/2020.07.01.182717 ◽

2020 ◽

Author(s):

Yingcan Carol Wang ◽

Ediz Sohoglu ◽

Rebecca A. Gilbert ◽

Richard N. Henson ◽

Matthew H. Davis

Keyword(s):

Word Recognition ◽

Prediction Error ◽

Spoken Word Recognition ◽

Predictive Coding ◽

Spoken Word ◽

Speech Comprehension ◽

Speech Sounds ◽

Neural Responses ◽

Perceptual Inference ◽

Neural Computations

AbstractHuman listeners achieve quick and effortless speech comprehension through computations of conditional probability using Bayes rule. However, the neural implementation of Bayesian perceptual inference remains unclear. Competitive-selection accounts (e.g. TRACE) propose that word recognition is achieved through direct inhibitory connections between units representing candidate words that share segments (e.g. hygiene and hijack share /haid3/). Manipulations that increase lexical uncertainty should increase neural responses associated with word recognition when words cannot be uniquely identified (during the first syllable). In contrast, predictive-selection accounts (e.g. Predictive-Coding) proposes that spoken word recognition involves comparing heard and predicted speech sounds and using prediction error to update lexical representations. Increased lexical uncertainty in words like hygiene and hijack will increase prediction error and hence neural activity only at later time points when different segments are predicted (during the second syllable). We collected MEG data to distinguish these two mechanisms and used a competitor priming manipulation to change the prior probability of specific words. Lexical decision responses showed delayed recognition of target words (hygiene) following presentation of a neighbouring prime word (hijack) several minutes earlier. However, this effect was not observed with pseudoword primes (higent) or targets (hijure). Crucially, MEG responses in the STG showed greater neural responses for word-primed words after the point at which they were uniquely identified (after /haid3/ in hygiene) but not before while similar changes were again absent for pseudowords. These findings are consistent with accounts of spoken word recognition in which neural computations of prediction error play a central role.Significance StatementEffective speech perception is critical to daily life and involves computations that combine speech signals with prior knowledge of spoken words; that is, Bayesian perceptual inference. This study specifies the neural mechanisms that support spoken word recognition by testing two distinct implementations of Bayes perceptual inference. Most established theories propose direct competition between lexical units such that inhibition of irrelevant candidates leads to selection of critical words. Our results instead support predictive-selection theories (e.g. Predictive-Coding): by comparing heard and predicted speech sounds, neural computations of prediction error can help listeners continuously update lexical probabilities, allowing for more rapid word identification.

Download Full-text