Three Functions of Prediction Error for Bayesian Inference in Speech Perception

Author(s):  
Matthew H. Davis ◽  
Ediz Sohoglu

Spoken language is one of the most important sounds that humans hear, yet, also one of the most difficult sounds for non-human listeners or machines to identify. In this chapter we explore different neuro-computational implementations of Bayesian Inference for Speech Perception. We propose, in line with Predictive Coding (PC) principles, that Bayesian Inference is based on neural computations of the difference between heard and expected speech segments (Prediction Error). We will review three functions of these Prediction Error representations: (1) in combining prior knowledge and degraded speech for optimal word identification, (2) supporting rapid learning processes so that perception remains optimal despite perceptual degradation or variation, (3) ensuring that listeners detect instances of lexical novelty (previously unfamiliar words) so as to learn new words over the life span. Evidence from MEG and multivariate fMRI studies suggestion computations of Prediction Error in the Superior Temporal Gyrus (STG) during these three processes.

eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Ediz Sohoglu ◽  
Matthew H Davis

Human speech perception can be described as Bayesian perceptual inference but how are these Bayesian computations instantiated neurally? We used magnetoencephalographic recordings of brain responses to degraded spoken words and experimentally manipulated signal quality and prior knowledge. We first demonstrate that spectrotemporal modulations in speech are more strongly represented in neural responses than alternative speech representations (e.g. spectrogram or articulatory features). Critically, we found an interaction between speech signal quality and expectations from prior written text on the quality of neural representations; increased signal quality enhanced neural representations of speech that mismatched with prior expectations, but led to greater suppression of speech that matched prior expectations. This interaction is a unique neural signature of prediction error computations and is apparent in neural responses within 100 ms of speech input. Our findings contribute to the detailed specification of a computational model of speech perception based on predictive coding frameworks.


2020 ◽  
Author(s):  
Ediz Sohoglu ◽  
Matthew H. Davis

AbstractHuman speech perception can be described as Bayesian perceptual inference but how are these Bayesian computations instantiated neurally? We use magnetoencephalographic recordings of brain responses to degraded spoken words as a function of signal quality and prior knowledge to demonstrate that spectrotemporal modulations in speech are more clearly represented in neural responses than alternative speech representations (e.g. spectrogram or articulatory features). We found an interaction between speech signal quality and expectations from prior written text on the quality of neural representations; increased signal quality enhanced neural representations of speech that mismatched with prior expectations, but led to greater suppression of speech that matched prior expectations. This interaction is a unique neural signature of prediction error computations and already apparent in neural responses within 250 ms of speech input. Our findings contribute towards the detailed specification of a computational model of speech perception based on predictive coding frameworks.


2016 ◽  
Vol 113 (12) ◽  
pp. E1747-E1756 ◽  
Author(s):  
Ediz Sohoglu ◽  
Matthew H. Davis

Human perception is shaped by past experience on multiple timescales. Sudden and dramatic changes in perception occur when prior knowledge or expectations match stimulus content. These immediate effects contrast with the longer-term, more gradual improvements that are characteristic of perceptual learning. Despite extensive investigation of these two experience-dependent phenomena, there is considerable debate about whether they result from common or dissociable neural mechanisms. Here we test single- and dual-mechanism accounts of experience-dependent changes in perception using concurrent magnetoencephalographic and EEG recordings of neural responses evoked by degraded speech. When speech clarity was enhanced by prior knowledge obtained from matching text, we observed reduced neural activity in a peri-auditory region of the superior temporal gyrus (STG). Critically, longer-term improvements in the accuracy of speech recognition following perceptual learning resulted in reduced activity in a nearly identical STG region. Moreover, short-term neural changes caused by prior knowledge and longer-term neural changes arising from perceptual learning were correlated across subjects with the magnitude of learning-induced changes in recognition accuracy. These experience-dependent effects on neural processing could be dissociated from the neural effect of hearing physically clearer speech, which similarly enhanced perception but increased rather than decreased STG responses. Hence, the observed neural effects of prior knowledge and perceptual learning cannot be attributed to epiphenomenal changes in listening effort that accompany enhanced perception. Instead, our results support a predictive coding account of speech perception; computational simulations show how a single mechanism, minimization of prediction error, can drive immediate perceptual effects of prior knowledge and longer-term perceptual learning of degraded speech.


2020 ◽  
Author(s):  
Yingcan Carol Wang ◽  
Ediz Sohoglu ◽  
Rebecca A. Gilbert ◽  
Richard N. Henson ◽  
Matthew H. Davis

AbstractHuman listeners achieve quick and effortless speech comprehension through computations of conditional probability using Bayes rule. However, the neural implementation of Bayesian perceptual inference remains unclear. Competitive-selection accounts (e.g. TRACE) propose that word recognition is achieved through direct inhibitory connections between units representing candidate words that share segments (e.g. hygiene and hijack share /haid3/). Manipulations that increase lexical uncertainty should increase neural responses associated with word recognition when words cannot be uniquely identified (during the first syllable). In contrast, predictive-selection accounts (e.g. Predictive-Coding) proposes that spoken word recognition involves comparing heard and predicted speech sounds and using prediction error to update lexical representations. Increased lexical uncertainty in words like hygiene and hijack will increase prediction error and hence neural activity only at later time points when different segments are predicted (during the second syllable). We collected MEG data to distinguish these two mechanisms and used a competitor priming manipulation to change the prior probability of specific words. Lexical decision responses showed delayed recognition of target words (hygiene) following presentation of a neighbouring prime word (hijack) several minutes earlier. However, this effect was not observed with pseudoword primes (higent) or targets (hijure). Crucially, MEG responses in the STG showed greater neural responses for word-primed words after the point at which they were uniquely identified (after /haid3/ in hygiene) but not before while similar changes were again absent for pseudowords. These findings are consistent with accounts of spoken word recognition in which neural computations of prediction error play a central role.Significance StatementEffective speech perception is critical to daily life and involves computations that combine speech signals with prior knowledge of spoken words; that is, Bayesian perceptual inference. This study specifies the neural mechanisms that support spoken word recognition by testing two distinct implementations of Bayes perceptual inference. Most established theories propose direct competition between lexical units such that inhibition of irrelevant candidates leads to selection of critical words. Our results instead support predictive-selection theories (e.g. Predictive-Coding): by comparing heard and predicted speech sounds, neural computations of prediction error can help listeners continuously update lexical probabilities, allowing for more rapid word identification.


2021 ◽  
Author(s):  
Tomaso Muzzu ◽  
Aman B. Saleem

Sensory experience is often dependent on one’s own actions, including self-motion. Theories of predictive coding postulate that actions are regulated by calculating prediction error, which is the difference between sensory experience and expectation based on self-generated actions. Signals consistent with prediction error have been reported in mouse visual cortex (V1) when visual flow coupled to running is unexpectedly perturbed. Here, we show that such signals can be elicited by visual stimuli uncoupled with the animal’s running. We recorded the activity of mouse V1 neurons while presenting drifting gratings that unexpectedly stopped. We found strong responses to visual perturbations, which were enhanced during running. If these perturbation responses are signals about sensorimotor mismatch, they should be largest for front-to-back visual flow expected from the animals’ running. Responses, however, did not show a bias for front-to-back visual flow. Instead, perturbation responses were strongest in the preferred orientation of individual neurons and perturbation responsive neurons were more likely to prefer slow visual speeds. Our results therefore indicate that prediction error signals can be explained by the convergence of known motor and sensory signals in visual cortex, providing a purely sensory and motor explanation for purported mismatch signals.


Author(s):  
Christoph Mathys

Psychiatry has found it difficult to develop a nosology that allows for the targeted treatment of disorders of the mind. This article sets out a possible way forward: harnessing systems theory to provide the conceptual constraints needed to link clinical phenomena with neurobiology. This approach builds on the insight that the mind is a system which, to regulate its environment, needs to have a model of that environment and needs to update predictions about it using the rules of inductive logic. It can be shown that Bayesian inference can be reduced to updating beliefs based on precision-weighted prediction errors, where a prediction error is the difference between actual and predicted input, and precision is the confidence associated with the input prediction. Precision weighting of prediction errors entails that a given discrepancy between outcome and prediction means more, and leads to greater belief updates, the more confidently the prediction was made. This provides a conceptual framework linking clinical experience with the pathophysiology underlying disorders of the mind. Limitations of this approach are discussed and ways to work around them illustrated. Initial steps and possible future directions toward a nosology based on failures of precision weighting are discussed.


2018 ◽  
Author(s):  
Benjamin Gagl ◽  
Jona Sassenhagen ◽  
Sophia Haan ◽  
Klara Gregorova ◽  
Fabio Richlan ◽  
...  

AbstractMost current models assume that the perceptual and cognitive processes of visual word recognition and reading operate upon neuronally coded domain-general low-level visual representations – typically oriented line representations. We here demonstrate, consistent with neurophysiological theories of Bayesian-like predictive neural computations, that prior visual knowledge of words may be utilized to ‘explain away’ redundant and highly expected parts of the visual percept. Subsequent processing stages, accordingly, operate upon an optimized representation of the visual input, the orthographic prediction error, highlighting only the visual information relevant for word identification. We show that this optimized representation is related to orthographic word characteristics, accounts for word recognition behavior, and is processed early in the visual processing stream, i.e., in V4 and before 200 ms after word-onset. Based on these findings, we propose that prior visual-orthographic knowledge is used to optimize the representation of visually presented words, which in turn allows for highly efficient reading processes.


2019 ◽  
Author(s):  
Lílian Rodrigues de Almeida ◽  
Paul A. Pope ◽  
Peter Hansen

In our previous studies we supported the claim that the motor theory is modulated by task load. Motoric participation in phonological processing increases from speech perception to speech production, with the endpoints of the dorsal stream having changing and complementary weightings for processing: the left inferior frontal gyrus (LIFG) being increasingly relevant and the left superior temporal gyrus (LSTG) being decreasingly relevant. Our previous results for neurostimulation of the LIFG support this model. In this study we investigated whether our claim that the motor theory is modulated by task load holds in (frontal) aphasia. Person(s) with aphasia (PWA) after stroke typically have damage on brain areas responsible for phonological processing. They may present variable patterns of recovery and, consequently, variable strategies of phonological processing. Here these strategies were investigated in two PWA with simultaneous fMRI and tDCS of the LIFG during speech perception and speech production tasks. Anodal tDCS excitation and cathodal tDCS inhibition should increase with the relevance of the target for the task. Cathodal tDCS over a target of low relevance could also induce compensation by the remaining nodes. Responses of PWA to tDCS would further depend on their pattern of recovery. Responses would depend on the responsiveness of the perilesional area, and could be weaker than in controls due to an overall hypoactivation of the cortex. Results suggest that the analysis of motor codes for articulation during phonological processing remains in frontal aphasia and that tDCS is a promising diagnostic tool to investigate the individual processing strategies.


2021 ◽  
Vol 92 (8) ◽  
pp. A3.3-A4
Author(s):  
Harriet Sharp ◽  
Kristy Themelis ◽  
Marisa Amato ◽  
Andrew Barritt ◽  
Kevin Davies ◽  
...  

IntroductionThe aetiology and pathophysiology of fibromyalgia and ME/CFS are poorly characterised but altered inflammatory, autonomic and interoceptive processes have been implicated. Interoception has been conceptualised as a predictive coding process; where top-down prediction signals compare to bottom-up afferents, resulting in prediction error signals indicating mismatch between expected and actual bodily states. Chronic dyshomeostasis and elevated interoceptive prediction error signals have been theorised to contribute to the expression of pain and fatigue in fibromyalgia and ME/CFS.Objectives/AimsTo investigate how altered interoception and prediction error relates to baseline expression of pain and fatigue in fibromyalgia and ME/CFS and in response to an inflammatory challenge.MethodsSixty-five patients with fibromyalgia and/or ME/CFS diagnosis and 26 matched controls underwent baseline assessment: self-report questionnaires assessing subjective pain and fatigue and objective measurements of pressure-pain thresholds. Participants received injections of typhoid (inflammatory challenge) or saline (placebo) in a randomised, double-blind, crossover design, then completed heartbeat tracking task (assessing interoceptive accuracy). Porges Body Questionnaire assessed interoceptive sensibility. Interoceptive prediction error (IPE) was calculated as discrepancy between objective accuracy and subjective sensibility.ResultsPatients with fibromyalgia and ME/CFS had significantly higher IPE (suggesting tendency to over-estimate interoceptive ability) and interoceptive sensibility, despite no differences in interoceptive accuracy. IPE and sensibility correlated positively with all self-report fatigue and pain measures, and negatively with pain thresholds. Following inflammatory challenge, IPE correlated negatively with the mismatch between subjective and objective measures of pain induced by inflammation.ConclusionsThis is the first study to reveal altered interoception processes in patients with fibromyalgia and ME/CFS, who are known to have dysregulated autonomic function. Notably, we found elevated IPE in patients, correlating with their subjective experiences of pain and fatigue. We hypothesise a predictive coding model, where mismatch between expected and actual internal bodily states (linked to autonomic dysregulation) results in prediction error signalling which could be metacognitively interpreted as chronic pain and fatigue. Our results demonstrate potential for further exploration of interoceptive processing in patients with fibromyalgia and ME/CFS, aiding understanding of these poorly defined conditions and providing potential new targets for diagnostic and therapeutic intervention.


2020 ◽  
Vol 32 (6) ◽  
pp. 1092-1103 ◽  
Author(s):  
Dan Kennedy-Higgins ◽  
Joseph T. Devlin ◽  
Helen E. Nuttall ◽  
Patti Adank

Successful perception of speech in everyday listening conditions requires effective listening strategies to overcome common acoustic distortions, such as background noise. Convergent evidence from neuroimaging and clinical studies identify activation within the temporal lobes as key to successful speech perception. However, current neurobiological models disagree on whether the left temporal lobe is sufficient for successful speech perception or whether bilateral processing is required. We addressed this issue using TMS to selectively disrupt processing in either the left or right superior temporal gyrus (STG) of healthy participants to test whether the left temporal lobe is sufficient or whether both left and right STG are essential. Participants repeated keywords from sentences presented in background noise in a speech reception threshold task while receiving online repetitive TMS separately to the left STG, right STG, or vertex or while receiving no TMS. Results show an equal drop in performance following application of TMS to either left or right STG during the task. A separate group of participants performed a visual discrimination threshold task to control for the confounding side effects of TMS. Results show no effect of TMS on the control task, supporting the notion that the results of Experiment 1 can be attributed to modulation of cortical functioning in STG rather than to side effects associated with online TMS. These results indicate that successful speech perception in everyday listening conditions requires both left and right STG and thus have ramifications for our understanding of the neural organization of spoken language processing.


Sign in / Sign up

Export Citation Format

Share Document