Frontal Theta Oscillatory Activity Is a Common Mechanism for the Computation of Unexpected Outcomes and Learning Rate

In decision-making processes, the relevance of the information yielded by outcomes varies across time and situations. It increases when previous predictions are not accurate and in contexts with high environmental uncertainty. Previous fMRI studies have shown an important role of medial pFC in coding both reward prediction errors and the impact of this information to guide future decisions. However, it is unclear whether these two processes are dissociated in time or occur simultaneously, suggesting that a common mechanism is engaged. In the present work, we studied the modulation of two electrophysiological responses associated to outcome processing—the feedback-related negativity ERP and frontocentral theta oscillatory activity—with the reward prediction error and the learning rate. Twenty-six participants performed two learning tasks differing in the degree of predictability of the outcomes: a reversal learning task and a probabilistic learning task with multiple blocks of novel cue–outcome associations. We implemented a reinforcement learning model to obtain the single-trial reward prediction error and the learning rate for each participant and task. Our results indicated that midfrontal theta activity and feedback-related negativity increased linearly with the unsigned prediction error. In addition, variations of frontal theta oscillatory activity predicted the learning rate across tasks and participants. These results support the existence of a common brain mechanism for the computation of unsigned prediction error and learning rate.

Download Full-text

Positive reward prediction errors strengthen incidental memory encoding

10.1101/327445 ◽

2018 ◽

Cited By ~ 2

Author(s):

Anthony I. Jang ◽

Matthew R. Nassar ◽

Daniel G. Dillon ◽

Michael J. Frank

Keyword(s):

Prediction Error ◽

Memory Systems ◽

Prediction Errors ◽

Memory Encoding ◽

Dopamine System ◽

Reward Prediction Error ◽

Reward Prediction ◽

Incidental Memory ◽

Episodic Memories ◽

The Impact

AbstractThe dopamine system is thought to provide a reward prediction error signal that facilitates reinforcement learning and reward-based choice in corticostriatal circuits. While it is believed that similar prediction error signals are also provided to temporal lobe memory systems, the impact of such signals on episodic memory encoding has not been fully characterized. Here we develop an incidental memory paradigm that allows us to 1) estimate the influence of reward prediction errors on the formation of episodic memories, 2) dissociate this influence from other factors such as surprise and uncertainty, 3) test the degree to which this influence depends on temporal correspondence between prediction error and memoranda presentation, and 4) determine the extent to which this influence is consolidation-dependent. We find that when choosing to gamble for potential rewards during a primary decision making task, people encode incidental memoranda more strongly even though they are not aware that their memory will be subsequently probed. Moreover, this strengthened encoding scales with the reward prediction error, and not overall reward, experienced selectively at the time of memoranda presentation (and not before or after). Finally, this strengthened encoding is identifiable within a few minutes and is not substantially enhanced after twenty-four hours, indicating that it is not consolidation-dependent. These results suggest a computationally and temporally specific role for putative dopaminergic reward prediction error signaling in memory formation.

Download Full-text

Behavioural and computational evidence for memory consolidation biased by reward-prediction errors

10.1101/716290 ◽

2019 ◽

Author(s):

Emma L. Roscow ◽

Matthew W. Jones ◽

Nathan F. Lepora

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Learning Task ◽

Male Rats ◽

Adaptive Behaviour ◽

Prediction Errors ◽

Reward Prediction Error ◽

Reward Prediction ◽

Per Se ◽

Reinforcement Learning Model

AbstractNeural activity encoding recent experiences is replayed during sleep and rest to promote consolidation of the corresponding memories. However, precisely which features of experience influence replay prioritisation to optimise adaptive behaviour remains unclear. Here, we trained adult male rats on a novel maze-based rein-forcement learning task designed to dissociate reward outcomes from reward-prediction errors. Four variations of a reinforcement learning model were fitted to the rats’ behaviour over multiple days. Behaviour was best predicted by a model incorporating replay biased by reward-prediction error, compared to the same model with no replay; random replay or reward-biased replay produced poorer predictions of behaviour. This insight disentangles the influences of salience on replay, suggesting that reinforcement learning is tuned by post-learning replay biased by reward-prediction error, not by reward per se. This work therefore provides a behavioural and theoretical toolkit with which to measure and interpret replay in striatal, hippocampal and neocortical circuits.

Download Full-text

Belief about nicotine selectively modulates value and reward prediction error signals in smokers

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1416639112 ◽

2015 ◽

Vol 112 (8) ◽

pp. 2539-2544 ◽

Cited By ~ 38

Author(s):

Xiaosi Gu ◽

Terry Lohrenz ◽

Ramiro Salas ◽

Philip R. Baldwin ◽

Alireza Soltani ◽

...

Keyword(s):

Prediction Error ◽

Mental Health Problems ◽

Global Changes ◽

Prediction Errors ◽

Prior Beliefs ◽

Reward Prediction Error ◽

Model Based ◽

Reward Prediction ◽

Neuroactive Drugs ◽

The Impact

Little is known about how prior beliefs impact biophysically described processes in the presence of neuroactive drugs, which presents a profound challenge to the understanding of the mechanisms and treatments of addiction. We engineered smokers’ prior beliefs about the presence of nicotine in a cigarette smoked before a functional magnetic resonance imaging session where subjects carried out a sequential choice task. Using a model-based approach, we show that smokers’ beliefs about nicotine specifically modulated learning signals (value and reward prediction error) defined by a computational model of mesolimbic dopamine systems. Belief of “no nicotine in cigarette” (compared with “nicotine in cigarette”) strongly diminished neural responses in the striatum to value and reward prediction errors and reduced the impact of both on smokers’ choices. These effects of belief could not be explained by global changes in visual attention and were specific to value and reward prediction errors. Thus, by modulating the expression of computationally explicit signals important for valuation and choice, beliefs can override the physical presence of a potent neuroactive compound like nicotine. These selective effects of belief demonstrate that belief can modulate model-based parameters important for learning. The implications of these findings may be far ranging because belief-dependent effects on learning signals could impact a host of other behaviors in addiction as well as in other mental health problems.

Download Full-text

Dopamine reward prediction error coding

Dialogues in Clinical Neuroscience ◽

10.31887/dcns.2016.18.1/wschultz ◽

2016 ◽

Vol 18 (1) ◽

pp. 23-32 ◽

Cited By ~ 71

Keyword(s):

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Reward Prediction Error ◽

Reward Prediction ◽

Negative Prediction ◽

Baseline Activity ◽

Error Coding ◽

Reward Value ◽

Dopamine Signal

Reward prediction errors consist of the differences between received and predicted rewards. They are crucial for basic forms of learning about rewards and make us strive for more rewards—an evolutionary beneficial trait. Most dopamine neurons in the midbrain of humans, monkeys, and rodents signal a reward prediction error; they are activated by more reward than predicted (positive prediction error), remain at baseline activity for fully predicted rewards, and show depressed activity with less reward than predicted (negative prediction error). The dopamine signal increases nonlinearly with reward value and codes formal economic utility. Drugs of addiction generate, hijack, and amplify the dopamine reward signal and induce exaggerated, uncontrolled dopamine effects on neuronal plasticity. The striatum, amygdala, and frontal cortex also show reward prediction error coding, but only in subpopulations of neurons. Thus, the important concept of reward prediction errors is implemented in neuronal hardware.

Download Full-text

Differential reinforcement encoding along the hippocampal long axis helps resolve the explore–exploit dilemma

Nature Communications ◽

10.1038/s41467-020-18864-0 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Alexandre Y. Dombrovski ◽

Beatriz Luna ◽

Michael N. Hallquist

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Differential Reinforcement ◽

Cognitive Maps ◽

Learning Task ◽

Natural Environments ◽

Reward Prediction Error ◽

Reward Function ◽

Reward Prediction ◽

Reward Information

Abstract When making decisions, should one exploit known good options or explore potentially better alternatives? Exploration of spatially unstructured options depends on the neocortex, striatum, and amygdala. In natural environments, however, better options often cluster together, forming structured value distributions. The hippocampus binds reward information into allocentric cognitive maps to support navigation and foraging in such spaces. Here we report that human posterior hippocampus (PH) invigorates exploration while anterior hippocampus (AH) supports the transition to exploitation on a reinforcement learning task with a spatially structured reward function. These dynamics depend on differential reinforcement representations in the PH and AH. Whereas local reward prediction error signals are early and phasic in the PH tail, global value maximum signals are delayed and sustained in the AH body. AH compresses reinforcement information across episodes, updating the location and prominence of the value maximum and displaying goal cell-like ramping activity when navigating toward it.

Download Full-text

Striatal Dopamine and Reward Prediction Error Signaling in Unmedicated Schizophrenia Patients

Schizophrenia Bulletin ◽

10.1093/schbul/sbaa055 ◽

2020 ◽

Vol 46 (6) ◽

pp. 1535-1546

Author(s):

Teresa Katthagen ◽

Jakob Kaminski ◽

Andreas Heinz ◽

Ralph Buchert ◽

Florian Schlagenhauf

Keyword(s):

Reversal Learning ◽

Prediction Error ◽

Negative Symptoms ◽

Ventral Striatum ◽

Striatal Dopamine ◽

Dopamine Synthesis ◽

Positive Symptoms ◽

Prediction Errors ◽

Reward Prediction Error ◽

Reward Prediction

Abstract Increased striatal dopamine synthesis capacity has consistently been reported in patients with schizophrenia. However, the mechanism translating this into behavior and symptoms remains unclear. It has been proposed that heightened striatal dopamine may blunt dopaminergic reward prediction error signaling during reinforcement learning. In this study, we investigated striatal dopamine synthesis capacity, reward prediction errors, and their association in unmedicated schizophrenia patients (n = 19) and healthy controls (n = 23). They took part in FDOPA-PET and underwent functional magnetic resonance imaging (fMRI) scanning, where they performed a reversal-learning paradigm. The groups were compared regarding dopamine synthesis capacity (Kicer), fMRI neural prediction error signals, and the correlation of both. Patients did not differ from controls with respect to striatal Kicer. Taking into account, comorbid alcohol abuse revealed that patients without such abuse showed elevated Kicer in the associative striatum, while those with abuse did not differ from controls. Comparing all patients to controls, patients performed worse during reversal learning and displayed reduced prediction error signaling in the ventral striatum. In controls, Kicer in the limbic striatum correlated with higher reward prediction error signaling, while there was no significant association in patients. Kicer in the associative striatum correlated with higher positive symptoms and blunted reward prediction error signaling was associated with negative symptoms. Our results suggest a dissociation between striatal subregions and symptom domains, with elevated dopamine synthesis capacity in the associative striatum contributing to positive symptoms while blunted prediction error signaling in the ventral striatum related to negative symptoms.

Download Full-text

Momentary subjective well-being depends on learning and not reward

eLife ◽

10.7554/elife.57977 ◽

2020 ◽

Vol 9 ◽

Author(s):

Bastien Blain ◽

Robb B Rutledge

Keyword(s):

Prediction Error ◽

Computational Modelling ◽

Well Being ◽

Learning Task ◽

Adaptive Behaviour ◽

Subjective Well Being ◽

Reward Prediction Error ◽

Reward Prediction ◽

The Difference ◽

Momentary Happiness

Subjective well-being or happiness is often associated with wealth. Recent studies suggest that momentary happiness is associated with reward prediction error, the difference between experienced and predicted reward, a key component of adaptive behaviour. We tested subjects in a reinforcement learning task in which reward size and probability were uncorrelated, allowing us to dissociate between the contributions of reward and learning to happiness. Using computational modelling, we found convergent evidence across stable and volatile learning tasks that happiness, like behaviour, is sensitive to learning-relevant variables (i.e. probability prediction error). Unlike behaviour, happiness is not sensitive to learning-irrelevant variables (i.e. reward prediction error). Increasing volatility reduces how many past trials influence behaviour but not happiness. Finally, depressive symptoms reduce happiness more in volatile than stable environments. Our results suggest that how we learn about our world may be more important for how we feel than the rewards we actually receive.

Download Full-text

Abnormal reward prediction error signalling in antipsychotic naïve individuals with first episode psychosis or clinical risk for psychosis

10.1101/214437 ◽

2017 ◽

Cited By ~ 2

Author(s):

Anna O Ermakova ◽

Franziska Knolle ◽

Azucena Justicia ◽

Edward T Bullmore ◽

Peter B Jones ◽

...

Keyword(s):

At Risk ◽

Prediction Error ◽

First Episode Psychosis ◽

Psychotic Symptoms ◽

First Episode ◽

Prediction Errors ◽

Reward Prediction Error ◽

Reward Prediction ◽

Episode Psychosis ◽

Risk Patients

AbstractOngoing research suggests preliminary, though not entirely consistent, evidence of neural abnormalities in signalling prediction errors in schizophrenia. Supporting theories suggest mechanistic links between the disruption of these processes and the generation of psychotic symptoms. However, it is not known at what stage in psychosis these impairments in prediction error signalling develop. One major confound in prior studies is the use of medicated patients with strongly varying disease durations. Our study aims to investigate the involvement of the meso-cortico-striatal circuitry during reward prediction error signalling in the earliest stages of psychosis. We studied patients with first episode psychosis (FEP) and help-seeking individuals at risk for psychosis due to subthreshold prodromal psychotic symptoms. Patients with either FEP (n = 14), or at-risk for developing psychosis (n= 30), and healthy volunteers (n = 39) performed a reinforcement learning task during fMRI scanning. ANOVA revealed significant (p<0.05 family-wise error corrected) prediction error signalling differences between groups in the dopaminergic midbrain and right middle frontal gyrus (dorsolateral prefrontal cortex, DLPFC). Patients with FEP showed disrupted reward prediction error signalling compared to controls in both regions. At-risk patients showed intermediate activation in the midbrain that significantly differed from controls and from FEP patients, but DLPFC activation that did not differ from controls. Our study confirms that patients with FEP have abnormal meso-cortical signalling of reward prediction errors, whilst reward prediction error dysfunction in the at-risk patients appears to show a more nuanced pattern of activation with a degree of midbrain impairment but preserved cortical function.

Download Full-text

Role of Reversal Learning Impairment in Social Disinhibition following Severe Traumatic Brain Injury

Journal of the International Neuropsychological Society ◽

10.1017/s1355617715001277 ◽

2016 ◽

Vol 22 (3) ◽

pp. 303-313 ◽

Cited By ~ 4

Author(s):

Katherine Osborne-Crowley ◽

Skye McDonald ◽

Jacqueline A. Rushby

Keyword(s):

Traumatic Brain Injury ◽

Brain Injury ◽

Reversal Learning ◽

Prediction Error ◽

Learning Task ◽

Reward Prediction Error ◽

Reward Prediction ◽

The Social ◽

Learning Impairment ◽

Reversal Errors

AbstractObjectives: The current study aimed to determine whether reversal learning impairments and feedback-related negativity (FRN), reflecting reward prediction error signals generated by negative feedback during the reversal learning tasks, were associated with social disinhibition in a group of participants with traumatic brain injury (TBI). Methods: Number of reversal errors on a social and a non-social reversal learning task and FRN were examined for 21 participants with TBI and 21 control participants matched for age. Participants with TBI were also divided into low and high disinhibition groups based on rated videotaped interviews. Results: Participants with TBI made more reversal errors and produced smaller amplitude FRNs than controls. Furthermore, participants with TBI high on social disinhibition made more reversal errors on the social reversal learning task than did those low on social disinhibition. FRN amplitude was not related to disinhibition. Conclusions: These results suggest that impairment in the ability to update behavior when social reinforcement contingencies change plays a role in social disinhibition after TBI. Furthermore, the social reversal learning task used in this study may be a useful neuropsychological tool for detecting susceptibility to acquired social disinhibition following TBI. Finally, that the FRN amplitude was not associated with social disinhibition suggests that reward prediction error signals are not critical for behavioral adaptation in the social domain. (JINS, 2016, 21, 303–313)

Download Full-text