Behavioural and computational evidence for memory consolidation biased by reward-prediction errors

AbstractNeural activity encoding recent experiences is replayed during sleep and rest to promote consolidation of the corresponding memories. However, precisely which features of experience influence replay prioritisation to optimise adaptive behaviour remains unclear. Here, we trained adult male rats on a novel maze-based rein-forcement learning task designed to dissociate reward outcomes from reward-prediction errors. Four variations of a reinforcement learning model were fitted to the rats’ behaviour over multiple days. Behaviour was best predicted by a model incorporating replay biased by reward-prediction error, compared to the same model with no replay; random replay or reward-biased replay produced poorer predictions of behaviour. This insight disentangles the influences of salience on replay, suggesting that reinforcement learning is tuned by post-learning replay biased by reward-prediction error, not by reward per se. This work therefore provides a behavioural and theoretical toolkit with which to measure and interpret replay in striatal, hippocampal and neocortical circuits.

Download Full-text

Frontal Theta Oscillatory Activity Is a Common Mechanism for the Computation of Unexpected Outcomes and Learning Rate

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_00516 ◽

2014 ◽

Vol 26 (3) ◽

pp. 447-458 ◽

Cited By ~ 39

Author(s):

Ernest Mas-Herrero ◽

Josep Marco-Pallarés

Keyword(s):

Prediction Error ◽

Learning Task ◽

Learning Rate ◽

Oscillatory Activity ◽

Common Mechanism ◽

Prediction Errors ◽

Reward Prediction Error ◽

Reward Prediction ◽

Medial Pfc ◽

The Impact

In decision-making processes, the relevance of the information yielded by outcomes varies across time and situations. It increases when previous predictions are not accurate and in contexts with high environmental uncertainty. Previous fMRI studies have shown an important role of medial pFC in coding both reward prediction errors and the impact of this information to guide future decisions. However, it is unclear whether these two processes are dissociated in time or occur simultaneously, suggesting that a common mechanism is engaged. In the present work, we studied the modulation of two electrophysiological responses associated to outcome processing—the feedback-related negativity ERP and frontocentral theta oscillatory activity—with the reward prediction error and the learning rate. Twenty-six participants performed two learning tasks differing in the degree of predictability of the outcomes: a reversal learning task and a probabilistic learning task with multiple blocks of novel cue–outcome associations. We implemented a reinforcement learning model to obtain the single-trial reward prediction error and the learning rate for each participant and task. Our results indicated that midfrontal theta activity and feedback-related negativity increased linearly with the unsigned prediction error. In addition, variations of frontal theta oscillatory activity predicted the learning rate across tasks and participants. These results support the existence of a common brain mechanism for the computation of unsigned prediction error and learning rate.

Download Full-text

Differential reinforcement encoding along the hippocampal long axis helps resolve the explore–exploit dilemma

Nature Communications ◽

10.1038/s41467-020-18864-0 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Alexandre Y. Dombrovski ◽

Beatriz Luna ◽

Michael N. Hallquist

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Differential Reinforcement ◽

Cognitive Maps ◽

Learning Task ◽

Natural Environments ◽

Reward Prediction Error ◽

Reward Function ◽

Reward Prediction ◽

Reward Information

Abstract When making decisions, should one exploit known good options or explore potentially better alternatives? Exploration of spatially unstructured options depends on the neocortex, striatum, and amygdala. In natural environments, however, better options often cluster together, forming structured value distributions. The hippocampus binds reward information into allocentric cognitive maps to support navigation and foraging in such spaces. Here we report that human posterior hippocampus (PH) invigorates exploration while anterior hippocampus (AH) supports the transition to exploitation on a reinforcement learning task with a spatially structured reward function. These dynamics depend on differential reinforcement representations in the PH and AH. Whereas local reward prediction error signals are early and phasic in the PH tail, global value maximum signals are delayed and sustained in the AH body. AH compresses reinforcement information across episodes, updating the location and prominence of the value maximum and displaying goal cell-like ramping activity when navigating toward it.

Download Full-text

Momentary subjective well-being depends on learning and not reward

eLife ◽

10.7554/elife.57977 ◽

2020 ◽

Vol 9 ◽

Author(s):

Bastien Blain ◽

Robb B Rutledge

Keyword(s):

Prediction Error ◽

Computational Modelling ◽

Well Being ◽

Learning Task ◽

Adaptive Behaviour ◽

Subjective Well Being ◽

Reward Prediction Error ◽

Reward Prediction ◽

The Difference ◽

Momentary Happiness

Subjective well-being or happiness is often associated with wealth. Recent studies suggest that momentary happiness is associated with reward prediction error, the difference between experienced and predicted reward, a key component of adaptive behaviour. We tested subjects in a reinforcement learning task in which reward size and probability were uncorrelated, allowing us to dissociate between the contributions of reward and learning to happiness. Using computational modelling, we found convergent evidence across stable and volatile learning tasks that happiness, like behaviour, is sensitive to learning-relevant variables (i.e. probability prediction error). Unlike behaviour, happiness is not sensitive to learning-irrelevant variables (i.e. reward prediction error). Increasing volatility reduces how many past trials influence behaviour but not happiness. Finally, depressive symptoms reduce happiness more in volatile than stable environments. Our results suggest that how we learn about our world may be more important for how we feel than the rewards we actually receive.

Download Full-text

Subjective and model-estimated reward prediction: Association with the feedback-related negativity (FRN) and reward prediction error in a reinforcement learning task

International Journal of Psychophysiology ◽

10.1016/j.ijpsycho.2010.09.001 ◽

2010 ◽

Vol 78 (3) ◽

pp. 273-283 ◽

Cited By ~ 15

Author(s):

Naho Ichikawa ◽

Greg J. Siegle ◽

Alexandre Dombrovski ◽

Hideki Ohira

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Learning Task ◽

Reward Prediction Error ◽

Reward Prediction

Download Full-text

Reward prediction error in the ERP following unconditioned aversive stimuli

Scientific Reports ◽

10.1038/s41598-021-99408-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Harry J. Stewardson ◽

Thomas D. Sambrook

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Temporal Difference ◽

Dopamine System ◽

Reward Prediction Error ◽

Reward Prediction ◽

Midbrain Dopamine ◽

Human Participants

AbstractReinforcement learning in humans and other animals is driven by reward prediction errors: deviations between the amount of reward or punishment initially expected and that which is obtained. Temporal difference methods of reinforcement learning generate this reward prediction error at the earliest time at which a revision in reward or punishment likelihood is signalled, for example by a conditioned stimulus. Midbrain dopamine neurons, believed to compute reward prediction errors, generate this signal in response to both conditioned and unconditioned stimuli, as predicted by temporal difference learning. Electroencephalographic recordings of human participants have suggested that a component named the feedback-related negativity (FRN) is generated when this signal is carried to the cortex. If this is so, the FRN should be expected to respond equivalently to conditioned and unconditioned stimuli. However, very few studies have attempted to measure the FRN’s response to unconditioned stimuli. The present study attempted to elicit the FRN in response to a primary aversive stimulus (electric shock) using a design that varied reward prediction error while holding physical intensity constant. The FRN was strongly elicited, but earlier and more transiently than typically seen, suggesting that it may incorporate other processes than the midbrain dopamine system.

Download Full-text

Differential reinforcement encoding along the hippocampal long axis helps resolve the explore/exploit dilemma

10.1101/2020.01.02.893255 ◽

2020 ◽

Author(s):

Alexandre Y. Dombrovski ◽

Beatriz Luna ◽

Michael N. Hallquist

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Differential Reinforcement ◽

Cognitive Maps ◽

Learning Task ◽

Natural Environments ◽

Reward Prediction Error ◽

Reward Function ◽

Reward Prediction ◽

Reward Information

ABSTRACTWhen making decisions, should one exploit known good options or explore potentially better alternatives? Exploration of spatially unstructured options depends on the neocortex, striatum, and amygdala. In natural environments, however, better options often cluster together, forming structured value distributions. The hippocampus binds reward information into allocentric cognitive maps to support navigation and foraging in such spaces. Using a reinforcement learning task with a spatially structured reward function, we show that human posterior hippocampus (PH) invigorates exploration while anterior hippocampus (AH) supports the transition to exploitation. These dynamics depend on differential reinforcement representations in the PH and AH. Whereas local reward prediction error signals are early and phasic in the PH tail, global value maximum signals are delayed and sustained in the AH body. AH compresses reinforcement information across episodes, updating the location and prominence of the value maximum and displaying goal cell-like ramping activity when navigating toward it.

Download Full-text

Positive reward prediction errors strengthen incidental memory encoding

10.1101/327445 ◽

2018 ◽

Cited By ~ 2

Author(s):

Anthony I. Jang ◽

Matthew R. Nassar ◽

Daniel G. Dillon ◽

Michael J. Frank

Keyword(s):

Prediction Error ◽

Memory Systems ◽

Prediction Errors ◽

Memory Encoding ◽

Dopamine System ◽

Reward Prediction Error ◽

Reward Prediction ◽

Incidental Memory ◽

Episodic Memories ◽

The Impact

AbstractThe dopamine system is thought to provide a reward prediction error signal that facilitates reinforcement learning and reward-based choice in corticostriatal circuits. While it is believed that similar prediction error signals are also provided to temporal lobe memory systems, the impact of such signals on episodic memory encoding has not been fully characterized. Here we develop an incidental memory paradigm that allows us to 1) estimate the influence of reward prediction errors on the formation of episodic memories, 2) dissociate this influence from other factors such as surprise and uncertainty, 3) test the degree to which this influence depends on temporal correspondence between prediction error and memoranda presentation, and 4) determine the extent to which this influence is consolidation-dependent. We find that when choosing to gamble for potential rewards during a primary decision making task, people encode incidental memoranda more strongly even though they are not aware that their memory will be subsequently probed. Moreover, this strengthened encoding scales with the reward prediction error, and not overall reward, experienced selectively at the time of memoranda presentation (and not before or after). Finally, this strengthened encoding is identifiable within a few minutes and is not substantially enhanced after twenty-four hours, indicating that it is not consolidation-dependent. These results suggest a computationally and temporally specific role for putative dopaminergic reward prediction error signaling in memory formation.

Download Full-text

Amphetamine disrupts haemodynamic correlates of prediction errors in nucleus accumbens and orbitofrontal cortex

10.1101/802488 ◽

2019 ◽

Cited By ~ 1

Author(s):

Emilie Werlen ◽

Soon-Lim Shin ◽

Francois Gastambide ◽

Jennifer Francois ◽

Mark D Tricklebank ◽

...

Keyword(s):

Nucleus Accumbens ◽

Orbitofrontal Cortex ◽

Prediction Error ◽

Learning Task ◽

Adaptive Behaviour ◽

Diagnostic Feature ◽

Prediction Errors ◽

Guided Learning ◽

The Difference ◽

Precise Relationship

AbstractIn an uncertain world, the ability to predict and update the relationships between environmental cues and outcomes is a fundamental element of adaptive behaviour. This type of learning is typically thought to depend on prediction error, the difference between expected and experienced events, and in the reward domain this has been closely linked to mesolimbic dopamine. There is also increasing behavioural and neuroimaging evidence that disruption to this process may be a cross-diagnostic feature of several neuropsychiatric and neurological disorders in which dopamine is dysregulated. However, the precise relationship between haemodynamic measures, dopamine and reward-guided learning remains unclear. To help address this issue, we used a translational technique, oxygen amperometry, to record haemodynamic signals in the nucleus accumbens (NAc) and orbitofrontal cortex (OFC) while freely-moving rats performed a probabilistic Pavlovian learning task. Using a model-based analysis approach to account for individual variations in learning, we found that the oxygen signal in the NAc correlated with a reward prediction error, whereas in the OFC it correlated with an unsigned prediction error or salience signal. Furthermore, an acute dose of amphetamine, creating a hyperdopaminergic state, disrupted rats’ ability to discriminate between cues associated with either a high or a low probability of reward and concomitantly corrupted prediction error signalling. These results demonstrate parallel but distinct prediction error signals in NAc and OFC during learning, both of which are affected by psychostimulant administration. Furthermore, they establish the viability of tracking and manipulating haemodynamic signatures of reward-guided learning observed in human fMRI studies using a proxy signal for BOLD in a freely behaving rodent.

Download Full-text

Dopamine transients delivered in learning contexts do not act as model-free prediction errors

10.1101/574541 ◽

2019 ◽

Cited By ~ 3

Author(s):

Melissa J. Sharpe ◽

Hannah M. Batchelor ◽

Lauren E. Mueller ◽

Chun Yun Chang ◽

Etienne J.P. Maes ◽

...

Keyword(s):

Reinforcement Learning ◽

Associative Learning ◽

Prediction Error ◽

Error Term ◽

Neural Correlates ◽

Dopamine Neurons ◽

Prediction Errors ◽

Model Free ◽

Reward Prediction ◽

Excess Value

AbstractDopamine neurons fire transiently in response to unexpected rewards. These neural correlates are proposed to signal the reward prediction error described in model-free reinforcement learning algorithms. This error term represents the unpredicted or ‘excess’ value of the rewarding event. In model-free reinforcement learning, this value is then stored as part of the learned value of any antecedent cues, contexts or events, making them intrinsically valuable, independent of the specific rewarding event that caused the prediction error. In support of equivalence between dopamine transients and this model-free error term, proponents cite causal optogenetic studies showing that artificially induced dopamine transients cause lasting changes in behavior. Yet none of these studies directly demonstrate the presence of cached value under conditions appropriate for associative learning. To address this gap in our knowledge, we conducted three studies where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquired value and instead entered into value-independent associative relationships with the other cues or rewards. These results show that dopamine transients, constrained within appropriate learning situations, support valueless associative learning.

Download Full-text

Dopamine reward prediction error coding

Dialogues in Clinical Neuroscience ◽

10.31887/dcns.2016.18.1/wschultz ◽

2016 ◽

Vol 18 (1) ◽

pp. 23-32 ◽

Cited By ~ 71

Keyword(s):

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Reward Prediction Error ◽

Reward Prediction ◽

Negative Prediction ◽

Baseline Activity ◽

Error Coding ◽

Reward Value ◽

Dopamine Signal

Reward prediction errors consist of the differences between received and predicted rewards. They are crucial for basic forms of learning about rewards and make us strive for more rewards—an evolutionary beneficial trait. Most dopamine neurons in the midbrain of humans, monkeys, and rodents signal a reward prediction error; they are activated by more reward than predicted (positive prediction error), remain at baseline activity for fully predicted rewards, and show depressed activity with less reward than predicted (negative prediction error). The dopamine signal increases nonlinearly with reward value and codes formal economic utility. Drugs of addiction generate, hijack, and amplify the dopamine reward signal and induce exaggerated, uncontrolled dopamine effects on neuronal plasticity. The striatum, amygdala, and frontal cortex also show reward prediction error coding, but only in subpopulations of neurons. Thus, the important concept of reward prediction errors is implemented in neuronal hardware.

Download Full-text