Dopamine transients delivered in learning contexts do not act as model-free prediction errors

AbstractDopamine neurons fire transiently in response to unexpected rewards. These neural correlates are proposed to signal the reward prediction error described in model-free reinforcement learning algorithms. This error term represents the unpredicted or ‘excess’ value of the rewarding event. In model-free reinforcement learning, this value is then stored as part of the learned value of any antecedent cues, contexts or events, making them intrinsically valuable, independent of the specific rewarding event that caused the prediction error. In support of equivalence between dopamine transients and this model-free error term, proponents cite causal optogenetic studies showing that artificially induced dopamine transients cause lasting changes in behavior. Yet none of these studies directly demonstrate the presence of cached value under conditions appropriate for associative learning. To address this gap in our knowledge, we conducted three studies where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquired value and instead entered into value-independent associative relationships with the other cues or rewards. These results show that dopamine transients, constrained within appropriate learning situations, support valueless associative learning.

Download Full-text

Dopamine transients do not act as model-free prediction errors during associative learning

Nature Communications ◽

10.1038/s41467-019-13953-1 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 6

Author(s):

Melissa J. Sharpe ◽

Hannah M. Batchelor ◽

Lauren E. Mueller ◽

Chun Yun Chang ◽

Etienne J. P. Maes ◽

...

Keyword(s):

Reinforcement Learning ◽

Associative Learning ◽

Prediction Error ◽

Intrinsic Value ◽

Learning Algorithms ◽

Dopamine Neurons ◽

Prediction Errors ◽

Model Free ◽

Reward Prediction ◽

Excess Value

AbstractDopamine neurons are proposed to signal the reward prediction error in model-free reinforcement learning algorithms. This term represents the unpredicted or ‘excess’ value of the rewarding event, value that is then added to the intrinsic value of any antecedent cues, contexts or events. To support this proposal, proponents cite evidence that artificially-induced dopamine transients cause lasting changes in behavior. Yet these studies do not generally assess learning under conditions where an endogenous prediction error would occur. Here, to address this, we conducted three experiments where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquire value and instead entered into associations with the later events, whether valueless cues or valued rewards. These results show that in learning situations appropriate for the appearance of a prediction error, dopamine transients support associative, rather than model-free, learning.

Download Full-text

Rethinking dopamine as generalized prediction error

10.1101/239731 ◽

2017 ◽

Cited By ~ 2

Author(s):

Matthew P.H. Gardner ◽

Geoffrey Schoenbaum ◽

Samuel J. Gershman

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Model Free ◽

Reward Prediction ◽

Midbrain Dopamine ◽

Sensory Prediction ◽

Lines Of Evidence ◽

Midbrain Dopamine Neurons

AbstractMidbrain dopamine neurons are commonly thought to report a reward prediction error, as hypothesized by reinforcement learning theory. While this theory has been highly successful, several lines of evidence suggest that dopamine activity also encodes sensory prediction errors unrelated to reward. Here we develop a new theory of dopamine function that embraces a broader conceptualization of prediction errors. By signaling errors in both sensory and reward predictions, dopamine supports a form of reinforcement learning that lies between model-based and model-free algorithms. This account remains consistent with current canon regarding the correspondence between dopamine transients and reward prediction errors, while also accounting for new data suggesting a role for these signals in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.

Download Full-text

Rethinking dopamine as generalized prediction error

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2018.1645 ◽

2018 ◽

Vol 285 (1891) ◽

pp. 20181645 ◽

Cited By ~ 32

Author(s):

Matthew P. H. Gardner ◽

Geoffrey Schoenbaum ◽

Samuel J. Gershman

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Model Free ◽

Reward Prediction ◽

Midbrain Dopamine ◽

Sensory Prediction ◽

Lines Of Evidence ◽

Midbrain Dopamine Neurons

Midbrain dopamine neurons are commonly thought to report a reward prediction error (RPE), as hypothesized by reinforcement learning (RL) theory. While this theory has been highly successful, several lines of evidence suggest that dopamine activity also encodes sensory prediction errors unrelated to reward. Here, we develop a new theory of dopamine function that embraces a broader conceptualization of prediction errors. By signalling errors in both sensory and reward predictions, dopamine supports a form of RL that lies between model-based and model-free algorithms. This account remains consistent with current canon regarding the correspondence between dopamine transients and RPEs, while also accounting for new data suggesting a role for these signals in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.

Download Full-text

How We Learn to Make Decisions: Rapid Propagation of Reinforcement Learning Prediction Errors in Humans

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_00509 ◽

2014 ◽

Vol 26 (3) ◽

pp. 635-644 ◽

Cited By ~ 38

Author(s):

Olav E. Krigolson ◽

Cameron D. Hassall ◽

Todd C. Handy

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Human Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Neural Basis ◽

Error Related Negativity ◽

Reward Positivity ◽

Reward Prediction ◽

Feedback Error

Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors—discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833–1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129–141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769–776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679–709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.

Download Full-text

Reward prediction error in the ERP following unconditioned aversive stimuli

Scientific Reports ◽

10.1038/s41598-021-99408-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Harry J. Stewardson ◽

Thomas D. Sambrook

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Temporal Difference ◽

Dopamine System ◽

Reward Prediction Error ◽

Reward Prediction ◽

Midbrain Dopamine ◽

Human Participants

AbstractReinforcement learning in humans and other animals is driven by reward prediction errors: deviations between the amount of reward or punishment initially expected and that which is obtained. Temporal difference methods of reinforcement learning generate this reward prediction error at the earliest time at which a revision in reward or punishment likelihood is signalled, for example by a conditioned stimulus. Midbrain dopamine neurons, believed to compute reward prediction errors, generate this signal in response to both conditioned and unconditioned stimuli, as predicted by temporal difference learning. Electroencephalographic recordings of human participants have suggested that a component named the feedback-related negativity (FRN) is generated when this signal is carried to the cortex. If this is so, the FRN should be expected to respond equivalently to conditioned and unconditioned stimuli. However, very few studies have attempted to measure the FRN’s response to unconditioned stimuli. The present study attempted to elicit the FRN in response to a primary aversive stimulus (electric shock) using a design that varied reward prediction error while holding physical intensity constant. The FRN was strongly elicited, but earlier and more transiently than typically seen, suggesting that it may incorporate other processes than the midbrain dopamine system.

Download Full-text

What is dopamine doing in model-based reinforcement learning?

10.31234/osf.io/z2fmw ◽

2020 ◽

Author(s):

Thomas Akam ◽

Mark Walton

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Dopamine Neurons ◽

State Representation ◽

Dopaminergic Activity ◽

Model Based ◽

Model Free ◽

Reward Prediction ◽

Predictive State Representation ◽

Teaching Signal

Experiments have implicated dopamine in model-based reinforcement learning (RL). These findings are unexpected as dopamine is thought to encode a reward prediction error (RPE), which is the key teaching signal in model-free RL. Here we examine two possible accounts for dopamine’s involvement in model-based RL: the first that dopamine neurons carry a prediction error used to update a type of predictive state representation called a successor representation, the second that two well established aspects of dopaminergic activity, RPEs and surprise signals, can together explain dopamine’s involvement in model-based RL.

Download Full-text

Prefrontal solution to the bias-variance tradeoff during reinforcement learning

10.1101/2020.12.23.424258 ◽

2020 ◽

Author(s):

Dongjae Kim ◽

Jaeseung Jeong ◽

Sang Wan Lee

Keyword(s):

Adaptive Control ◽

Reinforcement Learning ◽

Prediction Error ◽

Brain Regions ◽

Decision Task ◽

Prediction Errors ◽

Model Based ◽

Model Free ◽

Bias Variance ◽

The Brain

AbstractThe goal of learning is to maximize future rewards by minimizing prediction errors. Evidence have shown that the brain achieves this by combining model-based and model-free learning. However, the prediction error minimization is challenged by a bias-variance tradeoff, which imposes constraints on each strategy’s performance. We provide new theoretical insight into how this tradeoff can be resolved through the adaptive control of model-based and model-free learning. The theory predicts the baseline correction for prediction error reduces the lower bound of the bias–variance error by factoring out irreducible noise. Using a Markov decision task with context changes, we showed behavioral evidence of adaptive control. Model-based behavioral analyses show that the prediction error baseline signals context changes to improve adaptability. Critically, the neural results support this view, demonstrating multiplexed representations of prediction error baseline within the ventrolateral and ventromedial prefrontal cortex, key brain regions known to guide model-based and model-free learning.One sentence summaryA theoretical, behavioral, computational, and neural account of how the brain resolves the bias-variance tradeoff during reinforcement learning is described.

Download Full-text

Dopamine reward prediction error coding

Dialogues in Clinical Neuroscience ◽

10.31887/dcns.2016.18.1/wschultz ◽

2016 ◽

Vol 18 (1) ◽

pp. 23-32 ◽

Cited By ~ 71

Keyword(s):

Prediction Error ◽

Dopamine Neurons ◽

Prediction Errors ◽

Reward Prediction Error ◽

Reward Prediction ◽

Negative Prediction ◽

Baseline Activity ◽

Error Coding ◽

Reward Value ◽

Dopamine Signal

Reward prediction errors consist of the differences between received and predicted rewards. They are crucial for basic forms of learning about rewards and make us strive for more rewards—an evolutionary beneficial trait. Most dopamine neurons in the midbrain of humans, monkeys, and rodents signal a reward prediction error; they are activated by more reward than predicted (positive prediction error), remain at baseline activity for fully predicted rewards, and show depressed activity with less reward than predicted (negative prediction error). The dopamine signal increases nonlinearly with reward value and codes formal economic utility. Drugs of addiction generate, hijack, and amplify the dopamine reward signal and induce exaggerated, uncontrolled dopamine effects on neuronal plasticity. The striatum, amygdala, and frontal cortex also show reward prediction error coding, but only in subpopulations of neurons. Thus, the important concept of reward prediction errors is implemented in neuronal hardware.

Download Full-text

Dopamine, Inference, and Uncertainty

Neural Computation ◽

10.1162/neco_a_01023 ◽

2017 ◽

Vol 29 (12) ◽

pp. 3311-3326 ◽

Cited By ~ 17

Author(s):

Samuel J. Gershman

Keyword(s):

Reinforcement Learning ◽

Learning Rule ◽

Dopamine Neurons ◽

Prediction Errors ◽

Stimulus Representation ◽

Reward Prediction ◽

Probabilistic Beliefs ◽

Bayesian Reinforcement Learning ◽

Simple Error ◽

The Relationship

The hypothesis that the phasic dopamine response reports a reward prediction error has become deeply entrenched. However, dopamine neurons exhibit several notable deviations from this hypothesis. A coherent explanation for these deviations can be obtained by analyzing the dopamine response in terms of Bayesian reinforcement learning. The key idea is that prediction errors are modulated by probabilistic beliefs about the relationship between cues and outcomes, updated through Bayesian inference. This account can explain dopamine responses to inferred value in sensory preconditioning, the effects of cue preexposure (latent inhibition), and adaptive coding of prediction errors when rewards vary across orders of magnitude. We further postulate that orbitofrontal cortex transforms the stimulus representation through recurrent dynamics, such that a simple error-driven learning rule operating on the transformed representation can implement the Bayesian reinforcement learning update.

Download Full-text

Behavioural and computational evidence for memory consolidation biased by reward-prediction errors

10.1101/716290 ◽

2019 ◽

Author(s):

Emma L. Roscow ◽

Matthew W. Jones ◽

Nathan F. Lepora

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Learning Task ◽

Male Rats ◽

Adaptive Behaviour ◽

Prediction Errors ◽

Reward Prediction Error ◽

Reward Prediction ◽

Per Se ◽

Reinforcement Learning Model

AbstractNeural activity encoding recent experiences is replayed during sleep and rest to promote consolidation of the corresponding memories. However, precisely which features of experience influence replay prioritisation to optimise adaptive behaviour remains unclear. Here, we trained adult male rats on a novel maze-based rein-forcement learning task designed to dissociate reward outcomes from reward-prediction errors. Four variations of a reinforcement learning model were fitted to the rats’ behaviour over multiple days. Behaviour was best predicted by a model incorporating replay biased by reward-prediction error, compared to the same model with no replay; random replay or reward-biased replay produced poorer predictions of behaviour. This insight disentangles the influences of salience on replay, suggesting that reinforcement learning is tuned by post-learning replay biased by reward-prediction error, not by reward per se. This work therefore provides a behavioural and theoretical toolkit with which to measure and interpret replay in striatal, hippocampal and neocortical circuits.

Download Full-text