scholarly journals Dopamine reward prediction error coding

2016 ◽  
Vol 18 (1) ◽  
pp. 23-32 ◽  

Reward prediction errors consist of the differences between received and predicted rewards. They are crucial for basic forms of learning about rewards and make us strive for more rewards—an evolutionary beneficial trait. Most dopamine neurons in the midbrain of humans, monkeys, and rodents signal a reward prediction error; they are activated by more reward than predicted (positive prediction error), remain at baseline activity for fully predicted rewards, and show depressed activity with less reward than predicted (negative prediction error). The dopamine signal increases nonlinearly with reward value and codes formal economic utility. Drugs of addiction generate, hijack, and amplify the dopamine reward signal and induce exaggerated, uncontrolled dopamine effects on neuronal plasticity. The striatum, amygdala, and frontal cortex also show reward prediction error coding, but only in subpopulations of neurons. Thus, the important concept of reward prediction errors is implemented in neuronal hardware.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Harry J. Stewardson ◽  
Thomas D. Sambrook

AbstractReinforcement learning in humans and other animals is driven by reward prediction errors: deviations between the amount of reward or punishment initially expected and that which is obtained. Temporal difference methods of reinforcement learning generate this reward prediction error at the earliest time at which a revision in reward or punishment likelihood is signalled, for example by a conditioned stimulus. Midbrain dopamine neurons, believed to compute reward prediction errors, generate this signal in response to both conditioned and unconditioned stimuli, as predicted by temporal difference learning. Electroencephalographic recordings of human participants have suggested that a component named the feedback-related negativity (FRN) is generated when this signal is carried to the cortex. If this is so, the FRN should be expected to respond equivalently to conditioned and unconditioned stimuli. However, very few studies have attempted to measure the FRN’s response to unconditioned stimuli. The present study attempted to elicit the FRN in response to a primary aversive stimulus (electric shock) using a design that varied reward prediction error while holding physical intensity constant. The FRN was strongly elicited, but earlier and more transiently than typically seen, suggesting that it may incorporate other processes than the midbrain dopamine system.





2020 ◽  
Author(s):  
Pramod Kaushik ◽  
Jérémie Naudé ◽  
Surampudi Bapi Raju ◽  
Frédéric Alexandre

AbstractClassical Conditioning is a fundamental learning mechanism where the Ventral Striatum is generally thought to be the source of inhibition to Ventral Tegmental Area (VTA) Dopamine neurons when a reward is expected. However, recent evidences point to a new candidate in VTA GABA encoding expectation for computing the reward prediction error in the VTA. In this system-level computational model, the VTA GABA signal is hypothesised to be a combination of magnitude and timing computed in the Peduncolopontine and Ventral Striatum respectively. This dissociation enables the model to explain recent results wherein Ventral Striatum lesions affected the temporal expectation of the reward but the magnitude of the reward was intact. This model also exhibits other features in classical conditioning namely, progressively decreasing firing for early rewards closer to the actual reward, twin peaks of VTA dopamine during training and cancellation of US dopamine after training.



2010 ◽  
Vol 30 (34) ◽  
pp. 11447-11457 ◽  
Author(s):  
K. Oyama ◽  
I. Hernadi ◽  
T. Iijima ◽  
K.-I. Tsutsui


2014 ◽  
Vol 26 (3) ◽  
pp. 447-458 ◽  
Author(s):  
Ernest Mas-Herrero ◽  
Josep Marco-Pallarés

In decision-making processes, the relevance of the information yielded by outcomes varies across time and situations. It increases when previous predictions are not accurate and in contexts with high environmental uncertainty. Previous fMRI studies have shown an important role of medial pFC in coding both reward prediction errors and the impact of this information to guide future decisions. However, it is unclear whether these two processes are dissociated in time or occur simultaneously, suggesting that a common mechanism is engaged. In the present work, we studied the modulation of two electrophysiological responses associated to outcome processing—the feedback-related negativity ERP and frontocentral theta oscillatory activity—with the reward prediction error and the learning rate. Twenty-six participants performed two learning tasks differing in the degree of predictability of the outcomes: a reversal learning task and a probabilistic learning task with multiple blocks of novel cue–outcome associations. We implemented a reinforcement learning model to obtain the single-trial reward prediction error and the learning rate for each participant and task. Our results indicated that midfrontal theta activity and feedback-related negativity increased linearly with the unsigned prediction error. In addition, variations of frontal theta oscillatory activity predicted the learning rate across tasks and participants. These results support the existence of a common brain mechanism for the computation of unsigned prediction error and learning rate.



2018 ◽  
Author(s):  
Anthony I. Jang ◽  
Matthew R. Nassar ◽  
Daniel G. Dillon ◽  
Michael J. Frank

AbstractThe dopamine system is thought to provide a reward prediction error signal that facilitates reinforcement learning and reward-based choice in corticostriatal circuits. While it is believed that similar prediction error signals are also provided to temporal lobe memory systems, the impact of such signals on episodic memory encoding has not been fully characterized. Here we develop an incidental memory paradigm that allows us to 1) estimate the influence of reward prediction errors on the formation of episodic memories, 2) dissociate this influence from other factors such as surprise and uncertainty, 3) test the degree to which this influence depends on temporal correspondence between prediction error and memoranda presentation, and 4) determine the extent to which this influence is consolidation-dependent. We find that when choosing to gamble for potential rewards during a primary decision making task, people encode incidental memoranda more strongly even though they are not aware that their memory will be subsequently probed. Moreover, this strengthened encoding scales with the reward prediction error, and not overall reward, experienced selectively at the time of memoranda presentation (and not before or after). Finally, this strengthened encoding is identifiable within a few minutes and is not substantially enhanced after twenty-four hours, indicating that it is not consolidation-dependent. These results suggest a computationally and temporally specific role for putative dopaminergic reward prediction error signaling in memory formation.



2018 ◽  
Author(s):  
Rachel S. Lee ◽  
Marcelo G. Mattar ◽  
Nathan F. Parker ◽  
Ilana B. Witten ◽  
Nathaniel D. Daw

AbstractAlthough midbrain dopamine (DA) neurons have been thought to primarily encode reward prediction error (RPE), recent studies have also found movement-related DAergic signals. For example, we recently reported that DA neurons in mice projecting to dorsomedial striatum are modulated by choices contralateral to the recording side. Here, we introduce, and ultimately reject, a candidate resolution for the puzzling RPE vs movement dichotomy, by showing how seemingly movement-related activity might be explained by an action-specific RPE. By considering both choice and RPE on a trial-by-trial basis, we find that DA signals are modulated by contralateral choice in a manner that is distinct from RPE, implying that choice encoding is better explained by movement direction. This fundamental separation between RPE and movement encoding may help shed light on the diversity of functions and dysfunctions of the DA system.



2019 ◽  
Author(s):  
Melissa J. Sharpe ◽  
Hannah M. Batchelor ◽  
Lauren E. Mueller ◽  
Chun Yun Chang ◽  
Etienne J.P. Maes ◽  
...  

AbstractDopamine neurons fire transiently in response to unexpected rewards. These neural correlates are proposed to signal the reward prediction error described in model-free reinforcement learning algorithms. This error term represents the unpredicted or ‘excess’ value of the rewarding event. In model-free reinforcement learning, this value is then stored as part of the learned value of any antecedent cues, contexts or events, making them intrinsically valuable, independent of the specific rewarding event that caused the prediction error. In support of equivalence between dopamine transients and this model-free error term, proponents cite causal optogenetic studies showing that artificially induced dopamine transients cause lasting changes in behavior. Yet none of these studies directly demonstrate the presence of cached value under conditions appropriate for associative learning. To address this gap in our knowledge, we conducted three studies where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquired value and instead entered into value-independent associative relationships with the other cues or rewards. These results show that dopamine transients, constrained within appropriate learning situations, support valueless associative learning.



2014 ◽  
Vol 26 (3) ◽  
pp. 635-644 ◽  
Author(s):  
Olav E. Krigolson ◽  
Cameron D. Hassall ◽  
Todd C. Handy

Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors—discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833–1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129–141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769–776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679–709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.



Sign in / Sign up

Export Citation Format

Share Document