scholarly journals Learning with reward prediction errors in a model of the Drosophila mushroom body

2019 ◽  
Author(s):  
James E. M. Bennett ◽  
Andrew Philippides ◽  
Thomas Nowotny

AbstractEffective decision making in a changing environment demands that accurate predictions are learned about decision outcomes. In Drosophila, such learning is or-chestrated in part by the mushroom body (MB), where dopamine neurons (DANs) signal reinforcing stimuli to modulate plasticity presynaptic to MB output neurons (MBONs). Here, we extend previous MB models, in which DANs signal absolute rewards, proposing instead that DANs signal reward prediction errors (RPEs) by utilising feedback reward predictions from MBONs. We formulate plasticity rules that minimise RPEs, and use simulations to verify that MBONs learn accurate reward predictions. We postulate as yet unobserved connectivity, which not only overcomes limitations in the experimentally constrained model, but also explains additional experimental observations that connect MB physiology to learning. The original, experimentally constrained model and the augmented model capture a broad range of established fly behaviours, and together make five predictions that can be tested using established experimental methods.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
James E. M. Bennett ◽  
Andrew Philippides ◽  
Thomas Nowotny

AbstractEffective decision making in a changing environment demands that accurate predictions are learned about decision outcomes. In Drosophila, such learning is orchestrated in part by the mushroom body, where dopamine neurons signal reinforcing stimuli to modulate plasticity presynaptic to mushroom body output neurons. Building on previous mushroom body models, in which dopamine neurons signal absolute reinforcement, we propose instead that dopamine neurons signal reinforcement prediction errors by utilising feedback reinforcement predictions from output neurons. We formulate plasticity rules that minimise prediction errors, verify that output neurons learn accurate reinforcement predictions in simulations, and postulate connectivity that explains more physiological observations than an experimentally constrained model. The constrained and augmented models reproduce a broad range of conditioning and blocking experiments, and we demonstrate that the absence of blocking does not imply the absence of prediction error dependent learning. Our results provide five predictions that can be tested using established experimental methods.


2008 ◽  
Vol 20 (12) ◽  
pp. 3034-3054 ◽  
Author(s):  
Elliot A. Ludvig ◽  
Richard S. Sutton ◽  
E. James Kehoe

The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each moment in a trial is distinctly represented. We introduce a more realistic temporal stimulus representation for the TD model. In our model, all external stimuli, including rewards, spawn a series of internal microstimuli, which grow weaker and more diffuse over time. These microstimuli are used by the TD learning algorithm to generate predictions of future reward. This new stimulus representation injects temporal generalization into the TD model and enhances correspondence between model and data in several experiments, including those when rewards are omitted or received early. This improved fit mostly derives from the absence of large negative errors in the new model, suggesting that dopamine alone can encode the full range of TD errors in these situations.


eLife ◽  
2014 ◽  
Vol 3 ◽  
Author(s):  
Katrin Vogt ◽  
Christopher Schnaitmann ◽  
Kristina V Dylla ◽  
Stephan Knapek ◽  
Yoshinori Aso ◽  
...  

In nature, animals form memories associating reward or punishment with stimuli from different sensory modalities, such as smells and colors. It is unclear, however, how distinct sensory memories are processed in the brain. We established appetitive and aversive visual learning assays for Drosophila that are comparable to the widely used olfactory learning assays. These assays share critical features, such as reinforcing stimuli (sugar reward and electric shock punishment), and allow direct comparison of the cellular requirements for visual and olfactory memories. We found that the same subsets of dopamine neurons drive formation of both sensory memories. Furthermore, distinct yet partially overlapping subsets of mushroom body intrinsic neurons are required for visual and olfactory memories. Thus, our results suggest that distinct sensory memories are processed in a common brain center. Such centralization of related brain functions is an economical design that avoids the repetition of similar circuit motifs.


2019 ◽  
Author(s):  
Melissa J. Sharpe ◽  
Hannah M. Batchelor ◽  
Lauren E. Mueller ◽  
Chun Yun Chang ◽  
Etienne J.P. Maes ◽  
...  

AbstractDopamine neurons fire transiently in response to unexpected rewards. These neural correlates are proposed to signal the reward prediction error described in model-free reinforcement learning algorithms. This error term represents the unpredicted or ‘excess’ value of the rewarding event. In model-free reinforcement learning, this value is then stored as part of the learned value of any antecedent cues, contexts or events, making them intrinsically valuable, independent of the specific rewarding event that caused the prediction error. In support of equivalence between dopamine transients and this model-free error term, proponents cite causal optogenetic studies showing that artificially induced dopamine transients cause lasting changes in behavior. Yet none of these studies directly demonstrate the presence of cached value under conditions appropriate for associative learning. To address this gap in our knowledge, we conducted three studies where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquired value and instead entered into value-independent associative relationships with the other cues or rewards. These results show that dopamine transients, constrained within appropriate learning situations, support valueless associative learning.


2016 ◽  
Vol 18 (1) ◽  
pp. 23-32 ◽  

Reward prediction errors consist of the differences between received and predicted rewards. They are crucial for basic forms of learning about rewards and make us strive for more rewards—an evolutionary beneficial trait. Most dopamine neurons in the midbrain of humans, monkeys, and rodents signal a reward prediction error; they are activated by more reward than predicted (positive prediction error), remain at baseline activity for fully predicted rewards, and show depressed activity with less reward than predicted (negative prediction error). The dopamine signal increases nonlinearly with reward value and codes formal economic utility. Drugs of addiction generate, hijack, and amplify the dopamine reward signal and induce exaggerated, uncontrolled dopamine effects on neuronal plasticity. The striatum, amygdala, and frontal cortex also show reward prediction error coding, but only in subpopulations of neurons. Thus, the important concept of reward prediction errors is implemented in neuronal hardware.


2021 ◽  
Vol 118 (42) ◽  
pp. e2023674118
Author(s):  
Jia Jia ◽  
Lei He ◽  
Junfei Yang ◽  
Yichun Shuai ◽  
Jingjing Yang ◽  
...  

Chronic stress could induce severe cognitive impairments. Despite extensive investigations in mammalian models, the underlying mechanisms remain obscure. Here, we show that chronic stress could induce dramatic learning and memory deficits in Drosophila melanogaster. The chronic stress–induced learning deficit (CSLD) is long lasting and associated with other depression-like behaviors. We demonstrated that excessive dopaminergic activity provokes susceptibility to CSLD. Remarkably, a pair of PPL1-γ1pedc dopaminergic neurons that project to the mushroom body (MB) γ1pedc compartment play a key role in regulating susceptibility to CSLD so that stress-induced PPL1-γ1pedc hyperactivity facilitates the development of CSLD. Consistently, the mushroom body output neurons (MBON) of the γ1pedc compartment, MBON-γ1pedc>α/β neurons, are important for modulating susceptibility to CSLD. Imaging studies showed that dopaminergic activity is necessary to provoke the development of chronic stress–induced maladaptations in the MB network. Together, our data support that PPL1-γ1pedc mediates chronic stress signals to drive allostatic maladaptations in the MB network that lead to CSLD.


2014 ◽  
Vol 26 (3) ◽  
pp. 635-644 ◽  
Author(s):  
Olav E. Krigolson ◽  
Cameron D. Hassall ◽  
Todd C. Handy

Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors—discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833–1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129–141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769–776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679–709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.


2019 ◽  
Author(s):  
Kristin M. Scaplen ◽  
Mustafa Talay ◽  
Sarah Salamon ◽  
Kavin M. Nuñez ◽  
Amanda G. Waterman ◽  
...  

AbstractSubstance use disorders are chronic relapsing disorders often impelled by enduring memories and persistent cravings. Alcohol, as well as other addictive substances, remolds neural circuits important for memory to establish obstinate preference despite aversive consequences. How pertinent circuits are selected and shaped to result in these unchanging, inflexible memories is unclear. Using neurogenetic tools available inDrosophila melanogasterwe define how circuits required for alcohol associated preference shift from population level dopaminergic activation to select dopamine neurons that predict behavioral choice. During memory expression, these dopamine neurons directly, and indirectly via the mushroom body (MB), modulate the activity of interconnected glutamatergic and cholinergic output neurons. Transsynaptic tracing of these output neurons revealed at least two regions of convergence: 1) a center of memory consolidation within the MB implicated in arousal, and 2) a structure outside the MB implicated in integration of naïve and learned responses. These findings provide a circuit framework through which dopamine neuron activation shifts from reward delivery to cue onset, and provides insight into the inflexible, maladaptive nature of alcohol associated memories.


2018 ◽  
Author(s):  
Stefania Sarno ◽  
Manuel Beirán ◽  
José Vergara ◽  
Román Rossi-Pool ◽  
Ranulfo Romo ◽  
...  

AbstractDopamine neurons produce reward-related signals that regulate learning and guide behavior. Prior expectations about forthcoming stimuli and internal biases can alter perception and choices and thus could influence dopamine signaling. We tested this hypothesis studying dopamine neurons recorded in monkeys trained to discriminate between two tactile frequencies separated by a delay period, a task affected by the contraction bias. The bias greatly controlled the animals’ choices and confidence on their decisions. During decision formation the phasic activity reflected bias-induced modulations and simultaneously coded reward prediction errors. In contrast, the activity during the delay period was not affected by the bias, was not tuned to the value of the stimuli but was temporally modulated, pointing to a role different from that of the phasic activity.


2017 ◽  
Author(s):  
Matthew P.H. Gardner ◽  
Geoffrey Schoenbaum ◽  
Samuel J. Gershman

AbstractMidbrain dopamine neurons are commonly thought to report a reward prediction error, as hypothesized by reinforcement learning theory. While this theory has been highly successful, several lines of evidence suggest that dopamine activity also encodes sensory prediction errors unrelated to reward. Here we develop a new theory of dopamine function that embraces a broader conceptualization of prediction errors. By signaling errors in both sensory and reward predictions, dopamine supports a form of reinforcement learning that lies between model-based and model-free algorithms. This account remains consistent with current canon regarding the correspondence between dopamine transients and reward prediction errors, while also accounting for new data suggesting a role for these signals in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.


Sign in / Sign up

Export Citation Format

Share Document