Rare Rewards Amplify Dopamine Learning Responses

AbstractDopamine neurons drive learning by coding reward prediction errors (RPEs), which are formalized as subtractions of predicted values from reward values. Subtractions accommodate point estimate predictions of value, such as the average value. However, point estimate predictions fail to capture many features of choice and learning behaviors. For instance, reaction times and learning rates consistently reflect higher moments of probability distributions. Here, we demonstrate that dopamine RPE responses code probability distributions. We presented monkeys with rewards that were drawn from the tails of normal and uniform reward size distributions to generate rare and common RPEs, respectively. Behavioral choices and pupil diameter measurements indicated that monkeys learned faster and registered greater arousal from rare RPEs, compared to common RPEs of identical magnitudes. Dopamine neuron recordings indicated that rare rewards amplified RPE responses. These results demonstrate that dopamine responses reflect probability distributions and suggest a neural mechanism for the amplified learning and enhanced arousal associated with rare events.

Download Full-text

Firing patterns of serotonin neurons underlying cognitive flexibility

10.1101/059758 ◽

2016 ◽

Cited By ~ 3

Author(s):

Sara Matias ◽

Eran Lottem ◽

Guillaume P. Dugué ◽

Zachary F. Mainen

Keyword(s):

Cognitive Flexibility ◽

Causal Structure ◽

Dopamine Neurons ◽

Prediction Errors ◽

Firing Patterns ◽

Learning Rates ◽

Serotonin Neurons ◽

Endogenous Role ◽

Flexible Adaptation

Serotonin is implicated in mood and affective disorders1,2 but growing evidence suggests that its core endogenous role may be to promote flexible adaptation to changes in the causal structure of the environment3–8. This stems from two functions of endogenous serotonin activation: inhibiting learned responses that are not currently adaptive9,10 and driving plasticity to reconfigure them1113. These mirror dual functions of dopamine in invigorating reward-related responses and promoting plasticity that reinforces new ones16,17. However, while dopamine neurons are known to be activated by reward prediction errors18,19, consistent with theories of reinforcement learning, the reported firing patterns of serotonin neurons21–23 do not accord with any existing theories1,24,25. Here, we used long-term photometric recordings in mice to study a genetically-defined population of dorsal raphe serotonin neurons whose activity we could link to normal reversal learning. We found that these neurons are activated by both positive and negative prediction errors, thus reporting the kind of surprise signal proposed to promote learning in conditions of uncertainty26,27. Furthermore, by comparing cue responses of serotonin and dopamine neurons we found differences in learning rates that could explain the importance of serotonin in inhibiting perseverative responding. Together, these findings show how the firing patterns of serotonin neurons support a role in cognitive flexibility and suggest a revised model of dopamine-serotonin opponency with potential clinical implications.

Download Full-text

Activity patterns of serotonin neurons underlying cognitive flexibility

eLife ◽

10.7554/elife.20552 ◽

2017 ◽

Vol 6 ◽

Cited By ~ 73

Author(s):

Sara Matias ◽

Eran Lottem ◽

Guillaume P Dugué ◽

Zachary F Mainen

Keyword(s):

Cognitive Flexibility ◽

Causal Structure ◽

Activity Patterns ◽

Dopamine Neurons ◽

Prediction Errors ◽

Learning Rates ◽

Serotonin Neurons ◽

Endogenous Role ◽

Flexible Adaptation

Serotonin is implicated in mood and affective disorders. However, growing evidence suggests that a core endogenous role is to promote flexible adaptation to changes in the causal structure of the environment, through behavioral inhibition and enhanced plasticity. We used long-term photometric recordings in mice to study a population of dorsal raphe serotonin neurons, whose activity we could link to normal reversal learning using pharmacogenetics. We found that these neurons are activated by both positive and negative prediction errors, and thus report signals similar to those proposed to promote learning in conditions of uncertainty. Furthermore, by comparing the cue responses of serotonin and dopamine neurons, we found differences in learning rates that could explain the importance of serotonin in inhibiting perseverative responding. Our findings show how the activity patterns of serotonin neurons support a role in cognitive flexibility, and suggest a revised model of dopamine–serotonin opponency with potential clinical implications.

Download Full-text

Dopamine neurons encode errors in predicting movement trigger occurrence

Journal of Neurophysiology ◽

10.1152/jn.00401.2014 ◽

2015 ◽

Vol 113 (4) ◽

pp. 1110-1123 ◽

Cited By ~ 21

Author(s):

Benjamin Pasquereau ◽

Robert S. Turner

Keyword(s):

Hazard Rate ◽

Dynamic Environment ◽

Reaction Times ◽

Dopamine Neurons ◽

Substantia Nigra Pars Compacta ◽

Prediction Errors ◽

Reaching Task ◽

Foreperiod Duration ◽

Arm Reaching ◽

Other Information

The capacity to anticipate the timing of events in a dynamic environment allows us to optimize the processes necessary for perceiving, attending to, and responding to them. Such anticipation requires neuronal mechanisms that track the passage of time and use this representation, combined with prior experience, to estimate the likelihood that an event will occur (i.e., the event's “hazard rate”). Although hazard-like ramps in activity have been observed in several cortical areas in preparation for movement, it remains unclear how such time-dependent probabilities are estimated to optimize response performance. We studied the spiking activity of dopamine neurons in the substantia nigra pars compacta of monkeys during an arm-reaching task for which the foreperiod preceding the “go” signal varied randomly along a uniform distribution. After extended training, the monkeys' reaction times correlated inversely with foreperiod duration, reflecting a progressive anticipation of the go signal according to its hazard rate. Many dopamine neurons modulated their firing rates as predicted by a succession of hazard-related prediction errors. First, as time passed during the foreperiod, slowly decreasing anticipatory activity tracked the elapsed time as if encoding negative prediction errors. Then, when the go signal appeared, a phasic response encoded the temporal unpredictability of the event, consistent with a positive prediction error. Neither the anticipatory nor the phasic signals were affected by the anticipated magnitudes of future reward or effort, or by parameters of the subsequent movement. These results are consistent with the notion that dopamine neurons encode hazard-related prediction errors independently of other information.

Download Full-text

Midbrain dopamine neurons signal aversion in a reward-context-dependent manner

eLife ◽

10.7554/elife.17328 ◽

2016 ◽

Vol 5 ◽

Cited By ~ 47

Author(s):

Hideyuki Matsumoto ◽

Ju Tian ◽

Naoshige Uchida ◽

Mitsuko Watabe-Uchida

Keyword(s):

Dopamine Neurons ◽

Prediction Errors ◽

Dependent Manner ◽

Value Prediction ◽

Signal Value ◽

Aversive Events ◽

High Reward ◽

Midbrain Dopamine ◽

Dopamine Signaling ◽

Reward And Punishment

Dopamine is thought to regulate learning from appetitive and aversive events. Here we examined how optogenetically-identified dopamine neurons in the lateral ventral tegmental area of mice respond to aversive events in different conditions. In low reward contexts, most dopamine neurons were exclusively inhibited by aversive events, and expectation reduced dopamine neurons’ responses to reward and punishment. When a single odor predicted both reward and punishment, dopamine neurons’ responses to that odor reflected the integrated value of both outcomes. Thus, in low reward contexts, dopamine neurons signal value prediction errors (VPEs) integrating information about both reward and aversion in a common currency. In contrast, in high reward contexts, dopamine neurons acquired a short-latency excitation to aversive events that masked their VPE signaling. Our results demonstrate the importance of considering the contexts to examine the representation in dopamine neurons and uncover different modes of dopamine signaling, each of which may be adaptive for different environments.

Download Full-text

A nonlinear relationship between prediction errors and learning rates in human reinforcement learning

10.1101/751222 ◽

2019 ◽

Author(s):

Erdem Pulcu

Keyword(s):

Reinforcement Learning ◽

Nonlinear Relationship ◽

Prediction Errors ◽

Learning Rates ◽

The Face ◽

In The Wild ◽

Actual Outcome ◽

Update Rules ◽

Different Sources

AbstractWe are living in a dynamic world in which stochastic relationships between cues and outcome events create different sources of uncertainty1 (e.g. the fact that not all grey clouds bring rain). Living in an uncertain world continuously probes learning systems in the brain, guiding agents to make better decisions. This is a type of value-based decision-making which is very important for survival in the wild and long-term evolutionary fitness. Consequently, reinforcement learning (RL) models describing cognitive/computational processes underlying learning-based adaptations have been pivotal in behavioural2,3 and neural sciences4–6, as well as machine learning7,8. This paper demonstrates the suitability of novel update rules for RL, based on a nonlinear relationship between prediction errors (i.e. difference between the agent’s expectation and the actual outcome) and learning rates (i.e. a coefficient with which agents update their beliefs about the environment), that can account for learning-based adaptations in the face of environmental uncertainty. These models illustrate how learners can flexibly adapt to dynamically changing environments.

Download Full-text

Subsecond dopamine fluctuations in human striatum encode superposed error signals about actual and counterfactual reward

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1513619112 ◽

2015 ◽

Vol 113 (1) ◽

pp. 200-205 ◽

Cited By ~ 92

Author(s):

Kenneth T. Kishida ◽

Ignacio Saez ◽

Terry Lohrenz ◽

Mark R. Witcher ◽

Adrian W. Laxton ◽

...

Keyword(s):

Parkinson’S Disease ◽

Decision Making ◽

Parkinson's Disease ◽

Animal Models ◽

Large Body ◽

Reward Processing ◽

Dopamine Neurons ◽

Model Organisms ◽

Mammalian Brain ◽

Prediction Errors

In the mammalian brain, dopamine is a critical neuromodulator whose actions underlie learning, decision-making, and behavioral control. Degeneration of dopamine neurons causes Parkinson’s disease, whereas dysregulation of dopamine signaling is believed to contribute to psychiatric conditions such as schizophrenia, addiction, and depression. Experiments in animal models suggest the hypothesis that dopamine release in human striatum encodes reward prediction errors (RPEs) (the difference between actual and expected outcomes) during ongoing decision-making. Blood oxygen level-dependent (BOLD) imaging experiments in humans support the idea that RPEs are tracked in the striatum; however, BOLD measurements cannot be used to infer the action of any one specific neurotransmitter. We monitored dopamine levels with subsecond temporal resolution in humans (n = 17) with Parkinson’s disease while they executed a sequential decision-making task. Participants placed bets and experienced monetary gains or losses. Dopamine fluctuations in the striatum fail to encode RPEs, as anticipated by a large body of work in model organisms. Instead, subsecond dopamine fluctuations encode an integration of RPEs with counterfactual prediction errors, the latter defined by how much better or worse the experienced outcome could have been. How dopamine fluctuations combine the actual and counterfactual is unknown. One possibility is that this process is the normal behavior of reward processing dopamine neurons, which previously had not been tested by experiments in animal models. Alternatively, this superposition of error terms may result from an additional yet-to-be-identified subclass of dopamine neurons.

Download Full-text

The effect of size on the fast-start performance of rainbow trout Salmo gairdneri, and a consideration of piscivorous predator-prey interactions

Journal of Experimental Biology ◽

10.1242/jeb.65.1.157 ◽

1976 ◽

Vol 65 (1) ◽

pp. 157-177 ◽

Cited By ~ 17

Author(s):

P. W. Webb

Keyword(s):

Rainbow Trout ◽

Reaction Time ◽

Reaction Times ◽

The Body ◽

Salmo Gairdneri ◽

Escape Behaviour ◽

Predator Prey ◽

Average Value ◽

Fast Start ◽

Prey Escape

The fast-start (acceleration) performance of seven groups of rainbow trout from 9-6 to 38-7 cm total length was measured in response to d.c. electric shock stimuli. Two fast-start kinematic patterns, L- and S-start were observed. In L-starts the body was bent into an L or U shape and a recoil turn normally accompanied acceleration. Free manoeuvre was not possible in L-starts without loss of speed. In S-starts the body was bent into an S-shape and fish accelerated without a recoil turn. The frequency of S-starts increased with size from 0 for the smallest fish to 60–65% for the largest fish. Acceleration turns were common. The radius of smallest turn for both fast-start patterns was proportional to length (L) with an overall radius of 0–17 L. The duration of the primary acceleration stages increased with size from 0–07 s for the group of smallest fish to 0–10 s for the group of largest fish. Acceleration rates were independent of size. The overall mean maximum rate was 3438 cm/s2 and the average value to the end of the primary acceleration movements was 1562 cm/s2. The distance covered and velocity attained after a given time for fish accelerating from rest were independent of size. The results are discussed in the context of interactions between a predator and prey fish following initial approach by the predator. It is concluded that the outcome of an interaction is likely to depend on reaction times of interacting fish responding to manoeuvres initiated by the predator or prey. The prey reaction time results in the performance of the predator exceeding that of the prey at any instant. The predator reaction time and predator error in responses to unpredictable prey manoeuvre are required for prey escape. It is predicted that a predator should strike the prey within 0-1 s if the fish are initially 5–15 cm apart as reported in the literature for predator-prey interactions. These distances would be increased for non-optimal prey escape behaviour and when the prey body was more compressed or depressed than the predator.

Download Full-text

Overlapping Prediction Errors in Dorsal Striatum During Instrumental Learning With Juice and Money Reward in the Human Brain

Journal of Neurophysiology ◽

10.1152/jn.91195.2008 ◽

2009 ◽

Vol 102 (6) ◽

pp. 3384-3391 ◽

Cited By ~ 62

Author(s):

Vivian V. Valentin ◽

John P. O'Doherty

Keyword(s):

Prediction Error ◽

Apple Juice ◽

Instrumental Learning ◽

Dorsal Striatum ◽

Dopamine Neurons ◽

Prediction Errors ◽

Imaging Data ◽

Different Types ◽

Monetary Gain ◽

Reinforcement Learning Model

Prediction error signals have been reported in human imaging studies in target areas of dopamine neurons such as ventral and dorsal striatum during learning with many different types of reinforcers. However, a key question that has yet to be addressed is whether prediction error signals recruit distinct or overlapping regions of striatum and elsewhere during learning with different types of reward. To address this, we scanned 17 healthy subjects with functional magnetic resonance imaging while they chose actions to obtain either a pleasant juice reward (1 ml apple juice), or a monetary gain (5 cents) and applied a computational reinforcement learning model to subjects' behavioral and imaging data. Evidence for an overlapping prediction error signal during learning with juice and money rewards was found in a region of dorsal striatum (caudate nucleus), while prediction error signals in a subregion of ventral striatum were significantly stronger during learning with money but not juice reward. These results provide evidence for partially overlapping reward prediction signals for different types of appetitive reinforcers within the striatum, a finding with important implications for understanding the nature of associative encoding in the striatum as a function of reinforcer type.

Download Full-text

The Merits of Confidence Intervals Relative to Hypothesis Testing

Infection Control and Hospital Epidemiology ◽

10.1086/646596 ◽

1992 ◽

Vol 13 (9) ◽

pp. 553-555 ◽

Cited By ~ 2

Author(s):

Leon F. Burmeister ◽

David Bimbaum ◽

Samuel B. Sheps

Keyword(s):

Hypothesis Testing ◽

Confidence Intervals ◽

Null Hypothesis ◽

Finite Population ◽

Extreme Values ◽

Statistical Tests ◽

Point Estimate ◽

Distributed Data ◽

Average Difference ◽

Average Value

A variety of statistical tests of a null hypothesis commonly are used in biomedical studies. While these tests are the mainstay for justifying inferences drawn from data, they have important limitations. This report discusses the relative merits of two different approaches to data analysis and display, and recommends the use of confidence intervals rather than classic hypothesis testing.Formulae for a confidence interval surrounding the point estimate of an average value take the form: d= ±zσ/√n, where “d” represents the average difference between central and extreme values, “z” is derived from the density function of a known distribution, and “a/-∨n” represents the magnitude of sampling variability. Transposition of terms yields the familiar formula for hypothesis testing of normally distributed data (without applying the finite population correction factor): z = d/(σ/√n).

Download Full-text

New Sliding Mode Controller for Robotic Manipulator Subjected to Disturbance and Uncertainty

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1243/09544062c00105 ◽

2006 ◽

Vol 220 (6) ◽

pp. 807-815 ◽

Cited By ~ 1

Author(s):

J H Ham ◽

S B Choi

Keyword(s):

Position Control ◽

Sliding Mode ◽

Estimation Error ◽

Sampling Period ◽

Robotic Manipulator ◽

Sliding Mode Controller ◽

Control Performance ◽

Parameter Uncertainties ◽

Average Value ◽

Predicted Values

This article presents a new sliding mode controller (SMC) for the position control of a robotic manipulator subjected to perturbations, such as parameter uncertainties and extraneous disturbances. The SMC is designed so that the sliding mode condition is satisfied and integrated with the perturbation estimator. The estimator is formulated by adopting a concept of the integrated average value of the imposed perturbation over a certain sampling period and realized using the Taylor series. In the formulation of the estimator, the relationship between control performance and sensor performance is established by adjusting the sampling ratio. Subsequently, in order to improve control performance, the actuating condition for the estimator is introduced: on-off switching condition (OSC). This condition is decided on the basis of the estimation error between actual and predicted values. By imposing the OSC, control accuracy can be enhanced when high frequency perturbations exist in the system. The benefits of the proposed methodology are demonstrated on a two-link planar manipulator. The position control performances of the manipulator are evaluated and compared between the proposed methodology and conventional control schemes.

Download Full-text