scholarly journals Ramping and State Uncertainty in the Dopamine Signal

2019 ◽  
Author(s):  
John G. Mikhael ◽  
HyungGoo R. Kim ◽  
Naoshige Uchida ◽  
Samuel J. Gershman

AbstractReinforcement learning models of the basal ganglia map the phasic dopamine signal to reward prediction errors (RPEs). Conventional models assert that, when a stimulus reliably predicts a reward with fixed delay, dopamine activity during the delay period and at reward time should converge to baseline through learning. However, recent studies have found that dopamine exhibits a gradual ramp before reward in certain conditions even after extensive learning, such as when animals are trained to run to obtain the reward, thus challenging the conventional RPE models. In this work, we begin with the limitation of temporal uncertainty (animals cannot perfectly estimate time to reward), and show that sensory feedback, which reduces this uncertainty, will cause an unbiased learner to produce RPE ramps. On the other hand, in the absence of feedback, RPEs will be flat after learning. These results reconcile the seemingly conflicting data on dopamine behaviors under the RPE hypothesis.

eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Nina Rouhani ◽  
Yael Niv

Memory helps guide behavior, but which experiences from the past are prioritized? Classic models of learning posit that events associated with unpredictable outcomes as well as, paradoxically, predictable outcomes, deploy more attention and learning for those events. Here, we test reinforcement learning and subsequent memory for those events, and treat signed and unsigned reward prediction errors (RPEs), experienced at the reward-predictive cue or reward outcome, as drivers of these two seemingly contradictory signals. By fitting reinforcement learning models to behavior, we find that both RPEs contribute to learning by modulating a dynamically changing learning rate. We further characterize the effects of these RPE signals on memory, and show that both signed and unsigned RPEs enhance memory, in line with midbrain dopamine and locus-coeruleus modulation of hippocampal plasticity, thereby reconciling separate findings in the literature.


2017 ◽  
Vol 47 (7) ◽  
pp. 1246-1258 ◽  
Author(s):  
T. U. Hauser ◽  
R. Iannaccone ◽  
R. J. Dolan ◽  
J. Ball ◽  
J. Hättenschwiler ◽  
...  

BackgroundObsessive–compulsive disorder (OCD) has been linked to functional abnormalities in fronto-striatal networks as well as impairments in decision making and learning. Little is known about the neurocognitive mechanisms causing these decision-making and learning deficits in OCD, and how they relate to dysfunction in fronto-striatal networks.MethodWe investigated neural mechanisms of decision making in OCD patients, including early and late onset of disorder, in terms of reward prediction errors (RPEs) using functional magnetic resonance imaging. RPEs index a mismatch between expected and received outcomes, encoded by the dopaminergic system, and are known to drive learning and decision making in humans and animals. We used reinforcement learning models and RPE signals to infer the learning mechanisms and to compare behavioural parameters and neural RPE responses of the OCD patients with those of healthy matched controls.ResultsPatients with OCD showed significantly increased RPE responses in the anterior cingulate cortex (ACC) and the putamen compared with controls. OCD patients also had a significantly lower perseveration parameter than controls.ConclusionsEnhanced RPE signals in the ACC and putamen extend previous findings of fronto-striatal deficits in OCD. These abnormally strong RPEs suggest a hyper-responsive learning network in patients with OCD, which might explain their indecisiveness and intolerance of uncertainty.


2018 ◽  
Author(s):  
Li-Ann Leow ◽  
Welber Marinovic ◽  
Aymar de Rugy ◽  
Timothy J Carroll

AbstractPerturbations of sensory feedback evoke sensory prediction errors (discrepancies between predicted and actual sensory outcomes of movements), and reward prediction errors (discrepancies between predicted rewards and actual rewards). Sensory prediction errors result in obligatory remapping of the relationship between motor commands and predicted sensory outcomes. The role of reward prediction errors in sensorimotor adaptation is less clear. When moving towards a target, we expect to obtain the reward of hitting the target, and so we experience a reward prediction error if the perturbation causes us to miss it. These discrepancies between desired task outcomes and actual task outcomes, or “task errors”, are thought to drive the use of strategic processes to restore success, although their role is not fully understood. Here, we investigated the role of task errors in sensorimotor adaptation: during target-reaching, we either removed task errors by moving the target mid-movement to align with cursor feedback of hand position, or enforced task error by moving the target away from the cursor feedback of hand position. Removing task errors not only reduced the rate and extent of adaptation during exposure to the perturbation, but also reduced the amount of post-adaptation implicit remapping. Hence, task errors contribute to implicit remapping resulting from sensory prediction errors. This suggests that the system which implicitly acquires new sensorimotor maps via exposure to sensory prediction errors is also sensitive to reward prediction errors.


2016 ◽  
Vol 18 (1) ◽  
pp. 23-32 ◽  

Reward prediction errors consist of the differences between received and predicted rewards. They are crucial for basic forms of learning about rewards and make us strive for more rewards—an evolutionary beneficial trait. Most dopamine neurons in the midbrain of humans, monkeys, and rodents signal a reward prediction error; they are activated by more reward than predicted (positive prediction error), remain at baseline activity for fully predicted rewards, and show depressed activity with less reward than predicted (negative prediction error). The dopamine signal increases nonlinearly with reward value and codes formal economic utility. Drugs of addiction generate, hijack, and amplify the dopamine reward signal and induce exaggerated, uncontrolled dopamine effects on neuronal plasticity. The striatum, amygdala, and frontal cortex also show reward prediction error coding, but only in subpopulations of neurons. Thus, the important concept of reward prediction errors is implemented in neuronal hardware.


2018 ◽  
Author(s):  
Stefania Sarno ◽  
Manuel Beirán ◽  
José Vergara ◽  
Román Rossi-Pool ◽  
Ranulfo Romo ◽  
...  

AbstractDopamine neurons produce reward-related signals that regulate learning and guide behavior. Prior expectations about forthcoming stimuli and internal biases can alter perception and choices and thus could influence dopamine signaling. We tested this hypothesis studying dopamine neurons recorded in monkeys trained to discriminate between two tactile frequencies separated by a delay period, a task affected by the contraction bias. The bias greatly controlled the animals’ choices and confidence on their decisions. During decision formation the phasic activity reflected bias-induced modulations and simultaneously coded reward prediction errors. In contrast, the activity during the delay period was not affected by the bias, was not tuned to the value of the stimuli but was temporally modulated, pointing to a role different from that of the phasic activity.


2010 ◽  
Vol 22 (5) ◽  
pp. 1149-1179 ◽  
Author(s):  
Tobias Larsen ◽  
David S. Leslie ◽  
Edmund J. Collins ◽  
Rafal Bogacz

Reinforcement learning models generally assume that a stimulus is presented that allows a learner to unambiguously identify the state of nature, and the reward received is drawn from a distribution that depends on that state. However, in any natural environment, the stimulus is noisy. When there is state uncertainty, it is no longer immediately obvious how to perform reinforcement learning, since the observed reward cannot be unambiguously allocated to a state of the environment. This letter addresses the problem of incorporating state uncertainty in reinforcement learning models. We show that simply ignoring the uncertainty and allocating the reward to the most likely state of the environment results in incorrect value estimates. Furthermore, using only the information that is available before observing the reward also results in incorrect estimates. We therefore introduce a new technique, posterior weighted reinforcement learning, in which the estimates of state probabilities are updated according to the observed rewards (e.g., if a learner observes a reward usually associated with a particular state, this state becomes more likely). We show analytically that this modified algorithm can converge to correct reward estimates and confirm this with numerical experiments. The algorithm is shown to be a variant of the expectation-maximization algorithm, allowing rigorous convergence analyses to be carried out. A possible neural implementation of the algorithm in the cortico-basal-ganglia-thalamic network is presented, and experimental predictions of our model are discussed.


2020 ◽  
Author(s):  
Jascha Achterberg ◽  
Mikiko Kadohisa ◽  
Kei Watanabe ◽  
Makoto Kusunoki ◽  
Mark J Buckley ◽  
...  

AbstractMuch animal learning is slow, with cumulative changes in behavior driven by reward prediction errors. When the abstract structure of a problem is known, however, both animals and formal learning models can rapidly attach new items to their roles within this structure, sometimes in a single trial. Frontal cortex is likely to play a key role in this process. To examine information seeking and use in a known problem structure, we trained monkeys in a novel explore/exploit task, requiring the animal first to test objects for their association with reward, then, once rewarded objects were found, to re-select them on further trials for further rewards. Many cells in the frontal cortex showed an explore/exploit preference, changing activity in a signal trial to align with one-shot learning in the monkeys’ behaviour. In contrast to this binary switch, these cells showed little evidence of continuous changes linked to expectancy or prediction error. Explore/exploit preferences were independent for two stages of the trial, object selection and receipt of feedback. Within an established task structure, frontal activity may control the separate operations of explore and exploit, switching in one trial between the two.Significance statementMuch animal learning is slow, with cumulative changes in behavior driven by reward prediction errors. When the abstract structure a problem is known, however, both animals and formal learning models can rapidly attach new items to their roles within this structure. To address transitions in neural activity during one-shot learning, we trained monkeys in an explore/exploit task using familiar objects and a highly familiar task structure. In contrast to continuous changes reflecting expectancy or prediction error, frontal neurons showed a binary, one-shot switch between explore and exploit. Within an established task structure, frontal activity may control the separate operations of exploring alternative objects to establish their current role, then exploiting this knowledge for further reward.


Neuron ◽  
2018 ◽  
Vol 98 (3) ◽  
pp. 616-629.e6 ◽  
Author(s):  
Clara Kwon Starkweather ◽  
Samuel J. Gershman ◽  
Naoshige Uchida

2022 ◽  
Vol 119 (2) ◽  
pp. e2113311119
Author(s):  
Stefania Sarno ◽  
Manuel Beirán ◽  
Joan Falcó-Roget ◽  
Gabriel Diaz-deLeon ◽  
Román Rossi-Pool ◽  
...  

Little is known about how dopamine (DA) neuron firing rates behave in cognitively demanding decision-making tasks. Here, we investigated midbrain DA activity in monkeys performing a discrimination task in which the animal had to use working memory (WM) to report which of two sequentially applied vibrotactile stimuli had the higher frequency. We found that perception was altered by an internal bias, likely generated by deterioration of the representation of the first frequency during the WM period. This bias greatly controlled the DA phasic response during the two stimulation periods, confirming that DA reward prediction errors reflected stimulus perception. In contrast, tonic dopamine activity during WM was not affected by the bias and did not encode the stored frequency. More interestingly, both delay-period activity and phasic responses before the second stimulus negatively correlated with reaction times of the animals after the trial start cue and thus represented motivated behavior on a trial-by-trial basis. During WM, this motivation signal underwent a ramp-like increase. At the same time, motivation positively correlated with accuracy, especially in difficult trials, probably by decreasing the effect of the bias. Overall, our results indicate that DA activity, in addition to encoding reward prediction errors, could at the same time be involved in motivation and WM. In particular, the ramping activity during the delay period suggests a possible DA role in stabilizing sustained cortical activity, hypothetically by increasing the gain communicated to prefrontal neurons in a motivation-dependent way.


2021 ◽  
Author(s):  
Lena Esther Ptasczynski ◽  
Isa Steinecker ◽  
Philipp Sterzer ◽  
Matthias Guggenmos

Reinforcement learning algorithms have a long-standing success story in explaining the dynamics of instrumental conditioning in humans and other species. While normative reinforcement learning models are critically dependent on external feedback, recent findings in the field of perceptual learning point to a crucial role of internally-generated reinforcement signals based on subjective confidence, when external feedback is not available. Here, we investigated the existence of such confidence-based learning signals in a key domain of reinforcement-based learning: instrumental conditioning. We conducted a value-based decision making experiment which included phases with and without external feedback and in which participants reported their confidence in addition to choices. Behaviorally, we found signatures of self-reinforcement in phases without feedback, reflected in an increase of subjective confidence and choice consistency. To clarify the mechanistic role of confidence in value-based learning, we compared a family of confidence-based learning models with more standard models predicting either no change in value estimates or a devaluation over time when no external reward is provided. We found that confidence-based models indeed outperformed these reference models, whereby the learning signal of the winning model was based on the prediction error between current confidence and a stimulus-unspecific average of previous confidence levels. Interestingly, individuals with more volatile reward-based value updates in the presence of feedback also showed more volatile confidence-based value updates when feedback was not available. Together, our results provide evidence that confidence-based learning signals affect instrumentally learned subjective values in the absence of external feedback.


Sign in / Sign up

Export Citation Format

Share Document