scholarly journals Dopamine enhances model-free credit assignment through boosting of retrospective model-based inference

2021 ◽  
Author(s):  
Lorenz Deserno ◽  
Rani Moran ◽  
Jochen Michely ◽  
Ying Lee ◽  
Peter Dayan ◽  
...  

AbstractDopamine is implicated in signalling model-free (MF) reward prediction errors and various aspects of model-based (MB) credit assignment and choice. Recently, we showed that cooperative interactions between MB and MF systems include guidance of MF credit assignment by MB inference. Here, we used a double-blind, placebo-controlled, within-subjects design to test the hypothesis that enhancing dopamine levels, using levodopa, boosts the guidance of MF credit assignment by MB inference. We found that levodopa enhanced retrospective guidance of MF credit assignment by MB inference, without impacting on MF and MB influences per se. This drug effect positively correlated with working memory, but only in a context where reward needed to be recalled for MF credit assignment. The dopaminergic enhancement in MB-MF interactions correlated negatively with a dopamine-dependent change in MB credit assignment, possibly reflecting a potential trade-off between these two components of behavioural control. Thus, our findings demonstrate that dopamine boosts MB inference during guidance of MF learning, supported in part by working memory, but trading-off with a dopaminergic enhancement of MB credit assignment. The findings highlight a novel role for a DA influence on MB-MF interactions.

eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Lorenz Deserno ◽  
Rani Moran ◽  
Jochen Michely ◽  
Ying Lee ◽  
Peter Dayan ◽  
...  

Dopamine is implicated in representing model-free (MF) reward prediction errors a as well as influencing model-based (MB) credit assignment and choice. Putative cooperative interactions between MB and MF systems include a guidance of MF credit assignment by MB inference. Here, we used a double-blind, placebo-controlled, within-subjects design to test an hypothesis that enhancing dopamine levels boosts the guidance of MF credit assignment by MB inference. In line with this, we found that levodopa enhanced guidance of MF credit assignment by MB inference, without impacting MF and MB influences directly. This drug effect correlated negatively with a dopamine-dependent change in purely MB credit assignment, possibly reflecting a trade-off between these two MB components of behavioural control. Our findings of a dopamine boost in MB inference guidance of MF learning highlights a novel DA influence on MB-MF cooperative interactions.


2020 ◽  
Author(s):  
Dongjae Kim ◽  
Jaeseung Jeong ◽  
Sang Wan Lee

AbstractThe goal of learning is to maximize future rewards by minimizing prediction errors. Evidence have shown that the brain achieves this by combining model-based and model-free learning. However, the prediction error minimization is challenged by a bias-variance tradeoff, which imposes constraints on each strategy’s performance. We provide new theoretical insight into how this tradeoff can be resolved through the adaptive control of model-based and model-free learning. The theory predicts the baseline correction for prediction error reduces the lower bound of the bias–variance error by factoring out irreducible noise. Using a Markov decision task with context changes, we showed behavioral evidence of adaptive control. Model-based behavioral analyses show that the prediction error baseline signals context changes to improve adaptability. Critically, the neural results support this view, demonstrating multiplexed representations of prediction error baseline within the ventrolateral and ventromedial prefrontal cortex, key brain regions known to guide model-based and model-free learning.One sentence summaryA theoretical, behavioral, computational, and neural account of how the brain resolves the bias-variance tradeoff during reinforcement learning is described.


2021 ◽  
Vol 89 (9) ◽  
pp. S94
Author(s):  
Lorenz Deserno ◽  
Rani Moran ◽  
Ying Lee ◽  
Jochen Michely ◽  
Peter Dayan ◽  
...  

2019 ◽  
Vol 116 (32) ◽  
pp. 15871-15876 ◽  
Author(s):  
Nitzan Shahar ◽  
Rani Moran ◽  
Tobias U. Hauser ◽  
Rogier A. Kievit ◽  
Daniel McNamee ◽  
...  

Model-free learning enables an agent to make better decisions based on prior experience while representing only minimal knowledge about an environment’s structure. It is generally assumed that model-free state representations are based on outcome-relevant features of the environment. Here, we challenge this assumption by providing evidence that a putative model-free system assigns credit to task representations that are irrelevant to an outcome. We examined data from 769 individuals performing a well-described 2-step reward decision task where stimulus identity but not spatial-motor aspects of the task predicted reward. We show that participants assigned value to spatial-motor representations despite it being outcome irrelevant. Strikingly, spatial-motor value associations affected behavior across all outcome-relevant features and stages of the task, consistent with credit assignment to low-level state-independent task representations. Individual difference analyses suggested that the impact of spatial-motor value formation was attenuated for individuals who showed greater deployment of goal-directed (model-based) strategies. Our findings highlight a need for a reconsideration of how model-free representations are formed and regulated according to the structure of the environment.


2017 ◽  
Author(s):  
R. Keiflin ◽  
H.J. Pribut ◽  
N.B. Shah ◽  
P.H. Janak

ABSTRACTDopamine (DA) neurons in the ventral tegmental area (VTA) and substantia nigra (SNc) encode reward prediction errors (RPEs) and are proposed to mediate error-driven learning. However the learning strategy engaged by DA-RPEs remains controversial. Model-free associations imbue cue/actions with pure value, independently of representations of their associated outcome. In contrast, model-based associations support detailed representation of anticipated outcomes. Here we show that although both VTA and SNc DA neuron activation reinforces instrumental responding, only VTA DA neuron activation during consumption of expected sucrose reward restores error-driven learning and promotes formation of a new cue→sucrose association. Critically, expression of VTA DA-dependent Pavlovian associations is abolished following sucrose devaluation, a signature of model-based learning. These findings reveal that activation of VTA-or SNc-DA neurons engages largely dissociable learning processes with VTA-DA neurons capable of participating in model-based predictive learning, while the role of SNc-DA neurons appears limited to reinforcement of instrumental responses.


2014 ◽  
Vol 5 ◽  
Author(s):  
Daniel J. Schad ◽  
Elisabeth Jünger ◽  
Miriam Sebold ◽  
Maria Garbusow ◽  
Nadine Bernhardt ◽  
...  

2020 ◽  
Author(s):  
Yinmei Ni ◽  
Sidong Wang ◽  
Jie Su ◽  
Jian Li ◽  
Xiaohong Wan

Abstract The dopaminergic reward system encoding the reward PE signals is vital for reinforcement learning (RL). Although this reward PE hypothesis has been extensively validated, it remains considerable debates on the alternative account of motivation. In the current study, we diverted the participants’ motivation from the conditioned stimulus (CS)-associated valences to the CS-elicited actions in a variant Pavlovian conditioning task under appetitive and aversive conditions. We found that the regions in the dopaminergic reward system did not encode such bidirectional reward PE signals, but the PE magnitudes, namely, the motivation PE signals. These neural signals without indicating the directions of learning could not be directly used for model-free RL, but probably for model-based control. Specifically, the ventral striatum during the feedback phase might encode the need of adjusting the learning policy, while the putative substantia nigra pars compacta (SNc) in the midbrain and the putamen during the prediction phase might sustain the intended actions. Meanwhile, the primary motor cortex encoded the salience PE signals for model-free RL. Therefore, our findings demonstrate that the human dopaminergic reward system could encode the motivation PE signals to substantialize model-based control, rather than model-free learning, suggesting that its involvement in RL should be motivation-dependent.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Rani Moran ◽  
Mehdi Keramati ◽  
Peter Dayan ◽  
Raymond J. Dolan

2019 ◽  
Author(s):  
Bryant Jongkees

Adaptive goal-directed behavior requires a dynamic balance between maintenance and updating within working memory (WM). This balance is controlled by an input-gating mechanism implemented by dopamine in the basal ganglia. Given that dopaminergic manipulations can modulate performance on WM-related tasks, it is important to gain mechanistic insight into whether such manipulations differentially affect updating (i.e., encoding and removal) and the closely-related gate opening/closing processes that respectively enable/prevent updating. To clarify this issue, 2.0 g of dopamine’s precursor L-tyrosine was administered to healthy young adults (N = 45) in a double-blind, placebo-controlled, within-subjects study. WM processes were empirically distinguished using the reference-back paradigm, which isolates performance related to updating, gate opening, and gate closing. L-tyrosine had a selective, baseline-dependent effect only on gate opening: low-performing subjects improved whereas high-performing subjects were impaired on L-tyrosine. Importantly, this inverted-U shaped pattern was not explained by regression to the mean. These results are consistent with an inverted-U relationship between dopamine and WM, and they indicate that updating and gating are differentially affected by a dopaminergic manipulation. This highlights the importance of distinguishing these processes when studying WM, for example in the context of WM deficits in disorders with a dopaminergic pathophysiology.


Sign in / Sign up

Export Citation Format

Share Document