Nigrostriatal Dopamine Signals Sequence-Specific Action-Outcome Prediction Errors

ABSTRACTDopamine has been suggested to encode cue-reward prediction errors during Pavlovian conditioning. While this theory has been widely applied to reinforcement learning concerning instrumental actions, whether dopamine represents action-outcome prediction errors and how it controls sequential behavior remain largely unknown. Here, by training mice to perform optogenetic intracranial self-stimulation, we examined how self-initiated goal-directed behavior influences nigrostriatal dopamine transmission during single as well as sequential instrumental actions. We found that dopamine release evoked by direct optogenetic stimulation was dramatically reduced when delivered as the consequence of the animal’s own action, relative to non-contingent passive stimulation. This action-induced dopamine suppression was specific to the reinforced action, temporally restricted to counteract the expected outcome, and exhibited sequence-selectivity consistent with hierarchical control of sequential behavior. Together these findings demonstrate that nigrostriatal dopamine signals sequence-specific prediction errors in action-outcome associations, with fundamental implications for reinforcement learning and instrumental behavior in health and disease.

Download Full-text

Offline replay supports planning in human reinforcement learning

eLife ◽

10.7554/elife.32548 ◽

2018 ◽

Vol 7 ◽

Cited By ~ 29

Author(s):

Ida Momennejad ◽

A Ross Otto ◽

Nathaniel D Daw ◽

Kenneth A Norman

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Computational Cost ◽

Decision Time ◽

Anterior Cingulate ◽

Prediction Errors ◽

Activity Increase ◽

Planning Methods ◽

Goal Directed Behavior ◽

The Brain

Making decisions in sequentially structured tasks requires integrating distally acquired information. The extensive computational cost of such integration challenges planning methods that integrate online, at decision time. Furthermore, it remains unclear whether ‘offline’ integration during replay supports planning, and if so which memories should be replayed. Inspired by machine learning, we propose that (a) offline replay of trajectories facilitates integrating representations that guide decisions, and (b) unsigned prediction errors (uncertainty) trigger such integrative replay. We designed a 2-step revaluation task for fMRI, whereby participants needed to integrate changes in rewards with past knowledge to optimally replan decisions. As predicted, we found that (a) multi-voxel pattern evidence for off-task replay predicts subsequent replanning; (b) neural sensitivity to uncertainty predicts subsequent replay and replanning; (c) off-task hippocampus and anterior cingulate activity increase when revaluation is required. These findings elucidate how the brain leverages offline mechanisms in planning and goal-directed behavior under uncertainty.

Download Full-text

Nigrostriatal dopamine signals sequence-specific action-outcome prediction errors

Current Biology ◽

10.1016/j.cub.2021.09.040 ◽

2021 ◽

Author(s):

Nick G. Hollon ◽

Elora W. Williams ◽

Christopher D. Howard ◽

Hao Li ◽

Tavish I. Traut ◽

...

Keyword(s):

Outcome Prediction ◽

Prediction Errors ◽

Nigrostriatal Dopamine

Download Full-text

Author response: Decoding hierarchical control of sequential behavior in oscillatory EEG activity

10.7554/elife.38550.025 ◽

2018 ◽

Author(s):

Atsushi Kikumoto ◽

Ulrich Mayr

Keyword(s):

Hierarchical Control ◽

Author Response ◽

Sequential Behavior ◽

Eeg Activity ◽

Oscillatory Eeg

Download Full-text

Language statistical learning responds to reinforcement learning principles rooted in the striatum

PLoS Biology ◽

10.1371/journal.pbio.3001119 ◽

2021 ◽

Vol 19 (9) ◽

pp. e3001119

Author(s):

Joan Orpella ◽

Ernest Mas-Herrero ◽

Pablo Ripollés ◽

Josep Marco-Pallarés ◽

Ruth de Diego-Balaguer

Keyword(s):

Reinforcement Learning ◽

Language Learning ◽

Statistical Learning ◽

Dorsal Striatum ◽

Rule Learning ◽

Prediction Errors ◽

Neural Basis ◽

Structural Rules ◽

Learning Principles ◽

Striatal Function

Statistical learning (SL) is the ability to extract regularities from the environment. In the domain of language, this ability is fundamental in the learning of words and structural rules. In lack of reliable online measures, statistical word and rule learning have been primarily investigated using offline (post-familiarization) tests, which gives limited insights into the dynamics of SL and its neural basis. Here, we capitalize on a novel task that tracks the online SL of simple syntactic structures combined with computational modeling to show that online SL responds to reinforcement learning principles rooted in striatal function. Specifically, we demonstrate—on 2 different cohorts—that a temporal difference model, which relies on prediction errors, accounts for participants’ online learning behavior. We then show that the trial-by-trial development of predictions through learning strongly correlates with activity in both ventral and dorsal striatum. Our results thus provide a detailed mechanistic account of language-related SL and an explanation for the oft-cited implication of the striatum in SL tasks. This work, therefore, bridges the long-standing gap between language learning and reinforcement learning phenomena.

Download Full-text

Prefrontal solution to the bias-variance tradeoff during reinforcement learning

10.1101/2020.12.23.424258 ◽

2020 ◽

Author(s):

Dongjae Kim ◽

Jaeseung Jeong ◽

Sang Wan Lee

Keyword(s):

Adaptive Control ◽

Reinforcement Learning ◽

Prediction Error ◽

Brain Regions ◽

Decision Task ◽

Prediction Errors ◽

Model Based ◽

Model Free ◽

Bias Variance ◽

The Brain

AbstractThe goal of learning is to maximize future rewards by minimizing prediction errors. Evidence have shown that the brain achieves this by combining model-based and model-free learning. However, the prediction error minimization is challenged by a bias-variance tradeoff, which imposes constraints on each strategy’s performance. We provide new theoretical insight into how this tradeoff can be resolved through the adaptive control of model-based and model-free learning. The theory predicts the baseline correction for prediction error reduces the lower bound of the bias–variance error by factoring out irreducible noise. Using a Markov decision task with context changes, we showed behavioral evidence of adaptive control. Model-based behavioral analyses show that the prediction error baseline signals context changes to improve adaptability. Critically, the neural results support this view, demonstrating multiplexed representations of prediction error baseline within the ventrolateral and ventromedial prefrontal cortex, key brain regions known to guide model-based and model-free learning.One sentence summaryA theoretical, behavioral, computational, and neural account of how the brain resolves the bias-variance tradeoff during reinforcement learning is described.

Download Full-text

Attenuated directed exploration during reinforcement learning in gambling disorder

10.1101/823583 ◽

2019 ◽

Cited By ~ 3

Author(s):

A. Wiehler ◽

K. Chakroun ◽

J. Peters

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Gambling Disorder ◽

Brain Activity ◽

Clinical Status ◽

Classical Problem ◽

Behavioral Flexibility ◽

Network Connectivity ◽

Prediction Errors ◽

Reward Contingencies

AbstractGambling disorder is a behavioral addiction associated with impairments in decision-making and reduced behavioral flexibility. Decision-making in volatile environments requires a flexible trade-off between exploitation of options with high expected values and exploration of novel options to adapt to changing reward contingencies. This classical problem is known as the exploration-exploitation dilemma. We hypothesized gambling disorder to be associated with a specific reduction in directed (uncertainty-based) exploration compared to healthy controls, accompanied by changes in brain activity in a fronto-parietal exploration-related network.Twenty-three frequent gamblers and nineteen matched controls performed a classical four-armed bandit task during functional magnetic resonance imaging. Computational modeling revealed that choice behavior in both groups contained signatures of directed exploration, random exploration and perseveration. Gamblers showed a specific reduction in directed exploration, while random exploration and perseveration were similar between groups.Neuroimaging revealed no evidence for group differences in neural representations of expected value and reward prediction errors. Likewise, our hypothesis of attenuated fronto-parietal exploration effects in gambling disorder was not supported. However, during directed exploration, gamblers showed reduced parietal and substantia nigra / ventral tegmental area activity. Cross-validated classification analyses revealed that connectivity in an exploration-related network was predictive of clinical status, suggesting alterations in network dynamics in gambling disorder.In sum, we show that reduced flexibility during reinforcement learning in volatile environments in gamblers is attributable to a reduction in directed exploration rather than an increase in perseveration. Neuroimaging findings suggest that patterns of network connectivity might be more diagnostic of gambling disorder than univariate value and prediction error effects. We provide a computational account of flexibility impairments in gamblers during reinforcement learning that might arise as a consequence of dopaminergic dysregulation in this disorder.

Download Full-text

A nonlinear relationship between prediction errors and learning rates in human reinforcement learning

10.1101/751222 ◽

2019 ◽

Author(s):

Erdem Pulcu

Keyword(s):

Reinforcement Learning ◽

Nonlinear Relationship ◽

Prediction Errors ◽

Learning Rates ◽

The Face ◽

In The Wild ◽

Actual Outcome ◽

Update Rules ◽

Different Sources

AbstractWe are living in a dynamic world in which stochastic relationships between cues and outcome events create different sources of uncertainty1 (e.g. the fact that not all grey clouds bring rain). Living in an uncertain world continuously probes learning systems in the brain, guiding agents to make better decisions. This is a type of value-based decision-making which is very important for survival in the wild and long-term evolutionary fitness. Consequently, reinforcement learning (RL) models describing cognitive/computational processes underlying learning-based adaptations have been pivotal in behavioural2,3 and neural sciences4–6, as well as machine learning7,8. This paper demonstrates the suitability of novel update rules for RL, based on a nonlinear relationship between prediction errors (i.e. difference between the agent’s expectation and the actual outcome) and learning rates (i.e. a coefficient with which agents update their beliefs about the environment), that can account for learning-based adaptations in the face of environmental uncertainty. These models illustrate how learners can flexibly adapt to dynamically changing environments.

Download Full-text

Hierarchical Control Architecture Regulating Competition between Model-Based and Context-Dependent Model-Free Reinforcement Learning Strategies

2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/smc.2018.00176 ◽

2018 ◽

Cited By ~ 1

Author(s):

Dongjae Kim ◽

Geon Young Park ◽

Sang Wan Lee

Keyword(s):

Reinforcement Learning ◽

Learning Strategies ◽

Hierarchical Control ◽

Control Architecture ◽

Model Based ◽

Model Free ◽

Hierarchical Control Architecture ◽

Dependent Model ◽

Context Dependent

Download Full-text

The Tortoise and the Hare: Interactions between Reinforcement Learning and Working Memory

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_01238 ◽

2018 ◽

Vol 30 (10) ◽

pp. 1422-1432 ◽

Cited By ~ 16

Author(s):

Anne G. E. Collins

Keyword(s):

Working Memory ◽

Reinforcement Learning ◽

Computational Modeling ◽

Behavioral Experiment ◽

Prediction Errors ◽

Multiple Systems ◽

Memory Interference ◽

Neural Representations ◽

The Cost

Learning to make rewarding choices in response to stimuli depends on a slow but steady process, reinforcement learning, and a fast and flexible, but capacity-limited process, working memory. Using both systems in parallel, with their contributions weighted based on performance, should allow us to leverage the best of each system: rapid early learning, supplemented by long-term robust acquisition. However, this assumes that using one process does not interfere with the other. We use computational modeling to investigate the interactions between the two processes in a behavioral experiment and show that working memory interferes with reinforcement learning. Previous research showed that neural representations of reward prediction errors, a key marker of reinforcement learning, were blunted when working memory was used for learning. We thus predicted that arbitrating in favor of working memory to learn faster in simple problems would weaken the reinforcement learning process. We tested this by measuring performance in a delayed testing phase where the use of working memory was impossible, and thus participant choices depended on reinforcement learning. Counterintuitively, but confirming our predictions, we observed that associations learned most easily were retained worse than associations learned slower: Using working memory to learn quickly came at the cost of long-term retention. Computational modeling confirmed that this could only be accounted for by working memory interference in reinforcement learning computations. These results further our understanding of how multiple systems contribute in parallel to human learning and may have important applications for education and computational psychiatry.

Download Full-text

Sample-Specific Prediction Error Measures in Spectroscopy

Applied Spectroscopy ◽

10.1177/0003702820913562 ◽

2020 ◽

Vol 74 (7) ◽

pp. 791-798

Author(s):

Carl Emil Eskildsen ◽

Tormod Næs

Keyword(s):

Chemical Reaction ◽

Multivariate Calibration ◽

Mean Squared Error ◽

Predictive Performance ◽

Calibration Model ◽

Prediction Errors ◽

Prediction Ability ◽

Spectroscopic Measurements ◽

Error Measures ◽

Specific Prediction

In applied spectroscopy, the purpose of multivariate calibration is almost exclusively to relate analyte concentrations and spectroscopic measurements. The multivariate calibration model provides estimates of analyte concentrations based on the spectroscopic measurements. Predictive performance is often evaluated based on a mean squared error. While this average measure can be used in model selection, it is not satisfactory for evaluating the uncertainty of individual predictions. For a calibration, the uncertainties are sample specific. This is especially true for multivariate calibration, where interfering compounds may be present. Consider in-line spectroscopic measurements during a chemical reaction, production, etc. Here, reference values are not necessarily available. Hence, one should know the uncertainty of a given prediction in order to use that prediction for telling the state of the chemical reaction, adjusting the process, etc. In this paper, we discuss the influence of variance and bias on sample-specific prediction errors in multivariate calibration. We compare theoretical formulae with results obtained on experimental data. The results point towards the fact that bias contribution cannot necessarily be neglected when assessing sample-specific prediction ability in practice.

Download Full-text