Validating the Representational Space of Deep Reinforcement Learning Models of Behavior with Neural Data

Deep Reinforcement Learning (Deep RL) agents have in recent years emerged as successful models of animal behavior in a variety of complex learning tasks, as exemplified by Song et al. [2017]. As agents are typically trained to mimic an animal subject, the emphasis in past studies on behavior as a means of evaluating the fitness of models to experimental data is only natural. But the true power of Deep RL agents lies in their ability to learn neural computations and codes that generate a particular behavior|factors that are also of great relevance and interest to computational neuroscience. On that basis, we believe that model evaluation should include an examination of neural representations and validation against neural recordings from animal subjects. In this paper, we introduce a procedure to test hypotheses about the relationship between internal representations of Deep RL agents and those in animal neural recordings. Taking a sequential learning task as a running example, we apply our method and show that the geometry of representations learnt by artificial agents is similar to that of the biological subjects', and that such similarities are driven by shared information in some latent space. Our method is applicable to any Deep RL agent that learns a Markov Decision Process, and as such enables researchers to assess the suitability of more advanced Deep Learning modules, or map hierarchies of representations to different parts of a circuit in the brain, and help shed light on their function. To demonstrate that point, we conduct an ablation study to deduce that, in the sequential task under consideration, temporal information plays a key role in molding a correct representation of the task.

Download Full-text

The rational use of causal inference to guide reinforcement learning strengthens with age

10.31234/osf.io/j9zuk ◽

2019 ◽

Author(s):

Alexandra O. Cohen ◽

Kate Nussenbaum ◽

Hayley Dorfman ◽

Samuel J. Gershman ◽

Catherine A. Hartley

Keyword(s):

Reinforcement Learning ◽

Causal Structure ◽

Learning Task ◽

Negative Events ◽

Shape Learning ◽

Adolescents And Adults ◽

Bayesian Reinforcement Learning ◽

External Causes ◽

Reinforcement Learning Models ◽

Best Fit

Beliefs about the controllability of positive or negative events in the environment can shape learning throughout the lifespan. Previous research has shown that adults’ learning is modulated by beliefs about the causal structure of the environment such that they will update their value estimates to a lesser extent when the outcomes can be attributed to hidden causes. The present study examined whether external causes similarly influenced outcome attributions and learning across development. Ninety participants, ages 7 to 25 years, completed a reinforcement learning task in which they chose between two options with fixed reward probabilities. Choices were made in three distinct environments in which different hidden agents occasionally intervened to generate positive, negative, or random outcomes. Participants’ beliefs about hidden-agent intervention aligned well with the true probabilities of positive, negative, or random outcome manipulation in each of the three environments. Computational modeling of the learning data revealed that while the choices made by both adults (ages 18 - 25) and adolescents (ages 13 - 17) were best fit by Bayesian reinforcement learning models that incorporate beliefs about hidden agent intervention, those of children (ages 7 - 12) were best fit by a one learning rate model that updates value estimates based on choice outcomes alone. Together, these results suggest that while children demonstrate explicit awareness of the causal structure of the task environment they do not implicitly use beliefs about the causal structure of the environment to guide reinforcement learning in the same manner as adolescents and adults.

Download Full-text

Modeling changes in probabilistic reinforcement learning during adolescence

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008524 ◽

2021 ◽

Vol 17 (7) ◽

pp. e1008524

Author(s):

Liyu Xia ◽

Sarah L. Master ◽

Maria K. Eckstein ◽

Beth Baribault ◽

Ronald E. Dahl ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Task ◽

Learning Rate ◽

Integration Time ◽

Daily Experience ◽

Salivary Testosterone ◽

Probabilistic Uncertainty ◽

Probabilistic Reinforcement ◽

Hierarchical Bayesian Methods ◽

Reinforcement Learning Models

In the real world, many relationships between events are uncertain and probabilistic. Uncertainty is also likely to be a more common feature of daily experience for youth because they have less experience to draw from than adults. Some studies suggest probabilistic learning may be inefficient in youths compared to adults, while others suggest it may be more efficient in youths in mid adolescence. Here we used a probabilistic reinforcement learning task to test how youth age 8-17 (N = 187) and adults age 18-30 (N = 110) learn about stable probabilistic contingencies. Performance increased with age through early-twenties, then stabilized. Using hierarchical Bayesian methods to fit computational reinforcement learning models, we show that all participants’ performance was better explained by models in which negative outcomes had minimal to no impact on learning. The performance increase over age was driven by 1) an increase in learning rate (i.e. decrease in integration time scale); 2) a decrease in noisy/exploratory choices. In mid-adolescence age 13-15, salivary testosterone and learning rate were positively related. We discuss our findings in the context of other studies and hypotheses about adolescent brain development.

Download Full-text

L-DOPA Reduces Model-Free Control of Behavior by Attenuating the Transfer of Value to Action

10.1101/086116 ◽

2016 ◽

Cited By ~ 2

Author(s):

Nils B. Kroemer ◽

Ying Lee ◽

Shakoor Pooseh ◽

Ben Eppinger ◽

Thomas Goschke ◽

...

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Behavioral Control ◽

Decision Task ◽

Learning Models ◽

Model Free ◽

Markov Decision ◽

Model Free Control ◽

Reinforcement Learning Models ◽

The Brain

AbstractDopamine is a key neurotransmitter in reinforcement learning and action control. Recent findings suggest that these components are inherently entangled. Here, we tested if increases in dopamine tone by administration of L-DOPA upregulate deliberative “model-based” control of behavior or reflexive “model-free” control as predicted by dual-control reinforcement-learning models. Alternatively, L-DOPA may impair learning as suggested by “value” or “thrift” theories of dopamine. To this end, we employed a two-stage Markov decision-task to investigate the effect of L-DOPA (randomized cross-over) on behavioral control while brain activation was measured using fMRI. L-DOPA led to attenuated model-free control of behavior as indicated by the reduced impact of reward on choice and increased stochasticity of model-free choices. Correspondingly, in the brain, L-DOPA decreased the effect of reward while prediction-error signals were unaffected. Taken together, our results suggest that L-DOPA reduces model-free control of behavior by attenuating the transfer of value to action.

Download Full-text

How the level of reward awareness changes the computational and electrophysiological signatures of reinforcement learning

10.1101/421743 ◽

2018 ◽

Author(s):

C.M.C. Correa ◽

S. Noorman ◽

J. Jiang ◽

S. Palminteri ◽

M.X Cohen ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Present Report ◽

Subjective Evaluation ◽

Computational Modelling ◽

Learning Task ◽

Reward Processing ◽

Neural Activities ◽

Electrophysiological Recordings ◽

Neural Computations

AbstractThe extent to which subjective awareness influences reward processing, and thereby affects future decisions is currently largely unknown. In the present report, we investigated this question in a reinforcement-learning framework, combining perceptual masking, computational modeling and electroencephalographic recordings (human male and female participants). Our results indicate that degrading the visibility of the reward decreased -without completely obliterating- the ability of participants to learn from outcomes, but concurrently increased their tendency to repeat previous choices. We dissociated electrophysiological signatures evoked by the reward-based learning processes from those elicited by the reward-independent repetition of previous choices and showed that these neural activities were significantly modulated by reward visibility. Overall, this report sheds new light on the neural computations underlying reward-based learning and decision-making and highlights that awareness is beneficial for the trial-by-trial adjustment of decision-making strategies.Significance statementThe notion of reward is strongly associated with subjective evaluation, related to conscious processes such as “pleasure”, “liking” and “wanting”. Here we show that degrading reward visibility in a reinforcement learning task decreases -without completely obliterating- the ability of participants to learn from outcomes, but concurrently increases subjects tendency to repeat previous choices. Electrophysiological recordings, in combination with computational modelling, show that neural activities were significantly modulated by reward visibility. Overall, we dissociate different neural computations underlying reward-based learning and decision-making, which highlights a beneficial role of reward awareness in adjusting decision-making strategies.

Download Full-text

Ventral striatum’s role in learning from gains and losses

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1809833115 ◽

2018 ◽

Vol 115 (52) ◽

pp. E12398-E12406 ◽

Cited By ~ 12

Author(s):

Craig A. Taswell ◽

Vincent D. Costa ◽

Elisabeth A. Murray ◽

Bruno B. Averbeck

Keyword(s):

Reinforcement Learning ◽

Dopamine Release ◽

Ventral Striatum ◽

Learning Task ◽

Specific Role ◽

Aversive Stimuli ◽

Learning Rates ◽

Reinforcement Learning Models ◽

Gains And Losses

Adaptive behavior requires animals to learn from experience. Ideally, learning should both promote choices that lead to rewards and reduce choices that lead to losses. Because the ventral striatum (VS) contains neurons that respond to aversive stimuli and aversive stimuli can drive dopamine release in the VS, it is possible that the VS contributes to learning about aversive outcomes, including losses. However, other work suggests that the VS may play a specific role in learning to choose among rewards, with other systems mediating learning from aversive outcomes. To examine the role of the VS in learning from gains and losses, we compared the performance of macaque monkeys with VS lesions and unoperated controls on a reinforcement learning task. In the task, the monkeys gained or lost tokens, which were periodically cashed out for juice, as outcomes for choices. They learned over trials to choose cues associated with gains, and not choose cues associated with losses. We found that monkeys with VS lesions had a deficit in learning to choose between cues that differed in reward magnitude. By contrast, monkeys with VS lesions performed as well as controls when choices involved a potential loss. We also fit reinforcement learning models to the behavior and compared learning rates between groups. Relative to controls, the monkeys with VS lesions had reduced learning rates for gain cues. Therefore, in this task, the VS plays a specific role in learning to choose between rewarding options.

Download Full-text

The rational use of causal inference to guide reinforcement learning strengthens with age

npj Science of Learning ◽

10.1038/s41539-020-00075-3 ◽

2020 ◽

Vol 5 (1) ◽

Cited By ~ 1

Author(s):

Alexandra O. Cohen ◽

Kate Nussenbaum ◽

Hayley M. Dorfman ◽

Samuel J. Gershman ◽

Catherine A. Hartley

Keyword(s):

Reinforcement Learning ◽

Causal Structure ◽

Learning Task ◽

Negative Events ◽

Shape Learning ◽

Adolescents And Adults ◽

Bayesian Reinforcement Learning ◽

External Causes ◽

Reinforcement Learning Models ◽

Best Fit

Abstract Beliefs about the controllability of positive or negative events in the environment can shape learning throughout the lifespan. Previous research has shown that adults’ learning is modulated by beliefs about the causal structure of the environment such that they update their value estimates to a lesser extent when the outcomes can be attributed to hidden causes. This study examined whether external causes similarly influenced outcome attributions and learning across development. Ninety participants, ages 7 to 25 years, completed a reinforcement learning task in which they chose between two options with fixed reward probabilities. Choices were made in three distinct environments in which different hidden agents occasionally intervened to generate positive, negative, or random outcomes. Participants’ beliefs about hidden-agent intervention aligned with the true probabilities of the positive, negative, or random outcome manipulation in each of the three environments. Computational modeling of the learning data revealed that while the choices made by both adults (ages 18–25) and adolescents (ages 13–17) were best fit by Bayesian reinforcement learning models that incorporate beliefs about hidden-agent intervention, those of children (ages 7–12) were best fit by a one learning rate model that updates value estimates based on choice outcomes alone. Together, these results suggest that while children demonstrate explicit awareness of the causal structure of the task environment, they do not implicitly use beliefs about the causal structure of the environment to guide reinforcement learning in the same manner as adolescents and adults.

Download Full-text

Surprise acts as a reducer of outcome value in human reinforcement learning

10.31234/osf.io/5ha3y ◽

2019 ◽

Author(s):

Motofumi Sumiya ◽

Kentaro Katahira

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Prediction Error ◽

Simulation Analysis ◽

Learning Task ◽

Risk Averse ◽

Choice Preference ◽

Proposed Model ◽

Modulation Parameter ◽

Reinforcement Learning Models

Surprise occurs because of differences between a decision outcome and its predicted outcome (prediction error), regardless of whether the error is positive or negative. It has recently been postulated that surprise affects the reward value of the action outcome itself; studies have indicated that increasing surprise, as absolute value of prediction error, decreases the value of the outcome. However, how surprise affects the value of the outcome and subsequent decision making is unclear. We suggested that, on the assumption that surprise decreases the outcome value, agents will increase their risk averse choices when an outcome is often surprisal. Here, we propose the surprise-sensitive utility model, a reinforcement learning model that states that surprise decreases the outcome value, to explain how surprise affects subsequent decision-making. To investigate the assumption, we compared this model with previous reinforcement learning models on a risky probabilistic learning task with simulation analysis, and model selection with two experimental datasets with different tasks and population. We further simulated a simple decision-making task to investigate how parameters within the proposed model modulate the choice preference. As a result, we found the proposed model explains the risk averse choices in a manner similar to the previous models, and risk averse choices increased as the surprise-based modulation parameter of outcome value increased. The model fits these datasets better than the other models, with same free parameters, thus providing a more parsimonious and robust account for risk averse choices. These findings indicate that surprise acts as a reducer of outcome value and decreases the action value for risky choices in which prediction error often occurs.

Download Full-text

Stimulation of the vagus nerve reduces learning in a go/no-go reinforcement learning task

10.1101/535260 ◽

2019 ◽

Cited By ~ 2

Author(s):

Anne Kühnel ◽

Vanessa Teckentrup ◽

Monja P. Neuser ◽

Quentin J. M. Huys ◽

Caroline Burrasch ◽

...

Keyword(s):

Reinforcement Learning ◽

Vagus Nerve ◽

Learning Task ◽

Afferent Input ◽

Learning Rate ◽

Action Execution ◽

Transcutaneous Vagus Nerve Stimulation ◽

Metabolic States ◽

Reinforcement Learning Models ◽

The Impact

AbstractWhen facing decisions to approach rewards or to avoid punishments, we often figuratively go with our gut, and the impact of metabolic states such as hunger on motivation are well documented. However, whether and how vagal feedback signals from the gut influence instrumental actions is unknown. Here, we investigated the effect of non-invasive transcutaneous vagus nerve stimulation (tVNS) vs. sham (randomized cross-over design) on approach and avoidance behavior using an established go/no-go reinforcement learning paradigm (Guitart-Masip et al., 2012) in 39 healthy, participants after an overnight fast. First, mixed-effects logistic regression analysis of choice accuracy showed that tVNS acutely impaired decision-making, p = .045. Computational reinforcement learning models identified the cause of this as a reduction in the learning rate through tVNS (Δα = −0.092, pboot= .002), particularly after punishment (ΔαPun= −0.081, pboot= .012 vs. ΔαRew= −0.031, p = .22). However, tVNS had no effect on go biases, Pavlovian response biases or response time. Hence, tVNS appeared to influence learning rather than action execution. These results highlight a novel role of vagal afferent input in modulating reinforcement learning by tuning the learning rate according to homeostatic needs.

Download Full-text