The rational use of causal inference to guide reinforcement learning strengthens with age

Abstract Beliefs about the controllability of positive or negative events in the environment can shape learning throughout the lifespan. Previous research has shown that adults’ learning is modulated by beliefs about the causal structure of the environment such that they update their value estimates to a lesser extent when the outcomes can be attributed to hidden causes. This study examined whether external causes similarly influenced outcome attributions and learning across development. Ninety participants, ages 7 to 25 years, completed a reinforcement learning task in which they chose between two options with fixed reward probabilities. Choices were made in three distinct environments in which different hidden agents occasionally intervened to generate positive, negative, or random outcomes. Participants’ beliefs about hidden-agent intervention aligned with the true probabilities of the positive, negative, or random outcome manipulation in each of the three environments. Computational modeling of the learning data revealed that while the choices made by both adults (ages 18–25) and adolescents (ages 13–17) were best fit by Bayesian reinforcement learning models that incorporate beliefs about hidden-agent intervention, those of children (ages 7–12) were best fit by a one learning rate model that updates value estimates based on choice outcomes alone. Together, these results suggest that while children demonstrate explicit awareness of the causal structure of the task environment, they do not implicitly use beliefs about the causal structure of the environment to guide reinforcement learning in the same manner as adolescents and adults.

Download Full-text

The rational use of causal inference to guide reinforcement learning strengthens with age

10.31234/osf.io/j9zuk ◽

2019 ◽

Author(s):

Alexandra O. Cohen ◽

Kate Nussenbaum ◽

Hayley Dorfman ◽

Samuel J. Gershman ◽

Catherine A. Hartley

Keyword(s):

Reinforcement Learning ◽

Causal Structure ◽

Learning Task ◽

Negative Events ◽

Shape Learning ◽

Adolescents And Adults ◽

Bayesian Reinforcement Learning ◽

External Causes ◽

Reinforcement Learning Models ◽

Best Fit

Beliefs about the controllability of positive or negative events in the environment can shape learning throughout the lifespan. Previous research has shown that adults’ learning is modulated by beliefs about the causal structure of the environment such that they will update their value estimates to a lesser extent when the outcomes can be attributed to hidden causes. The present study examined whether external causes similarly influenced outcome attributions and learning across development. Ninety participants, ages 7 to 25 years, completed a reinforcement learning task in which they chose between two options with fixed reward probabilities. Choices were made in three distinct environments in which different hidden agents occasionally intervened to generate positive, negative, or random outcomes. Participants’ beliefs about hidden-agent intervention aligned well with the true probabilities of positive, negative, or random outcome manipulation in each of the three environments. Computational modeling of the learning data revealed that while the choices made by both adults (ages 18 - 25) and adolescents (ages 13 - 17) were best fit by Bayesian reinforcement learning models that incorporate beliefs about hidden agent intervention, those of children (ages 7 - 12) were best fit by a one learning rate model that updates value estimates based on choice outcomes alone. Together, these results suggest that while children demonstrate explicit awareness of the causal structure of the task environment they do not implicitly use beliefs about the causal structure of the environment to guide reinforcement learning in the same manner as adolescents and adults.

Download Full-text

Modeling changes in probabilistic reinforcement learning during adolescence

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008524 ◽

2021 ◽

Vol 17 (7) ◽

pp. e1008524

Author(s):

Liyu Xia ◽

Sarah L. Master ◽

Maria K. Eckstein ◽

Beth Baribault ◽

Ronald E. Dahl ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Task ◽

Learning Rate ◽

Integration Time ◽

Daily Experience ◽

Salivary Testosterone ◽

Probabilistic Uncertainty ◽

Probabilistic Reinforcement ◽

Hierarchical Bayesian Methods ◽

Reinforcement Learning Models

In the real world, many relationships between events are uncertain and probabilistic. Uncertainty is also likely to be a more common feature of daily experience for youth because they have less experience to draw from than adults. Some studies suggest probabilistic learning may be inefficient in youths compared to adults, while others suggest it may be more efficient in youths in mid adolescence. Here we used a probabilistic reinforcement learning task to test how youth age 8-17 (N = 187) and adults age 18-30 (N = 110) learn about stable probabilistic contingencies. Performance increased with age through early-twenties, then stabilized. Using hierarchical Bayesian methods to fit computational reinforcement learning models, we show that all participants’ performance was better explained by models in which negative outcomes had minimal to no impact on learning. The performance increase over age was driven by 1) an increase in learning rate (i.e. decrease in integration time scale); 2) a decrease in noisy/exploratory choices. In mid-adolescence age 13-15, salivary testosterone and learning rate were positively related. We discuss our findings in the context of other studies and hypotheses about adolescent brain development.

Download Full-text

Validating the Representational Space of Deep Reinforcement Learning Models of Behavior with Neural Data

10.1101/2021.06.15.448556 ◽

2021 ◽

Author(s):

Sebastian Bruch ◽

Patrick McClure ◽

Jingfeng Zhou ◽

Geoffrey Schoenbaum ◽

Francisco Pereira

Keyword(s):

Reinforcement Learning ◽

Learning Task ◽

Neural Recordings ◽

Learning Modules ◽

Representational Space ◽

Neural Computations ◽

Shared Information ◽

Markov Decision ◽

Ablation Study ◽

Reinforcement Learning Models

Deep Reinforcement Learning (Deep RL) agents have in recent years emerged as successful models of animal behavior in a variety of complex learning tasks, as exemplified by Song et al. [2017]. As agents are typically trained to mimic an animal subject, the emphasis in past studies on behavior as a means of evaluating the fitness of models to experimental data is only natural. But the true power of Deep RL agents lies in their ability to learn neural computations and codes that generate a particular behavior|factors that are also of great relevance and interest to computational neuroscience. On that basis, we believe that model evaluation should include an examination of neural representations and validation against neural recordings from animal subjects. In this paper, we introduce a procedure to test hypotheses about the relationship between internal representations of Deep RL agents and those in animal neural recordings. Taking a sequential learning task as a running example, we apply our method and show that the geometry of representations learnt by artificial agents is similar to that of the biological subjects', and that such similarities are driven by shared information in some latent space. Our method is applicable to any Deep RL agent that learns a Markov Decision Process, and as such enables researchers to assess the suitability of more advanced Deep Learning modules, or map hierarchies of representations to different parts of a circuit in the brain, and help shed light on their function. To demonstrate that point, we conduct an ablation study to deduce that, in the sequential task under consideration, temporal information plays a key role in molding a correct representation of the task.

Download Full-text

Ventral striatum’s role in learning from gains and losses

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1809833115 ◽

2018 ◽

Vol 115 (52) ◽

pp. E12398-E12406 ◽

Cited By ~ 12

Author(s):

Craig A. Taswell ◽

Vincent D. Costa ◽

Elisabeth A. Murray ◽

Bruno B. Averbeck

Keyword(s):

Reinforcement Learning ◽

Dopamine Release ◽

Ventral Striatum ◽

Learning Task ◽

Specific Role ◽

Aversive Stimuli ◽

Learning Rates ◽

Reinforcement Learning Models ◽

Gains And Losses

Adaptive behavior requires animals to learn from experience. Ideally, learning should both promote choices that lead to rewards and reduce choices that lead to losses. Because the ventral striatum (VS) contains neurons that respond to aversive stimuli and aversive stimuli can drive dopamine release in the VS, it is possible that the VS contributes to learning about aversive outcomes, including losses. However, other work suggests that the VS may play a specific role in learning to choose among rewards, with other systems mediating learning from aversive outcomes. To examine the role of the VS in learning from gains and losses, we compared the performance of macaque monkeys with VS lesions and unoperated controls on a reinforcement learning task. In the task, the monkeys gained or lost tokens, which were periodically cashed out for juice, as outcomes for choices. They learned over trials to choose cues associated with gains, and not choose cues associated with losses. We found that monkeys with VS lesions had a deficit in learning to choose between cues that differed in reward magnitude. By contrast, monkeys with VS lesions performed as well as controls when choices involved a potential loss. We also fit reinforcement learning models to the behavior and compared learning rates between groups. Relative to controls, the monkeys with VS lesions had reduced learning rates for gain cues. Therefore, in this task, the VS plays a specific role in learning to choose between rewarding options.

Download Full-text

Surprise acts as a reducer of outcome value in human reinforcement learning

10.31234/osf.io/5ha3y ◽

2019 ◽

Author(s):

Motofumi Sumiya ◽

Kentaro Katahira

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Prediction Error ◽

Simulation Analysis ◽

Learning Task ◽

Risk Averse ◽

Choice Preference ◽

Proposed Model ◽

Modulation Parameter ◽

Reinforcement Learning Models

Surprise occurs because of differences between a decision outcome and its predicted outcome (prediction error), regardless of whether the error is positive or negative. It has recently been postulated that surprise affects the reward value of the action outcome itself; studies have indicated that increasing surprise, as absolute value of prediction error, decreases the value of the outcome. However, how surprise affects the value of the outcome and subsequent decision making is unclear. We suggested that, on the assumption that surprise decreases the outcome value, agents will increase their risk averse choices when an outcome is often surprisal. Here, we propose the surprise-sensitive utility model, a reinforcement learning model that states that surprise decreases the outcome value, to explain how surprise affects subsequent decision-making. To investigate the assumption, we compared this model with previous reinforcement learning models on a risky probabilistic learning task with simulation analysis, and model selection with two experimental datasets with different tasks and population. We further simulated a simple decision-making task to investigate how parameters within the proposed model modulate the choice preference. As a result, we found the proposed model explains the risk averse choices in a manner similar to the previous models, and risk averse choices increased as the surprise-based modulation parameter of outcome value increased. The model fits these datasets better than the other models, with same free parameters, thus providing a more parsimonious and robust account for risk averse choices. These findings indicate that surprise acts as a reducer of outcome value and decreases the action value for risky choices in which prediction error often occurs.

Download Full-text

Stimulation of the vagus nerve reduces learning in a go/no-go reinforcement learning task

10.1101/535260 ◽

2019 ◽

Cited By ~ 2

Author(s):

Anne Kühnel ◽

Vanessa Teckentrup ◽

Monja P. Neuser ◽

Quentin J. M. Huys ◽

Caroline Burrasch ◽

...

Keyword(s):

Reinforcement Learning ◽

Vagus Nerve ◽

Learning Task ◽

Afferent Input ◽

Learning Rate ◽

Action Execution ◽

Transcutaneous Vagus Nerve Stimulation ◽

Metabolic States ◽

Reinforcement Learning Models ◽

The Impact

AbstractWhen facing decisions to approach rewards or to avoid punishments, we often figuratively go with our gut, and the impact of metabolic states such as hunger on motivation are well documented. However, whether and how vagal feedback signals from the gut influence instrumental actions is unknown. Here, we investigated the effect of non-invasive transcutaneous vagus nerve stimulation (tVNS) vs. sham (randomized cross-over design) on approach and avoidance behavior using an established go/no-go reinforcement learning paradigm (Guitart-Masip et al., 2012) in 39 healthy, participants after an overnight fast. First, mixed-effects logistic regression analysis of choice accuracy showed that tVNS acutely impaired decision-making, p = .045. Computational reinforcement learning models identified the cause of this as a reduction in the learning rate through tVNS (Δα = −0.092, pboot= .002), particularly after punishment (ΔαPun= −0.081, pboot= .012 vs. ΔαRew= −0.031, p = .22). However, tVNS had no effect on go biases, Pavlovian response biases or response time. Hence, tVNS appeared to influence learning rather than action execution. These results highlight a novel role of vagal afferent input in modulating reinforcement learning by tuning the learning rate according to homeostatic needs.

Download Full-text

Blocking Is Sensitive to Causal Structure in 4-Year-Old and 8-Year-Old Children

Experimental Psychology (formerly Zeitschrift für Experimentelle Psychologie) ◽

10.1027/1618-3169.52.4.264 ◽

2005 ◽

Vol 52 (4) ◽

pp. 264-271 ◽

Cited By ~ 4

Author(s):

Tom Beckers ◽

Uschi Van den Broeck ◽

Marij Renne ◽

Stefaan Vandorpe ◽

Jan De Houwer ◽

...

Keyword(s):

Young Children ◽

Associative Learning ◽

Learning Theory ◽

Causal Structure ◽

Age Groups ◽

Learning Task ◽

Contingency Learning

Abstract. In a contingency learning task, 4-year-old and 8-year-old children had to predict the outcome displayed on the back of a card on the basis of cues presented on the front. The task was embedded in either a causal or a merely predictive scenario. Within this task, either a forward blocking or a backward blocking procedure was implemented. Blocking occurred in the causal but not in the predictive scenario. Moreover, blocking was affected by the scenario to the same extent in both age groups. The pattern of results was similar for forward and backward blocking. These results suggest that even young children are sensitive to the causal structure of a contingency learning task and that the occurrence of blocking in such a task defies an explanation in terms of associative learning theory.

Download Full-text