A nonlinear relationship between prediction errors and learning rates in human reinforcement learning

Mapping Intimacies ◽

10.1101/751222 ◽

2019 ◽

Author(s):

Erdem Pulcu

Keyword(s):

Reinforcement Learning ◽

Nonlinear Relationship ◽

Prediction Errors ◽

Learning Rates ◽

The Face ◽

In The Wild ◽

Actual Outcome ◽

Update Rules ◽

Different Sources

AbstractWe are living in a dynamic world in which stochastic relationships between cues and outcome events create different sources of uncertainty1 (e.g. the fact that not all grey clouds bring rain). Living in an uncertain world continuously probes learning systems in the brain, guiding agents to make better decisions. This is a type of value-based decision-making which is very important for survival in the wild and long-term evolutionary fitness. Consequently, reinforcement learning (RL) models describing cognitive/computational processes underlying learning-based adaptations have been pivotal in behavioural2,3 and neural sciences4–6, as well as machine learning7,8. This paper demonstrates the suitability of novel update rules for RL, based on a nonlinear relationship between prediction errors (i.e. difference between the agent’s expectation and the actual outcome) and learning rates (i.e. a coefficient with which agents update their beliefs about the environment), that can account for learning-based adaptations in the face of environmental uncertainty. These models illustrate how learners can flexibly adapt to dynamically changing environments.

Download Full-text

MEG signatures of long-term effects of agreement and disagreement with the majority

Scientific Reports ◽

10.1038/s41598-021-82670-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

A. Gorin ◽

V. Klucharev ◽

A. Ossadtchi ◽

I. Zubarev ◽

V. Moiseeva ◽

...

Keyword(s):

Reinforcement Learning ◽

Social Influence ◽

Temporal Dynamics ◽

Peer Group ◽

Long Term Effects ◽

Learning Mechanisms ◽

Source Imaging ◽

The Face ◽

First Session

AbstractPeople often change their beliefs by succumbing to an opinion of others. Such changes are often referred to as effects of social influence. While some previous studies have focused on the reinforcement learning mechanisms of social influence or on its internalization, others have reported evidence of changes in sensory processing evoked by social influence of peer groups. In this study, we used magnetoencephalographic (MEG) source imaging to further investigate the long-term effects of agreement and disagreement with the peer group. The study was composed of two sessions. During the first session, participants rated the trustworthiness of faces and subsequently learned group rating of each face. In the first session, a neural marker of an immediate mismatch between individual and group opinions was found in the posterior cingulate cortex, an area involved in conflict-monitoring and reinforcement learning. To identify the neural correlates of the long-lasting effect of the group opinion, we analysed MEG activity while participants rated faces during the second session. We found MEG traces of past disagreement or agreement with the peers at the parietal cortices 230 ms after the face onset. The neural activity of the superior parietal lobule, intraparietal sulcus, and precuneus was significantly stronger when the participant’s rating had previously differed from the ratings of the peers. The early MEG correlates of disagreement with the majority were followed by activity in the orbitofrontal cortex 320 ms after the face onset. Altogether, the results reveal the temporal dynamics of the neural mechanism of long-term effects of disagreement with the peer group: early signatures of modified face processing were followed by later markers of long-term social influence on the valuation process at the ventromedial prefrontal cortex.

Download Full-text

The Tortoise and the Hare: Interactions between Reinforcement Learning and Working Memory

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_01238 ◽

2018 ◽

Vol 30 (10) ◽

pp. 1422-1432 ◽

Cited By ~ 16

Author(s):

Anne G. E. Collins

Keyword(s):

Working Memory ◽

Reinforcement Learning ◽

Computational Modeling ◽

Behavioral Experiment ◽

Prediction Errors ◽

Multiple Systems ◽

Memory Interference ◽

Neural Representations ◽

The Cost

Learning to make rewarding choices in response to stimuli depends on a slow but steady process, reinforcement learning, and a fast and flexible, but capacity-limited process, working memory. Using both systems in parallel, with their contributions weighted based on performance, should allow us to leverage the best of each system: rapid early learning, supplemented by long-term robust acquisition. However, this assumes that using one process does not interfere with the other. We use computational modeling to investigate the interactions between the two processes in a behavioral experiment and show that working memory interferes with reinforcement learning. Previous research showed that neural representations of reward prediction errors, a key marker of reinforcement learning, were blunted when working memory was used for learning. We thus predicted that arbitrating in favor of working memory to learn faster in simple problems would weaken the reinforcement learning process. We tested this by measuring performance in a delayed testing phase where the use of working memory was impossible, and thus participant choices depended on reinforcement learning. Counterintuitively, but confirming our predictions, we observed that associations learned most easily were retained worse than associations learned slower: Using working memory to learn quickly came at the cost of long-term retention. Computational modeling confirmed that this could only be accounted for by working memory interference in reinforcement learning computations. These results further our understanding of how multiple systems contribute in parallel to human learning and may have important applications for education and computational psychiatry.

Download Full-text

Velocity estimation in reinforcement learning

10.1101/432492 ◽

2018 ◽

Author(s):

Carlos Velazquez ◽

Manuel Villarreal ◽

Arturo Bouzas

Keyword(s):

Reinforcement Learning ◽

Hierarchical Structure ◽

Predictive Accuracy ◽

Information Criterion ◽

Individual Variability ◽

Velocity Estimation ◽

Hyperbolic Function ◽

Prediction Errors ◽

Signal To Noise ◽

Learning Rates

The current work aims to study how people make predictions, under a reinforcement learning framework, in an environment that fluctuates from trial to trial and is corrupted with Gaussian noise. A computer-based experiment was developed where subjects were required to predict the future location of a spaceship that orbited around planet Earth. Its position was sampled from a Gaussian distribution with the mean changing at a variable velocity and four different values of variance that defined our signal-to-noise conditions. Three error-driven algorithms using a Bayesian approach were proposed as candidates to describe our data. The first is the standard delta-rule. The second and third models are delta rules incorporating a velocity component which is updated using prediction errors. The third model additionally assumes a hierarchical structure where individual learning rates for velocity and decision noise come from Gaussian distributions with means following a hyperbolic function. We used leave-one-out cross-validation and the Widely Applicable Information Criterion to compare the predictive accuracy of these models. In general, our results provided evidence in favor of the hierarchical model and highlight two main conclusions. First, when facing an environment that fluctuates from trial to trial, people can learn to estimate its velocity to make predictions. Second, learning rates for velocity and decision noise are influenced by uncertainty constraints represented by the signal-to-noise ratio. This higher order control was modeled using a hierarchical structure, which qualitatively accounts for individual variability and is able to generalize and make predictions about new subjects on each experimental condition.

Download Full-text

Firing patterns of serotonin neurons underlying cognitive flexibility

10.1101/059758 ◽

2016 ◽

Cited By ~ 3

Author(s):

Sara Matias ◽

Eran Lottem ◽

Guillaume P. Dugué ◽

Zachary F. Mainen

Keyword(s):

Cognitive Flexibility ◽

Causal Structure ◽

Dopamine Neurons ◽

Prediction Errors ◽

Firing Patterns ◽

Learning Rates ◽

Serotonin Neurons ◽

Endogenous Role ◽

Flexible Adaptation

Serotonin is implicated in mood and affective disorders1,2 but growing evidence suggests that its core endogenous role may be to promote flexible adaptation to changes in the causal structure of the environment3–8. This stems from two functions of endogenous serotonin activation: inhibiting learned responses that are not currently adaptive9,10 and driving plasticity to reconfigure them1113. These mirror dual functions of dopamine in invigorating reward-related responses and promoting plasticity that reinforces new ones16,17. However, while dopamine neurons are known to be activated by reward prediction errors18,19, consistent with theories of reinforcement learning, the reported firing patterns of serotonin neurons21–23 do not accord with any existing theories1,24,25. Here, we used long-term photometric recordings in mice to study a genetically-defined population of dorsal raphe serotonin neurons whose activity we could link to normal reversal learning. We found that these neurons are activated by both positive and negative prediction errors, thus reporting the kind of surprise signal proposed to promote learning in conditions of uncertainty26,27. Furthermore, by comparing cue responses of serotonin and dopamine neurons we found differences in learning rates that could explain the importance of serotonin in inhibiting perseverative responding. Together, these findings show how the firing patterns of serotonin neurons support a role in cognitive flexibility and suggest a revised model of dopamine-serotonin opponency with potential clinical implications.

Download Full-text

A Normative Account of Confirmation Bias During Reinforcement Learning

Neural Computation ◽

10.1162/neco_a_01455 ◽

2021 ◽

pp. 1-31

Author(s):

Germain Lefebvre ◽

Christopher Summerfield ◽

Rafal Bogacz

Keyword(s):

Reinforcement Learning ◽

Prediction Errors ◽

Option Value ◽

Confirmatory Bias ◽

Learning Rules ◽

Reward Prediction ◽

Wide Range ◽

The Face ◽

Paradoxical Finding ◽

Normative Account

Abstract Reinforcement learning involves updating estimates of the value of states and actions on the basis of experience. Previous work has shown that in humans, reinforcement learning exhibits a confirmatory bias: when the value of a chosen option is being updated, estimates are revised more radically following positive than negative reward prediction errors, but the converse is observed when updating the unchosen option value estimate. Here, we simulate performance on a multi-arm bandit task to examine the consequences of a confirmatory bias for reward harvesting. We report a paradoxical finding: that confirmatory biases allow the agent to maximize reward relative to an unbiased updating rule. This principle holds over a wide range of experimental settings and is most influential when decisions are corrupted by noise. We show that this occurs because on average, confirmatory biases lead to overestimating the value of more valuable bandits and underestimating the value of less valuable bandits, rendering decisions overall more robust in the face of noise. Our results show how apparently suboptimal learning rules can in fact be reward maximizing if decisions are made with finite computational precision.

Download Full-text

The tortoise and the hare: interactions between reinforcement learning and working memory

10.1101/234724 ◽

2017 ◽

Author(s):

Anne G.E. Collins

Keyword(s):

Working Memory ◽

Reinforcement Learning ◽

Computational Modeling ◽

Behavioral Experiment ◽

Prediction Errors ◽

Multiple Systems ◽

Memory Interference ◽

Neural Representations ◽

The Cost

AbstractLearning to make rewarding choices in response to stimuli depends on a slow but steady process, reinforcement learning, and a fast and flexible, but capacity limited process, working memory. Using both systems in parallel, with their contributions weighted based on performance, should allow us to leverage the best of each system: rapid early learning, supplemented by long term robust acquisition. However, this assumes that using one process does not interfere with the other. We use computational modeling to investigate the interactions between the two processes in a behavioral experiment, and show that working memory interferes with reinforcement learning. Previous research showed that neural representations of reward prediction errors, a key marker of reinforcement learning, were blunted when working memory was used for learning. We thus predicted that arbitrating in favor of working memory to learn faster in simple problems would weaken the reinforcement learning process. We tested this by measuring performance in a delayed testing phase where the use of working memory was impossible, and thus subject choices depended on reinforcement learning. Counter-intuitively, but confirming our predictions, we observed that associations learned most easily were retained worse than associations learned slower: using working memory to learn quickly came at the cost of long-term retention. Computational modeling confirmed that this could only be accounted for by working memory interference in reinforcement learning computations. These results further our understanding of how multiple systems contribute in parallel to human learning, and may have important applications for education and computational psychiatry.

Download Full-text

Neurocomputational mechanisms of prosocial learning and links to empathy

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1603198113 ◽

2016 ◽

Vol 113 (35) ◽

pp. 9763-9768 ◽

Cited By ~ 65

Author(s):

Patricia L. Lockwood ◽

Matthew A. J. Apps ◽

Vincent Valton ◽

Essi Viding ◽

Jonathan P. Roiser

Keyword(s):

Reinforcement Learning ◽

Prosocial Behavior ◽

Learning Theory ◽

Anterior Cingulate ◽

Prediction Errors ◽

Posterior Portion ◽

Subgenual Anterior Cingulate Cortex ◽

Actual Outcome ◽

The Difference ◽

Do So

Reinforcement learning theory powerfully characterizes how we learn to benefit ourselves. In this theory, prediction errors—the difference between a predicted and actual outcome of a choice—drive learning. However, we do not operate in a social vacuum. To behave prosocially we must learn the consequences of our actions for other people. Empathy, the ability to vicariously experience and understand the affect of others, is hypothesized to be a critical facilitator of prosocial behaviors, but the link between empathy and prosocial behavior is still unclear. During functional magnetic resonance imaging (fMRI) participants chose between different stimuli that were probabilistically associated with rewards for themselves (self), another person (prosocial), or no one (control). Using computational modeling, we show that people can learn to obtain rewards for others but do so more slowly than when learning to obtain rewards for themselves. fMRI revealed that activity in a posterior portion of the subgenual anterior cingulate cortex/basal forebrain (sgACC) drives learning only when we are acting in a prosocial context and signals a prosocial prediction error conforming to classical principles of reinforcement learning theory. However, there is also substantial variability in the neural and behavioral efficiency of prosocial learning, which is predicted by trait empathy. More empathic people learn more quickly when benefitting others, and their sgACC response is the most selective for prosocial learning. We thus reveal a computational mechanism driving prosocial learning in humans. This framework could provide insights into atypical prosocial behavior in those with disorders of social cognition.

Download Full-text

Activity patterns of serotonin neurons underlying cognitive flexibility

eLife ◽

10.7554/elife.20552 ◽

2017 ◽

Vol 6 ◽

Cited By ~ 73

Author(s):

Sara Matias ◽

Eran Lottem ◽

Guillaume P Dugué ◽

Zachary F Mainen

Keyword(s):

Cognitive Flexibility ◽

Causal Structure ◽

Activity Patterns ◽

Dopamine Neurons ◽

Prediction Errors ◽

Learning Rates ◽

Serotonin Neurons ◽

Endogenous Role ◽

Flexible Adaptation

Serotonin is implicated in mood and affective disorders. However, growing evidence suggests that a core endogenous role is to promote flexible adaptation to changes in the causal structure of the environment, through behavioral inhibition and enhanced plasticity. We used long-term photometric recordings in mice to study a population of dorsal raphe serotonin neurons, whose activity we could link to normal reversal learning using pharmacogenetics. We found that these neurons are activated by both positive and negative prediction errors, and thus report signals similar to those proposed to promote learning in conditions of uncertainty. Furthermore, by comparing the cue responses of serotonin and dopamine neurons, we found differences in learning rates that could explain the importance of serotonin in inhibiting perseverative responding. Our findings show how the activity patterns of serotonin neurons support a role in cognitive flexibility, and suggest a revised model of dopamine–serotonin opponency with potential clinical implications.

Download Full-text

A normative account of confirmatory biases during reinforcement learning

10.1101/2020.05.12.090134 ◽

2020 ◽

Cited By ~ 1

Author(s):

Germain Lefebvre ◽

Christopher Summerfield ◽

Rafal Bogacz

Keyword(s):

Reinforcement Learning ◽

Prediction Errors ◽

Option Value ◽

Confirmatory Bias ◽

Reward Prediction ◽

Wide Range ◽

The Face ◽

Paradoxical Finding ◽

Learning Policies ◽

Normative Account

AbstractReinforcement learning involves updating estimates of the value of states and actions on the basis of experience. Previous work has shown that in humans, reinforcement learning exhibits a confirmatory bias: when updating the value of a chosen option, estimates are revised more radically following positive than negative reward prediction errors, but the converse is observed when updating the unchosen option value estimate. Here, we simulate performance on a multi-arm bandit task to examine the consequences of a confirmatory bias for reward harvesting. We report a paradoxical finding: that confirmatory biases allow the agent to maximise reward relative to an unbiased updating rule. This principle holds over a wide range of experimental settings and is most influential when decisions are corrupted by noise. We show that this occurs because on average, confirmatory biases overestimate the value of more valuable bandits, and underestimate the value of less valuable bandits, rendering decisions overall more robust in the face of noise. Our results show how apparently suboptimal learning policies can in fact be reward-maximising if decisions are made with finite computational precision.

Download Full-text

Altered Reinforcement Learning from Reward and Punishment in Anorexia Nervosa: Evidence from Computational Modeling

Journal of the International Neuropsychological Society ◽

10.1017/s1355617721001326 ◽

2021 ◽

pp. 1-13

Author(s):

Christina E. Wierenga ◽

Erin Reilly ◽

Amanda Bischoff-Grethe ◽

Walter H. Kaye ◽

Gregory G. Brown

Keyword(s):

Anorexia Nervosa ◽

Reinforcement Learning ◽

Computational Models ◽

Learning Task ◽

Maladaptive Behavior ◽

Prediction Errors ◽

Negative Consequences ◽

Diagnostic And Statistical Manual ◽

Learning Rates ◽

Reward And Punishment

ABSTRACT Objectives: Anorexia nervosa (AN) is associated with altered sensitivity to reward and punishment. Few studies have investigated whether this results in aberrant learning. The ability to learn from rewarding and aversive experiences is essential for flexibly adapting to changing environments, yet individuals with AN tend to demonstrate cognitive inflexibility, difficulty set-shifting and altered decision-making. Deficient reinforcement learning may contribute to repeated engagement in maladaptive behavior. Methods: This study investigated learning in AN using a probabilistic associative learning task that separated learning of stimuli via reward from learning via punishment. Forty-two individuals with Diagnostic and Statistical Manual of Mental Disorders (DSM)-5 restricting-type AN were compared to 38 healthy controls (HCs). We applied computational models of reinforcement learning to assess group differences in learning, thought to be driven by violations in expectations, or prediction errors (PEs). Linear regression analyses examined whether learning parameters predicted BMI at discharge. Results: AN had lower learning rates than HC following both positive and negative PE (p < .02), and were less likely to exploit what they had learned. Negative PE on punishment trials predicted lower discharge BMI (p < .001), suggesting individuals with more negative expectancies about avoiding punishment had the poorest outcome. Conclusions: This is the first study to show lower rates of learning in AN following both positive and negative outcomes, with worse punishment learning predicting less weight gain. An inability to modify expectations about avoiding punishment might explain persistence of restricted eating despite negative consequences, and suggests that treatments that modify negative expectancy might be effective in reducing food avoidance in AN.

Download Full-text