scholarly journals Explaining Valence Asymmetries in Value Learning: A Reinforcement Learning Account

2022 ◽  
Author(s):  
Chenxu Hao ◽  
Lilian E. Cabrera-Haro ◽  
Ziyong Lin ◽  
Patricia Reuter-Lorenz ◽  
Richard L. Lewis

To understand how acquired value impacts how we perceive and process stimuli, psychologists have developed the Value Learning Task (VLT; e.g., Raymond & O’Brien, 2009). The task consists of a series of trials in which participants attempt to maximize accumulated winnings as they make choices from a pair of presented images associated with probabilistic win, loss, or no-change outcomes. Despite the task having a symmetric outcome structure for win and loss pairs, people learn win associations better than loss associations (Lin, Cabrera-Haro, & Reuter-Lorenz, 2020). This asymmetry could lead to differences when the stimuli are probed in subsequent tasks, compromising inferences about how acquired value affects downstream processing. We investigate the nature of the asymmetry using a standard error-driven reinforcement learning model with a softmax choice rule. Despite having no special role for valence, the model yields the asymmetry observed in human behavior, whether the model parameters are set to maximize empirical fit, or task payoff. The asymmetry arises from an interaction between a neutral initial value estimate and a choice policy that exploits while exploring, leading to more poorly discriminated value estimates for loss stimuli. We also show how differences in estimated individual learning rates help to explain individual differences in the observed win-loss asymmetries, and how the final value estimates produced by the model provide a simple account of a post-learning explicit value categorization task.

2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Lucio Marinelli ◽  
Carlo Trompetto ◽  
Stefania Canneva ◽  
Laura Mori ◽  
Flavio Nobili ◽  
...  

Learning new information is crucial in daily activities and occurs continuously during a subject’s lifetime. Retention of learned material is required for later recall and reuse, although learning capacity is limited and interference between consecutively learned information may occur. Learning processes are impaired in Parkinson’s disease (PD); however, little is known about the processes related to retention and interference. The aim of this study is to investigate the retention and anterograde interference using a declarative sequence learning task in drug-naive patients in the disease’s early stages. Eleven patients with PD and eleven age-matched controls learned a visuomotor sequence, SEQ1, during Day1; the following day, retention of SEQ1 was assessed and, immediately after, a new sequence of comparable complexity, SEQ2, was learned. The comparison of the learning rates of SEQ1 on Day1 and SEQ2 on Day2 assessed the anterograde interference of SEQ1 on SEQ2. We found that SEQ1 performance improved in both patients and controls on Day2. Surprisingly, controls learned SEQ2 better than SEQ1, suggesting the absence of anterograde interference and the occurrence of learning optimization, a process that we defined as “learning how to learn.” Patients with PD lacked such improvement, suggesting defective performance optimization processes.


2018 ◽  
Author(s):  
Nura Sidarus ◽  
Stefano Palminteri ◽  
Valérian Chambon

AbstractValue-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful actions, and discounting the value of obtained rewards. Facing conflicts between competing action alternatives is considered aversive, as recruiting cognitive control to overcome conflict is effortful. Yet, it remains unclear whether conflict is also perceived as a cost in value-based decisions. The present study investigated this question by embedding irrelevant distractors (flanker arrows) within a reversal-learning task, with intermixed free and instructed trials. Results showed that participants learned to adapt their choices to maximize rewards, but were nevertheless biased to follow the suggestions of irrelevant distractors. Thus, the perceived cost of being in conflict with an external suggestion could sometimes trump internal value representations. By adapting computational models of reinforcement learning, we assessed the influence of conflict at both the decision and learning stages. Modelling the decision showed that conflict was avoided when evidence for either action alternative was weak, demonstrating that the cost of conflict was traded off against expected rewards. During the learning phase, we found that learning rates were reduced in instructed, relative to free, choices. Learning rates were further reduced by conflict between an instruction and subjective action values, whereas learning was not robustly influenced by conflict between one’s actions and external distractors. Our results show that the subjective cost of conflict factors into value-based decision-making, and highlights that different types of conflict may have different effects on learning about action outcomes.


2020 ◽  
Vol 34 (07) ◽  
pp. 11205-11212 ◽  
Author(s):  
Ilchae Jung ◽  
Kihyun You ◽  
Hyeonwoo Noh ◽  
Minsu Cho ◽  
Bohyung Han

We propose a novel meta-learning framework for real-time object tracking with efficient model adaptation and channel pruning. Given an object tracker, our framework learns to fine-tune its model parameters in only a few gradient-descent iterations during tracking while pruning its network channels using the target ground-truth at the first frame. Such a learning problem is formulated as a meta-learning task, where a meta-tracker is trained by updating its meta-parameters for initial weights, learning rates, and pruning masks through carefully designed tracking simulations. The integrated meta-tracker greatly improves tracking performance by accelerating the convergence of online learning and reducing the cost of feature computation. Experimental evaluation on the standard datasets demonstrates its outstanding accuracy and speed compared to the state-of-the-art methods.


2018 ◽  
Author(s):  
Vincent Moens ◽  
Alexandre Zénon

AbstractAgents living in volatile environments must be able to detect changes in contingencies while refraining to adapt to unexpected events that are caused by noise. In Reinforcement Learning (RL) frameworks, this requires learning rates that adapt to past reliability of the model. The observation that behavioural flexibility in animals tends to decrease following prolonged training in stable environment provides experimental evidence for such adaptive learning rates. However, in classical RL models, learning rate is either fixed or scheduled and can thus not adapt dynamically to environmental changes. Here, we propose a new Bayesian learning model, using variational inference, that achieves adaptive change detection by the use of Stabilized Forgetting, updating its current belief based on a mixture of fixed, initial priors and previous posterior beliefs. The weight given to these two sources is optimized alongside the other parameters, allowing the model to adapt dynamically to changes in environmental volatility and to unexpected observations. This approach is used to implement the “critic” of an actor-critic RL model, while the actor samples the resulting value distributions to choose which action to undertake. We show that our model can emulate different adaptation strategies to contingency changes, depending on its prior assumptions of environmental stability, and that model parameters can be fit to real data with high accuracy. The model also exhibits trade-offs between flexibility and computational costs that mirror those observed in real data. Overall, the proposed method provides a general framework to study learning flexibility and decision making in RL contexts.Author summaryIn stable contexts, animals and humans exhibit automatic behaviour that allows them to make fast decisions. However, these automatic processes exhibit a lack of flexibility when environmental contingencies change. In the present paper, we propose a model of behavioural automatization that is based on adaptive forgetting and that emulates these properties. The model builds an estimate of the stability of the environment and uses this estimate to adjust its learning rate and the balance between exploration and exploitation policies. The model performs Bayesian inference on latent variables that represent relevant environmental properties, such as reward functions, optimal policies or environment stability. From there, the model makes decisions in order to maximize long-term rewards, with a noise proportional to environmental uncertainty. This rich model encompasses many aspects of Reinforcement Learning (RL), such as Temporal Difference RL and counterfactual learning, and accounts for the reduced computational cost of automatic behaviour. Using simulations, we show that this model leads to interesting predictions about the efficiency with which subjects adapt to sudden change of contingencies after prolonged training.


2021 ◽  
Author(s):  
Gabriele Chierchia ◽  
Magdaléna Soukupová ◽  
Emma J. Kilford ◽  
Cait Griffin ◽  
Jovita Tung Leung ◽  
...  

Confirmation bias, the widespread tendency to favour evidence that confirms rather than disconfirms one’s prior beliefs and choices, has been shown to play a role in the way decisions are shaped by rewards and punishment, known as confirmatory reinforcement learning. Given that exploratory tendencies change during adolescence, we investigated whether confirmatory learning also changes during this age. In an instrumental learning task, participants aged 11-33 years attempted to maximize monetary rewards by repeatedly sampling different pairs of novel options, which varied in their reward/punishment probabilities. Our results showed an age-related increase in accuracy with as long as learning contingencies remained stable across trials, but less so when they reversed halfway through the trials. Across participants, there was a greater tendency to stay with an option that had delivered a reward on the immediately preceding trial, more than to switch away from an option that had just delivered a punishment, and this behavioural asymmetry also increased with age. Younger participants spent more time assessing the outcomes of their choices than did older participants, suggesting that their learning inefficiencies were not due to reduced attention. At a computational level, these decision patterns were best described by a model that assumes that people learn very little from disconfirmatory evidence and that they vary in the extent to which they learn from confirmatory evidence. Such confirmatory learning rates also increased with age. Overall, these findings are consistent with the hypothesis that a discrepancy between confirmatory and disconfirmatory learning increases with age during adolescence.


2020 ◽  
Author(s):  
Joana Carvalheiro ◽  
Vasco A. Conceição ◽  
Ana Mesquita ◽  
Ana Seara-Cardoso

AbstractAcute stress is ubiquitous in everyday life, but the extent to which acute stress affects how people learn from the outcomes of their choices is still poorly understood. Here, we investigate how acute stress impacts reward and punishment learning in men using a reinforcement-learning task. Sixty-two male participants performed the task whilst under stress and control conditions. We observed that acute stress impaired participants’ choice performance towards monetary gains, but not losses. To unravel the mechanism(s) underlying such impairment, we fitted a reinforcement-learning model to participants’ trial-by-trial choices. Computational modeling indicated that under acute stress participants learned more slowly from positive prediction errors — when the outcomes were better than expected — consistent with stress-induced dopamine disruptions. Such mechanistic understanding of how acute stress impairs reward learning is particularly important given the pervasiveness of stress in our daily life and the impact that stress can have on our wellbeing and mental health.


2018 ◽  
Vol 115 (52) ◽  
pp. E12398-E12406 ◽  
Author(s):  
Craig A. Taswell ◽  
Vincent D. Costa ◽  
Elisabeth A. Murray ◽  
Bruno B. Averbeck

Adaptive behavior requires animals to learn from experience. Ideally, learning should both promote choices that lead to rewards and reduce choices that lead to losses. Because the ventral striatum (VS) contains neurons that respond to aversive stimuli and aversive stimuli can drive dopamine release in the VS, it is possible that the VS contributes to learning about aversive outcomes, including losses. However, other work suggests that the VS may play a specific role in learning to choose among rewards, with other systems mediating learning from aversive outcomes. To examine the role of the VS in learning from gains and losses, we compared the performance of macaque monkeys with VS lesions and unoperated controls on a reinforcement learning task. In the task, the monkeys gained or lost tokens, which were periodically cashed out for juice, as outcomes for choices. They learned over trials to choose cues associated with gains, and not choose cues associated with losses. We found that monkeys with VS lesions had a deficit in learning to choose between cues that differed in reward magnitude. By contrast, monkeys with VS lesions performed as well as controls when choices involved a potential loss. We also fit reinforcement learning models to the behavior and compared learning rates between groups. Relative to controls, the monkeys with VS lesions had reduced learning rates for gain cues. Therefore, in this task, the VS plays a specific role in learning to choose between rewarding options.


Author(s):  
Christina E. Wierenga ◽  
Erin Reilly ◽  
Amanda Bischoff-Grethe ◽  
Walter H. Kaye ◽  
Gregory G. Brown

ABSTRACT Objectives: Anorexia nervosa (AN) is associated with altered sensitivity to reward and punishment. Few studies have investigated whether this results in aberrant learning. The ability to learn from rewarding and aversive experiences is essential for flexibly adapting to changing environments, yet individuals with AN tend to demonstrate cognitive inflexibility, difficulty set-shifting and altered decision-making. Deficient reinforcement learning may contribute to repeated engagement in maladaptive behavior. Methods: This study investigated learning in AN using a probabilistic associative learning task that separated learning of stimuli via reward from learning via punishment. Forty-two individuals with Diagnostic and Statistical Manual of Mental Disorders (DSM)-5 restricting-type AN were compared to 38 healthy controls (HCs). We applied computational models of reinforcement learning to assess group differences in learning, thought to be driven by violations in expectations, or prediction errors (PEs). Linear regression analyses examined whether learning parameters predicted BMI at discharge. Results: AN had lower learning rates than HC following both positive and negative PE (p < .02), and were less likely to exploit what they had learned. Negative PE on punishment trials predicted lower discharge BMI (p < .001), suggesting individuals with more negative expectancies about avoiding punishment had the poorest outcome. Conclusions: This is the first study to show lower rates of learning in AN following both positive and negative outcomes, with worse punishment learning predicting less weight gain. An inability to modify expectations about avoiding punishment might explain persistence of restricted eating despite negative consequences, and suggests that treatments that modify negative expectancy might be effective in reducing food avoidance in AN.


2018 ◽  
Author(s):  
Erdem Pulcu ◽  
Lorika Shkreli ◽  
Carolina Guzman Holst ◽  
Marcella L. Woud ◽  
Michelle G. Craske ◽  
...  

AbstractExposure therapy is a first-line treatment for anxiety disorders but remains ineffective in a large proportion of patients. A proposed mechanism of exposure involves a form of inhibitory learning where the association between a stimulus and an aversive outcome is suppressed by a new association with an appetitive or neutral outcome. The blood pressure medication losartan augments fear extinction in rodents and might have similar synergistic effects on human exposure therapy, but the exact cognitive mechanisms underlying these effects remain unknown. In this study, we used a reinforcement learning paradigm with compound rewards and punishments to test the prediction that losartan augments learning from appetitive relative to aversive outcomes. Healthy volunteers (N=53) were randomly assigned to single-dose losartan (50mg) versus placebo. Participants then performed a reinforcement learning task which simultaneously probes appetitive and aversive learning. Participant choice behaviour was analysed using both a standard reinforcement learning model and by analysis of choice switching behaviour. Losartan significantly reduced learning rates from aversive events (losses) when participants were first exposed to the novel task environment, while preserving learning from positive outcomes. The same effect was seen in choice switching behaviour. Losartan enhances learning from positive relative to negative events. This effect may represent a computationally defined neurocognitive mechanism by which the drug could enhance the effect of exposure in clinical populations.


2019 ◽  
Author(s):  
Sarah L. Master ◽  
Maria K. Eckstein ◽  
Neta Gotlieb ◽  
Ronald Dahl ◽  
Linda Wilbrecht ◽  
...  

AbstractMultiple neurocognitive systems contribute simultaneously to learning. For example, dopamine and basal ganglia (BG) systems are thought to support reinforcement learning (RL) by incrementally updating the value of choices, while the prefrontal cortex (PFC) contributes different computations, such as actively maintaining precise information in working memory (WM). It is commonly thought that WM and PFC show more protracted development than RL and BG systems, yet their contributions are rarely assessed in tandem. Here, we used a simple learning task to test how RL and WM contribute to changes in learning across adolescence. We tested 187 subjects ages 8 to 17 and 53 adults (25-30). Participants learned stimulus-action associations from feedback; the learning load was varied to be within or exceed WM capacity. Participants age 8-12 learned slower than participants age 13-17, and were more sensitive to load. We used computational modeling to estimate subjects’ use of WM and RL processes. Surprisingly, we found more robust changes in RL than WM during development. RL learning rate increased significantly with age across adolescence and WM parameters showed more subtle changes, many of them early in adolescence. These results underscore the importance of changes in RL processes for the developmental science of learning.Highlights- Subjects combine reinforcement learning (RL) and working memory (WM) to learn- Computational modeling shows RL learning rates grew with age during adolescence- When load was beyond WM capacity, weaker RL compensated less in younger adolescents- WM parameters showed subtler and more puberty-related changes- WM reliance, maintenance, and capacity had separable developmental trajectories- Underscores importance of RL processes in developmental changes in learning


Sign in / Sign up

Export Citation Format

Share Document