scholarly journals Choice-confirmation bias in reinforcement learning changes with age during adolescence

2021 ◽  
Author(s):  
Gabriele Chierchia ◽  
Magdaléna Soukupová ◽  
Emma J. Kilford ◽  
Cait Griffin ◽  
Jovita Tung Leung ◽  
...  

Confirmation bias, the widespread tendency to favour evidence that confirms rather than disconfirms one’s prior beliefs and choices, has been shown to play a role in the way decisions are shaped by rewards and punishment, known as confirmatory reinforcement learning. Given that exploratory tendencies change during adolescence, we investigated whether confirmatory learning also changes during this age. In an instrumental learning task, participants aged 11-33 years attempted to maximize monetary rewards by repeatedly sampling different pairs of novel options, which varied in their reward/punishment probabilities. Our results showed an age-related increase in accuracy with as long as learning contingencies remained stable across trials, but less so when they reversed halfway through the trials. Across participants, there was a greater tendency to stay with an option that had delivered a reward on the immediately preceding trial, more than to switch away from an option that had just delivered a punishment, and this behavioural asymmetry also increased with age. Younger participants spent more time assessing the outcomes of their choices than did older participants, suggesting that their learning inefficiencies were not due to reduced attention. At a computational level, these decision patterns were best described by a model that assumes that people learn very little from disconfirmatory evidence and that they vary in the extent to which they learn from confirmatory evidence. Such confirmatory learning rates also increased with age. Overall, these findings are consistent with the hypothesis that a discrepancy between confirmatory and disconfirmatory learning increases with age during adolescence.

2018 ◽  
Author(s):  
Nura Sidarus ◽  
Stefano Palminteri ◽  
Valérian Chambon

AbstractValue-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful actions, and discounting the value of obtained rewards. Facing conflicts between competing action alternatives is considered aversive, as recruiting cognitive control to overcome conflict is effortful. Yet, it remains unclear whether conflict is also perceived as a cost in value-based decisions. The present study investigated this question by embedding irrelevant distractors (flanker arrows) within a reversal-learning task, with intermixed free and instructed trials. Results showed that participants learned to adapt their choices to maximize rewards, but were nevertheless biased to follow the suggestions of irrelevant distractors. Thus, the perceived cost of being in conflict with an external suggestion could sometimes trump internal value representations. By adapting computational models of reinforcement learning, we assessed the influence of conflict at both the decision and learning stages. Modelling the decision showed that conflict was avoided when evidence for either action alternative was weak, demonstrating that the cost of conflict was traded off against expected rewards. During the learning phase, we found that learning rates were reduced in instructed, relative to free, choices. Learning rates were further reduced by conflict between an instruction and subjective action values, whereas learning was not robustly influenced by conflict between one’s actions and external distractors. Our results show that the subjective cost of conflict factors into value-based decision-making, and highlights that different types of conflict may have different effects on learning about action outcomes.


2022 ◽  
Author(s):  
Chenxu Hao ◽  
Lilian E. Cabrera-Haro ◽  
Ziyong Lin ◽  
Patricia Reuter-Lorenz ◽  
Richard L. Lewis

To understand how acquired value impacts how we perceive and process stimuli, psychologists have developed the Value Learning Task (VLT; e.g., Raymond & O’Brien, 2009). The task consists of a series of trials in which participants attempt to maximize accumulated winnings as they make choices from a pair of presented images associated with probabilistic win, loss, or no-change outcomes. Despite the task having a symmetric outcome structure for win and loss pairs, people learn win associations better than loss associations (Lin, Cabrera-Haro, & Reuter-Lorenz, 2020). This asymmetry could lead to differences when the stimuli are probed in subsequent tasks, compromising inferences about how acquired value affects downstream processing. We investigate the nature of the asymmetry using a standard error-driven reinforcement learning model with a softmax choice rule. Despite having no special role for valence, the model yields the asymmetry observed in human behavior, whether the model parameters are set to maximize empirical fit, or task payoff. The asymmetry arises from an interaction between a neutral initial value estimate and a choice policy that exploits while exploring, leading to more poorly discriminated value estimates for loss stimuli. We also show how differences in estimated individual learning rates help to explain individual differences in the observed win-loss asymmetries, and how the final value estimates produced by the model provide a simple account of a post-learning explicit value categorization task.


2021 ◽  
Author(s):  
Stefano Palminteri

Do we preferentially learn from outcomes that confirm our choices? This is one of the most basic, and yet consequence-bearing, questions concerning reinforcement learning. In recent years, we investigated this question in a series of studies implementing increasingly complex behavioral protocols. The learning rates fitted in experiments featuring partial or complete feedback, as well as free and forced choices, were systematically found to be consistent with a choice-confirmation bias. This result is robust across a broad range of outcome contingencies and response modalities. One of the prominent behavioral consequences of the confirmatory learning rate pattern is choice hysteresis: that is the tendency of repeating previous choices, despite contradictory evidence. As robust and replicable as they have proven to be, these findings were (legitimately) challenged by a couple of studies pointing out that a choice-confirmatory pattern of learning rates may spuriously arise from not taking into consideration an explicit choice autocorrelation term in the model. In the present study, we re-analyze data from four previously published papers (in total nine experiments; N=363), originally included in the studies demonstrating (or criticizing) the choice-confirmation bias in human participants. We fitted two models: one featured valence-specific updates (i.e., different learning rates for confirmatory and disconfirmatory outcomes) and one additionally including an explicit choice autocorrelation process (gradual perseveration). Our analysis confirms that the inclusion of the gradual perseveration process in the model significantly reduces the estimated choice-confirmation bias. However, in all considered experiments, the choice-confirmation bias remains present at the meta-analytical level, and significantly different from zero in most experiments. Our results demonstrate that the choice-confirmation bias resists the inclusion of an explicit choice autocorrelation term, thus proving to be a robust feature of human reinforcement learning. We conclude by discussing the psychological plausibility of the gradual perseveration process in the context of these behavioral paradigms and by pointing to additional computational processes that may play an important role in estimating and interpreting the computational biases under scrutiny.


2021 ◽  
Author(s):  
Bianca Westhoff ◽  
Neeltje E. Blankenstein ◽  
Elisabeth Schreuders ◽  
Eveline A. Crone ◽  
Anna C. K. van Duijvenvoorde

AbstractLearning which of our behaviors benefit others contributes to social bonding and being liked by others. An important period for the development of (pro)social behavior is adolescence, in which peers become more salient and relationships intensify. It is, however, unknown how learning to benefit others develops across adolescence and what the underlying cognitive and neural mechanisms are. In this functional neuroimaging study, we assessed learning for self and others (i.e., prosocial learning) and the concurring neural tracking of prediction errors across adolescence (ages 9-21, N=74). Participants performed a two-choice probabilistic reinforcement learning task in which outcomes resulted in monetary consequences for themselves, an unknown other, or no one. Participants from all ages were able to learn for themselves and others, but learning for others showed a more protracted developmental trajectory. Prediction errors for self were observed in the ventral striatum and showed no age-related differences. However, prediction error coding for others was specifically observed in the ventromedial prefrontal cortex and showed age-related increases. These results reveal insights into the computational mechanisms of learning for others across adolescence, and highlight that learning for self and others show different age-related patterns.


2020 ◽  
Author(s):  
Jil Humann ◽  
Adrian Georg Fischer ◽  
Markus Ullsperger

Research suggests that working memory (WM) has an important role in instrumental learning in changeable environments when reinforcement histories of multiple options must be tracked. Working memory capacity (WMC) not only reflects the ability to maintain items, but also to update and shield items against interference in a context-dependent manner; functions conceivably also essential to instrumental learning. To address the relationship of WMC and instrumental learning, we studied choice behavior and EEG of participants performing a probabilistic reversal learning task. Their separately measured WMC positively correlated with reversal learning performance. Computational modeling revealed that low-capacity participants modulated learning rates less dynamically around value reversals. Their choices were more stochastic and less guided by learnt values, resulting in less stable performance and higher susceptibility to misleading probabilistic feedback. Single-trial model-based EEG analysis revealed that prediction errors and learning rates were less strongly represented in cortical activity of low-capacity participants, while the centroparietal positivity, a general correlate of adaptation, was independent of WMC. In conclusion, cognitive functions tackled by WMC tasks are also necessary in instrumental learning. We suggest that noisier representations render items held in WM as well as tracked values in instrumental learning less stable and more susceptible to distractors.


2018 ◽  
Vol 115 (52) ◽  
pp. E12398-E12406 ◽  
Author(s):  
Craig A. Taswell ◽  
Vincent D. Costa ◽  
Elisabeth A. Murray ◽  
Bruno B. Averbeck

Adaptive behavior requires animals to learn from experience. Ideally, learning should both promote choices that lead to rewards and reduce choices that lead to losses. Because the ventral striatum (VS) contains neurons that respond to aversive stimuli and aversive stimuli can drive dopamine release in the VS, it is possible that the VS contributes to learning about aversive outcomes, including losses. However, other work suggests that the VS may play a specific role in learning to choose among rewards, with other systems mediating learning from aversive outcomes. To examine the role of the VS in learning from gains and losses, we compared the performance of macaque monkeys with VS lesions and unoperated controls on a reinforcement learning task. In the task, the monkeys gained or lost tokens, which were periodically cashed out for juice, as outcomes for choices. They learned over trials to choose cues associated with gains, and not choose cues associated with losses. We found that monkeys with VS lesions had a deficit in learning to choose between cues that differed in reward magnitude. By contrast, monkeys with VS lesions performed as well as controls when choices involved a potential loss. We also fit reinforcement learning models to the behavior and compared learning rates between groups. Relative to controls, the monkeys with VS lesions had reduced learning rates for gain cues. Therefore, in this task, the VS plays a specific role in learning to choose between rewarding options.


2021 ◽  
Author(s):  
Leor M Hackel ◽  
Drew Kogon ◽  
David Amodio ◽  
Wendy Wood

How do group-based interaction tendencies form through encounters with individual group members? In three experiments, in which participants interacted with group members in a reinforcement learning task presented as a money sharing game, participants formed instrumental reward associations with individual group members through direct interaction and feedback. Results revealed that individual-level reward learning generalized to a group-based representation, as indicated in self-reported group attitudes, trait impressions, and the tendency to choose subsequent interactions with novel members of the group. Moreover, group-based reward values continued to predict interactions with novel members after controlling for explicit attitudes and impressions, suggesting that instrumental learning contributes to an implicit form of group-based choice. Experiment 3 further demonstrated that group-based reward effects on interaction choices persisted even when group reward value was no longer predicted of positive outcomes, consistent with a habit-like expression of group bias. These results demonstrate a novel process of prejudice formation based on instrumental reward learning from direct interactions with individual group members. We discuss implications for existing theories of prejudice, the role of habit in intergroup bias, and intervention strategies to reduce prejudice.


Author(s):  
Christina E. Wierenga ◽  
Erin Reilly ◽  
Amanda Bischoff-Grethe ◽  
Walter H. Kaye ◽  
Gregory G. Brown

ABSTRACT Objectives: Anorexia nervosa (AN) is associated with altered sensitivity to reward and punishment. Few studies have investigated whether this results in aberrant learning. The ability to learn from rewarding and aversive experiences is essential for flexibly adapting to changing environments, yet individuals with AN tend to demonstrate cognitive inflexibility, difficulty set-shifting and altered decision-making. Deficient reinforcement learning may contribute to repeated engagement in maladaptive behavior. Methods: This study investigated learning in AN using a probabilistic associative learning task that separated learning of stimuli via reward from learning via punishment. Forty-two individuals with Diagnostic and Statistical Manual of Mental Disorders (DSM)-5 restricting-type AN were compared to 38 healthy controls (HCs). We applied computational models of reinforcement learning to assess group differences in learning, thought to be driven by violations in expectations, or prediction errors (PEs). Linear regression analyses examined whether learning parameters predicted BMI at discharge. Results: AN had lower learning rates than HC following both positive and negative PE (p < .02), and were less likely to exploit what they had learned. Negative PE on punishment trials predicted lower discharge BMI (p < .001), suggesting individuals with more negative expectancies about avoiding punishment had the poorest outcome. Conclusions: This is the first study to show lower rates of learning in AN following both positive and negative outcomes, with worse punishment learning predicting less weight gain. An inability to modify expectations about avoiding punishment might explain persistence of restricted eating despite negative consequences, and suggests that treatments that modify negative expectancy might be effective in reducing food avoidance in AN.


2018 ◽  
Author(s):  
Erdem Pulcu ◽  
Lorika Shkreli ◽  
Carolina Guzman Holst ◽  
Marcella L. Woud ◽  
Michelle G. Craske ◽  
...  

AbstractExposure therapy is a first-line treatment for anxiety disorders but remains ineffective in a large proportion of patients. A proposed mechanism of exposure involves a form of inhibitory learning where the association between a stimulus and an aversive outcome is suppressed by a new association with an appetitive or neutral outcome. The blood pressure medication losartan augments fear extinction in rodents and might have similar synergistic effects on human exposure therapy, but the exact cognitive mechanisms underlying these effects remain unknown. In this study, we used a reinforcement learning paradigm with compound rewards and punishments to test the prediction that losartan augments learning from appetitive relative to aversive outcomes. Healthy volunteers (N=53) were randomly assigned to single-dose losartan (50mg) versus placebo. Participants then performed a reinforcement learning task which simultaneously probes appetitive and aversive learning. Participant choice behaviour was analysed using both a standard reinforcement learning model and by analysis of choice switching behaviour. Losartan significantly reduced learning rates from aversive events (losses) when participants were first exposed to the novel task environment, while preserving learning from positive outcomes. The same effect was seen in choice switching behaviour. Losartan enhances learning from positive relative to negative events. This effect may represent a computationally defined neurocognitive mechanism by which the drug could enhance the effect of exposure in clinical populations.


2019 ◽  
Author(s):  
Sarah L. Master ◽  
Maria K. Eckstein ◽  
Neta Gotlieb ◽  
Ronald Dahl ◽  
Linda Wilbrecht ◽  
...  

AbstractMultiple neurocognitive systems contribute simultaneously to learning. For example, dopamine and basal ganglia (BG) systems are thought to support reinforcement learning (RL) by incrementally updating the value of choices, while the prefrontal cortex (PFC) contributes different computations, such as actively maintaining precise information in working memory (WM). It is commonly thought that WM and PFC show more protracted development than RL and BG systems, yet their contributions are rarely assessed in tandem. Here, we used a simple learning task to test how RL and WM contribute to changes in learning across adolescence. We tested 187 subjects ages 8 to 17 and 53 adults (25-30). Participants learned stimulus-action associations from feedback; the learning load was varied to be within or exceed WM capacity. Participants age 8-12 learned slower than participants age 13-17, and were more sensitive to load. We used computational modeling to estimate subjects’ use of WM and RL processes. Surprisingly, we found more robust changes in RL than WM during development. RL learning rate increased significantly with age across adolescence and WM parameters showed more subtle changes, many of them early in adolescence. These results underscore the importance of changes in RL processes for the developmental science of learning.Highlights- Subjects combine reinforcement learning (RL) and working memory (WM) to learn- Computational modeling shows RL learning rates grew with age during adolescence- When load was beyond WM capacity, weaker RL compensated less in younger adolescents- WM parameters showed subtler and more puberty-related changes- WM reliance, maintenance, and capacity had separable developmental trajectories- Underscores importance of RL processes in developmental changes in learning


Sign in / Sign up

Export Citation Format

Share Document