scholarly journals Reinforcement learning modeling reveals a reward-history-dependent strategy underlying reversal learning in squirrel monkeys.

2021 ◽  
Author(s):  
Bilal A. Bari ◽  
Megan J. Moerke ◽  
Hank P. Jedema ◽  
Devin P. Effinger ◽  
Jeremiah Y. Cohen ◽  
...  
2018 ◽  
Author(s):  
Nura Sidarus ◽  
Stefano Palminteri ◽  
Valérian Chambon

AbstractValue-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful actions, and discounting the value of obtained rewards. Facing conflicts between competing action alternatives is considered aversive, as recruiting cognitive control to overcome conflict is effortful. Yet, it remains unclear whether conflict is also perceived as a cost in value-based decisions. The present study investigated this question by embedding irrelevant distractors (flanker arrows) within a reversal-learning task, with intermixed free and instructed trials. Results showed that participants learned to adapt their choices to maximize rewards, but were nevertheless biased to follow the suggestions of irrelevant distractors. Thus, the perceived cost of being in conflict with an external suggestion could sometimes trump internal value representations. By adapting computational models of reinforcement learning, we assessed the influence of conflict at both the decision and learning stages. Modelling the decision showed that conflict was avoided when evidence for either action alternative was weak, demonstrating that the cost of conflict was traded off against expected rewards. During the learning phase, we found that learning rates were reduced in instructed, relative to free, choices. Learning rates were further reduced by conflict between an instruction and subjective action values, whereas learning was not robustly influenced by conflict between one’s actions and external distractors. Our results show that the subjective cost of conflict factors into value-based decision-making, and highlights that different types of conflict may have different effects on learning about action outcomes.


2011 ◽  
Vol 23 (4) ◽  
pp. 936-946 ◽  
Author(s):  
Henry W. Chase ◽  
Rachel Swainson ◽  
Lucy Durham ◽  
Laura Benham ◽  
Roshan Cools

We assessed electrophysiological activity over the medial frontal cortex (MFC) during outcome-based behavioral adjustment using a probabilistic reversal learning task. During recording, participants were presented two abstract visual patterns on each trial and had to select the stimulus rewarded on 80% of trials and to avoid the stimulus rewarded on 20% of trials. These contingencies were reversed frequently during the experiment. Previous EEG work has revealed feedback-locked electrophysiological responses over the MFC (feedback-related negativity; FRN), which correlate with the negative prediction error [Holroyd, C. B., & Coles, M. G. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679–709, 2002] and which predict outcome-based adjustment of decision values [Cohen, M. X., & Ranganath, C. Reinforcement learning signals predict future decisions. Journal of Neuroscience, 27, 371–378, 2007]. Unlike previous paradigms, our paradigm enabled us to disentangle, on the one hand, mechanisms related to the reward prediction error, derived from reinforcement learning (RL) modeling, and on the other hand, mechanisms related to explicit rule-based adjustment of actual behavior. Our results demonstrate greater FRN amplitudes with greater RL model-derived prediction errors. Conversely expected negative outcomes that preceded rule-based behavioral reversal were not accompanied by an FRN. This pattern contrasted remarkably with that of the P3 amplitude, which was significantly greater for expected negative outcomes that preceded rule-based behavioral reversal than for unexpected negative outcomes that did not precede behavioral reversal. These data suggest that the FRN reflects prediction error and associated RL-based adjustment of decision values, whereas the P3 reflects adjustment of behavior on the basis of explicit rules.


2020 ◽  
Vol 4 ◽  
pp. 239821282090717 ◽  
Author(s):  
Matthew P. Wilkinson ◽  
John P. Grogan ◽  
Jack R. Mellor ◽  
Emma S. J. Robinson

Deficits in reward processing are a central feature of major depressive disorder with patients exhibiting decreased reward learning and altered feedback sensitivity in probabilistic reversal learning tasks. Methods to quantify probabilistic learning in both rodents and humans have been developed, providing translational paradigms for depression research. We have utilised a probabilistic reversal learning task to investigate potential differences between conventional and rapid-acting antidepressants on reward learning and feedback sensitivity. We trained 12 rats in a touchscreen probabilistic reversal learning task before investigating the effect of acute administration of citalopram, venlafaxine, reboxetine, ketamine or scopolamine. Data were also analysed using a Q-learning reinforcement learning model to understand the effects of antidepressant treatment on underlying reward processing parameters. Citalopram administration decreased trials taken to learn the first rule and increased win-stay probability. Reboxetine decreased win-stay behaviour while also decreasing the number of rule changes animals performed in a session. Venlafaxine had no effect. Ketamine and scopolamine both decreased win-stay probability, number of rule changes performed and motivation in the task. Insights from the reinforcement learning model suggested that reboxetine led animals to choose a less optimal strategy, while ketamine decreased the model-free learning rate. These results suggest that reward learning and feedback sensitivity are not differentially modulated by conventional and rapid-acting antidepressant treatment in the probabilistic reversal learning task.


Author(s):  
Mojtaba Rostami Kandroodi ◽  
Jennifer L. Cook ◽  
Jennifer C. Swart ◽  
Monja I. Froböse ◽  
Dirk E. M. Geurts ◽  
...  

Abstract Rationale Brain catecholamines have long been implicated in reinforcement learning, exemplified by catecholamine drug and genetic effects on probabilistic reversal learning. However, the mechanisms underlying such effects are unclear. Objectives and methods Here we investigated effects of an acute catecholamine challenge with methylphenidate (20 mg, oral) on a novel probabilistic reversal learning paradigm in a within-subject, double-blind randomised design. The paradigm was designed to disentangle effects on punishment avoidance from effects on reward perseveration. Given the known large individual variability in methylphenidate’s effects, we stratified our effects by working memory capacity and trait impulsivity, putatively modulating the effects of methylphenidate, in a large sample (n = 102) of healthy volunteers. Results Contrary to our prediction, methylphenidate did not alter performance in the reversal phase of the task. Our key finding is that methylphenidate altered learning of choice-outcome contingencies in a manner that depended on individual variability in working memory span. Specifically, methylphenidate improved performance by adaptively reducing the effective learning rate in participants with higher working memory capacity. Conclusions This finding emphasises the important role of working memory in reinforcement learning, as reported in influential recent computational modelling and behavioural work, and highlights the dependence of this interplay on catecholaminergic function.


2020 ◽  
Vol 31 (1) ◽  
pp. 529-546 ◽  
Author(s):  
Craig A Taswell ◽  
Vincent D Costa ◽  
Benjamin M Basile ◽  
Maia S Pujara ◽  
Breonda Jones ◽  
...  

Abstract The neural systems that underlie reinforcement learning (RL) allow animals to adapt to changes in their environment. In the present study, we examined the hypothesis that the amygdala would have a preferential role in learning the values of visual objects. We compared a group of monkeys (Macaca mulatta) with amygdala lesions to a group of unoperated controls on a two-armed bandit reversal learning task. The task had two conditions. In the What condition, the animals had to learn to select a visual object, independent of its location. And in the Where condition, the animals had to learn to saccade to a location, independent of the object at the location. In both conditions choice-outcome mappings reversed in the middle of the block. We found that monkeys with amygdala lesions had learning deficits in both conditions. Monkeys with amygdala lesions did not have deficits in learning to reverse choice-outcome mappings. Rather, amygdala lesions caused the monkeys to become overly sensitive to negative feedback which impaired their ability to consistently select the more highly valued action or object. These results imply that the amygdala is generally necessary for RL.


2020 ◽  
Author(s):  
Dahlia Mukherjee ◽  
Alexandre Leo Stephen Filipowicz ◽  
Khoi D. Vo ◽  
Theodore Sattherwaite ◽  
Joe Kable

Depression has been associated with impaired reward and punishment processing, but the specific nature of these deficits is less understood and still widely debated. We analyzed reinforcement-based decision-making in individuals diagnosed with major depressive disorder (MDD) to identify the specific decision mechanisms contributing to poorer performance. Individuals with MDD (n = 64) and matched healthy controls (n = 64) performed a probabilistic reversal learning task in which they used feedback to identify which of two stimuli had the highest probability of reward (reward condition) or lowest probability of punishment (punishment condition). Learning differences were characterized using a hierarchical Bayesian reinforcement learning model. While both groups showed reinforcement learning-like behavior, depressed individuals made fewer optimal choices and adjusted more slowly to reversals in both the reward and punishment conditions. Our computational modeling analysis found that depressed individuals showed lower learning rates and, to a lesser extent, lower value sensitivity in both the reward and punishment conditions. Learning rates also predicted depression more accurately than simple performance metrics. These results demonstrate that depression is characterized by a hyposensitivity to positive outcomes, which influences the rate at which depressed individuals learn from feedback, but not a hypersensitivity to negative outcomes as has previously been suggested. Additionally, we demonstrate that computational modeling provides a more precise characterization of the dynamics contributing to these learning deficits, and offers stronger insights into the mechanistic processes affected by depression.


Sign in / Sign up

Export Citation Format

Share Document