scholarly journals Task Learnability Modulates Surprise but Not Valence Processing for Reinforcement Learning in Probabilistic Choice Tasks

2021 ◽  
pp. 1-20
Author(s):  
Franz Wurm ◽  
Wioleta Walentowska ◽  
Benjamin Ernst ◽  
Mario Carlo Severo ◽  
Gilles Pourtois ◽  
...  

Abstract The goal of temporal difference (TD) reinforcement learning is to maximize outcomes and improve future decision-making. It does so by utilizing a prediction error (PE), which quantifies the difference between the expected and the obtained outcome. In gambling tasks, however, decision-making cannot be improved because of the lack of learnability. On the basis of the idea that TD utilizes two independent bits of information from the PE (valence and surprise), we asked which of these aspects is affected when a task is not learnable. We contrasted behavioral data and ERPs in a learning variant and a gambling variant of a simple two-armed bandit task, in which outcome sequences were matched across tasks. Participants were explicitly informed that feedback could be used to improve performance in the learning task but not in the gambling task, and we predicted a corresponding modulation of the aspects of the PE. We used a model-based analysis of ERP data to extract the neural footprints of the valence and surprise information in the two tasks. Our results revealed that task learnability modulates reinforcement learning via the suppression of surprise processing but leaves the processing of valence unaffected. On the basis of our model and the data, we propose that task learnability can selectively suppress TD learning as well as alter behavioral adaptation based on a flexible cost–benefit arbitration.

2020 ◽  
Vol 10 (8) ◽  
pp. 508
Author(s):  
Hiroyoshi Ogishima ◽  
Shunta Maeda ◽  
Yuki Tanaka ◽  
Hironori Shimada

Background: In this study, we examined the relationships between reward-based decision-making in terms of learning rate, memory rate, exploration rate, and depression-related subjective emotional experience, in terms of interoception and feelings, to understand how reward-based decision-making is impaired in depression. Methods: In all, 52 university students were randomly assigned to an experimental group and a control group. To manipulate interoception, the participants in the experimental group were instructed to tune their internal somatic sense to the skin-conductance-response waveform presented on a display. The participants in the control group were only instructed to stay relaxed. Before and after the manipulation, the participants completed a probabilistic reversal-learning task to assess reward-based decision-making using reinforcement learning modeling. Similarly, participants completed a probe-detection task, a heartbeat-detection task, and self-rated scales. Results: The experimental manipulation of interoception was not successful. In the baseline testing, reinforcement learning modeling indicated a marginally-significant correlation between the exploration rate and depressive symptoms. However, the exploration rate was significantly associated with lower interoceptive attention and higher depressive feeling. Conclusions: The findings suggest that situational characteristics may be closely involved in reward exploration and highlight the clinically-meaningful possibility that intervention for affective processes may impact reward-based decision-making in those with depression.


2018 ◽  
Author(s):  
Nura Sidarus ◽  
Stefano Palminteri ◽  
Valérian Chambon

AbstractValue-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful actions, and discounting the value of obtained rewards. Facing conflicts between competing action alternatives is considered aversive, as recruiting cognitive control to overcome conflict is effortful. Yet, it remains unclear whether conflict is also perceived as a cost in value-based decisions. The present study investigated this question by embedding irrelevant distractors (flanker arrows) within a reversal-learning task, with intermixed free and instructed trials. Results showed that participants learned to adapt their choices to maximize rewards, but were nevertheless biased to follow the suggestions of irrelevant distractors. Thus, the perceived cost of being in conflict with an external suggestion could sometimes trump internal value representations. By adapting computational models of reinforcement learning, we assessed the influence of conflict at both the decision and learning stages. Modelling the decision showed that conflict was avoided when evidence for either action alternative was weak, demonstrating that the cost of conflict was traded off against expected rewards. During the learning phase, we found that learning rates were reduced in instructed, relative to free, choices. Learning rates were further reduced by conflict between an instruction and subjective action values, whereas learning was not robustly influenced by conflict between one’s actions and external distractors. Our results show that the subjective cost of conflict factors into value-based decision-making, and highlights that different types of conflict may have different effects on learning about action outcomes.


2019 ◽  
Author(s):  
Lidia Cabeza ◽  
Julie Giustiniani ◽  
Thibault Chabin ◽  
Bahrie Ramadan ◽  
Coralie Joucla ◽  
...  

AbstractDecision-making is a conserved evolutionary process enabling to choose one option among several alternatives, and relying on reward and cognitive control systems. The Iowa Gambling Task allows to assess human decision-making under uncertainty by presenting four cards decks with various cost-benefit probabilities. Participants seek to maximize their monetary gains by developing long-term optimal choice strategies. Animal versions have been adapted with nutritional rewards but interspecies data comparisons are still scarce. Our study directly compared physiological decision-making performances between humans and wild-type C57BL/6 mice. Human subjects fulfilled an electronic Iowa Gambling Task version while mice performed a maze-based adaptation with four arms baited in a probabilistic way. Our data show closely matching performances among species with similar patterns of choice behaviors. Moreover, both populations clustered into good, intermediate, and poor decision-making categories with similar proportions. Remarkably, mice good decision-makers behaved as humans of the same category, but slight differences among species have been evidenced for the other two subpopulations. Overall, our direct comparative study confirms the good face validity of the rodent gambling task. Extended behavioral characterization and pathological animal models should help strengthen its construct validity and disentangle determinants of decision-making in animals and humans.


2017 ◽  
Vol 46 (10) ◽  
pp. 2620-2628 ◽  
Author(s):  
M. L. Daniel ◽  
P. J. Cocker ◽  
J. Lacoste ◽  
A. C. Mar ◽  
J. L. Houeto ◽  
...  

2018 ◽  
Vol 30 (10) ◽  
pp. 1391-1404 ◽  
Author(s):  
Wouter Kool ◽  
Samuel J. Gershman ◽  
Fiery A. Cushman

Decision-making algorithms face a basic tradeoff between accuracy and effort (i.e., computational demands). It is widely agreed that humans can choose between multiple decision-making processes that embody different solutions to this tradeoff: Some are computationally cheap but inaccurate, whereas others are computationally expensive but accurate. Recent progress in understanding this tradeoff has been catalyzed by formalizing it in terms of model-free (i.e., habitual) versus model-based (i.e., planning) approaches to reinforcement learning. Intuitively, if two tasks offer the same rewards for accuracy but one of them is much more demanding, we might expect people to rely on habit more in the difficult task: Devoting significant computation to achieve slight marginal accuracy gains would not be “worth it.” We test and verify this prediction in a sequential reinforcement learning task. Because our paradigm is amenable to formal analysis, it contributes to the development of a computational model of how people balance the costs and benefits of different decision-making processes in a task-specific manner; in other words, how we decide when hard thinking is worth it.


2020 ◽  
Author(s):  
Lidia CABEZA ◽  
Bahrie RAMADAN ◽  
Julie GIUSTINIANI ◽  
Christophe HOUDAYER ◽  
Yann PELLEQUER ◽  
...  

Anxio-depressive symptoms as well as severe cognitive dysfunction including aberrant decision-making (DM) are documented in neuropsychiatric patients with hypercortisolaemia. Yet, the influence of the hypothalamo-pituitary-adrenal (HPA) axis on DM processes remains poorly understood. As a tractable mean to approach this human condition, adult male C57BL/6JRj mice were chronically treated with corticosterone (CORT) prior to behavioural, physiological and neurobiological evaluation. The behavioural data indicate that chronic CORT delays the acquisition of contingencies required to orient responding towards optimal DM performance in a mouse Gambling Task (mGT). Specifically, CORT-treated animals show a longer exploration and a delayed onset of the optimal DM performance. Remarkably, the proportion of individuals performing suboptimally in the mGT is increased in the CORT condition. This variability seems to be better accounted for by variations in sensitivity to negative rather than to positive outcome. Besides, CORT-treated animals perform worse than control animals in a spatial working memory (WM) paradigm and in a motor learning task. Finally, Western blotting neurobiological analyses show that chronic CORT downregulates glucocorticoid receptor expression in the medial Prefrontal Cortex (mPFC). Besides, corticotropin-releasing factor signalling in the mPFC of CORT individuals negatively correlates with their DM performance. Collectively, this study describes how chronic exposure to glucocorticoids induces suboptimal DM under uncertainty in a mGT, hampers WM and motor learning processes, thus affecting specific emotional, motor, cognitive and neurobiological endophenotypic dimensions relevant for precision medicine in biological psychiatry.


2021 ◽  
Author(s):  
Monja P. Neuser ◽  
Franziska Kräutlein ◽  
Anne Kühnel ◽  
Vanessa Teckentrup ◽  
Jennifer Svaldi ◽  
...  

AbstractReinforcement learning is a core facet of motivation and alterations have been associated with various mental disorders. To build better models of individual learning, repeated measurement of value-based decision-making is crucial. However, the focus on lab-based assessment of reward learning has limited the number of measurements and the test-retest reliability of many decision-related parameters is therefore unknown. Here, we developed an open-source cross-platform application Influenca that provides a novel reward learning task complemented by ecological momentary assessment (EMA) for repeated assessment over weeks. In this task, players have to identify the most effective medication by selecting the best option after integrating offered points with changing probabilities (according to random Gaussian walks). Participants can complete up to 31 levels with 150 trials each. To encourage replay on their preferred device, in-game screens provide feedback on the progress. Using an initial validation sample of 127 players (2904 runs), we found that reinforcement learning parameters such as the learning rate and reward sensitivity show low to medium intra-class correlations (ICC: 0.22-0.52), indicating substantial within- and between-subject variance. Notably, state items showed comparable ICCs as reinforcement learning parameters. To conclude, our innovative and openly customizable app framework provides a gamified task that optimizes repeated assessments of reward learning to better quantify intra- and inter-individual differences in value-based decision-making over time.


2018 ◽  
Author(s):  
Samuel D. McDougle ◽  
Peter A. Butcher ◽  
Darius Parvin ◽  
Fasial Mushtaq ◽  
Yael Niv ◽  
...  

AbstractDecisions must be implemented through actions, and actions are prone to error. As such, when an expected outcome is not obtained, an individual should not only be sensitive to whether the choice itself was suboptimal, but also whether the action required to indicate that choice was executed successfully. The intelligent assignment of credit to action execution versus action selection has clear ecological utility for the learner. To explore this scenario, we used a modified version of a classic reinforcement learning task in which feedback indicated if negative prediction errors were, or were not, associated with execution errors. Using fMRI, we asked if prediction error computations in the human striatum, a key substrate in reinforcement learning and decision making, are modulated when a failure in action execution results in the negative outcome. Participants were more tolerant of non-rewarded outcomes when these resulted from execution errors versus when execution was successful but the reward was withheld. Consistent with this behavior, a model-driven analysis of neural activity revealed an attenuation of the signal associated with negative reward prediction error in the striatum following execution failures. These results converge with other lines of evidence suggesting that prediction errors in the mesostriatal dopamine system integrate high-level information during the evaluation of instantaneous reward outcomes.


2018 ◽  
Author(s):  
Joanne C. Van Slooten ◽  
Sara Jahfari ◽  
Tomas Knapen ◽  
Jan Theeuwes

AbstractPupil responses have been used to track cognitive processes during decision-making. Studies have shown that in these cases the pupil reflects the joint activation of many cortical and subcortical brain regions, also those traditionally implicated in value-based learning. However, how the pupil tracks value-based decisions and reinforcement learning is unknown. We combined a reinforcement learning task with a computational model to study pupil responses during value-based decisions, and decision evaluations. We found that the pupil closely tracks reinforcement learning both across trials and participants. Prior to choice, the pupil dilated as a function of trial-by-trial fluctuations in value beliefs. After feedback, early dilation scaled with value uncertainty, whereas later constriction scaled with reward prediction errors. Our computational approach systematically implicates the pupil in value-based decisions, and the subsequent processing of violated value beliefs, ttese dissociable influences provide an exciting possibility to non-invasively study ongoing reinforcement learning in the pupil.


Sign in / Sign up

Export Citation Format

Share Document