scholarly journals How the level of reward awareness changes the computational and electrophysiological signatures of reinforcement learning

2018 ◽  
Author(s):  
C.M.C. Correa ◽  
S. Noorman ◽  
J. Jiang ◽  
S. Palminteri ◽  
M.X Cohen ◽  
...  

AbstractThe extent to which subjective awareness influences reward processing, and thereby affects future decisions is currently largely unknown. In the present report, we investigated this question in a reinforcement-learning framework, combining perceptual masking, computational modeling and electroencephalographic recordings (human male and female participants). Our results indicate that degrading the visibility of the reward decreased -without completely obliterating- the ability of participants to learn from outcomes, but concurrently increased their tendency to repeat previous choices. We dissociated electrophysiological signatures evoked by the reward-based learning processes from those elicited by the reward-independent repetition of previous choices and showed that these neural activities were significantly modulated by reward visibility. Overall, this report sheds new light on the neural computations underlying reward-based learning and decision-making and highlights that awareness is beneficial for the trial-by-trial adjustment of decision-making strategies.Significance statementThe notion of reward is strongly associated with subjective evaluation, related to conscious processes such as “pleasure”, “liking” and “wanting”. Here we show that degrading reward visibility in a reinforcement learning task decreases -without completely obliterating- the ability of participants to learn from outcomes, but concurrently increases subjects tendency to repeat previous choices. Electrophysiological recordings, in combination with computational modelling, show that neural activities were significantly modulated by reward visibility. Overall, we dissociate different neural computations underlying reward-based learning and decision-making, which highlights a beneficial role of reward awareness in adjusting decision-making strategies.

2020 ◽  
Vol 10 (8) ◽  
pp. 508
Author(s):  
Hiroyoshi Ogishima ◽  
Shunta Maeda ◽  
Yuki Tanaka ◽  
Hironori Shimada

Background: In this study, we examined the relationships between reward-based decision-making in terms of learning rate, memory rate, exploration rate, and depression-related subjective emotional experience, in terms of interoception and feelings, to understand how reward-based decision-making is impaired in depression. Methods: In all, 52 university students were randomly assigned to an experimental group and a control group. To manipulate interoception, the participants in the experimental group were instructed to tune their internal somatic sense to the skin-conductance-response waveform presented on a display. The participants in the control group were only instructed to stay relaxed. Before and after the manipulation, the participants completed a probabilistic reversal-learning task to assess reward-based decision-making using reinforcement learning modeling. Similarly, participants completed a probe-detection task, a heartbeat-detection task, and self-rated scales. Results: The experimental manipulation of interoception was not successful. In the baseline testing, reinforcement learning modeling indicated a marginally-significant correlation between the exploration rate and depressive symptoms. However, the exploration rate was significantly associated with lower interoceptive attention and higher depressive feeling. Conclusions: The findings suggest that situational characteristics may be closely involved in reward exploration and highlight the clinically-meaningful possibility that intervention for affective processes may impact reward-based decision-making in those with depression.


2018 ◽  
Author(s):  
Nura Sidarus ◽  
Stefano Palminteri ◽  
Valérian Chambon

AbstractValue-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful actions, and discounting the value of obtained rewards. Facing conflicts between competing action alternatives is considered aversive, as recruiting cognitive control to overcome conflict is effortful. Yet, it remains unclear whether conflict is also perceived as a cost in value-based decisions. The present study investigated this question by embedding irrelevant distractors (flanker arrows) within a reversal-learning task, with intermixed free and instructed trials. Results showed that participants learned to adapt their choices to maximize rewards, but were nevertheless biased to follow the suggestions of irrelevant distractors. Thus, the perceived cost of being in conflict with an external suggestion could sometimes trump internal value representations. By adapting computational models of reinforcement learning, we assessed the influence of conflict at both the decision and learning stages. Modelling the decision showed that conflict was avoided when evidence for either action alternative was weak, demonstrating that the cost of conflict was traded off against expected rewards. During the learning phase, we found that learning rates were reduced in instructed, relative to free, choices. Learning rates were further reduced by conflict between an instruction and subjective action values, whereas learning was not robustly influenced by conflict between one’s actions and external distractors. Our results show that the subjective cost of conflict factors into value-based decision-making, and highlights that different types of conflict may have different effects on learning about action outcomes.


2019 ◽  
Author(s):  
Charles Findling ◽  
Nicolas Chopin ◽  
Etienne Koechlin

AbstractEveryday life features uncertain and ever-changing situations. In such environments, optimal adaptive behavior requires higher-order inferential capabilities to grasp the volatility of external contingencies. These capabilities however involve complex and rapidly intractable computations, so that we poorly understand how humans develop efficient adaptive behaviors in such environments. Here we demonstrate this counterintuitive result: simple, low-level inferential processes involving imprecise computations conforming to the psychophysical Weber Law actually lead to near-optimal adaptive behavior, regardless of the environment volatility. Using volatile experimental settings, we further show that such imprecise, low-level inferential processes accounted for observed human adaptive performances, unlike optimal adaptive models involving higher-order inferential capabilities, their biologically more plausible, algorithmic approximations and non-inferential adaptive models like reinforcement learning. Thus, minimal inferential capabilities may have evolved along with imprecise neural computations as contributing to near-optimal adaptive behavior in real-life environments, while leading humans to make suboptimal choices in canonical decision-making tasks.


2020 ◽  
Author(s):  
Milena Rmus ◽  
Samuel McDougle ◽  
Anne Collins

Reinforcement learning (RL) models have advanced our understanding of how animals learn and make decisions, and how the brain supports some aspects of learning. However, the neural computations that are explained by RL algorithms fall short of explaining many sophisticated aspects of human decision making, including the generalization of learned information, one-shot learning, and the synthesis of task information in complex environments. Instead, these aspects of instrumental behavior are assumed to be supported by the brain’s executive functions (EF). We review recent findings that highlight the importance of EF in learning. Specifically, we advance the theory that EF sets the stage for canonical RL computations in the brain, providing inputs that broaden their flexibility and applicability. Our theory has important implications for how to interpret RL computations in the brain and behavior.


2018 ◽  
Vol 30 (10) ◽  
pp. 1391-1404 ◽  
Author(s):  
Wouter Kool ◽  
Samuel J. Gershman ◽  
Fiery A. Cushman

Decision-making algorithms face a basic tradeoff between accuracy and effort (i.e., computational demands). It is widely agreed that humans can choose between multiple decision-making processes that embody different solutions to this tradeoff: Some are computationally cheap but inaccurate, whereas others are computationally expensive but accurate. Recent progress in understanding this tradeoff has been catalyzed by formalizing it in terms of model-free (i.e., habitual) versus model-based (i.e., planning) approaches to reinforcement learning. Intuitively, if two tasks offer the same rewards for accuracy but one of them is much more demanding, we might expect people to rely on habit more in the difficult task: Devoting significant computation to achieve slight marginal accuracy gains would not be “worth it.” We test and verify this prediction in a sequential reinforcement learning task. Because our paradigm is amenable to formal analysis, it contributes to the development of a computational model of how people balance the costs and benefits of different decision-making processes in a task-specific manner; in other words, how we decide when hard thinking is worth it.


2021 ◽  
Author(s):  
Monja P. Neuser ◽  
Franziska Kräutlein ◽  
Anne Kühnel ◽  
Vanessa Teckentrup ◽  
Jennifer Svaldi ◽  
...  

AbstractReinforcement learning is a core facet of motivation and alterations have been associated with various mental disorders. To build better models of individual learning, repeated measurement of value-based decision-making is crucial. However, the focus on lab-based assessment of reward learning has limited the number of measurements and the test-retest reliability of many decision-related parameters is therefore unknown. Here, we developed an open-source cross-platform application Influenca that provides a novel reward learning task complemented by ecological momentary assessment (EMA) for repeated assessment over weeks. In this task, players have to identify the most effective medication by selecting the best option after integrating offered points with changing probabilities (according to random Gaussian walks). Participants can complete up to 31 levels with 150 trials each. To encourage replay on their preferred device, in-game screens provide feedback on the progress. Using an initial validation sample of 127 players (2904 runs), we found that reinforcement learning parameters such as the learning rate and reward sensitivity show low to medium intra-class correlations (ICC: 0.22-0.52), indicating substantial within- and between-subject variance. Notably, state items showed comparable ICCs as reinforcement learning parameters. To conclude, our innovative and openly customizable app framework provides a gamified task that optimizes repeated assessments of reward learning to better quantify intra- and inter-individual differences in value-based decision-making over time.


2021 ◽  
Author(s):  
Sebastian Bruch ◽  
Patrick McClure ◽  
Jingfeng Zhou ◽  
Geoffrey Schoenbaum ◽  
Francisco Pereira

Deep Reinforcement Learning (Deep RL) agents have in recent years emerged as successful models of animal behavior in a variety of complex learning tasks, as exemplified by Song et al. [2017]. As agents are typically trained to mimic an animal subject, the emphasis in past studies on behavior as a means of evaluating the fitness of models to experimental data is only natural. But the true power of Deep RL agents lies in their ability to learn neural computations and codes that generate a particular behavior|factors that are also of great relevance and interest to computational neuroscience. On that basis, we believe that model evaluation should include an examination of neural representations and validation against neural recordings from animal subjects. In this paper, we introduce a procedure to test hypotheses about the relationship between internal representations of Deep RL agents and those in animal neural recordings. Taking a sequential learning task as a running example, we apply our method and show that the geometry of representations learnt by artificial agents is similar to that of the biological subjects', and that such similarities are driven by shared information in some latent space. Our method is applicable to any Deep RL agent that learns a Markov Decision Process, and as such enables researchers to assess the suitability of more advanced Deep Learning modules, or map hierarchies of representations to different parts of a circuit in the brain, and help shed light on their function. To demonstrate that point, we conduct an ablation study to deduce that, in the sequential task under consideration, temporal information plays a key role in molding a correct representation of the task.


2018 ◽  
Author(s):  
Samuel D. McDougle ◽  
Peter A. Butcher ◽  
Darius Parvin ◽  
Fasial Mushtaq ◽  
Yael Niv ◽  
...  

AbstractDecisions must be implemented through actions, and actions are prone to error. As such, when an expected outcome is not obtained, an individual should not only be sensitive to whether the choice itself was suboptimal, but also whether the action required to indicate that choice was executed successfully. The intelligent assignment of credit to action execution versus action selection has clear ecological utility for the learner. To explore this scenario, we used a modified version of a classic reinforcement learning task in which feedback indicated if negative prediction errors were, or were not, associated with execution errors. Using fMRI, we asked if prediction error computations in the human striatum, a key substrate in reinforcement learning and decision making, are modulated when a failure in action execution results in the negative outcome. Participants were more tolerant of non-rewarded outcomes when these resulted from execution errors versus when execution was successful but the reward was withheld. Consistent with this behavior, a model-driven analysis of neural activity revealed an attenuation of the signal associated with negative reward prediction error in the striatum following execution failures. These results converge with other lines of evidence suggesting that prediction errors in the mesostriatal dopamine system integrate high-level information during the evaluation of instantaneous reward outcomes.


2018 ◽  
Author(s):  
Joanne C. Van Slooten ◽  
Sara Jahfari ◽  
Tomas Knapen ◽  
Jan Theeuwes

AbstractPupil responses have been used to track cognitive processes during decision-making. Studies have shown that in these cases the pupil reflects the joint activation of many cortical and subcortical brain regions, also those traditionally implicated in value-based learning. However, how the pupil tracks value-based decisions and reinforcement learning is unknown. We combined a reinforcement learning task with a computational model to study pupil responses during value-based decisions, and decision evaluations. We found that the pupil closely tracks reinforcement learning both across trials and participants. Prior to choice, the pupil dilated as a function of trial-by-trial fluctuations in value beliefs. After feedback, early dilation scaled with value uncertainty, whereas later constriction scaled with reward prediction errors. Our computational approach systematically implicates the pupil in value-based decisions, and the subsequent processing of violated value beliefs, ttese dissociable influences provide an exciting possibility to non-invasively study ongoing reinforcement learning in the pupil.


2021 ◽  
Vol 92 (8) ◽  
pp. A4.2-A5
Author(s):  
Luis Manssuer ◽  
Ding Qiong ◽  
Yijie Zhao ◽  
Rocky Yang ◽  
ChenCheng Zhang ◽  
...  

Objectives/AimsTo examine the temporal and spectral characteristics of local field potentials recorded from the amygdala in epilepsy in the context of the anticipation and receipt of rewards and losses using an incentive learning task and a risky decision-making task.Methods16 Epilepsy patients completed two tasks. In the monetary incentive delay (MID) task, patients saw reward and loss cues which indicated whether money could be won or lost depending on whether a subsequent response was or was not quick/accurate enough, respectively. This was compared with neutral cues where responses were neither rewarded nor punished regardless of response.In the risk task, patients were presented with two face down cards with values ranging from 1 to 10. When the first card is revealed, patients have to choose whether to bet or not bet that the second card is higher. After the card is revealed, patients receive a monetary reward if it is higher and a loss if it is lower. If patients do not bet, they receive nothing.ResultsIn both tasks, patients showed larger left amygdala theta band oscillatory activity to the receipt of monetary rewards compared to no money. In contrast, there were no significant responses to monetary losses. During the decision phase of the risk task, there was increased theta activity when patients chose to bet instead of not betting and when the decision had low risk (card <= 5) compared to high risk (card above 5). There were no effects of uncertainty.ConclusionsThe combined results of these two studies embellish our understanding of the role of the amygdala in motivation and decision-making processes and lend further support for its role in reward related processes rather than its often cited fear-related functions (Baxter & Murray, 2002; Murray, 2007). Theta activation is linked to cognitive processes in frontal cortices and coupled to MTL activity (Helfrich & Knight, 2016). As left amygdala theta activation was only recruited when patients were making their bet and not just anticipating reward, the pattern of results lend support to its role in cognition-emotion interactions specific to risk and reward but not uncertainty. Indeed, the hemispheric asymmetry is highly consistent with EEG studies showing left prefrontal dominance in reward processing (Manssuer et al., 2021).


Sign in / Sign up

Export Citation Format

Share Document