scholarly journals Contextual influence on confidence judgments in human reinforcement learning

2018 ◽  
Author(s):  
Maël Lebreton ◽  
Karin Bacily ◽  
Stefano Palminteri ◽  
Jan B. Engelmann

AbstractThe ability to correctly estimate the probability of one’s choices being correct is fundamental to optimally re-evaluate previous choices or to arbitrate between different decision strategies. Experimental evidence nonetheless suggests that this metacognitive process -referred to as a confidence judgment-is susceptible to numerous biases. We investigate the effect of outcome valence (gains or losses) on confidence while participants learned stimulus-outcome associations by trial-and-error. In two experiments, we demonstrate that participants are more confident in their choices when learning to seek gains compared to avoiding losses. Importantly, these differences in confidence were observed despite objectively equal choice difficulty and similar observed performance between those two contexts. Using computational modelling, we show that this bias is driven by the context-value, a dynamically updated estimate of the average expected-value of choice options that has previously been demonstrated to be necessary to explain equal performance in the gain and loss domain. The biasing effect of context-value on confidence, also recently observed in the context of incentivized perceptual decision-making, is therefore domain-general, with likely important functional consequences.


2021 ◽  
Author(s):  
Miguel Barretto Garcia ◽  
Marcus Grueschow ◽  
Marius Moisa ◽  
Rafael Polania ◽  
Christian Carl Ruff

Humans and animals can flexibly choose their actions based on different information, ranging from objective states of the environment (e.g., apples are bigger than cherries) to subjective preferences (e.g., cherries are tastier than apples). Whether the brain instantiates these different choices by recruiting either specialized or shared neural circuitry remains debated. Specifically, domain-general theories of prefrontal cortex (PFC) function propose that prefrontal areas flexibly process either perceptual or value-based evidence depending on what is required for the present choice, whereas domain-specific theories posit that PFC sub- areas, such as the left superior frontal sulcus (SFS), selectively integrate evidence relevant for perceptual decisions. Here we comprehensively test the functional role of the left SFS for choices based on perceptual and value-based evidence, by combining fMRI with a behavioural paradigm, computational modelling, and transcranial magnetic stimulation. Confirming predictions by a sequential sampling model, we show that TMS-induced excitability reduction of the left SFS selectively changes the processing of decision-relevant perceptual information and associated neural processes. In contrast, value-based decision making and associated neural processes remain unaffected. This specificity of SFS function is evident at all levels of analysis (behavioural, computational, and neural, including functional connectivity), demonstrating that the left SFS causally contributes to evidence integration for  perceptual but not value-based decisions.



2018 ◽  
Author(s):  
C.M.C. Correa ◽  
S. Noorman ◽  
J. Jiang ◽  
S. Palminteri ◽  
M.X Cohen ◽  
...  

AbstractThe extent to which subjective awareness influences reward processing, and thereby affects future decisions is currently largely unknown. In the present report, we investigated this question in a reinforcement-learning framework, combining perceptual masking, computational modeling and electroencephalographic recordings (human male and female participants). Our results indicate that degrading the visibility of the reward decreased -without completely obliterating- the ability of participants to learn from outcomes, but concurrently increased their tendency to repeat previous choices. We dissociated electrophysiological signatures evoked by the reward-based learning processes from those elicited by the reward-independent repetition of previous choices and showed that these neural activities were significantly modulated by reward visibility. Overall, this report sheds new light on the neural computations underlying reward-based learning and decision-making and highlights that awareness is beneficial for the trial-by-trial adjustment of decision-making strategies.Significance statementThe notion of reward is strongly associated with subjective evaluation, related to conscious processes such as “pleasure”, “liking” and “wanting”. Here we show that degrading reward visibility in a reinforcement learning task decreases -without completely obliterating- the ability of participants to learn from outcomes, but concurrently increases subjects tendency to repeat previous choices. Electrophysiological recordings, in combination with computational modelling, show that neural activities were significantly modulated by reward visibility. Overall, we dissociate different neural computations underlying reward-based learning and decision-making, which highlights a beneficial role of reward awareness in adjusting decision-making strategies.



Author(s):  
Zhenhai Gao ◽  
Xiangtong Yan ◽  
Fei Gao ◽  
Lei He

Decision-making is one of the key parts of the research on vehicle longitudinal autonomous driving. Considering the behavior of human drivers when designing autonomous driving decision-making strategies is a current research hotspot. In longitudinal autonomous driving decision-making strategies, traditional rule-based decision-making strategies are difficult to apply to complex scenarios. Current decision-making methods that use reinforcement learning and deep reinforcement learning construct reward functions designed with safety, comfort, and economy. Compared with human drivers, the obtained decision strategies still have big gaps. Focusing on the above problems, this paper uses the driver’s behavior data to design the reward function of the deep reinforcement learning algorithm through BP neural network fitting, and uses the deep reinforcement learning DQN algorithm and the DDPG algorithm to establish two driver-like longitudinal autonomous driving decision-making models. The simulation experiment compares the decision-making effect of the two models with the driver curve. The results shows that the two algorithms can realize driver-like decision-making, and the consistency of the DDPG algorithm and human driver behavior is higher than that of the DQN algorithm, the effect of the DDPG algorithm is better than the DQN algorithm.



2021 ◽  
Author(s):  
Uri Hertz ◽  
Vaughan Bell ◽  
Nichola Raihani

Social learning underpins our species’ extraordinary success. Learning through observation has been investigated in several species but learning from advice – where information is intentionally broadcast – is less understood. We used a pre-registered, online experiment (N=1492) combined with computational modelling to examine learning through observation and advice. Participants were more likely to immediately follow advice than to copy an observed choice but this was dependent upon trust in the adviser: highly paranoid participants were less likely to follow advice in the short-term. Reinforcement learning modelling revealed two distinct patterns regarding the long-term effects of social information: some individuals relied fully on social information whereas others reverted to trial-and-error learning. This variation may affect prevalence and fidelity of socially-transmitted information. Our results highlight the privileged status of advice relative to observation and how assimilation of intentionally-broadcasted information is affected by trust in others.



2018 ◽  
Author(s):  
Sophie Bavard ◽  
Maël Lebreton ◽  
Mehdi Khamassi ◽  
Giorgio Coricelli ◽  
Stefano Palminteri

AbstractIn economics and in perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual information pertaining to decision states is integrated in learning algorithms has received comparably little attention. Here, in an attempt to fill this gap, we investigated reinforcement learning behavior and its computational substrates in a task where we orthogonally manipulated both outcome valence and magnitude, resulting in systematic variations in state-values. Over two experiments, model comparison indicated that subjects’ behavior is best accounted for by an algorithm which includes both reference point-dependence and range-adaptation – two crucial features of state-dependent valuation. In addition, we found state-dependent outcome valuation to progressively emerge over time, to be favored by increasing outcome information and to be correlated with explicit understanding of the task structure. Finally, our data clearly show that, while being locally adaptive (for instance in negative valence and small magnitude contexts), state-dependent valuation comes at the cost of seemingly irrational choices, when options are extrapolated out from their original contexts.



2021 ◽  
Vol 288 (1961) ◽  
Author(s):  
Uri Hertz ◽  
Vaughan Bell ◽  
Nichola Raihani

Social learning underpins our species's extraordinary success. Learning through observation has been investigated in several species, but learning from advice—where information is intentionally broadcast—is less understood. We used a pre-registered, online experiment ( n = 1492) combined with computational modelling to examine learning through observation and advice. Participants were more likely to immediately follow advice than to copy an observed choice, but this was dependent upon trust in the adviser: highly paranoid participants were less likely to follow advice in the short term. Reinforcement learning modelling revealed two distinct patterns regarding the long-term effects of social information: some individuals relied fully on social information, whereas others reverted to trial-and-error learning. This variation may affect the prevalence and fidelity of socially transmitted information. Our results highlight the privileged status of advice relative to observation and how the assimilation of intentionally broadcast information is affected by trust in others.



2017 ◽  
Author(s):  
Tobias U. Hauser ◽  
Micah Allen ◽  
Geraint Rees ◽  
Raymond J. Dolan ◽  

AbstractAwareness of one’s own abilities is of paramount importance in adaptive decision making. Psychotherapeutic theories assume such metacognitive insight is impaired in compulsivity, though this is supported by scant empirical evidence. In this study, we investigate metacognitive abilities in compulsive participants using computational models, where these enable a segregation between metacognitive and perceptual decision making impairments. We examined twenty low-compulsive and twenty high-compulsive participants, recruited from a large population-based sample, and matched for other psychiatric and cognitive dimensions. Hierarchical computational modelling of the participants’ metacognitive abilities on a visual global motion detection paradigm revealed that high-compulsive participants had a reduced metacognitive ability. This impairment was accompanied by a perceptual decision making deficit whereby motion-related evidence was accumulated more slowly in high compulsive participants. Our study shows that the compulsivity spectrum is associated with a reduced ability to monitor one’s own performance, over and above any perceptual decision making difficulty.



2019 ◽  
Author(s):  
Benjamin James Dyson ◽  
Ben Albert Steward ◽  
Tea Meneghetti ◽  
Lewis Forder

AbstractTo understand the boundaries we set for ourselves in terms of environmental responsibility during competition, we examined a neural index of outcome valence (feedback-related negativity; FRN) in relation to earlier indices of visual attention (N1), later indices of motivational significance (P3), and, eventual behaviour. In Experiment 1 (n=36), participants either were (play) or were not (observe) responsible for action selection. In Experiment 2 (n=36), opponents additionally either could (exploitable) or could not (unexploitable) be beaten. Various failures in reinforcement learning expression were revealed including large-scale approximations of random behaviour. Against unexploitable opponents, N1 determined the extent to which negative and positive outcomes were perceived as distinct categories by FRN. Against exploitable opponents, FRN determined the extent to which P3 generated neural gain for future events. Differential activation of the N1 – FRN – P3 processing chain provides a framework for understanding the behavioural dynamism observed during competitive decision making.



2020 ◽  
Vol 123 (6) ◽  
pp. 2235-2248
Author(s):  
Deborah A. Barany ◽  
Ana Gómez-Granados ◽  
Margaret Schrayer ◽  
Sarah A. Cutts ◽  
Tarkeshwar Singh

Visual processing for perception and for action is thought to be mediated by two specialized neural pathways. Using a visuomotor decision-making task, we show that participants differentially utilized online perceptual decision-making in reaching and interception and that eye movements necessary for perception influenced motor decision strategies. These results provide evidence that task complexity modulates how pathways processing perception versus action information interact during the visual control of movement.



2018 ◽  
Author(s):  
Stefano Palminteri ◽  
Laura Fontanesi ◽  
Maël Lebreton

When humans and animals learn by trial-and-error to select the most advantageous action, the progressive increase in action selection accuracy due to learning is typically accompanied by a decrease in the time needed to execute this action. Both choice and response time (RT) data can thus provide information about decision and learning processes. However, traditional reinforcement learning (RL) models focus exclusively on the increase in choice accuracy and ignore RTs. Consequently, they neither decompose the interactions between choices and RTs, nor investigate how these interactions are influenced by contextual factors. However, at least in the field of perceptual decision-making, such interactions have proven to be important to dissociate between the underlying processes. Here, we analyzed such interactions in behavioral data from four experiments, which feature manipulations of two factors: outcome valence (gains vs. losses) and feedback information (partial vs. complete feedback). A Bayesian meta-analysis revealed that these contextual factors differently affect RTs and accuracy. To disentangle the processes underlying the observed behavioral patterns, we jointly fitted choices and RTs across all experiments with a single, Bayesian, hierarchical diffusion decision model (DDM). In punishment-avoidance contexts, compared to reward-seeking contexts, participants consistently slowed down without any loss of accuracy. The DDM explained these effects by shifts in the non-decision time and threshold parameters. The reduced motor facilitation may represent the basis of Pavlovian-to-instrumental transfer biases, while the increased cautiousness might be induced by the expectation of losses and be consistent with the loss attention framework.



Sign in / Sign up

Export Citation Format

Share Document