scholarly journals Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences

2018 ◽  
Author(s):  
Sophie Bavard ◽  
Maël Lebreton ◽  
Mehdi Khamassi ◽  
Giorgio Coricelli ◽  
Stefano Palminteri

AbstractIn economics and in perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual information pertaining to decision states is integrated in learning algorithms has received comparably little attention. Here, in an attempt to fill this gap, we investigated reinforcement learning behavior and its computational substrates in a task where we orthogonally manipulated both outcome valence and magnitude, resulting in systematic variations in state-values. Over two experiments, model comparison indicated that subjects’ behavior is best accounted for by an algorithm which includes both reference point-dependence and range-adaptation – two crucial features of state-dependent valuation. In addition, we found state-dependent outcome valuation to progressively emerge over time, to be favored by increasing outcome information and to be correlated with explicit understanding of the task structure. Finally, our data clearly show that, while being locally adaptive (for instance in negative valence and small magnitude contexts), state-dependent valuation comes at the cost of seemingly irrational choices, when options are extrapolated out from their original contexts.


2018 ◽  
Author(s):  
Sophie Bavard ◽  
Maël Lebreton ◽  
Mehdi Khamassi ◽  
Giorgio Coricelli ◽  
Stefano Palminteri

In economics and in perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual information pertaining to decision states is integrated in learning algorithms has received comparably little attention. Here, in an attempt to fill this gap, we investigated reinforcement learning behavior and its computational substrates in a task where we orthogonally manipulated both outcome valence and magnitude, resulting in systematic variations in state-values. Over two experiments, model comparison indicated that subjects’ behavior is best accounted for by an algorithm which includes both reference point-dependence and range-adaptation – two crucial features of state-dependent valuation. In addition, we found state-dependent outcome valuation to progressively emerge over time, to be favored by increasing outcome information and to be correlated with explicit understanding of the task structure. Finally, our data clearly show that, while being locally adaptive (for instance in negative valence and small magnitude contexts), state-dependent valuation comes at the cost of seemingly irrational choices, when options are extrapolated out from their original contexts.



2020 ◽  
Vol 15 (6) ◽  
pp. 695-707 ◽  
Author(s):  
Lei Zhang ◽  
Lukas Lengersdorff ◽  
Nace Mikus ◽  
Jan Gläscher ◽  
Claus Lamm

Abstract The recent years have witnessed a dramatic increase in the use of reinforcement learning (RL) models in social, cognitive and affective neuroscience. This approach, in combination with neuroimaging techniques such as functional magnetic resonance imaging, enables quantitative investigations into latent mechanistic processes. However, increased use of relatively complex computational approaches has led to potential misconceptions and imprecise interpretations. Here, we present a comprehensive framework for the examination of (social) decision-making with the simple Rescorla–Wagner RL model. We discuss common pitfalls in its application and provide practical suggestions. First, with simulation, we unpack the functional role of the learning rate and pinpoint what could easily go wrong when interpreting differences in the learning rate. Then, we discuss the inevitable collinearity between outcome and prediction error in RL models and provide suggestions of how to justify whether the observed neural activation is related to the prediction error rather than outcome valence. Finally, we suggest posterior predictive check is a crucial step after model comparison, and we articulate employing hierarchical modeling for parameter estimation. We aim to provide simple and scalable explanations and practical guidelines for employing RL models to assist both beginners and advanced users in better implementing and interpreting their model-based analyses.



2019 ◽  
Author(s):  
Lei Zhang ◽  
Lukas Lengersdorff ◽  
Nace Mikus ◽  
Jan Gläscher ◽  
Claus Lamm

Recent years have witnessed a dramatic increase in the use of reinforcement learning (RL) models in social, cognitive and affective neuroscience. This approach, in combination with neuroimaging techniques such as functional magnetic resonance imaging, enables quantitative investigations into latent mechanistic processes. However, increased use of relatively complex computational approaches has led to potential misconceptions and imprecise interpretations. Here, we present a comprehensive framework for the examination of (social) decision-making with the simple Rescorla-Wagner RL model. We discuss common pitfalls in its application and provide practical suggestions. First, with simulation, we unpack the functional role of the learning rate and pinpoint what could easily go wrong when interpreting differences in the learning rate. Then, we discuss the inevitable collinearity between outcome and prediction error in RL models and provide suggestions of how to justify whether the observed neural activation is related to the prediction error rather than outcome valence. Finally, we suggest posterior predictive check is a crucial step after model comparison, and we articulate employing hierarchical modeling for parameter estimation. We aim to provide simple and scalable explanations and practical guidelines for employing RL models to assist both beginners and advanced users in better implementing and interpreting their model-based analyses.



2018 ◽  
Vol 9 (1) ◽  
Author(s):  
Sophie Bavard ◽  
Maël Lebreton ◽  
Mehdi Khamassi ◽  
Giorgio Coricelli ◽  
Stefano Palminteri


2018 ◽  
Author(s):  
Maël Lebreton ◽  
Karin Bacily ◽  
Stefano Palminteri ◽  
Jan B. Engelmann

AbstractThe ability to correctly estimate the probability of one’s choices being correct is fundamental to optimally re-evaluate previous choices or to arbitrate between different decision strategies. Experimental evidence nonetheless suggests that this metacognitive process -referred to as a confidence judgment-is susceptible to numerous biases. We investigate the effect of outcome valence (gains or losses) on confidence while participants learned stimulus-outcome associations by trial-and-error. In two experiments, we demonstrate that participants are more confident in their choices when learning to seek gains compared to avoiding losses. Importantly, these differences in confidence were observed despite objectively equal choice difficulty and similar observed performance between those two contexts. Using computational modelling, we show that this bias is driven by the context-value, a dynamically updated estimate of the average expected-value of choice options that has previously been demonstrated to be necessary to explain equal performance in the gain and loss domain. The biasing effect of context-value on confidence, also recently observed in the context of incentivized perceptual decision-making, is therefore domain-general, with likely important functional consequences.



2018 ◽  
Author(s):  
Stefano Palminteri ◽  
Laura Fontanesi ◽  
Maël Lebreton

When humans and animals learn by trial-and-error to select the most advantageous action, the progressive increase in action selection accuracy due to learning is typically accompanied by a decrease in the time needed to execute this action. Both choice and response time (RT) data can thus provide information about decision and learning processes. However, traditional reinforcement learning (RL) models focus exclusively on the increase in choice accuracy and ignore RTs. Consequently, they neither decompose the interactions between choices and RTs, nor investigate how these interactions are influenced by contextual factors. However, at least in the field of perceptual decision-making, such interactions have proven to be important to dissociate between the underlying processes. Here, we analyzed such interactions in behavioral data from four experiments, which feature manipulations of two factors: outcome valence (gains vs. losses) and feedback information (partial vs. complete feedback). A Bayesian meta-analysis revealed that these contextual factors differently affect RTs and accuracy. To disentangle the processes underlying the observed behavioral patterns, we jointly fitted choices and RTs across all experiments with a single, Bayesian, hierarchical diffusion decision model (DDM). In punishment-avoidance contexts, compared to reward-seeking contexts, participants consistently slowed down without any loss of accuracy. The DDM explained these effects by shifts in the non-decision time and threshold parameters. The reduced motor facilitation may represent the basis of Pavlovian-to-instrumental transfer biases, while the increased cautiousness might be induced by the expectation of losses and be consistent with the loss attention framework.



2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Batel Yifrah ◽  
Ayelet Ramaty ◽  
Genela Morris ◽  
Avi Mendelsohn

AbstractDecision making can be shaped both by trial-and-error experiences and by memory of unique contextual information. Moreover, these types of information can be acquired either by means of active experience or by observing others behave in similar situations. The interactions between reinforcement learning parameters that inform decision updating and memory formation of declarative information in experienced and observational learning settings are, however, unknown. In the current study, participants took part in a probabilistic decision-making task involving situations that either yielded similar outcomes to those of an observed player or opposed them. By fitting alternative reinforcement learning models to each subject, we discerned participants who learned similarly from experience and observation from those who assigned different weights to learning signals from these two sources. Participants who assigned different weights to their own experience versus those of others displayed enhanced memory performance as well as subjective memory strength for episodes involving significant reward prospects. Conversely, memory performance of participants who did not prioritize their own experience over others did not seem to be influenced by reinforcement learning parameters. These findings demonstrate that interactions between implicit and explicit learning systems depend on the means by which individuals weigh relevant information conveyed via experience and observation.



2018 ◽  
Vol 232 ◽  
pp. 04002
Author(s):  
Fang Dong ◽  
Ou Li ◽  
Min Tong

With the rapid development and wide use of MANET, the quality of service for various businesses is much higher than before. Aiming at the adaptive routing control with multiple parameters for universal scenes, we propose an intelligent routing control algorithm for MANET based on reinforcement learning, which can constantly optimize the node selection strategy through the interaction with the environment and converge to the optimal transmission paths gradually. There is no need to update the network state frequently, which can save the cost of routing maintenance while improving the transmission performance. Simulation results show that, compared with other algorithms, the proposed approach can choose appropriate paths under constraint conditions, and can obtain better optimization objective.



2020 ◽  
Vol 30 (6) ◽  
pp. 3573-3589 ◽  
Author(s):  
Rick A Adams ◽  
Michael Moutoussis ◽  
Matthew M Nour ◽  
Tarik Dahoun ◽  
Declan Lewis ◽  
...  

Abstract Choosing actions that result in advantageous outcomes is a fundamental function of nervous systems. All computational decision-making models contain a mechanism that controls the variability of (or confidence in) action selection, but its neural implementation is unclear—especially in humans. We investigated this mechanism using two influential decision-making frameworks: active inference (AI) and reinforcement learning (RL). In AI, the precision (inverse variance) of beliefs about policies controls action selection variability—similar to decision ‘noise’ parameters in RL—and is thought to be encoded by striatal dopamine signaling. We tested this hypothesis by administering a ‘go/no-go’ task to 75 healthy participants, and measuring striatal dopamine 2/3 receptor (D2/3R) availability in a subset (n = 25) using [11C]-(+)-PHNO positron emission tomography. In behavioral model comparison, RL performed best across the whole group but AI performed best in participants performing above chance levels. Limbic striatal D2/3R availability had linear relationships with AI policy precision (P = 0.029) as well as with RL irreducible decision ‘noise’ (P = 0.020), and this relationship with D2/3R availability was confirmed with a ‘decision stochasticity’ factor that aggregated across both models (P = 0.0006). These findings are consistent with occupancy of inhibitory striatal D2/3Rs decreasing the variability of action selection in humans.



Author(s):  
Hossein Esfandiari ◽  
MohammadTaghi HajiAghayi ◽  
Brendan Lucier ◽  
Michael Mitzenmacher

We consider online variations of the Pandora’s box problem (Weitzman 1979), a standard model for understanding issues related to the cost of acquiring information for decision-making. Our problem generalizes both the classic Pandora’s box problem and the prophet inequality framework. Boxes are presented online, each with a random value and cost drawn jointly from some known distribution. Pandora chooses online whether to open each box given its cost, and then chooses irrevocably whether to keep the revealed prize or pass on it. We aim for approximation algorithms against adversaries that can choose the largest prize over any opened box, and use optimal offline policies to decide which boxes to open (without knowledge of the value inside)1. We consider variations where Pandora can collect multiple prizes subject to feasibility constraints, such as cardinality, matroid, or knapsack constraints. We also consider variations related to classic multi-armed bandit problems from reinforcement learning. Our results use a reduction-based framework where we separate the issues of the cost of acquiring information from the online decision process of which prizes to keep. Our work shows that in many scenarios, Pandora can achieve a good approximation to the best possible performance.



Sign in / Sign up

Export Citation Format

Share Document