Context-dependent outcome encoding in human reinforcement learning
A wealth of evidence in perceptual and economic decision-making research suggests that the subjective value of one option is determined by other available options (i.e. the context). A series of studies provides evidence that the same coding principles apply to situations where decisions are shaped by past outcomes, i.e. in reinforcement-learning situations. In bandit tasks, human behavior is explained by models assuming that individuals do not learn the objective value of an outcome, but rather its subjective, context-dependent representation. We argue that, while such outcome context-dependence may be informationally or ecologically optimal, it concomitantly undermines the capacity to generalize value-based knowledge to new contexts – sometimes creating apparent decision paradoxes.