scholarly journals Prediction error in reinforcement learning: A meta-analysis of neuroimaging studies

2013 ◽  
Vol 37 (7) ◽  
pp. 1297-1310 ◽  
Author(s):  
Jane Garrison ◽  
Burak Erdeniz ◽  
John Done
2021 ◽  
Vol 12 ◽  
Author(s):  
Marcel Schulze ◽  
David Coghill ◽  
Silke Lux ◽  
Alexandra Philipsen

Background: Deficient decision-making (DM) in attention deficit/hyperactivity disorder (ADHD) is marked by altered reward sensitivity, higher risk taking, and aberrant reinforcement learning. Previous meta-analysis aggregate findings for the ADHD combined presentation (ADHD-C) mostly, while the ADHD predominantly inattentive presentation (ADHD-I) and the predominantly hyperactive/impulsive presentation (ADHD-H) were not disentangled. The objectives of the current meta-analysis were to aggregate findings from DM for each presentation separately.Methods: A comprehensive literature search of the PubMed (Medline) and Web of Science Database took place using the keywords “ADHD,” “attention-deficit/hyperactivity disorder,” “decision-making,” “risk-taking,” “reinforcement learning,” and “risky.” Random-effects models based on correlational effect-sizes were conducted. Heterogeneity analysis and sensitivity/outlier analysis were performed, and publication biases were assessed with funnel-plots and the egger intercept.Results: Of 1,240 candidate articles, seven fulfilled criteria for analysis of ADHD-C (N = 193), seven for ADHD-I (N = 256), and eight for ADHD-H (N = 231). Moderate effect-size were found for ADHD-C (r = 0.34; p = 0.0001; 95% CI = [0.19, 0.49]). Small effect-sizes were found for ADHD-I (r = 0.09; p = 0.0001; 95% CI = [0.008, 0.25]) and for ADHD-H (r = 0.1; p = 0.0001; 95% CI = [−0.012, 0.32]). Heterogeneity was moderate for ADHD-H. Sensitivity analyses show robustness of the analysis, and no outliers were detected. No publication bias was evident.Conclusion: This is the first study that uses a meta-analytic approach to investigate the relationship between the different presentations of ADHD separately. These findings provide first evidence of lesser pronounced impairment in DM for ADHD-I and ADHD-I compared to ADHD-C. While the exact factors remain elusive, the current study can be considered as a starting point to reveal the relationship of ADHD presentations and DM more detailed.


2020 ◽  
Author(s):  
Dongjae Kim ◽  
Jaeseung Jeong ◽  
Sang Wan Lee

AbstractThe goal of learning is to maximize future rewards by minimizing prediction errors. Evidence have shown that the brain achieves this by combining model-based and model-free learning. However, the prediction error minimization is challenged by a bias-variance tradeoff, which imposes constraints on each strategy’s performance. We provide new theoretical insight into how this tradeoff can be resolved through the adaptive control of model-based and model-free learning. The theory predicts the baseline correction for prediction error reduces the lower bound of the bias–variance error by factoring out irreducible noise. Using a Markov decision task with context changes, we showed behavioral evidence of adaptive control. Model-based behavioral analyses show that the prediction error baseline signals context changes to improve adaptability. Critically, the neural results support this view, demonstrating multiplexed representations of prediction error baseline within the ventrolateral and ventromedial prefrontal cortex, key brain regions known to guide model-based and model-free learning.One sentence summaryA theoretical, behavioral, computational, and neural account of how the brain resolves the bias-variance tradeoff during reinforcement learning is described.


2015 ◽  
Vol 113 (10) ◽  
pp. 3459-3461 ◽  
Author(s):  
Chong Chen

Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence.


2021 ◽  
Author(s):  
Philip R. Corlett ◽  
Jessica A Mollick ◽  
Hedy Kober

Prediction errors (PEs) are a keystone for computational neuroscience. Their association with midbrain neural firing has been confirmed across species and has inspired the construction of artificial intelligence that can outperform humans. However, there is still much to learn. Here, we leverage the wealth of human PE data acquired in the functional neuroimaging setting in service of a deeper understanding, using meta-analysis. Across 263 PE studies that have focused on reward, punishment, action, cognition, and perception, we found consistent region-PE associations that were posited theoretically or evinced in preclinical studies, but not yet established in humans, including midbrain PE signals during perceptual and Pavlovian tasks. Further, we found evidence for PEs over successor representations in orbitofrontal cortex, and for default mode network PE signals. By combining functional imaging meta-analysis with theory and basic research, we provide new insights into learning in machines, humans, and other animals.


2019 ◽  
Author(s):  
Melissa J. Sharpe ◽  
Hannah M. Batchelor ◽  
Lauren E. Mueller ◽  
Chun Yun Chang ◽  
Etienne J.P. Maes ◽  
...  

AbstractDopamine neurons fire transiently in response to unexpected rewards. These neural correlates are proposed to signal the reward prediction error described in model-free reinforcement learning algorithms. This error term represents the unpredicted or ‘excess’ value of the rewarding event. In model-free reinforcement learning, this value is then stored as part of the learned value of any antecedent cues, contexts or events, making them intrinsically valuable, independent of the specific rewarding event that caused the prediction error. In support of equivalence between dopamine transients and this model-free error term, proponents cite causal optogenetic studies showing that artificially induced dopamine transients cause lasting changes in behavior. Yet none of these studies directly demonstrate the presence of cached value under conditions appropriate for associative learning. To address this gap in our knowledge, we conducted three studies where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquired value and instead entered into value-independent associative relationships with the other cues or rewards. These results show that dopamine transients, constrained within appropriate learning situations, support valueless associative learning.


2020 ◽  
Vol 15 (6) ◽  
pp. 695-707 ◽  
Author(s):  
Lei Zhang ◽  
Lukas Lengersdorff ◽  
Nace Mikus ◽  
Jan Gläscher ◽  
Claus Lamm

Abstract The recent years have witnessed a dramatic increase in the use of reinforcement learning (RL) models in social, cognitive and affective neuroscience. This approach, in combination with neuroimaging techniques such as functional magnetic resonance imaging, enables quantitative investigations into latent mechanistic processes. However, increased use of relatively complex computational approaches has led to potential misconceptions and imprecise interpretations. Here, we present a comprehensive framework for the examination of (social) decision-making with the simple Rescorla–Wagner RL model. We discuss common pitfalls in its application and provide practical suggestions. First, with simulation, we unpack the functional role of the learning rate and pinpoint what could easily go wrong when interpreting differences in the learning rate. Then, we discuss the inevitable collinearity between outcome and prediction error in RL models and provide suggestions of how to justify whether the observed neural activation is related to the prediction error rather than outcome valence. Finally, we suggest posterior predictive check is a crucial step after model comparison, and we articulate employing hierarchical modeling for parameter estimation. We aim to provide simple and scalable explanations and practical guidelines for employing RL models to assist both beginners and advanced users in better implementing and interpreting their model-based analyses.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Alexandre Y. Dombrovski ◽  
Beatriz Luna ◽  
Michael N. Hallquist

Abstract When making decisions, should one exploit known good options or explore potentially better alternatives? Exploration of spatially unstructured options depends on the neocortex, striatum, and amygdala. In natural environments, however, better options often cluster together, forming structured value distributions. The hippocampus binds reward information into allocentric cognitive maps to support navigation and foraging in such spaces. Here we report that human posterior hippocampus (PH) invigorates exploration while anterior hippocampus (AH) supports the transition to exploitation on a reinforcement learning task with a spatially structured reward function. These dynamics depend on differential reinforcement representations in the PH and AH. Whereas local reward prediction error signals are early and phasic in the PH tail, global value maximum signals are delayed and sustained in the AH body. AH compresses reinforcement information across episodes, updating the location and prominence of the value maximum and displaying goal cell-like ramping activity when navigating toward it.


Sign in / Sign up

Export Citation Format

Share Document