Testing the reinforcement learning hypothesis of social conformity

AbstractOur preferences are influenced by the opinions of others. The past human neuroimaging studies on social conformity have identified a network of brain regions related to social conformity that includes the posterior medial frontal cortex (pMFC), anterior insula, and striatum. It was hypothesized that since these brain regions are also known to play important roles in reinforcement learning (i.e., processing prediction error), social conformity and reinforcement learning have a common neural mechanism. However, these two processes have previously never been directly compared; therefore, the extent to which they shared a common neural mechanism had remained unclear. This study aimed to formally test the hypothesis. The same group of participants (n = 25) performed social conformity and reinforcement learning tasks inside a functional magnetic resonance imaging (fMRI) scanner. Univariate fMRI data analyses revealed activation overlaps in the pMFC and bilateral insula between social conflict and unsigned prediction error and in the striatum between social conflict and signed prediction error. We further conducted multi-voxel pattern analysis (MVPA) for more direct evidence of a shared neural mechanism. MVPA did not reveal any evidence to support the hypothesis in any of these regions but found that activation patterns between social conflict and prediction error in these regions were largely distinct. Taken together, the present study provides no clear evidence of a common neural mechanism between social conformity and reinforcement learning.

Download Full-text

Prefrontal solution to the bias-variance tradeoff during reinforcement learning

10.1101/2020.12.23.424258 ◽

2020 ◽

Author(s):

Dongjae Kim ◽

Jaeseung Jeong ◽

Sang Wan Lee

Keyword(s):

Adaptive Control ◽

Reinforcement Learning ◽

Prediction Error ◽

Brain Regions ◽

Decision Task ◽

Prediction Errors ◽

Model Based ◽

Model Free ◽

Bias Variance ◽

The Brain

AbstractThe goal of learning is to maximize future rewards by minimizing prediction errors. Evidence have shown that the brain achieves this by combining model-based and model-free learning. However, the prediction error minimization is challenged by a bias-variance tradeoff, which imposes constraints on each strategy’s performance. We provide new theoretical insight into how this tradeoff can be resolved through the adaptive control of model-based and model-free learning. The theory predicts the baseline correction for prediction error reduces the lower bound of the bias–variance error by factoring out irreducible noise. Using a Markov decision task with context changes, we showed behavioral evidence of adaptive control. Model-based behavioral analyses show that the prediction error baseline signals context changes to improve adaptability. Critically, the neural results support this view, demonstrating multiplexed representations of prediction error baseline within the ventrolateral and ventromedial prefrontal cortex, key brain regions known to guide model-based and model-free learning.One sentence summaryA theoretical, behavioral, computational, and neural account of how the brain resolves the bias-variance tradeoff during reinforcement learning is described.

Download Full-text

Testing the reinforcement learning hypothesis of social conformity

Human Brain Mapping ◽

10.1002/hbm.25296 ◽

2020 ◽

Author(s):

Marie Levorsen ◽

Ayahito Ito ◽

Shinsuke Suzuki ◽

Keise Izuma

Keyword(s):

Reinforcement Learning ◽

Social Conformity ◽

Learning Hypothesis

Download Full-text

Attention and reinforcement learning in Parkinson’s disease

10.1101/2020.09.12.294702 ◽

2020 ◽

Author(s):

Brónagh McCoy ◽

Rebecca P. Lawson ◽

Jan Theeuwes

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Reinforcement Learning ◽

Interactive Effect ◽

Learning Task ◽

Brain Regions ◽

Learning Performance ◽

Activation Patterns ◽

Probabilistic Classification ◽

Dorsolateral Prefrontal

ABSTRACTDopamine is known to be involved in several important cognitive processes, most notably in learning from rewards and in the ability to attend to task-relevant aspects of the environment. Both of these features of dopaminergic signalling have been studied separately in research involving Parkinson’s disease (PD) patients, who exhibit diminished levels of dopamine. Here, we tie together some of the commonalities in the effects of dopamine on these aspects of cognition by having PD patients (ON and OFF dopaminergic medication) and healthy controls (HCs) perform two tasks that probe these processes. Within-patient behavioural measures of distractibility, from an attentional capture task, and learning performance, from a probabilistic classification reinforcement learning task, were included in one model to assess the role of distractibility during learning. Dopamine medication state and distractibility level were found to have an interactive effect on learning performance; less distractibility in PD ON was associated with higher accuracy during learning, and this was altered in PD OFF. Functional magnetic resonance imaging (fMRI) data acquired during the learning task furthermore allowed us to assess multivariate patterns of positive and negative outcomes in fronto-striatal and visual brain regions involved in both learning processes and the executive control of attention. Here, we demonstrate that while PD ON show a clearer distinction between outcomes than OFF in dorsolateral prefrontal cortex (DLPFC) and putamen, PD OFF show better distinction of activation patterns in visual regions that respond to the stimuli presented during the task. These results demonstrate that dopamine plays a key role in modulating the interaction between attention and learning at the level of both behaviour and activation patterns in the brain.

Download Full-text

Dopamine mediates the bidirectional update of interval timing

10.1101/2021.11.02.466803 ◽

2021 ◽

Author(s):

Anthony M.V. Jakob ◽

John G Mikhael ◽

Allison E Hamilos ◽

John A Assad ◽

Samuel J Gershman

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Interval Timing ◽

Substantia Nigra Pars Compacta ◽

Subjective Time ◽

Reward Prediction Error ◽

Learning Tasks ◽

Reward Prediction ◽

Reward Delivery ◽

Speed Up

The role of dopamine as a reward prediction error signal in reinforcement learning tasks has been well-established over the past decades. Recent work has shown that the reward prediction error interpretation can also account for the effects of dopamine on interval timing by controlling the speed of subjective time. According to this theory, the timing of the dopamine signal relative to reward delivery dictates whether subjective time speeds up or slows down: Early DA signals speed up subjective time and late signals slow it down. To test this bidirectional prediction, we reanalyzed measurements of dopaminergic neurons in the substantia nigra pars compacta of mice performing a self-timed movement task. Using the slope of ramping dopamine activity as a read-out of subjective time speed, we found that trial-by-trial changes in the slope could be predicted from the timing of dopamine activity on the previous trial. This result provides a key piece of evidence supporting a unified computational theory of reinforcement learning and interval timing.

Download Full-text

Neural basis of decision making guided by emotional outcomes

Journal of Neurophysiology ◽

10.1152/jn.00564.2014 ◽

2015 ◽

Vol 113 (9) ◽

pp. 3056-3068 ◽

Cited By ~ 13

Author(s):

Kentaro Katahira ◽

Yoshi-Taka Matsuda ◽

Tomomi Fujimura ◽

Kenichi Ueno ◽

Takeshi Asamizuya ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Prediction Error ◽

Brain Regions ◽

Parahippocampal Gyrus ◽

Prediction Errors ◽

Neural Basis ◽

Decision Outcomes ◽

Gain Loss ◽

Emotional Events

Emotional events resulting from a choice influence an individual's subsequent decision making. Although the relationship between emotion and decision making has been widely discussed, previous studies have mainly investigated decision outcomes that can easily be mapped to reward and punishment, including monetary gain/loss, gustatory stimuli, and pain. These studies regard emotion as a modulator of decision making that can be made rationally in the absence of emotions. In our daily lives, however, we often encounter various emotional events that affect decisions by themselves, and mapping the events to a reward or punishment is often not straightforward. In this study, we investigated the neural substrates of how such emotional decision outcomes affect subsequent decision making. By using functional magnetic resonance imaging (fMRI), we measured brain activities of humans during a stochastic decision-making task in which various emotional pictures were presented as decision outcomes. We found that pleasant pictures differentially activated the midbrain, fusiform gyrus, and parahippocampal gyrus, whereas unpleasant pictures differentially activated the ventral striatum, compared with neutral pictures. We assumed that the emotional decision outcomes affect the subsequent decision by updating the value of the options, a process modeled by reinforcement learning models, and that the brain regions representing the prediction error that drives the reinforcement learning are involved in guiding subsequent decisions. We found that some regions of the striatum and the insula were separately correlated with the prediction error for either pleasant pictures or unpleasant pictures, whereas the precuneus was correlated with prediction errors for both pleasant and unpleasant pictures.

Download Full-text

Valence biases factual and counterfactual learning in opposite directions

10.1101/090654 ◽

2016 ◽

Cited By ~ 1

Author(s):

Stefano Palminteri ◽

Germain Lefebvre ◽

Emma J. Kilford ◽

Sarah-Jayne Blakemore

Keyword(s):

Reinforcement Learning ◽

Computational Model ◽

Prediction Error ◽

Prediction Errors ◽

Learning Tasks ◽

Negative Prediction ◽

Account Information ◽

Factual Learning

AbstractPrevious studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two cohorts of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valance influences learning. Concerning factual learning, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice

Download Full-text

Faculty Opinions recommendation of States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.4125957.4076054 ◽

2010 ◽

Author(s):

Susan Courtney

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Model Based ◽

Model Free

Download Full-text

Pairwise maximum entropy model explains the role of white matter structure in shaping emergent co-activation states

Communications Biology ◽

10.1038/s42003-021-01700-6 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Arian Ashourvan ◽

Preya Shah ◽

Adam Pines ◽

Shi Gu ◽

Christopher W. Lynn ◽

...

Keyword(s):

White Matter ◽

Maximum Entropy ◽

Large Scale ◽

Structural Connectivity ◽

Quantitative Relationship ◽

Brain Regions ◽

Maximum Entropy Model ◽

Entropy Model ◽

Activation Patterns ◽

Co Activation

AbstractA major challenge in neuroscience is determining a quantitative relationship between the brain’s white matter structural connectivity and emergent activity. We seek to uncover the intrinsic relationship among brain regions fundamental to their functional activity by constructing a pairwise maximum entropy model (MEM) of the inter-ictal activation patterns of five patients with medically refractory epilepsy over an average of ~14 hours of band-passed intracranial EEG (iEEG) recordings per patient. We find that the pairwise MEM accurately predicts iEEG electrodes’ activation patterns’ probability and their pairwise correlations. We demonstrate that the estimated pairwise MEM’s interaction weights predict structural connectivity and its strength over several frequencies significantly beyond what is expected based solely on sampled regions’ distance in most patients. Together, the pairwise MEM offers a framework for explaining iEEG functional connectivity and provides insight into how the brain’s structural connectome gives rise to large-scale activation patterns by promoting co-activation between connected structures.

Download Full-text

Selective network discovery via deep reinforcement learning on embedded spaces

Applied Network Science ◽

10.1007/s41109-021-00365-8 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Peter Morales ◽

Rajmonda Sulo Caceres ◽

Tina Eliassi-Rad

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Sequential Decision ◽

Network Discovery ◽

Learning Tasks ◽

Partially Observed ◽

Decision Making Problem ◽

Resource Collection ◽

Improved Performance ◽

Discovery Algorithms

AbstractComplex networks are often either too large for full exploration, partially accessible, or partially observed. Downstream learning tasks on these incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks given resource collection constraints are of great interest. In this paper, we formulate the task-specific network discovery problem as a sequential decision-making problem. Our downstream task is selective harvesting, the optimal collection of vertices with a particular attribute. We propose a framework, called network actor critic (NAC), which learns a policy and notion of future reward in an offline setting via a deep reinforcement learning algorithm. The NAC paradigm utilizes a task-specific network embedding to reduce the state space complexity. A detailed comparative analysis of popular network embeddings is presented with respect to their role in supporting offline planning. Furthermore, a quantitative study is presented on various synthetic and real benchmarks using NAC and several baselines. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms. Finally, we outline learning regimes where planning is critical in addressing sparse and changing reward signals.

Download Full-text