The structure of reinforcement-learning mechanisms in the human brain

2015 ◽  
Vol 1 ◽  
pp. 94-100 ◽  
Author(s):  
John P O’Doherty ◽  
Sang Wan Lee ◽  
Daniel McNamee
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
A. Gorin ◽  
V. Klucharev ◽  
A. Ossadtchi ◽  
I. Zubarev ◽  
V. Moiseeva ◽  
...  

AbstractPeople often change their beliefs by succumbing to an opinion of others. Such changes are often referred to as effects of social influence. While some previous studies have focused on the reinforcement learning mechanisms of social influence or on its internalization, others have reported evidence of changes in sensory processing evoked by social influence of peer groups. In this study, we used magnetoencephalographic (MEG) source imaging to further investigate the long-term effects of agreement and disagreement with the peer group. The study was composed of two sessions. During the first session, participants rated the trustworthiness of faces and subsequently learned group rating of each face. In the first session, a neural marker of an immediate mismatch between individual and group opinions was found in the posterior cingulate cortex, an area involved in conflict-monitoring and reinforcement learning. To identify the neural correlates of the long-lasting effect of the group opinion, we analysed MEG activity while participants rated faces during the second session. We found MEG traces of past disagreement or agreement with the peers at the parietal cortices 230 ms after the face onset. The neural activity of the superior parietal lobule, intraparietal sulcus, and precuneus was significantly stronger when the participant’s rating had previously differed from the ratings of the peers. The early MEG correlates of disagreement with the majority were followed by activity in the orbitofrontal cortex 320 ms after the face onset. Altogether, the results reveal the temporal dynamics of the neural mechanism of long-term effects of disagreement with the peer group: early signatures of modified face processing were followed by later markers of long-term social influence on the valuation process at the ventromedial prefrontal cortex.


eLife ◽  
2017 ◽  
Vol 6 ◽  
Author(s):  
Sven Collette ◽  
Wolfgang M Pauli ◽  
Peter Bossaerts ◽  
John O'Doherty

In inverse reinforcement learning an observer infers the reward distribution available for actions in the environment solely through observing the actions implemented by another agent. To address whether this computational process is implemented in the human brain, participants underwent fMRI while learning about slot machines yielding hidden preferred and non-preferred food outcomes with varying probabilities, through observing the repeated slot choices of agents with similar and dissimilar food preferences. Using formal model comparison, we found that participants implemented inverse RL as opposed to a simple imitation strategy, in which the actions of the other agent are copied instead of inferring the underlying reward structure of the decision problem. Our computational fMRI analysis revealed that anterior dorsomedial prefrontal cortex encoded inferences about action-values within the value space of the agent as opposed to that of the observer, demonstrating that inverse RL is an abstract cognitive process divorceable from the values and concerns of the observer him/herself.


2021 ◽  
Author(s):  
Daniel Hasegan ◽  
Matt Deible ◽  
Christopher Earl ◽  
David D'Onofrio ◽  
Hananel Hazan ◽  
...  

Biological learning operates at multiple interlocking timescales, from long evolutionary stretches down to the relatively short time span of an individual's life. While each process has been simulated individually as a basic learning algorithm in the context of spiking neuronal networks (SNNs), the integration of the two has remained limited. In this study, we first train SNNs separately using individual model learning using spike-timing dependent reinforcement learning (STDP-RL) and evolutionary (EVOL) learning algorithms to solve the CartPole reinforcement learning (RL) control problem. We then develop an interleaved algorithm inspired by biological evolution that combines the EVOL and STDP-RL learning in sequence. We use the NEURON simulator with NetPyNE to create an SNN interfaced with the CartPole environment from OpenAI's Gym. In CartPole, the goal is to balance a vertical pole by moving left/right on a 1-D plane. Our SNN contains multiple populations of neurons organized in three layers: sensory layer, association/hidden layer, and motor layer, where neurons are connected by excitatory (AMPA/NMDA) and inhibitory (GABA) synapses. Association and motor layers contain one excitatory (E) population and two inhibitory (I) populations with different synaptic time constants. Each neuron is an event-based integrate-and-fire model with plastic connections between excitatory neurons. In our SNN, the environment activates sensory neurons tuned to specific features of the game state. We split the motor population into subsets representing each movement choice. The subset with more spiking over an interval determines the action. During STDP-RL, we supply intermediary evaluations (reward/punishment) of each action by judging the effectiveness of a move (e.g., moving the CartPole to a balanced position). During EVOL, updates consist of adding together many random perturbations of the connection weights. Each set of random perturbations is weighted by the total episodic reward it achieves when applied independently. We evaluate the performance of each algorithm after training and through the creation of sensory/motor action maps that delineate the network's transformation of sensory inputs into higher-order representations and eventual motor decisions. Both EVOL and STDP-RL training produce SNNs capable of moving the cart left and right and keeping the pole vertical. Compared to the STDP-RL and EVOL algorithms operating on their own, our interleaved training paradigm produced enhanced robustness in performance, with different strategies revealed through analysis of the sensory/motor mappings. Analysis of synaptic weight matrices also shows distributed vs clustered representations after the EVOL and STDP-RL algorithms, respectively. These weight differences also manifest as diffuse vs synchronized firing patterns. Our modeling opens up new capabilities for SNNs in RL and could serve as a testbed for neurobiologists aiming to understand multi-timescale learning mechanisms and dynamics in neuronal circuits.


2014 ◽  
Vol 45 (6) ◽  
pp. 466-478 ◽  
Author(s):  
Robert Schnuerch ◽  
Henning Gibbons

In groups, individuals often adjust their behavior to the majority’s. Here, we provide a brief introduction into the research on social conformity and review the first, very recent investigations elucidating the underlying neurocognitive mechanisms. Multiple studies suggest that conformity is a behavioral adjustment based on reinforcement-learning mechanisms in posterior medial frontal cortex and ventral striatum. It has also been suggested that the detection of cognitive inconsistency and the modulation of basic encoding processes are involved. Together, recent findings provide valuable insight into the neural and cognitive mechanisms underlying social conformity and clearly point up the need for further studies in this field.


2017 ◽  
Vol 41 (S1) ◽  
pp. S11-S11
Author(s):  
M. Sebold ◽  
S. Nebe ◽  
M. Garbusow ◽  
D. Schad ◽  
C. Sommer ◽  
...  

The mesolimbic dopaminergic system has been implicated in two kinds of reward processing, one in reinforcement learning (e.g prediction error) and another in incentive salience attribution (e.g. cue-reactivity). Both functions have been implicated in alcohol dependence with the former contributing to the persistence of chronic alcohol intake despite severe negative consequences and the latter playing a crucial role in cue-induced craving and relapse. The bicentric study “Learning in alcohol dependence (LeAD)” aims to bridge a gap between these processes by investigating reinforcement learning mechanisms and the influence that Pavlovian cues exert over behavior. We here demonstrate that alcohol dependent subjects show alterations in goal-directed, model-based reinforcement learning (Sebold et al., 2014) and demonstrate that prospective relapsing patients show reductions in the medial prefrontal cortex activation during goal-directed control. Moreover we show that in alcohol dependent patients compared to healthy controls, Pavlovian cues exert pronounced control over behavior (Garbusow et al., 2016). Again, prospective relapsing patients showed increased Nucleus accumbens activation during these cue-induced responses. These findings point to an important role of the mesolimbic dopaminergic system as a predictor of treatment outcome in alcohol dependence.Disclosure of interestThe authors have not supplied their declaration of competing interest.


2016 ◽  
Vol 28 (2) ◽  
pp. 333-349 ◽  
Author(s):  
Matthew Balcarras ◽  
Salva Ardid ◽  
Daniel Kaping ◽  
Stefan Everling ◽  
Thilo Womelsdorf

Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.


Sign in / Sign up

Export Citation Format

Share Document