scholarly journals Credit assignment in movement-dependent reinforcement learning

2016 ◽  
Vol 113 (24) ◽  
pp. 6797-6802 ◽  
Author(s):  
Samuel D. McDougle ◽  
Matthew J. Boggess ◽  
Matthew J. Crossley ◽  
Darius Parvin ◽  
Richard B. Ivry ◽  
...  

When a person fails to obtain an expected reward from an object in the environment, they face a credit assignment problem: Did the absence of reward reflect an extrinsic property of the environment or an intrinsic error in motor execution? To explore this problem, we modified a popular decision-making task used in studies of reinforcement learning, the two-armed bandit task. We compared a version in which choices were indicated by key presses, the standard response in such tasks, to a version in which the choices were indicated by reaching movements, which affords execution failures. In the key press condition, participants exhibited a strong risk aversion bias; strikingly, this bias reversed in the reaching condition. This result can be explained by a reinforcement model wherein movement errors influence decision-making, either by gating reward prediction errors or by modifying an implicit representation of motor competence. Two further experiments support the gating hypothesis. First, we used a condition in which we provided visual cues indicative of movement errors but informed the participants that trial outcomes were independent of their actual movements. The main result was replicated, indicating that the gating process is independent of participants’ explicit sense of control. Second, individuals with cerebellar degeneration failed to modulate their behavior between the key press and reach conditions, providing converging evidence of an implicit influence of movement error signals on reinforcement learning. These results provide a mechanistically tractable solution to the credit assignment problem.

2019 ◽  
Author(s):  
A. Wiehler ◽  
K. Chakroun ◽  
J. Peters

AbstractGambling disorder is a behavioral addiction associated with impairments in decision-making and reduced behavioral flexibility. Decision-making in volatile environments requires a flexible trade-off between exploitation of options with high expected values and exploration of novel options to adapt to changing reward contingencies. This classical problem is known as the exploration-exploitation dilemma. We hypothesized gambling disorder to be associated with a specific reduction in directed (uncertainty-based) exploration compared to healthy controls, accompanied by changes in brain activity in a fronto-parietal exploration-related network.Twenty-three frequent gamblers and nineteen matched controls performed a classical four-armed bandit task during functional magnetic resonance imaging. Computational modeling revealed that choice behavior in both groups contained signatures of directed exploration, random exploration and perseveration. Gamblers showed a specific reduction in directed exploration, while random exploration and perseveration were similar between groups.Neuroimaging revealed no evidence for group differences in neural representations of expected value and reward prediction errors. Likewise, our hypothesis of attenuated fronto-parietal exploration effects in gambling disorder was not supported. However, during directed exploration, gamblers showed reduced parietal and substantia nigra / ventral tegmental area activity. Cross-validated classification analyses revealed that connectivity in an exploration-related network was predictive of clinical status, suggesting alterations in network dynamics in gambling disorder.In sum, we show that reduced flexibility during reinforcement learning in volatile environments in gamblers is attributable to a reduction in directed exploration rather than an increase in perseveration. Neuroimaging findings suggest that patterns of network connectivity might be more diagnostic of gambling disorder than univariate value and prediction error effects. We provide a computational account of flexibility impairments in gamblers during reinforcement learning that might arise as a consequence of dopaminergic dysregulation in this disorder.


Author(s):  
Mohammed Wahba ◽  
Amer Shalaby

This paper presents an operational prototype of an innovative framework for the transit assignment problem, structured in a multiagent way and inspired by a learning-based approach. The proposed framework is based on representing passengers and their learning and decision-making activities explicitly. The underlying hypothesis is that individual passengers are expected to adjust their behavior (i.e., trip choices) according to their experience with transit system performance. A hypothetical transit network, which consists of 22 routes and 194 stops, has been developed within a microsimulation platform (Paramics). A population of 3,000 passengers was generated and synthesized to model the transit assignment process in the morning peak period. Using reinforcement learning to represent passengers’ adaptation and accounting for differences in passengers’ preferences and the dynamics of the transit network, the prototype has demonstrated that the proposed approach can simultaneously predict how passengers will choose their routes and estimate the total passenger travel cost in a congested network as well as loads on different transit routes.


2018 ◽  
Vol 38 (19) ◽  
pp. 4521-4530 ◽  
Author(s):  
Darius E. Parvin ◽  
Samuel D. McDougle ◽  
Jordan A. Taylor ◽  
Richard B. Ivry

2020 ◽  
Vol 1 (1) ◽  
Author(s):  
Graham Findlay ◽  
Giulio Tononi ◽  
Chiara Cirelli

Abstract The term hippocampal replay originally referred to the temporally compressed reinstantiation, during rest, of sequential neural activity observed during prior active wake. Since its description in the 1990s, hippocampal replay has often been viewed as the key mechanism by which a memory trace is repeatedly rehearsed at high speeds during sleep and gradually transferred to neocortical circuits. However, the methods used to measure the occurrence of replay remain debated, and it is now clear that the underlying neural events are considerably more complicated than the traditional narratives had suggested. “Replay-like” activity happens during wake, can play out in reverse order, may represent trajectories never taken by the animal, and may have additional functions beyond memory consolidation, from learning values and solving the problem of credit assignment to decision-making and planning. Still, we know little about the role of replay in cognition, and to what extent it differs between wake and sleep. This may soon change, however, because decades-long efforts to explain replay in terms of reinforcement learning (RL) have started to yield testable predictions and possible explanations for a diverse set of observations. Here, we (1) survey the diverse features of replay, focusing especially on the latest findings; (2) discuss recent attempts at unifying disparate experimental results and putatively different cognitive functions under the banner of RL; (3) discuss methodological issues and theoretical biases that impede progress or may warrant a partial revaluation of the current literature, and finally; (4) highlight areas of considerable uncertainty and promising avenues of inquiry.


2018 ◽  
Author(s):  
Samuel D. McDougle ◽  
Peter A. Butcher ◽  
Darius Parvin ◽  
Fasial Mushtaq ◽  
Yael Niv ◽  
...  

AbstractDecisions must be implemented through actions, and actions are prone to error. As such, when an expected outcome is not obtained, an individual should not only be sensitive to whether the choice itself was suboptimal, but also whether the action required to indicate that choice was executed successfully. The intelligent assignment of credit to action execution versus action selection has clear ecological utility for the learner. To explore this scenario, we used a modified version of a classic reinforcement learning task in which feedback indicated if negative prediction errors were, or were not, associated with execution errors. Using fMRI, we asked if prediction error computations in the human striatum, a key substrate in reinforcement learning and decision making, are modulated when a failure in action execution results in the negative outcome. Participants were more tolerant of non-rewarded outcomes when these resulted from execution errors versus when execution was successful but the reward was withheld. Consistent with this behavior, a model-driven analysis of neural activity revealed an attenuation of the signal associated with negative reward prediction error in the striatum following execution failures. These results converge with other lines of evidence suggesting that prediction errors in the mesostriatal dopamine system integrate high-level information during the evaluation of instantaneous reward outcomes.


2018 ◽  
Author(s):  
Joanne C. Van Slooten ◽  
Sara Jahfari ◽  
Tomas Knapen ◽  
Jan Theeuwes

AbstractPupil responses have been used to track cognitive processes during decision-making. Studies have shown that in these cases the pupil reflects the joint activation of many cortical and subcortical brain regions, also those traditionally implicated in value-based learning. However, how the pupil tracks value-based decisions and reinforcement learning is unknown. We combined a reinforcement learning task with a computational model to study pupil responses during value-based decisions, and decision evaluations. We found that the pupil closely tracks reinforcement learning both across trials and participants. Prior to choice, the pupil dilated as a function of trial-by-trial fluctuations in value beliefs. After feedback, early dilation scaled with value uncertainty, whereas later constriction scaled with reward prediction errors. Our computational approach systematically implicates the pupil in value-based decisions, and the subsequent processing of violated value beliefs, ttese dissociable influences provide an exciting possibility to non-invasively study ongoing reinforcement learning in the pupil.


2009 ◽  
Vol 21 (7) ◽  
pp. 1332-1345 ◽  
Author(s):  
Thorsten Kahnt ◽  
Soyoung Q Park ◽  
Michael X Cohen ◽  
Anne Beck ◽  
Andreas Heinz ◽  
...  

It has been suggested that the target areas of dopaminergic midbrain neurons, the dorsal (DS) and ventral striatum (VS), are differently involved in reinforcement learning especially as actor and critic. Whereas the critic learns to predict rewards, the actor maintains action values to guide future decisions. The different midbrain connections to the DS and the VS seem to play a critical role in this functional distinction. Here, subjects performed a dynamic, reward-based decision-making task during fMRI acquisition. A computational model of reinforcement learning was used to estimate the different effects of positive and negative reinforcements on future decisions for each subject individually. We found that activity in both the DS and the VS correlated with reward prediction errors. Using functional connectivity, we show that the DS and the VS are differentially connected to different midbrain regions (possibly corresponding to the substantia nigra [SN] and the ventral tegmental area [VTA], respectively). However, only functional connectivity between the DS and the putative SN predicted the impact of different reinforcement types on future behavior. These results suggest that connections between the putative SN and the DS are critical for modulating action values in the DS according to both positive and negative reinforcements to guide future decision making.


2015 ◽  
Vol 113 (9) ◽  
pp. 3056-3068 ◽  
Author(s):  
Kentaro Katahira ◽  
Yoshi-Taka Matsuda ◽  
Tomomi Fujimura ◽  
Kenichi Ueno ◽  
Takeshi Asamizuya ◽  
...  

Emotional events resulting from a choice influence an individual's subsequent decision making. Although the relationship between emotion and decision making has been widely discussed, previous studies have mainly investigated decision outcomes that can easily be mapped to reward and punishment, including monetary gain/loss, gustatory stimuli, and pain. These studies regard emotion as a modulator of decision making that can be made rationally in the absence of emotions. In our daily lives, however, we often encounter various emotional events that affect decisions by themselves, and mapping the events to a reward or punishment is often not straightforward. In this study, we investigated the neural substrates of how such emotional decision outcomes affect subsequent decision making. By using functional magnetic resonance imaging (fMRI), we measured brain activities of humans during a stochastic decision-making task in which various emotional pictures were presented as decision outcomes. We found that pleasant pictures differentially activated the midbrain, fusiform gyrus, and parahippocampal gyrus, whereas unpleasant pictures differentially activated the ventral striatum, compared with neutral pictures. We assumed that the emotional decision outcomes affect the subsequent decision by updating the value of the options, a process modeled by reinforcement learning models, and that the brain regions representing the prediction error that drives the reinforcement learning are involved in guiding subsequent decisions. We found that some regions of the striatum and the insula were separately correlated with the prediction error for either pleasant pictures or unpleasant pictures, whereas the precuneus was correlated with prediction errors for both pleasant and unpleasant pictures.


Sign in / Sign up

Export Citation Format

Share Document