Integration of Reinforcement Learning and Optimal Decision-Making Theories of the Basal Ganglia

2011 ◽  
Vol 23 (4) ◽  
pp. 817-851 ◽  
Author(s):  
Rafal Bogacz ◽  
Tobias Larsen

This article seeks to integrate two sets of theories describing action selection in the basal ganglia: reinforcement learning theories describing learning which actions to select to maximize reward and decision-making theories proposing that the basal ganglia selects actions on the basis of sensory evidence accumulated in the cortex. In particular, we present a model that integrates the actor-critic model of reinforcement learning and a model assuming that the cortico-basal-ganglia circuit implements a statistically optimal decision-making procedure. The values of corico-striatal weights required for optimal decision making in our model differ from those provided by standard reinforcement learning models. Nevertheless, we show that an actor-critic model converges to the weights required for optimal decision making when biologically realistic limits on synaptic weights are introduced. We also describe the model's predictions concerning reaction times and neural responses during learning, and we discuss directions required for further integration of reinforcement learning and optimal decision-making theories.

2010 ◽  
Vol 22 (5) ◽  
pp. 1113-1148 ◽  
Author(s):  
Jiaxiang Zhang ◽  
Rafal Bogacz

Experimental data indicate that perceptual decision making involves integration of sensory evidence in certain cortical areas. Theoretical studies have proposed that the computation in neural decision circuits approximates statistically optimal decision procedures (e.g., sequential probability ratio test) that maximize the reward rate in sequential choice tasks. However, these previous studies assumed that the sensory evidence was represented by continuous values from gaussian distributions with the same variance across alternatives. In this article, we make a more realistic assumption that sensory evidence is represented in spike trains described by the Poisson processes, which naturally satisfy the mean-variance relationship observed in sensory neurons. We show that for such a representation, the neural circuits involving cortical integrators and basal ganglia can approximate the optimal decision procedures for two and multiple alternative choice tasks.


2007 ◽  
Vol 19 (2) ◽  
pp. 442-477 ◽  
Author(s):  
Rafal Bogacz ◽  
Kevin Gurney

Neurophysiological studies have identified a number of brain regions critically involved in solving the problem of action selection or decision making. In the case of highly practiced tasks, these regions include cortical areas hypothesized to integrate evidence supporting alternative actions and the basal ganglia, hypothesized to act as a central switch in gating behavioral requests. However, despite our relatively detailed knowledge of basal ganglia biology and its connectivity with the cortex and numerical simulation studies demonstrating selective function, no formal theoretical framework exists that supplies an algorithmic description of these circuits. This article shows how many aspects of the anatomy and physiology of the circuit involving the cortex and basal ganglia are exactly those required to implement the computation defined by an asymptotically optimal statistical test for decision making: the multihypothesis sequential probability ratio test (MSPRT). The resulting model of basal ganglia provides a new framework for understanding the computation in the basal ganglia during decision making in highly practiced tasks. The predictions of the theory concerning the properties of particular neuronal populations are validated in existing experimental data. Further, we show that this neurobiologically grounded implementation of MSPRT outperforms other candidates for neural decision making, that it is structurally and parametrically robust, and that it can accommodate cortical mechanisms for decision making in a way that complements those in basal ganglia.


2017 ◽  
Author(s):  
Julie J. Lee ◽  
Mehdi Keramati

AbstractDecision-making in the real world presents the challenge of requiring flexible yet prompt behavior, a balance that has been characterized in terms of a trade-off between a slower, prospective goal-directed model-based (MB) strategy and a fast, retrospective habitual model-free (MF) strategy. Theory predicts that flexibility to changes in both reward values and transition contingencies can determine the relative influence of the two systems in reinforcement learning, but few studies have manipulated the latter. Therefore, we developed a novel two-level contingency change task in which transition contingencies between states change every few trials; MB and MF control predict different responses following these contingency changes, allowing their relative influence to be inferred. Additionally, we manipulated the rate of contingency changes in order to determine whether contingency change volatility would play a role in shifting subjects between a MB and MF strategy. We found that human subjects employed a hybrid MB/MF strategy on the task, corroborating the parallel contribution of MB and MF systems in reinforcement learning. Further, subjects did not remain at one level of MB/MF behavior but rather displayed a shift towards more MB behavior over the first two blocks that was not attributable to the rate of contingency changes but rather to the extent of training. We demonstrate that flexibility to contingency changes can distinguish MB and MF strategies, with human subjects utilizing a hybrid strategy that shifts towards more MB behavior over blocks, consequently corresponding to a higher payoff.Author SummaryTo make good decisions, we must learn to associate actions with their true outcomes. Flexibility to changes in action/outcome relationships, therefore, is essential for optimal decision-making. For example, actions can lead to outcomes that change in value – one day, your favorite food is poorly made and thus less pleasant. Alternatively, changes can occur in terms of contingencies – ordering a dish of one kind and instead receiving another. How we respond to such changes is indicative of our decision-making strategy; habitual learners will continue to choose their favorite food even if the quality has gone down, whereas goal-directed learners will soon learn it is better to choose another dish. A popular paradigm probes the effect of value changes on decision making, but the effect of contingency changes is still unexplored. Therefore, we developed a novel task to study the latter. We find that humans used a mixed habitual/goal-directed strategy in which they became more goal-directed over the course of the task, and also earned more rewards with increasing goal-directed behavior. This shows that flexibility to contingency changes is adaptive for learning from rewards, and indicates that flexibility to contingency changes can reveal which decision-making strategy is used.


2012 ◽  
Vol 24 (11) ◽  
pp. 2924-2945 ◽  
Author(s):  
Nathan F. Lepora ◽  
Kevin N. Gurney

The basal ganglia are a subcortical group of interconnected nuclei involved in mediating action selection within cortex. A recent proposal is that this selection leads to optimal decision making over multiple alternatives because the basal ganglia anatomy maps onto a network implementation of an optimal statistical method for hypothesis testing, assuming that cortical activity encodes evidence for constrained gaussian-distributed alternatives. This letter demonstrates that this model of the basal ganglia extends naturally to encompass general Bayesian sequential analysis over arbitrary probability distributions, which raises the proposal to a practically realizable theory over generic perceptual hypotheses. We also show that the evidence in this model can represent either log likelihoods, log-likelihood ratios, or log odds, all leading proposals for the cortical processing of sensory data. For these reasons, we claim that the basal ganglia optimize decision making over general perceptual hypotheses represented in cortex. The relation of this theory to cortical encoding, cortico-basal ganglia anatomy, and reinforcement learning is discussed.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Batel Yifrah ◽  
Ayelet Ramaty ◽  
Genela Morris ◽  
Avi Mendelsohn

AbstractDecision making can be shaped both by trial-and-error experiences and by memory of unique contextual information. Moreover, these types of information can be acquired either by means of active experience or by observing others behave in similar situations. The interactions between reinforcement learning parameters that inform decision updating and memory formation of declarative information in experienced and observational learning settings are, however, unknown. In the current study, participants took part in a probabilistic decision-making task involving situations that either yielded similar outcomes to those of an observed player or opposed them. By fitting alternative reinforcement learning models to each subject, we discerned participants who learned similarly from experience and observation from those who assigned different weights to learning signals from these two sources. Participants who assigned different weights to their own experience versus those of others displayed enhanced memory performance as well as subjective memory strength for episodes involving significant reward prospects. Conversely, memory performance of participants who did not prioritize their own experience over others did not seem to be influenced by reinforcement learning parameters. These findings demonstrate that interactions between implicit and explicit learning systems depend on the means by which individuals weigh relevant information conveyed via experience and observation.


Stat ◽  
2021 ◽  
Author(s):  
Hengrui Cai ◽  
Rui Song ◽  
Wenbin Lu

Sign in / Sign up

Export Citation Format

Share Document