Integration of Reinforcement Learning and Optimal Decision-Making Theories of the Basal Ganglia

This article seeks to integrate two sets of theories describing action selection in the basal ganglia: reinforcement learning theories describing learning which actions to select to maximize reward and decision-making theories proposing that the basal ganglia selects actions on the basis of sensory evidence accumulated in the cortex. In particular, we present a model that integrates the actor-critic model of reinforcement learning and a model assuming that the cortico-basal-ganglia circuit implements a statistically optimal decision-making procedure. The values of corico-striatal weights required for optimal decision making in our model differ from those provided by standard reinforcement learning models. Nevertheless, we show that an actor-critic model converges to the weights required for optimal decision making when biologically realistic limits on synaptic weights are introduced. We also describe the model's predictions concerning reaction times and neural responses during learning, and we discuss directions required for further integration of reinforcement learning and optimal decision-making theories.

Download Full-text

Reinforcement Learning and Optimal Decision Making. Review of Income and Choice in Biological Control Systems: A Framework for Understanding the Function and Dysfunction of the Brain, by Gershom-Zvi Rosenstein

Journal of Mathematical Psychology ◽

10.1006/jmps.1994.1027 ◽

1994 ◽

Vol 38 (3) ◽

pp. 384-391

Author(s):

Leemon C. Baird ◽

A.Harry Klopf

Keyword(s):

Decision Making ◽

Biological Control ◽

Reinforcement Learning ◽

Control Systems ◽

Optimal Decision ◽

Optimal Decision Making ◽

The Brain

Download Full-text

Optimal Decision Making on the Basis of Evidence Represented in Spike Trains

Neural Computation ◽

10.1162/neco.2009.05-09-1025 ◽

2010 ◽

Vol 22 (5) ◽

pp. 1113-1148 ◽

Cited By ~ 10

Author(s):

Jiaxiang Zhang ◽

Rafal Bogacz

Keyword(s):

Decision Making ◽

Spike Trains ◽

Decision Procedures ◽

Optimal Decision ◽

Ratio Test ◽

Perceptual Decision Making ◽

Sensory Evidence ◽

Optimal Decision Making ◽

Alternative Choice ◽

Choice Tasks

Experimental data indicate that perceptual decision making involves integration of sensory evidence in certain cortical areas. Theoretical studies have proposed that the computation in neural decision circuits approximates statistically optimal decision procedures (e.g., sequential probability ratio test) that maximize the reward rate in sequential choice tasks. However, these previous studies assumed that the sensory evidence was represented by continuous values from gaussian distributions with the same variance across alternatives. In this article, we make a more realistic assumption that sensory evidence is represented in spike trains described by the Poisson processes, which naturally satisfy the mean-variance relationship observed in sensory neurons. We show that for such a representation, the neural circuits involving cortical integrators and basal ganglia can approximate the optimal decision procedures for two and multiple alternative choice tasks.

Download Full-text

The Basal Ganglia and Cortex Implement Optimal Decision Making Between Alternative Actions

Neural Computation ◽

10.1162/neco.2007.19.2.442 ◽

2007 ◽

Vol 19 (2) ◽

pp. 442-477 ◽

Cited By ~ 216

Author(s):

Rafal Bogacz ◽

Kevin Gurney

Keyword(s):

Decision Making ◽

Basal Ganglia ◽

Brain Regions ◽

Optimal Decision ◽

Detailed Knowledge ◽

Ratio Test ◽

Anatomy And Physiology ◽

Neurophysiological Studies ◽

Sequential Probability ◽

Optimal Decision Making

Neurophysiological studies have identified a number of brain regions critically involved in solving the problem of action selection or decision making. In the case of highly practiced tasks, these regions include cortical areas hypothesized to integrate evidence supporting alternative actions and the basal ganglia, hypothesized to act as a central switch in gating behavioral requests. However, despite our relatively detailed knowledge of basal ganglia biology and its connectivity with the cortex and numerical simulation studies demonstrating selective function, no formal theoretical framework exists that supplies an algorithmic description of these circuits. This article shows how many aspects of the anatomy and physiology of the circuit involving the cortex and basal ganglia are exactly those required to implement the computation defined by an asymptotically optimal statistical test for decision making: the multihypothesis sequential probability ratio test (MSPRT). The resulting model of basal ganglia provides a new framework for understanding the computation in the basal ganglia during decision making in highly practiced tasks. The predictions of the theory concerning the properties of particular neuronal populations are validated in existing experimental data. Further, we show that this neurobiologically grounded implementation of MSPRT outperforms other candidates for neural decision making, that it is structurally and parametrically robust, and that it can accommodate cortical mechanisms for decision making in a way that complements those in basal ganglia.

Download Full-text

Flexibility to contingency changes distinguishes habitual and goal-directed strategies in humans

10.1101/107078 ◽

2017 ◽

Author(s):

Julie J. Lee ◽

Mehdi Keramati

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Human Subjects ◽

Relative Influence ◽

Optimal Decision ◽

Hybrid Strategy ◽

Model Free ◽

Optimal Decision Making ◽

Value Changes ◽

Change Task

AbstractDecision-making in the real world presents the challenge of requiring flexible yet prompt behavior, a balance that has been characterized in terms of a trade-off between a slower, prospective goal-directed model-based (MB) strategy and a fast, retrospective habitual model-free (MF) strategy. Theory predicts that flexibility to changes in both reward values and transition contingencies can determine the relative influence of the two systems in reinforcement learning, but few studies have manipulated the latter. Therefore, we developed a novel two-level contingency change task in which transition contingencies between states change every few trials; MB and MF control predict different responses following these contingency changes, allowing their relative influence to be inferred. Additionally, we manipulated the rate of contingency changes in order to determine whether contingency change volatility would play a role in shifting subjects between a MB and MF strategy. We found that human subjects employed a hybrid MB/MF strategy on the task, corroborating the parallel contribution of MB and MF systems in reinforcement learning. Further, subjects did not remain at one level of MB/MF behavior but rather displayed a shift towards more MB behavior over the first two blocks that was not attributable to the rate of contingency changes but rather to the extent of training. We demonstrate that flexibility to contingency changes can distinguish MB and MF strategies, with human subjects utilizing a hybrid strategy that shifts towards more MB behavior over blocks, consequently corresponding to a higher payoff.Author SummaryTo make good decisions, we must learn to associate actions with their true outcomes. Flexibility to changes in action/outcome relationships, therefore, is essential for optimal decision-making. For example, actions can lead to outcomes that change in value – one day, your favorite food is poorly made and thus less pleasant. Alternatively, changes can occur in terms of contingencies – ordering a dish of one kind and instead receiving another. How we respond to such changes is indicative of our decision-making strategy; habitual learners will continue to choose their favorite food even if the quality has gone down, whereas goal-directed learners will soon learn it is better to choose another dish. A popular paradigm probes the effect of value changes on decision making, but the effect of contingency changes is still unexplored. Therefore, we developed a novel task to study the latter. We find that humans used a mixed habitual/goal-directed strategy in which they became more goal-directed over the course of the task, and also earned more rewards with increasing goal-directed behavior. This shows that flexibility to contingency changes is adaptive for learning from rewards, and indicates that flexibility to contingency changes can reveal which decision-making strategy is used.

Download Full-text

Optimal Decision Making in the Cortico-Basal-Ganglia Circuit

An Introduction to Model-Based Cognitive Neuroscience ◽

10.1007/978-1-4939-2236-9_14 ◽

2015 ◽

pp. 291-302

Author(s):

Rafal Bogacz

Keyword(s):

Decision Making ◽

Basal Ganglia ◽

Optimal Decision ◽

Optimal Decision Making

Download Full-text

The Basal Ganglia Optimize Decision Making over General Perceptual Hypotheses

Neural Computation ◽

10.1162/neco_a_00360 ◽

2012 ◽

Vol 24 (11) ◽

pp. 2924-2945 ◽

Cited By ~ 25

Author(s):

Nathan F. Lepora ◽

Kevin N. Gurney

Keyword(s):

Decision Making ◽

Basal Ganglia ◽

Sequential Analysis ◽

Probability Distributions ◽

Optimal Decision ◽

Likelihood Ratios ◽

Recent Proposal ◽

Sensory Data ◽

Log Odds ◽

Optimal Decision Making

The basal ganglia are a subcortical group of interconnected nuclei involved in mediating action selection within cortex. A recent proposal is that this selection leads to optimal decision making over multiple alternatives because the basal ganglia anatomy maps onto a network implementation of an optimal statistical method for hypothesis testing, assuming that cortical activity encodes evidence for constrained gaussian-distributed alternatives. This letter demonstrates that this model of the basal ganglia extends naturally to encompass general Bayesian sequential analysis over arbitrary probability distributions, which raises the proposal to a practically realizable theory over generic perceptual hypotheses. We also show that the evidence in this model can represent either log likelihoods, log-likelihood ratios, or log odds, all leading proposals for the cortical processing of sensory data. For these reasons, we claim that the basal ganglia optimize decision making over general perceptual hypotheses represented in cortex. The relation of this theory to cortical encoding, cortico-basal ganglia anatomy, and reinforcement learning is discussed.

Download Full-text

OPTIMAL DECISION MAKING WHEN PLANNING MOBILE COMMUNICATION NETWORKS TAKING INTO ACCOUNT A SET OF QUALITY FACTORS

Telecommunications and Radio Engineering ◽

10.1615/telecomradeng.v74.i18.40 ◽

2015 ◽

Vol 74 (18) ◽

pp. 1635-1649

Author(s):

V. M. Bezruk ◽

D. V. Chebotaryova

Keyword(s):

Decision Making ◽

Communication Networks ◽

Mobile Communication ◽

Optimal Decision ◽

Quality Factors ◽

Mobile Communication Networks ◽

Optimal Decision Making

Download Full-text

Individual differences in experienced and observational decision-making illuminate interactions between reinforcement learning and declarative memory

Scientific Reports ◽

10.1038/s41598-021-85322-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Batel Yifrah ◽

Ayelet Ramaty ◽

Genela Morris ◽

Avi Mendelsohn

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Declarative Memory ◽

Contextual Information ◽

Memory Performance ◽

Relevant Information ◽

Subjective Memory ◽

Types Of Information ◽

Reinforcement Learning Models ◽

Implicit And Explicit

AbstractDecision making can be shaped both by trial-and-error experiences and by memory of unique contextual information. Moreover, these types of information can be acquired either by means of active experience or by observing others behave in similar situations. The interactions between reinforcement learning parameters that inform decision updating and memory formation of declarative information in experienced and observational learning settings are, however, unknown. In the current study, participants took part in a probabilistic decision-making task involving situations that either yielded similar outcomes to those of an observed player or opposed them. By fitting alternative reinforcement learning models to each subject, we discerned participants who learned similarly from experience and observation from those who assigned different weights to learning signals from these two sources. Participants who assigned different weights to their own experience versus those of others displayed enhanced memory performance as well as subjective memory strength for episodes involving significant reward prospects. Conversely, memory performance of participants who did not prioritize their own experience over others did not seem to be influenced by reinforcement learning parameters. These findings demonstrate that interactions between implicit and explicit learning systems depend on the means by which individuals weigh relevant information conveyed via experience and observation.

Download Full-text