scholarly journals Believer-Skeptic meets Actor-Critic: Rethinking the role basal ganglia pathways in decision-making and reinforcement learning

2016 ◽  
Author(s):  
Kyle Dunovan ◽  
Timothy Verstynen

AbstractThe flexibility of behavioral control is a testament to the brain’s capacity for dynamically resolving uncertainty during goal-directed actions. This ability to select actions and learn from immediate feedback is driven by the dynamics of basal ganglia (BG) pathways. A growing body of empirical evidence conflicts with the traditional view that these pathways act as independent levers for facilitating (i.e., direct pathway) or suppressing (i.e., indirect pathway) motor output, suggesting instead that they engage in a dynamic competition during action decisions that computationally captures action uncertainty. Here we discuss the utility of encoding action uncertainty as a dynamic competition between opposing control pathways and provide evidence that this simple mechanism may have powerful implications for bridging neurocomputational theories of decision making and reinforcement learning.

2018 ◽  
Author(s):  
Kyle Dunovan ◽  
Catalina Vich ◽  
Matthew Clapp ◽  
Timothy Verstynen ◽  
Jonathan Rubin

AbstractCortico-basal-ganglia-thalamic (CBGT) networks are critical for adaptive decision-making, yet how changes to circuit-level properties impact cognitive algorithms remains unclear. Here we explore how dopaminergic plasticity at corticostriatal synapses alters competition between striatal pathways, impacting the evidence accumulation process during decision-making. Spike-timing dependent plasticity simulations showed that dopaminergic feedback based on rewards modified the ratio of direct and indirect corticostriatal weights within opposing action channels. Using the learned weight ratios in a full spiking CBGT network model, we simulated neural dynamics and decision outcomes in a reward-driven decision task and fit them with a drift diffusion model. Fits revealed that the rate of evidence accumulation varied with inter-channel differences in direct pathway activity while boundary height varied with overall indirect pathway activity. This multi-level modeling approach demonstrates how complementary learning and decision computations can emerge from corticostriatal plasticity.Author summaryCognitive process models such as reinforcement learning (RL) and the drift diffusion model (DDM) have helped to elucidate the basic algorithms underlying error-corrective learning and the evaluation of accumulating decision evidence leading up to a choice. While these relatively abstract models help to guide experimental and theoretical probes into associated phenomena, they remain uninformative about the actual physical mechanics by which learning and decision algorithms are carried out in a neurobiological substrate during adaptive choice behavior. Here we present an “upwards mapping” approach to bridging neural and cognitive models of value-based decision-making, showing how dopaminergic feedback alters the network-level dynamics of cortico-basal-ganglia-thalamic (CBGT) pathways during learning to bias behavioral choice towards more rewarding actions. By mapping “up” the levels of analysis, this approach yields specific predictions about aspects of neuronal activity that map to the quantities appearing in the cognitive decision-making framework.


2015 ◽  
Author(s):  
Thiago S. Gouvêa ◽  
Tiago Monteiro ◽  
Asma Motiwala ◽  
Sofia Soares ◽  
Christian K. Machens ◽  
...  

The striatum is an input structure of the basal ganglia implicated in several time-dependent functions including reinforcement learning, decision making, and interval timing. To determine whether striatal ensembles drive subjects' judgments of duration, we manipulated and recorded from striatal neurons in rats performing a duration categorization psychophysical task. We found that the dynamics of striatal neurons predicted duration judgments, and that simultaneously recorded ensembles could judge duration as well as the animal. Furthermore, striatal neurons were necessary for duration judgments, as muscimol infusions produced a specific impairment in animals' duration sensitivity. Lastly, we show that time as encoded by striatal populations ran faster or slower when rats judged a duration as longer or shorter, respectively. These results demonstrate that the speed with which striatal population state changes supports the fundamental ability of animals to judge the passage of time.


eLife ◽  
2015 ◽  
Vol 4 ◽  
Author(s):  
Thiago S Gouvêa ◽  
Tiago Monteiro ◽  
Asma Motiwala ◽  
Sofia Soares ◽  
Christian Machens ◽  
...  

The striatum is an input structure of the basal ganglia implicated in several time-dependent functions including reinforcement learning, decision making, and interval timing. To determine whether striatal ensembles drive subjects' judgments of duration, we manipulated and recorded from striatal neurons in rats performing a duration categorization psychophysical task. We found that the dynamics of striatal neurons predicted duration judgments, and that simultaneously recorded ensembles could judge duration as well as the animal. Furthermore, striatal neurons were necessary for duration judgments, as muscimol infusions produced a specific impairment in animals' duration sensitivity. Lastly, we show that time as encoded by striatal populations ran faster or slower when rats judged a duration as longer or shorter, respectively. These results demonstrate that the speed with which striatal population state changes supports the fundamental ability of animals to judge the passage of time.


2015 ◽  
Vol 112 (45) ◽  
pp. 13817-13822 ◽  
Author(s):  
Fiery Cushman ◽  
Adam Morris

Humans choose actions based on both habit and planning. Habitual control is computationally frugal but adapts slowly to novel circumstances, whereas planning is computationally expensive but can adapt swiftly. Current research emphasizes the competition between habits and plans for behavioral control, yet many complex tasks instead favor their integration. We consider a hierarchical architecture that exploits the computational efficiency of habitual control to select goals while preserving the flexibility of planning to achieve those goals. We formalize this mechanism in a reinforcement learning setting, illustrate its costs and benefits, and experimentally demonstrate its spontaneous application in a sequential decision-making task.


2021 ◽  
Author(s):  
Catalina Vich ◽  
Matthew Clapp ◽  
Timothy Verstynen ◽  
Jonathan Rubin

During action selection, mammals exhibit a high degree of flexibility in adapting their decisions in response to environmental changes. Although the cortico-basal ganglia-thalamic (CBGT) network is implicated in this adaptation, it features a synaptic architecture comprising multiple feed-forward, reciprocal, and feedback pathways, complicating efforts to elucidate the roles of specific CBGT populations in the process of evidence accumulation during decision-making. In this paper we apply a strategic sampling approach, based on Latin hypercube sampling, to explore how CBGT network properties, including subpopulation firing rates and synaptic weights, map to parameters of a normative drift diffusion model (DDM) representing algorithmic aspects of information accumulation during decision-making. Through the application of canonical correlation analysis, we find that this relationship can be characterized in terms of three low-dimensional control ensembles impacting specific qualities of the emergent decision policy: responsiveness (associated with overall activity in corticothalamic and direct pathways), pliancy (associated largely with overall activity in components of the indirect pathway of the basal ganglia), and choice (associated with differences in direct and indirect pathways across action channels). These analyses provide key mechanistic predictions about the roles of specific CBGT network elements in shifting different aspects of decision policies.


2011 ◽  
Vol 23 (4) ◽  
pp. 817-851 ◽  
Author(s):  
Rafal Bogacz ◽  
Tobias Larsen

This article seeks to integrate two sets of theories describing action selection in the basal ganglia: reinforcement learning theories describing learning which actions to select to maximize reward and decision-making theories proposing that the basal ganglia selects actions on the basis of sensory evidence accumulated in the cortex. In particular, we present a model that integrates the actor-critic model of reinforcement learning and a model assuming that the cortico-basal-ganglia circuit implements a statistically optimal decision-making procedure. The values of corico-striatal weights required for optimal decision making in our model differ from those provided by standard reinforcement learning models. Nevertheless, we show that an actor-critic model converges to the weights required for optimal decision making when biologically realistic limits on synaptic weights are introduced. We also describe the model's predictions concerning reaction times and neural responses during learning, and we discuss directions required for further integration of reinforcement learning and optimal decision-making theories.


Sign in / Sign up

Export Citation Format

Share Document