scholarly journals Biological Reinforcement Learning via Predictive Spacetime Encoding

2020 ◽  
Author(s):  
Minsu Abel Yang ◽  
Jee Hang Lee ◽  
Sang Wan Lee

AbstractRecent advances in reinforcement learning (RL) have successfully addressed several challenges, such as performance, scalability, or sample efficiency associated with the use of this technology. Although RL algorithms bear relevance to psychology and neuroscience in a broader context, they lack biological plausibility. Motivated by recent neural findings demonstrating the capacity of the hippocampus and prefrontal cortex to gather space and time information from the environment, this study presents a novel RL model, called spacetime Q-Network (STQN), that exploits predictive spatiotemporal encoding to reliably learn highly uncertain environment. The proposed method consists of two primary components. The first component is the successor representation with theta phase precession implements hippocampal spacetime encoding, acting as a rollout prediction. The second component, called Q switch ensemble, implements prefrontal population coding for reliable reward prediction. We also implement a single learning rule to accommodate both hippocampal-prefrontal replay and synaptic homeostasis, which subserves confidence-based metacognitive learning. To demonstrate the capacity of our model, we design a task array simulating various levels of environmental uncertainty and complexity. Results show that our model significantly outperforms a few state-of-the-art RL models. In the subsequent ablation study, we showed unique contributions of each component to resolving task uncertainty and complexity. Our study has two important implications. First, it provides the theoretical groundwork for closely linking unique characteristics of the distinct brain regions in the context of RL. Second, our implementation is performed in a simple matrix form that accommodates expansion into biologically-plausible, highly-scalable, and generalizable neural architectures.

2016 ◽  
Author(s):  
David Tingley ◽  
Andrew A. Alexander ◽  
Laleh K. Quinn ◽  
Andrea A. Chiba ◽  
Douglas Nitz

AbstractComplex behaviors demand temporal coordination among functionally distinct brain regions. The basal forebrain’s afferent and efferent structure suggests a capacity for mediating such coordination. During performance of a selective attention task, synaptic activity in this region was dominated by four amplitude-independent oscillations temporally organized by the phase of the slowest, a theta rhythm. Further, oscillatory amplitudes were precisely organized by task epoch and a robust input/output transform, from synchronous synaptic activity to spiking rates of basal forebrain neurons, was identified. For many neurons, spiking was temporally organized as phase precessing sequences against theta band field potential oscillations. Remarkably, theta phase precession advanced in parallel to task progression, rather than absolute spatial location or time. Together, the findings reveal a process by which associative brain regions can integrate independent oscillatory inputs and transform them into sequence-specific, rate-coded outputs that are adaptive to the pace with which organisms interact with their environment.


2020 ◽  
Author(s):  
Dongjae Kim ◽  
Jaeseung Jeong ◽  
Sang Wan Lee

AbstractThe goal of learning is to maximize future rewards by minimizing prediction errors. Evidence have shown that the brain achieves this by combining model-based and model-free learning. However, the prediction error minimization is challenged by a bias-variance tradeoff, which imposes constraints on each strategy’s performance. We provide new theoretical insight into how this tradeoff can be resolved through the adaptive control of model-based and model-free learning. The theory predicts the baseline correction for prediction error reduces the lower bound of the bias–variance error by factoring out irreducible noise. Using a Markov decision task with context changes, we showed behavioral evidence of adaptive control. Model-based behavioral analyses show that the prediction error baseline signals context changes to improve adaptability. Critically, the neural results support this view, demonstrating multiplexed representations of prediction error baseline within the ventrolateral and ventromedial prefrontal cortex, key brain regions known to guide model-based and model-free learning.One sentence summaryA theoretical, behavioral, computational, and neural account of how the brain resolves the bias-variance tradeoff during reinforcement learning is described.


Sign in / Sign up

Export Citation Format

Share Document