Biological Reinforcement Learning via Predictive Spacetime Encoding

AbstractRecent advances in reinforcement learning (RL) have successfully addressed several challenges, such as performance, scalability, or sample efficiency associated with the use of this technology. Although RL algorithms bear relevance to psychology and neuroscience in a broader context, they lack biological plausibility. Motivated by recent neural findings demonstrating the capacity of the hippocampus and prefrontal cortex to gather space and time information from the environment, this study presents a novel RL model, called spacetime Q-Network (STQN), that exploits predictive spatiotemporal encoding to reliably learn highly uncertain environment. The proposed method consists of two primary components. The first component is the successor representation with theta phase precession implements hippocampal spacetime encoding, acting as a rollout prediction. The second component, called Q switch ensemble, implements prefrontal population coding for reliable reward prediction. We also implement a single learning rule to accommodate both hippocampal-prefrontal replay and synaptic homeostasis, which subserves confidence-based metacognitive learning. To demonstrate the capacity of our model, we design a task array simulating various levels of environmental uncertainty and complexity. Results show that our model significantly outperforms a few state-of-the-art RL models. In the subsequent ablation study, we showed unique contributions of each component to resolving task uncertainty and complexity. Our study has two important implications. First, it provides the theoretical groundwork for closely linking unique characteristics of the distinct brain regions in the context of RL. Second, our implementation is performed in a simple matrix form that accommodates expansion into biologically-plausible, highly-scalable, and generalizable neural architectures.

Download Full-text

Transformation of Independent Oscillatory Inputs into Temporally Precise Rate Codes

10.1101/054163 ◽

2016 ◽

Author(s):

David Tingley ◽

Andrew A. Alexander ◽

Laleh K. Quinn ◽

Andrea A. Chiba ◽

Douglas Nitz

Keyword(s):

Field Potential ◽

Spatial Location ◽

Brain Regions ◽

Synaptic Activity ◽

Specific Rate ◽

Attention Task ◽

Potential Oscillations ◽

Phase Precession ◽

Theta Phase ◽

Theta Phase Precession

AbstractComplex behaviors demand temporal coordination among functionally distinct brain regions. The basal forebrain’s afferent and efferent structure suggests a capacity for mediating such coordination. During performance of a selective attention task, synaptic activity in this region was dominated by four amplitude-independent oscillations temporally organized by the phase of the slowest, a theta rhythm. Further, oscillatory amplitudes were precisely organized by task epoch and a robust input/output transform, from synchronous synaptic activity to spiking rates of basal forebrain neurons, was identified. For many neurons, spiking was temporally organized as phase precessing sequences against theta band field potential oscillations. Remarkably, theta phase precession advanced in parallel to task progression, rather than absolute spatial location or time. Together, the findings reveal a process by which associative brain regions can integrate independent oscillatory inputs and transform them into sequence-specific, rate-coded outputs that are adaptive to the pace with which organisms interact with their environment.

Download Full-text

A learning rule for place fields in a cortical model: Theta phase precession as a network effect

Hippocampus ◽

10.1002/hipo.20124 ◽

2005 ◽

Vol 15 (7) ◽

pp. 979-989 ◽

Cited By ~ 17

Author(s):

Silvia Scarpetta ◽

Maria Marinaro

Keyword(s):

Learning Rule ◽

Network Effect ◽

Cortical Model ◽

Phase Precession ◽

Place Fields ◽

Theta Phase ◽

Theta Phase Precession

Download Full-text

Input-dependent learning rule for the memory of spatiotemporal sequences in hippocampal network with theta phase precession

Biological Cybernetics ◽

10.1007/s00422-003-0454-2 ◽

2004 ◽

Vol 90 (2) ◽

pp. 113-124 ◽

Cited By ~ 17

Author(s):

Zhihua Wu ◽

Yoko Yamaguchi

Keyword(s):

Learning Rule ◽

Phase Precession ◽

Hippocampal Network ◽

Theta Phase ◽

Dependent Learning ◽

Theta Phase Precession

Download Full-text

Input-dependent learning rule for the memory of spatiotemporal sequences in hippocampal network with theta phase precession

Biological Cybernetics ◽

10.1007/s00422-004-0480-8 ◽

2004 ◽

Vol 90 (4) ◽

pp. 310-310

Author(s):

Zhihua Wu ◽

Yoko Yamaguchi

Keyword(s):

Learning Rule ◽

Phase Precession ◽

Hippocampal Network ◽

Theta Phase ◽

Dependent Learning ◽

Theta Phase Precession

Download Full-text

Faculty Opinions recommendation of Bimodality of theta phase precession in hippocampal place cells in freely running rats.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1007844.98757 ◽

2002 ◽

Author(s):

Michael E Hasselmo

Keyword(s):

Place Cells ◽

Phase Precession ◽

Theta Phase ◽

Theta Phase Precession

Download Full-text

Faculty Opinions recommendation of Spatial selectivity and theta phase precession in CA1 interneurons.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1060763.512688 ◽

2007 ◽

Author(s):

Edvard I Moser

Keyword(s):

Spatial Selectivity ◽

Phase Precession ◽

Theta Phase ◽

Theta Phase Precession

Download Full-text

Defining latent place fields based on theta phase precession in the subiculum

IBRO Reports ◽

10.1016/j.ibror.2019.07.204 ◽

2019 ◽

Vol 6 ◽

pp. S62

Author(s):

Su-Min Lee ◽

Hyun-Woo Lee ◽

Inah Lee

Keyword(s):

Phase Precession ◽

Place Fields ◽

Theta Phase ◽

Theta Phase Precession

Download Full-text

Prefrontal solution to the bias-variance tradeoff during reinforcement learning

10.1101/2020.12.23.424258 ◽

2020 ◽

Author(s):

Dongjae Kim ◽

Jaeseung Jeong ◽

Sang Wan Lee

Keyword(s):

Adaptive Control ◽

Reinforcement Learning ◽

Prediction Error ◽

Brain Regions ◽

Decision Task ◽

Prediction Errors ◽

Model Based ◽

Model Free ◽

Bias Variance ◽

The Brain

AbstractThe goal of learning is to maximize future rewards by minimizing prediction errors. Evidence have shown that the brain achieves this by combining model-based and model-free learning. However, the prediction error minimization is challenged by a bias-variance tradeoff, which imposes constraints on each strategy’s performance. We provide new theoretical insight into how this tradeoff can be resolved through the adaptive control of model-based and model-free learning. The theory predicts the baseline correction for prediction error reduces the lower bound of the bias–variance error by factoring out irreducible noise. Using a Markov decision task with context changes, we showed behavioral evidence of adaptive control. Model-based behavioral analyses show that the prediction error baseline signals context changes to improve adaptability. Critically, the neural results support this view, demonstrating multiplexed representations of prediction error baseline within the ventrolateral and ventromedial prefrontal cortex, key brain regions known to guide model-based and model-free learning.One sentence summaryA theoretical, behavioral, computational, and neural account of how the brain resolves the bias-variance tradeoff during reinforcement learning is described.

Download Full-text