Entorhinal and ventromedial prefrontal cortices abstract and generalise the structure of reinforcement learning problems

AbstractKnowledge of the structure of a problem, such as relationships between stimuli, enables rapid learning and flexible inference. Humans and other animals can abstract this structural knowledge and generalise it to solve new problems. For example, in spatial reasoning, shortest-path inferences are immediate in new environments. Spatial structural transfer is mediated by grid cells in entorhinal and (in humans) medial prefrontal cortices, which maintain their structure across different environments. Here, using fMRI, we show that entorhinal and ventromedial prefrontal cortex (vmPFC) representations perform a much broader role in generalising the structure of problems. We introduce a task-remapping paradigm, where subjects solve multiple reinforcement learning (RL) problems differing in structural or sensory properties. We show that, as with space, entorhinal representations are preserved across different RL problems only if task structure is preserved. In vmPFC, representations of standard RL signals such as prediction error also vary as a function of task structure.

Download Full-text

Entorhinal and ventromedial prefrontal cortices abstract and generalize the structure of reinforcement learning problems

Neuron ◽

10.1016/j.neuron.2020.11.024 ◽

2020 ◽

Author(s):

Alon Boaz Baram ◽

Timothy Howard Muller ◽

Hamed Nili ◽

Mona Maria Garvert ◽

Timothy Edward John Behrens

Keyword(s):

Reinforcement Learning ◽

Learning Problems ◽

Prefrontal Cortices

Download Full-text

Faculty Opinions recommendation of States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.4125957.4076054 ◽

2010 ◽

Author(s):

Susan Courtney

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Model Based ◽

Model Free

Download Full-text

Learning and control

10.1093/oso/9780199674923.003.0026 ◽

2018 ◽

Author(s):

Ivan Herreros

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Brain Function ◽

Control Strategies ◽

Learning Problems ◽

Animal Learning ◽

Feed Forward Control ◽

Machine Learning Applications ◽

And Control

This chapter discusses basic concepts from control theory and machine learning to facilitate a formal understanding of animal learning and motor control. It first distinguishes between feedback and feed-forward control strategies, and later introduces the classification of machine learning applications into supervised, unsupervised, and reinforcement learning problems. Next, it links these concepts with their counterparts in the domain of the psychology of animal learning, highlighting the analogies between supervised learning and classical conditioning, reinforcement learning and operant conditioning, and between unsupervised and perceptual learning. Additionally, it interprets innate and acquired actions from the standpoint of feedback vs anticipatory and adaptive control. Finally, it argues how this framework of translating knowledge between formal and biological disciplines can serve us to not only structure and advance our understanding of brain function but also enrich engineering solutions at the level of robot learning and control with insights coming from biology.

Download Full-text

Benefits of combining dimensional attention and working memory for partially observable reinforcement learning problems

Proceedings of the 2021 ACM Southeast Conference ◽

10.1145/3409334.3452072 ◽

2021 ◽

Author(s):

Ngozi Omatu ◽

Joshua L. Phillips

Keyword(s):

Working Memory ◽

Reinforcement Learning ◽

Learning Problems ◽

Partially Observable

Download Full-text

Policy iterations for reinforcement learning problems in continuous time and space — Fundamental theory and methods

Automatica ◽

10.1016/j.automatica.2020.109421 ◽

2021 ◽

Vol 126 ◽

pp. 109421

Author(s):

Jaeyoung Lee ◽

Richard S. Sutton

Keyword(s):

Reinforcement Learning ◽

Continuous Time ◽

Learning Problems ◽

Fundamental Theory ◽

Time And Space

Download Full-text

VTA dopamine neuron activity encodes social interaction and promotes reinforcement learning through social prediction error

Nature Neuroscience ◽

10.1038/s41593-021-00972-9 ◽

2021 ◽

Author(s):

Clément Solié ◽

Benoit Girard ◽

Beatrice Righetti ◽

Malika Tapparel ◽

Camilla Bellone

Keyword(s):

Reinforcement Learning ◽

Social Interaction ◽

Prediction Error ◽

Dopamine Neuron ◽

Neuron Activity ◽

Social Prediction

Download Full-text

Prefrontal solution to the bias-variance tradeoff during reinforcement learning

10.1101/2020.12.23.424258 ◽

2020 ◽

Author(s):

Dongjae Kim ◽

Jaeseung Jeong ◽

Sang Wan Lee

Keyword(s):

Adaptive Control ◽

Reinforcement Learning ◽

Prediction Error ◽

Brain Regions ◽

Decision Task ◽

Prediction Errors ◽

Model Based ◽

Model Free ◽

Bias Variance ◽

The Brain

AbstractThe goal of learning is to maximize future rewards by minimizing prediction errors. Evidence have shown that the brain achieves this by combining model-based and model-free learning. However, the prediction error minimization is challenged by a bias-variance tradeoff, which imposes constraints on each strategy’s performance. We provide new theoretical insight into how this tradeoff can be resolved through the adaptive control of model-based and model-free learning. The theory predicts the baseline correction for prediction error reduces the lower bound of the bias–variance error by factoring out irreducible noise. Using a Markov decision task with context changes, we showed behavioral evidence of adaptive control. Model-based behavioral analyses show that the prediction error baseline signals context changes to improve adaptability. Critically, the neural results support this view, demonstrating multiplexed representations of prediction error baseline within the ventrolateral and ventromedial prefrontal cortex, key brain regions known to guide model-based and model-free learning.One sentence summaryA theoretical, behavioral, computational, and neural account of how the brain resolves the bias-variance tradeoff during reinforcement learning is described.

Download Full-text

Hierarchical Reinforcement Learning

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch122 ◽

2011 ◽

pp. 825-830

Author(s):

Carlos Diuk ◽

Michael Littman

Keyword(s):

Reinforcement Learning ◽

Learning Problems ◽

Underlying Structure ◽

Sequential Decision ◽

State Spaces ◽

Hierarchical Reinforcement Learning ◽

Markov Decision ◽

Finite Set ◽

State Abstraction ◽

Main Ideas

Reinforcement learning (RL) deals with the problem of an agent that has to learn how to behave to maximize its utility by its interactions with an environment (Sutton & Barto, 1998; Kaelbling, Littman & Moore, 1996). Reinforcement learning problems are usually formalized as Markov Decision Processes (MDP), which consist of a finite set of states and a finite number of possible actions that the agent can perform. At any given point in time, the agent is in a certain state and picks an action. It can then observe the new state this action leads to, and receives a reward signal. The goal of the agent is to maximize its long-term reward. In this standard formalization, no particular structure or relationship between states is assumed. However, learning in environments with extremely large state spaces is infeasible without some form of generalization. Exploiting the underlying structure of a problem can effect generalization and has long been recognized as an important aspect in representing sequential decision tasks (Boutilier et al., 1999). Hierarchical Reinforcement Learning is the subfield of RL that deals with the discovery and/or exploitation of this underlying structure. Two main ideas come into play in hierarchical RL. The first one is to break a task into a hierarchy of smaller subtasks, each of which can be learned faster and easier than the whole problem. Subtasks can also be performed multiple times in the course of achieving the larger task, reusing accumulated knowledge and skills. The second idea is to use state abstraction within subtasks: not every task needs to be concerned with every aspect of the state space, so some states can actually be abstracted away and treated as the same for the purpose of the given subtask.

Download Full-text