scholarly journals Forgetting Enhances Episodic Control with Structured Memories

2021 ◽  
Author(s):  
Annik Yalnizyan-Carson ◽  
Blake A Richards

Forgetting is a normal process in healthy brains, and evidence suggests that the mammalian brain forgets more than is required based on limitations of mnemonic capacity. Episodic memories, in particular, are liable to be forgotten over time. Researchers have hypothesized that it may be beneficial for decision making to forget episodic memories over time. Reinforcement learning offers a normative framework in which to test such hypotheses. Here, we show that a reinforcement learning agent that uses an episodic memory cache to find rewards in maze environments can forget a large percentage of older memories without any performance impairments, if they utilize mnemonic representations that contain structural information about space. Moreover, we show that some forgetting can actually provide a benefit in performance compared to agents with unbounded memories. Our analyses of the agents show that forgetting reduces the influence of outdated information and states which are not frequently visited on the policies produced by the episodic control system. These results support the hypothesis that some degree of forgetting can be beneficial for decision making, which can help to explain why the brain forgets more than is required by capacity limitations.

Symmetry ◽  
2020 ◽  
Vol 12 (4) ◽  
pp. 631
Author(s):  
Chunyang Hu

In this paper, deep reinforcement learning (DRL) and knowledge transfer are used to achieve the effective control of the learning agent for the confrontation in the multi-agent systems. Firstly, a multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm with parameter sharing is proposed to achieve confrontation decision-making of multi-agent. In the process of training, the information of other agents is introduced to the critic network to improve the strategy of confrontation. The parameter sharing mechanism can reduce the loss of experience storage. In the DDPG algorithm, we use four neural networks to generate real-time action and Q-value function respectively and use a momentum mechanism to optimize the training process to accelerate the convergence rate for the neural network. Secondly, this paper introduces an auxiliary controller using a policy-based reinforcement learning (RL) method to achieve the assistant decision-making for the game agent. In addition, an effective reward function is used to help agents balance losses of enemies and our side. Furthermore, this paper also uses the knowledge transfer method to extend the learning model to more complex scenes and improve the generalization of the proposed confrontation model. Two confrontation decision-making experiments are designed to verify the effectiveness of the proposed method. In a small-scale task scenario, the trained agent can successfully learn to fight with the competitors and achieve a good winning rate. For large-scale confrontation scenarios, the knowledge transfer method can gradually improve the decision-making level of the learning agent.


2021 ◽  
Author(s):  
Kyra Schapiro ◽  
Kresimir Josic ◽  
Zachary Kilpatrick ◽  
Joshua I Gold

Deliberative decisions based on an accumulation of evidence over time depend on working memory, and working memory has limitations, but how these limitations affect deliberative decision-making is not understood. We used human psychophysics to assess the impact of working-memory limitations on the fidelity of a continuous decision variable. Participants decided the average location of multiple visual targets. This computed, continuous decision variable degraded with time and capacity in a manner that depended critically on the strategy used to form the decision variable. This dependence reflected whether the decision variable was computed either: 1) immediately upon observing the evidence, and thus stored as a single value in memory; or 2) at the time of the report, and thus stored as multiple values in memory. These results provide important constraints on how the brain computes and maintains temporally dynamic decision variables.


2019 ◽  
Vol 29 (11) ◽  
pp. 4850-4862 ◽  
Author(s):  
Sebastian Weissengruber ◽  
Sang Wan Lee ◽  
John P O’Doherty ◽  
Christian C Ruff

Abstract While it is established that humans use model-based (MB) and model-free (MF) reinforcement learning in a complementary fashion, much less is known about how the brain determines which of these systems should control behavior at any given moment. Here we provide causal evidence for a neural mechanism that acts as a context-dependent arbitrator between both systems. We applied excitatory and inhibitory transcranial direct current stimulation over a region of the left ventrolateral prefrontal cortex previously found to encode the reliability of both learning systems. The opposing neural interventions resulted in a bidirectional shift of control between MB and MF learning. Stimulation also affected the sensitivity of the arbitration mechanism itself, as it changed how often subjects switched between the dominant system over time. Both of these effects depended on varying task contexts that either favored MB or MF control, indicating that this arbitration mechanism is not context-invariant but flexibly incorporates information about current environmental demands.


2020 ◽  
Author(s):  
Milena Rmus ◽  
Samuel McDougle ◽  
Anne Collins

Reinforcement learning (RL) models have advanced our understanding of how animals learn and make decisions, and how the brain supports some aspects of learning. However, the neural computations that are explained by RL algorithms fall short of explaining many sophisticated aspects of human decision making, including the generalization of learned information, one-shot learning, and the synthesis of task information in complex environments. Instead, these aspects of instrumental behavior are assumed to be supported by the brain’s executive functions (EF). We review recent findings that highlight the importance of EF in learning. Specifically, we advance the theory that EF sets the stage for canonical RL computations in the brain, providing inputs that broaden their flexibility and applicability. Our theory has important implications for how to interpret RL computations in the brain and behavior.


2020 ◽  
Vol 11 ◽  
Author(s):  
Christian Balkenius ◽  
Trond A. Tjøstheim ◽  
Birger Johansson ◽  
Annika Wallin ◽  
Peter Gärdenfors

Reinforcement learning systems usually assume that a value function is defined over all states (or state-action pairs) that can immediately give the value of a particular state or action. These values are used by a selection mechanism to decide which action to take. In contrast, when humans and animals make decisions, they collect evidence for different alternatives over time and take action only when sufficient evidence has been accumulated. We have previously developed a model of memory processing that includes semantic, episodic and working memory in a comprehensive architecture. Here, we describe how this memory mechanism can support decision making when the alternatives cannot be evaluated based on immediate sensory information alone. Instead we first imagine, and then evaluate a possible future that will result from choosing one of the alternatives. Here we present an extended model that can be used as a model for decision making that depends on accumulating evidence over time, whether that information comes from the sequential attention to different sensory properties or from internal simulation of the consequences of making a particular choice. We show how the new model explains both simple immediate choices, choices that depend on multiple sensory factors and complicated selections between alternatives that require forward looking simulations based on episodic and semantic memory structures. In this framework, vicarious trial and error is explained as an internal simulation that accumulates evidence for a particular choice. We argue that a system like this forms the “missing link” between more traditional ideas of semantic and episodic memory, and the associative nature of reinforcement learning.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Payam Piray ◽  
Nathaniel D. Daw

AbstractIt is thought that the brain’s judicious reuse of previous computation underlies our ability to plan flexibly, but also that inappropriate reuse gives rise to inflexibilities like habits and compulsion. Yet we lack a complete, realistic account of either. Building on control engineering, here we introduce a model for decision making in the brain that reuses a temporally abstracted map of future events to enable biologically-realistic, flexible choice at the expense of specific, quantifiable biases. It replaces the classic nonlinear, model-based optimization with a linear approximation that softly maximizes around (and is weakly biased toward) a default policy. This solution demonstrates connections between seemingly disparate phenomena across behavioral neuroscience, notably flexible replanning with biases and cognitive control. It also provides insight into how the brain can represent maps of long-distance contingencies stably and componentially, as in entorhinal response fields, and exploit them to guide choice even under changing goals.


2019 ◽  
Author(s):  
Zhewei Zhang ◽  
Huzi Cheng ◽  
Tianming Yang

AbstractThe brain makes flexible and adaptive responses in the complicated and ever-changing environment for the organism’s survival. To achieve this, the brain needs to choose appropriate actions flexibly in response to sensory inputs. Moreover, the brain also has to understand how its actions affect future sensory inputs and what reward outcomes should be expected, and adapts its behavior based on the actual outcomes. A modeling approach that takes into account of the combined contingencies between sensory inputs, actions, and reward outcomes may be the key to understanding the underlying neural computation. Here, we train a recurrent neural network model based on sequence learning to predict future events based on the past event sequences that combine sensory, action, and reward events. We use four exemplary tasks that have been used in previous animal and human experiments to study different aspects of decision making and learning. We first show that the model reproduces the animals’ choice and reaction time pattern in a probabilistic reasoning task, and its units’ activities mimics the classical findings of the ramping pattern of the parietal neurons that reflects the evidence accumulation process during decision making. We further demonstrate that the model carries out Bayesian inference and may support meta-cognition such as confidence with additional tasks. Finally, we show how the network model achieves adaptive behavior with an approach distinct from reinforcement learning. Our work pieces together many experimental findings in decision making and reinforcement learning and provides a unified framework for the flexible and adaptive behavior of the brain.


2019 ◽  
Author(s):  
Payam Piray ◽  
Nathaniel D. Daw

AbstractIt is thought that the brain’s judicious allocation and reuse of computation underlies our ability to plan flexibly, but also failures to do so as in habits and compulsion. Yet we lack a complete, realistic account of either. Building on control engineering, we introduce a new model for decision making in the brain that reuses a temporally abstracted map of future events to enable biologically-realistic, flexible choice at the expense of specific, quantifiable biases. It replaces the classic nonlinear, model-based optimization with a linear approximation that softly maximizes around (and is weakly biased toward) a learned default policy. This solution exposes connections between seemingly disparate phenomena across behavioral neuroscience, notably flexible replanning with biases and cognitive control. It also gives new insight into how the brain can represent maps of long-distance contingencies stably and componentially, as in entorhinal response fields, and exploit them to guide choice even under changing goals.


2020 ◽  
Author(s):  
Seng Bum Michael Yoo ◽  
Benjamin Hayden ◽  
John Pearson

Real-world decisions take place over time, in dynamics contexts, and with frequently shifting choice sets. In contrast, most studies of decision making have focused on single, abstracted choices among small numbers of discrete options. Here, we advocate for a shift in emphasis to what we call continuous decisions, which better capture the complexity of decision-making in the wild. Continuous decisions involve a continuum of possible responses and take place over an extended period of time during which the response is continuously subject to modification. The range of options available at any time can fluctuate and is affected by recent responses, making consideration of feedback between choices and the environment essential. The study of continuous decisions emphasizes distinct questions not easily captured by discrete decisions, including questions about how the brain integrates choices with movement, bridges representations through time, and maintains hierarchically organized information. While microeconomic theory has proven invaluable for discrete decisions, we propose that engineering control theory may serve as a better foundation for continuous ones. And while the concept of value has proven foundational for discrete decisions, goal states and policies may prove more useful for continuous ones.


Sign in / Sign up

Export Citation Format

Share Document