scholarly journals The Missing Link Between Memory and Reinforcement Learning

2020 ◽  
Vol 11 ◽  
Author(s):  
Christian Balkenius ◽  
Trond A. Tjøstheim ◽  
Birger Johansson ◽  
Annika Wallin ◽  
Peter Gärdenfors

Reinforcement learning systems usually assume that a value function is defined over all states (or state-action pairs) that can immediately give the value of a particular state or action. These values are used by a selection mechanism to decide which action to take. In contrast, when humans and animals make decisions, they collect evidence for different alternatives over time and take action only when sufficient evidence has been accumulated. We have previously developed a model of memory processing that includes semantic, episodic and working memory in a comprehensive architecture. Here, we describe how this memory mechanism can support decision making when the alternatives cannot be evaluated based on immediate sensory information alone. Instead we first imagine, and then evaluate a possible future that will result from choosing one of the alternatives. Here we present an extended model that can be used as a model for decision making that depends on accumulating evidence over time, whether that information comes from the sequential attention to different sensory properties or from internal simulation of the consequences of making a particular choice. We show how the new model explains both simple immediate choices, choices that depend on multiple sensory factors and complicated selections between alternatives that require forward looking simulations based on episodic and semantic memory structures. In this framework, vicarious trial and error is explained as an internal simulation that accumulates evidence for a particular choice. We argue that a system like this forms the “missing link” between more traditional ideas of semantic and episodic memory, and the associative nature of reinforcement learning.

2006 ◽  
Vol 04 (06) ◽  
pp. 1071-1083 ◽  
Author(s):  
C. L. CHEN ◽  
D. Y. DONG ◽  
Z. H. CHEN

This paper proposes a novel action selection method based on quantum computation and reinforcement learning (RL). Inspired by the advantages of quantum computation, the state/action in a RL system is represented with quantum superposition state. The probability of action eigenvalue is denoted by probability amplitude, which is updated according to rewards. And the action selection is carried out by observing quantum state according to collapse postulate of quantum measurement. The results of simulated experiments show that quantum computation can be effectively used to action selection and decision making through speeding up learning. This method also makes a good tradeoff between exploration and exploitation for RL using probability characteristics of quantum theory.


2021 ◽  
Author(s):  
Annik Yalnizyan-Carson ◽  
Blake A Richards

Forgetting is a normal process in healthy brains, and evidence suggests that the mammalian brain forgets more than is required based on limitations of mnemonic capacity. Episodic memories, in particular, are liable to be forgotten over time. Researchers have hypothesized that it may be beneficial for decision making to forget episodic memories over time. Reinforcement learning offers a normative framework in which to test such hypotheses. Here, we show that a reinforcement learning agent that uses an episodic memory cache to find rewards in maze environments can forget a large percentage of older memories without any performance impairments, if they utilize mnemonic representations that contain structural information about space. Moreover, we show that some forgetting can actually provide a benefit in performance compared to agents with unbounded memories. Our analyses of the agents show that forgetting reduces the influence of outdated information and states which are not frequently visited on the policies produced by the episodic control system. These results support the hypothesis that some degree of forgetting can be beneficial for decision making, which can help to explain why the brain forgets more than is required by capacity limitations.


Information ◽  
2019 ◽  
Vol 10 (11) ◽  
pp. 341 ◽  
Author(s):  
Hu ◽  
Xu

Multi-Robot Confrontation on physics-based simulators is a complex and time-consuming task, but simulators are required to evaluate the performance of the advanced algorithms. Recently, a few advanced algorithms have been able to produce considerably complex levels in the context of the robot confrontation system when the agents are facing multiple opponents. Meanwhile, the current confrontation decision-making system suffers from difficulties in optimization and generalization. In this paper, a fuzzy reinforcement learning (RL) and the curriculum transfer learning are applied to the micromanagement for robot confrontation system. Firstly, an improved Qlearning in the semi-Markov decision-making process is designed to train the agent and an efficient RL model is defined to avoid the curse of dimensionality. Secondly, a multi-agent RL algorithm with parameter sharing is proposed to train the agents. We use a neural network with adaptive momentum acceleration as a function approximator to estimate the state-action function. Then, a method of fuzzy logic is used to regulate the learning rate of RL. Thirdly, a curriculum transfer learning method is used to extend the RL model to more difficult scenarios, which ensures the generalization of the decision-making system. The experimental results show that the proposed method is effective.


2019 ◽  
Vol 13 (1) ◽  
Author(s):  
Jiunyan Wu ◽  
Tomoki Sekiguchi

AbstractAlthough intragroup conflict has both multilevel and dynamic natures, less attention has been paid to establishing a holistic model of intragroup conflict that emerges across levels and unfolds over time. To address this research gap, we extend the multilevel view of intragroup conflict (Korsgaard et al. 2008) to develop a multilevel and dynamic model of intragroup conflict that explicitly includes (1) the role of time and (2) the feedback loop to encompass the dynamic aspect of intragroup conflict. We further instantiate the extended model in the context of team decision-making. To achieve this and systematically examine the complex relationships, we use agent-based modeling and simulation (ABMS). We directly investigate how two types of intragroup conflict—task and relationship conflict—interplay with cross-level antecedences, interrelate and develop over time, and affect team outcomes. This study adds to the intragroup conflict research by extending the field with multilevel and dynamic views.


2020 ◽  
Vol 14 (2) ◽  
pp. 179-193
Author(s):  
Rahul Shrivastava ◽  
Prabhat Kumar ◽  
Sudhakar Tripathi

Background: The cognitive models based agents proposed in the existing patents are not able to create knowledge by themselves. They also did not have the inference mechanism to take decisions and perform planning in novel situations. Objective: This patent proposes a method to mimic the human memory process for decision making. Methods: The proposed model simulates the functionality of episodic, semantic and procedural memory along with their interaction system. The sensory information activates the activity nodes which is a binding of concept and the sensory values. These activated activity nodes are captured by the episodic memory in the form of an event node. Each activity node has some participation strength in each event depending upon its involvement among other events. Recalling of events and frequent usage of some coactive activity nodes constitute the semantic knowledge in the form of associations between the activity nodes. The model also learns the actions in context to the activity nodes by using reinforcement learning. The proposed model uses an energy-based inference mechanism for planning and decision making. Results: The proposed model is validated by deploying it in a virtual war game agent and analysing the results. The obtained results show that the proposed model is significantly associated with all the biological findings and theories related to memories. Conclusion: The implementation of this model allows humanoid and game agents to take decisions and perform planning in novel situations.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 227
Author(s):  
Gone Neelakantam ◽  
Djeane Debora Onthoni ◽  
Prasan Kumar Sahoo

Wastage of perishable and non-perishable products due to manual monitoring in shopping malls creates huge revenue loss in supermarket industry. Besides, internal and external factors such as calendar events and weather condition contribute to excess wastage of products in different regions of supermarket. It is a challenging job to know about the wastage of the products manually in different supermarkets region-wise. Therefore, the supermarket management needs to take appropriate decision and action to prevent the wastage of products. The fog computing data centers located in each region can collect, process and analyze data for demand prediction and decision making. In this paper, a product-demand prediction model is designed using integrated Principal Component Analysis (PCA) and K-means Unsupervised Learning (UL) algorithms and a decision making model is developed using State-Action-Reward-State-Action (SARSA) Reinforcement Learning (RL) algorithm. Our proposed method can cluster the products into low, medium, and high-demand product by learning from the designed features. Taking the derived cluster model, decision making for distributing low-demand to high-demand product can be made using SARSA. Experimental results show that our proposed method can cluster the datasets well with a Silhouette score of ≥60%. Besides, our adopted SARSA-based decision making model outperforms over Q-Learning, Monte-Carlo, Deep Q-Network (DQN), and Actor-Critic algorithms in terms of maximum cumulative reward, average cumulative reward and execution time.


2020 ◽  
Author(s):  
Seng Bum Michael Yoo ◽  
Benjamin Hayden ◽  
John Pearson

Humans and other animals evolved to make decisions that extend over time with continuous and ever-changing options. Nonetheless, the academic study of decision-making is mostly limited to the simple case of choice between two options. Here we advocate that the study of choice should expand to include continuous decisions. Continuous decisions, by our definition, involve a continuum of possible responses and take place over an extended period of time during which the response is continuously subject to modification. In most continuous decisions, the range of options can fluctuate and is affected by recent responses, making consideration of reciprocal feedback between choices and the environment essential. The study of continuous decisions raises new questions, such as how abstract processes of valuation and comparison are co-implemented with action planning and execution, how we simulate the large number of possible futures our choices lead to, and how our brains employ hierarchical structure to make choices more efficiently. While microeconomic theory has proven invaluable for discrete decisions, we propose that engineering control theory may serve as a better foundation for continuous ones. And while the concept of value has proven foundational for discrete decisions, goal states and policies may prove more useful for continuous ones.


Sign in / Sign up

Export Citation Format

Share Document