Benefits of combining dimensional attention and working memory for partially observable reinforcement learning problems

Author(s):  
Ngozi Omatu ◽  
Joshua L. Phillips
Author(s):  
Yu. V. Dubenko

This paper is devoted to the problem of collective artificial intelligence in solving problems by intelligent agents in external environments. The environments may be: fully or partially observable, deterministic or stochastic, static or dynamic, discrete or continuous. The paper identifies problems of collective interaction of intelligent agents when they solve a class of tasks, which need to coordinate actions of agent group, e. g. task of exploring the territory of a complex infrastructure facility. It is revealed that the problem of reinforcement training in multi-agent systems is poorly presented in the press, especially in Russian-language publications. The article analyzes reinforcement learning, describes hierarchical reinforcement learning, presents basic methods to implement reinforcement learning. The concept of macro-action by agents integrated in groups is introduced. The main problems of intelligent agents collective interaction for problem solving (i. e. calculation of individual rewards for each agent; agent coordination issues; application of macro actions by agents integrated into groups; exchange of experience generated by various agents as part of solving a collective problem) are identified. The model of multi-agent reinforcement learning is described in details. The article describes problems of this approach building on existing solutions. Basic problems of multi-agent reinforcement learning are formulated in conclusion.


2020 ◽  
Vol 34 (02) ◽  
pp. 1324-1331
Author(s):  
Arthur Williams ◽  
Joshua Phillips

Transfer learning allows for knowledge to generalize across tasks, resulting in increased learning speed and/or performance. These tasks must have commonalities that allow for knowledge to be transferred. The main goal of transfer learning in the reinforcement learning domain is to train and learn on one or more source tasks in order to learn a target task that exhibits better performance than if transfer was not used (Taylor and Stone 2009). Furthermore, the use of output-gated neural network models of working memory has been shown to increase generalization for supervised learning tasks (Kriete and Noelle 2011; Kriete et al. 2013). We propose that working memory-based generalization plays a significant role in a model's ability to transfer knowledge successfully across tasks. Thus, we extended the Holographic Working Memory Toolkit (HWMtk) (Dubois and Phillips 2017; Phillips and Noelle 2005) to utilize the generalization benefits of output gating within a working memory system. Finally, the model's utility was tested on a temporally extended, partially observable 5x5 2D grid-world maze task that required the agent to learn 3 tasks over the duration of the training period. The results indicate that the addition of output gating increases the initial learning performance of an agent in target tasks and decreases the learning time required to reach a fixed performance threshold.


Author(s):  
Ivan Herreros

This chapter discusses basic concepts from control theory and machine learning to facilitate a formal understanding of animal learning and motor control. It first distinguishes between feedback and feed-forward control strategies, and later introduces the classification of machine learning applications into supervised, unsupervised, and reinforcement learning problems. Next, it links these concepts with their counterparts in the domain of the psychology of animal learning, highlighting the analogies between supervised learning and classical conditioning, reinforcement learning and operant conditioning, and between unsupervised and perceptual learning. Additionally, it interprets innate and acquired actions from the standpoint of feedback vs anticipatory and adaptive control. Finally, it argues how this framework of translating knowledge between formal and biological disciplines can serve us to not only structure and advance our understanding of brain function but also enrich engineering solutions at the level of robot learning and control with insights coming from biology.


Neuron ◽  
2020 ◽  
Author(s):  
Alon Boaz Baram ◽  
Timothy Howard Muller ◽  
Hamed Nili ◽  
Mona Maria Garvert ◽  
Timothy Edward John Behrens

2021 ◽  
Author(s):  
Wenjie Shang ◽  
Qingyang Li ◽  
Zhiwei Qin ◽  
Yang Yu ◽  
Yiping Meng ◽  
...  

Author(s):  
Jan Leike ◽  
Tor Lattimore ◽  
Laurent Orseau ◽  
Marcus Hutter

We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.


Sign in / Sign up

Export Citation Format

Share Document