Hierarchical Reinforcement Learning

2021 ◽  
Vol 54 (5) ◽  
pp. 1-35
Author(s):  
Shubham Pateria ◽  
Budhitama Subagdja ◽  
Ah-hwee Tan ◽  
Chai Quek

Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious approaches. A comprehensive overview of this vast landscape is necessary to study HRL in an organized manner. We provide a survey of the diverse HRL approaches concerning the challenges of learning hierarchical policies, subtask discovery, transfer learning, and multi-agent learning using HRL. The survey is presented according to a novel taxonomy of the approaches. Based on the survey, a set of important open problems is proposed to motivate the future research in HRL. Furthermore, we outline a few suitable task domains for evaluating the HRL approaches and a few interesting examples of the practical applications of HRL in the Supplementary Material.

2015 ◽  
Vol 53 ◽  
pp. 659-697 ◽  
Author(s):  
Daan Bloembergen ◽  
Karl Tuyls ◽  
Daniel Hennes ◽  
Michael Kaisers

The interaction of multiple autonomous agents gives rise to highly dynamic and nondeterministic environments, contributing to the complexity in applications such as automated financial markets, smart grids, or robotics. Due to the sheer number of situations that may arise, it is not possible to foresee and program the optimal behaviour for all agents beforehand. Consequently, it becomes essential for the success of the system that the agents can learn their optimal behaviour and adapt to new situations or circumstances. The past two decades have seen the emergence of reinforcement learning, both in single and multi-agent settings, as a strong, robust and adaptive learning paradigm. Progress has been substantial, and a wide range of algorithms are now available. An important challenge in the domain of multi-agent learning is to gain qualitative insights into the resulting system dynamics. In the past decade, tools and methods from evolutionary game theory have been successfully employed to study multi-agent learning dynamics formally in strategic interactions. This article surveys the dynamical models that have been derived for various multi-agent reinforcement learning algorithms, making it possible to study and compare them qualitatively. Furthermore, new learning algorithms that have been introduced using these evolutionary game theoretic tools are reviewed. The evolutionary models can be used to study complex strategic interactions. Examples of such analysis are given for the domains of automated trading in stock markets and collision avoidance in multi-robot systems. The paper provides a roadmap on the progress that has been achieved in analysing the evolutionary dynamics of multi-agent learning by highlighting the main results and accomplishments.


Author(s):  
Shihui Li ◽  
Yi Wu ◽  
Xinyue Cui ◽  
Honghua Dong ◽  
Fei Fang ◽  
...  

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.


2020 ◽  
Vol 34 (05) ◽  
pp. 7253-7260 ◽  
Author(s):  
Yuhang Song ◽  
Andrzej Wojcicki ◽  
Thomas Lukasiewicz ◽  
Jianyi Wang ◽  
Abi Aryan ◽  
...  

Learning agents that are not only capable of taking tests, but also innovating is becoming a hot topic in AI. One of the most promising paths towards this vision is multi-agent learning, where agents act as the environment for each other, and improving each agent means proposing new problems for others. However, existing evaluation platforms are either not compatible with multi-agent settings, or limited to a specific game. That is, there is not yet a general evaluation platform for research on multi-agent intelligence. To this end, we introduce Arena, a general evaluation platform for multi-agent intelligence with 35 games of diverse logics and representations. Furthermore, multi-agent intelligence is still at the stage where many problems remain unexplored. Therefore, we provide a building toolkit for researchers to easily invent and build novel multi-agent problems from the provided game set based on a GUI-configurable social tree and five basic multi-agent reward schemes. Finally, we provide Python implementations of five state-of-the-art deep multi-agent reinforcement learning baselines. Along with the baseline implementations, we release a set of 100 best agents/teams that we can train with different training schemes for each game, as the base for evaluating agents with population performance. As such, the research community can perform comparisons under a stable and uniform standard. All the implementations and accompanied tutorials have been open-sourced for the community at https://sites.google.com/view/arena-unity/.


Author(s):  
Thomas Recchia ◽  
Jae Chung ◽  
Kishore Pochiraju

As robotic systems become more prevalent, it is highly desirable for them to be able to operate in highly dynamic environments. A common approach is to use reinforcement learning to allow an agent controlling the robot to learn and adapt its behavior based on a reward function. This paper presents a novel multi-agent system that cooperates to control a single robot battle tank in a melee battle scenario, with no prior knowledge of its opponents’ strategies. The agents learn through reinforcement learning, and are loosely coupled by their reward functions. Each agent controls a different aspect of the robot’s behavior. In addition, the problem of delayed reward is addressed through a time-averaged reward applied to several sequential actions at once. This system was evaluated in a simulated melee combat scenario and was shown to learn to improve its performance over time. This was accomplished by each agent learning to pick specific battle strategies for each different opponent it faced.


Author(s):  
Yu. V. Dubenko

This paper is devoted to the problem of collective artificial intelligence in solving problems by intelligent agents in external environments. The environments may be: fully or partially observable, deterministic or stochastic, static or dynamic, discrete or continuous. The paper identifies problems of collective interaction of intelligent agents when they solve a class of tasks, which need to coordinate actions of agent group, e. g. task of exploring the territory of a complex infrastructure facility. It is revealed that the problem of reinforcement training in multi-agent systems is poorly presented in the press, especially in Russian-language publications. The article analyzes reinforcement learning, describes hierarchical reinforcement learning, presents basic methods to implement reinforcement learning. The concept of macro-action by agents integrated in groups is introduced. The main problems of intelligent agents collective interaction for problem solving (i. e. calculation of individual rewards for each agent; agent coordination issues; application of macro actions by agents integrated into groups; exchange of experience generated by various agents as part of solving a collective problem) are identified. The model of multi-agent reinforcement learning is described in details. The article describes problems of this approach building on existing solutions. Basic problems of multi-agent reinforcement learning are formulated in conclusion.


2020 ◽  
Author(s):  
Clay B. Holroyd ◽  
Tom Verguts

Despite continual debate for the past thirty years about the function of anterior cingulate cortex (ACC), its key contribution to neurocognition remains unknown. Here we review computational models that illustrate three core principles of ACC function (related to hierarchy, world models and cost), as well as four constraints on the neural implementation of these principles (related to modularity, binding, encoding and learning and regulation). These observations suggest a role for ACC in hierarchical model-based hierarchical reinforcement learning, which instantiates a mechanism for motivating the execution of high-level plans.


1995 ◽  
Vol 2 ◽  
pp. 475-500 ◽  
Author(s):  
A. Schaerf ◽  
Y. Shoham ◽  
M. Tennenholtz

We study the process of multi-agent reinforcement learning in the context ofload balancing in a distributed system, without use of either centralcoordination or explicit communication. We first define a precise frameworkin which to study adaptive load balancing, important features of which are itsstochastic nature and the purely local information available to individualagents. Given this framework, we show illuminating results on the interplaybetween basic adaptive behavior parameters and their effect on systemefficiency. We then investigate the properties of adaptive load balancing inheterogeneous populations, and address the issue of exploration vs.exploitation in that context. Finally, we show that naive use ofcommunication may not improve, and might even harm system efficiency.


2021 ◽  
Author(s):  
Amjad Yousef Majid ◽  
Serge Saaybi ◽  
Tomas van Rietbergen ◽  
Vincent Francois-Lavet ◽  
R Venkatesha Prasad ◽  
...  

<div>Deep Reinforcement Learning (DRL) and Evolution Strategies (ESs) have surpassed human-level control in many sequential decision-making problems, yet many open challenges still exist.</div><div>To get insights into the strengths and weaknesses of DRL versus ESs, an analysis of their respective capabilities and limitations is provided. </div><div>After presenting their fundamental concepts and algorithms, a comparison is provided on key aspects such as scalability, exploration, adaptation to dynamic environments, and multi-agent learning. </div><div>Then, the benefits of hybrid algorithms that combine concepts from DRL and ESs are highlighted. </div><div>Finally, to have an indication about how they compare in real-world applications, a survey of the literature for the set of applications they support is provided.</div>


This new textbook provides a comprehensive overview of the scientific and clinical aspects of rheumatoid arthritis (RA). Split into eight sections—history, diagnosis, and epidemiology; pathogenesis; clinical presentation; disease assessment; impact on life; non-drug treatments; drug treatments; and management and outcomes—it collects the contemporary ideas about RA and explains the revolutionary changes that have taken place over the past two decades, and indicates areas of future research. Witten by leading clinicians and scientists in the field, each chapter gives a detailed background, key recent advances, areas of doubt, and future directions of research.


Sign in / Sign up

Export Citation Format

Share Document