scholarly journals Automatic Curriculum Learning For Deep RL: A Short Survey

Author(s):  
Rémy Portelas ◽  
Cédric Colas ◽  
Lilian Weng ◽  
Katja Hofmann ◽  
Pierre-Yves Oudeyer

Automatic Curriculum Learning (ACL) has become a cornerstone of recent successes in Deep Reinforcement Learning (DRL). These methods shape the learning trajectories of agents by challenging them with tasks adapted to their capacities. In recent years, they have been used to improve sample efficiency and asymptotic performance, to organize exploration, to encourage generalization or to solve sparse reward problems, among others. To do so, ACL mechanisms can act on many aspects of learning problems. They can optimize domain randomization for Sim2Real transfer, organize task presentations in multi-task robotic settings, order sequences of opponents in multi-agent scenarios, etc. The ambition of this work is dual: 1) to present a compact and accessible introduction to the Automatic Curriculum Learning literature and 2) to draw a bigger picture of the current state of the art in ACL to encourage the cross-breeding of existing concepts and the emergence of new ideas.

Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 1043
Author(s):  
Zijian Gao ◽  
Kele Xu ◽  
Bo Ding ◽  
Huaimin Wang

Recently, deep reinforcement learning (RL) algorithms have achieved significant progress in the multi-agent domain. However, training for increasingly complex tasks would be time-consuming and resource intensive. To alleviate this problem, efficient leveraging of historical experience is essential, which is under-explored in previous studies because most existing methods fail to achieve this goal in a continuously dynamic system owing to their complicated design. In this paper, we propose a method for knowledge reuse called “KnowRU”, which can be easily deployed in the majority of multi-agent reinforcement learning (MARL) algorithms without requiring complicated hand-coded design. We employ the knowledge distillation paradigm to transfer knowledge among agents to shorten the training phase for new tasks while improving the asymptotic performance of agents. To empirically demonstrate the robustness and effectiveness of KnowRU, we perform extensive experiments on state-of-the-art MARL algorithms in collaborative and competitive scenarios. The results show that KnowRU outperforms recently reported methods and not only successfully accelerates the training phase, but also improves the training performance, emphasizing the importance of the proposed knowledge reuse for MARL.


2020 ◽  
Vol 34 (05) ◽  
pp. 7253-7260 ◽  
Author(s):  
Yuhang Song ◽  
Andrzej Wojcicki ◽  
Thomas Lukasiewicz ◽  
Jianyi Wang ◽  
Abi Aryan ◽  
...  

Learning agents that are not only capable of taking tests, but also innovating is becoming a hot topic in AI. One of the most promising paths towards this vision is multi-agent learning, where agents act as the environment for each other, and improving each agent means proposing new problems for others. However, existing evaluation platforms are either not compatible with multi-agent settings, or limited to a specific game. That is, there is not yet a general evaluation platform for research on multi-agent intelligence. To this end, we introduce Arena, a general evaluation platform for multi-agent intelligence with 35 games of diverse logics and representations. Furthermore, multi-agent intelligence is still at the stage where many problems remain unexplored. Therefore, we provide a building toolkit for researchers to easily invent and build novel multi-agent problems from the provided game set based on a GUI-configurable social tree and five basic multi-agent reward schemes. Finally, we provide Python implementations of five state-of-the-art deep multi-agent reinforcement learning baselines. Along with the baseline implementations, we release a set of 100 best agents/teams that we can train with different training schemes for each game, as the base for evaluating agents with population performance. As such, the research community can perform comparisons under a stable and uniform standard. All the implementations and accompanied tutorials have been open-sourced for the community at https://sites.google.com/view/arena-unity/.


Author(s):  
Yu. V. Dubenko

This paper is devoted to the problem of collective artificial intelligence in solving problems by intelligent agents in external environments. The environments may be: fully or partially observable, deterministic or stochastic, static or dynamic, discrete or continuous. The paper identifies problems of collective interaction of intelligent agents when they solve a class of tasks, which need to coordinate actions of agent group, e. g. task of exploring the territory of a complex infrastructure facility. It is revealed that the problem of reinforcement training in multi-agent systems is poorly presented in the press, especially in Russian-language publications. The article analyzes reinforcement learning, describes hierarchical reinforcement learning, presents basic methods to implement reinforcement learning. The concept of macro-action by agents integrated in groups is introduced. The main problems of intelligent agents collective interaction for problem solving (i. e. calculation of individual rewards for each agent; agent coordination issues; application of macro actions by agents integrated into groups; exchange of experience generated by various agents as part of solving a collective problem) are identified. The model of multi-agent reinforcement learning is described in details. The article describes problems of this approach building on existing solutions. Basic problems of multi-agent reinforcement learning are formulated in conclusion.


Entropy ◽  
2021 ◽  
Vol 23 (9) ◽  
pp. 1133
Author(s):  
Shanzhi Gu ◽  
Mingyang Geng ◽  
Long Lan

The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents. Typically, an agent receives its private observations providing a partial view of the true state of the environment. However, in realistic settings, the harsh environment might cause one or more agents to show arbitrarily faulty or malicious behavior, which may suffice to allow the current coordination mechanisms fail. In this paper, we study a practical scenario of multi-agent reinforcement learning systems considering the security issues in the presence of agents with arbitrarily faulty or malicious behavior. The previous state-of-the-art work that coped with extremely noisy environments was designed on the basis that the noise intensity in the environment was known in advance. However, when the noise intensity changes, the existing method has to adjust the configuration of the model to learn in new environments, which limits the practical applications. To overcome these difficulties, we present an Attention-based Fault-Tolerant (FT-Attn) model, which can select not only correct, but also relevant information for each agent at every time step in noisy environments. The multihead attention mechanism enables the agents to learn effective communication policies through experience concurrent with the action policies. Empirical results showed that FT-Attn beats previous state-of-the-art methods in some extremely noisy environments in both cooperative and competitive scenarios, much closer to the upper-bound performance. Furthermore, FT-Attn maintains a more general fault tolerance ability and does not rely on the prior knowledge about the noise intensity of the environment.


Author(s):  
Yu Wang ◽  
Hongxia Jin

In this paper, we present a multi-step coarse to fine question answering (MSCQA) system which can efficiently processes documents with different lengths by choosing appropriate actions. The system is designed using an actor-critic based deep reinforcement learning model to achieve multistep question answering. Compared to previous QA models targeting on datasets mainly containing either short or long documents, our multi-step coarse to fine model takes the merits from multiple system modules, which can handle both short and long documents. The system hence obtains a much better accuracy and faster trainings speed compared to the current state-of-the-art models. We test our model on four QA datasets, WIKEREADING, WIKIREADING LONG, CNN and SQuAD, and demonstrate 1.3%-1.7% accuracy improvements with 1.5x-3.4x training speed-ups in comparison to the baselines using state-of-the-art models.


Author(s):  
Pericles A. Mitkas ◽  
Paraskevi Nikolaidou

This chapter discusses the current state-of-the-art of agents and multi-agent systems (MAS) in supply chain management (SCM). Following a general description of SCM and the challenges it is currently ed with we present MAS as a possible solution to these challenge. We argue that an application involving multiple autonomous actors, such as SCM, can best be served by a software paradigm that relies on multiple independent software entities, like agents. The most significant current trends in this area and focusing on potential areas of further research. Furthermore, the authors believe that a clearer view on the current state-of-the-art and future extension will help researchers improve existing standards and solve remaining issues, eventually helping MAS-based SCM systems to replace legacy ERP software, but also give a boost on both areas of research separately.


2020 ◽  
Vol 12 (16) ◽  
pp. 6373 ◽  
Author(s):  
Magdalena Ramirez-Peña ◽  
Francisco J. Abad Fraga ◽  
Jorge Salguero ◽  
Moises Batista

The supply chain is currently taking on a very important role in organizations seeking to improve the competitiveness and profitability of the company. Its transversal character mainly places it in an unbeatable position to achieve this role. This article, through a study of each of the key enabling technologies of Industry 4.0, aims to obtain a general overview of the current state of the art in shipbuilding adapted to these technologies. To do so, a systematic review of what the scientific community says is carried out, dividing each of the technologies into different categories. In addition, the global vision of countries interested in each of the enabling technologies is also studied. Both studies present a general vision to the companies of the concerns of the scientific community, thus encouraging research on the subject that is focused on the sustainability of the shipbuilding supply chain.


Author(s):  
Adrián Ramírez ◽  
Rifat Sipahi ◽  
Sabine Mondié ◽  
Rubén Garrido

This article is on fast-consensus reaching in a class of multi-agent systems (MAS). We present an analytical approach to tune controllers for the agents based on the premise that delayed measurements in the controller can be preferable to standard controllers relying only on current measurements. Controller tuning in this setting is however challenging due to the presence of delays. To tackle this problem, we propose an analytic geometry approach. The key contribution is that the tuning can be implemented for complex eigenvalues of the arising graph Laplacian of the network, complementing the current state of the art, which is limited to real eigenvalues. Results, therefore, extend our knowledge beyond symmetric graphs and enable the study of the MAS under directed graphs. This article is part of the theme issue ‘Nonlinear dynamics of delay systems’.


Author(s):  
Qiming Fu ◽  
Quan Liu ◽  
Shan Zhong ◽  
Heng Luo ◽  
Hongjie Wu ◽  
...  

In reinforcement learning (RL), the exploration/exploitation (E/E) dilemma is a very crucial issue, which can be described as searching between the exploration of the environment to find more profitable actions, and the exploitation of the best empirical actions for the current state. We focus on the single trajectory RL problem where an agent is interacting with a partially unknown MDP over single trajectories, and try to deal with the E/E in this setting. Given the reward function, we try to find a good E/E strategy to address the MDPs under some MDP distribution. This is achieved by selecting the best strategy in mean over a potential MDP distribution from a large set of candidate strategies, which is done by exploiting single trajectories drawn from plenty of MDPs. In this paper, we mainly make the following contributions: (1) We discuss the strategy-selector algorithm based on formula set and polynomial function. (2) We provide the theoretical and experimental regret analysis of the learned strategy under an given MDP distribution. (3) We compare these methods with the “state-of-the-art” Bayesian RL method experimentally.


2008 ◽  
Vol 42 (1) ◽  
pp. 44-51 ◽  
Author(s):  
J. W. Nicholson ◽  
A. J. Healey

AUVs have proved their usefulness in recent years and continue to do so. This paper is a review of the current state of the art of AUVs. Present AUV capabilities are reviewed through a discussion of feasible present-day AUV missions. The state of key AUV design features and sensor technologies is also addressed, identifying those areas most critical to continued future progress in AUV development.


Sign in / Sign up

Export Citation Format

Share Document