scholarly journals KnowRU: Knowledge Reuse via Knowledge Distillation in Multi-Agent Reinforcement Learning

Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 1043
Author(s):  
Zijian Gao ◽  
Kele Xu ◽  
Bo Ding ◽  
Huaimin Wang

Recently, deep reinforcement learning (RL) algorithms have achieved significant progress in the multi-agent domain. However, training for increasingly complex tasks would be time-consuming and resource intensive. To alleviate this problem, efficient leveraging of historical experience is essential, which is under-explored in previous studies because most existing methods fail to achieve this goal in a continuously dynamic system owing to their complicated design. In this paper, we propose a method for knowledge reuse called “KnowRU”, which can be easily deployed in the majority of multi-agent reinforcement learning (MARL) algorithms without requiring complicated hand-coded design. We employ the knowledge distillation paradigm to transfer knowledge among agents to shorten the training phase for new tasks while improving the asymptotic performance of agents. To empirically demonstrate the robustness and effectiveness of KnowRU, we perform extensive experiments on state-of-the-art MARL algorithms in collaborative and competitive scenarios. The results show that KnowRU outperforms recently reported methods and not only successfully accelerates the training phase, but also improves the training performance, emphasizing the importance of the proposed knowledge reuse for MARL.

Author(s):  
Rémy Portelas ◽  
Cédric Colas ◽  
Lilian Weng ◽  
Katja Hofmann ◽  
Pierre-Yves Oudeyer

Automatic Curriculum Learning (ACL) has become a cornerstone of recent successes in Deep Reinforcement Learning (DRL). These methods shape the learning trajectories of agents by challenging them with tasks adapted to their capacities. In recent years, they have been used to improve sample efficiency and asymptotic performance, to organize exploration, to encourage generalization or to solve sparse reward problems, among others. To do so, ACL mechanisms can act on many aspects of learning problems. They can optimize domain randomization for Sim2Real transfer, organize task presentations in multi-task robotic settings, order sequences of opponents in multi-agent scenarios, etc. The ambition of this work is dual: 1) to present a compact and accessible introduction to the Automatic Curriculum Learning literature and 2) to draw a bigger picture of the current state of the art in ACL to encourage the cross-breeding of existing concepts and the emergence of new ideas.


2020 ◽  
Vol 34 (05) ◽  
pp. 7253-7260 ◽  
Author(s):  
Yuhang Song ◽  
Andrzej Wojcicki ◽  
Thomas Lukasiewicz ◽  
Jianyi Wang ◽  
Abi Aryan ◽  
...  

Learning agents that are not only capable of taking tests, but also innovating is becoming a hot topic in AI. One of the most promising paths towards this vision is multi-agent learning, where agents act as the environment for each other, and improving each agent means proposing new problems for others. However, existing evaluation platforms are either not compatible with multi-agent settings, or limited to a specific game. That is, there is not yet a general evaluation platform for research on multi-agent intelligence. To this end, we introduce Arena, a general evaluation platform for multi-agent intelligence with 35 games of diverse logics and representations. Furthermore, multi-agent intelligence is still at the stage where many problems remain unexplored. Therefore, we provide a building toolkit for researchers to easily invent and build novel multi-agent problems from the provided game set based on a GUI-configurable social tree and five basic multi-agent reward schemes. Finally, we provide Python implementations of five state-of-the-art deep multi-agent reinforcement learning baselines. Along with the baseline implementations, we release a set of 100 best agents/teams that we can train with different training schemes for each game, as the base for evaluating agents with population performance. As such, the research community can perform comparisons under a stable and uniform standard. All the implementations and accompanied tutorials have been open-sourced for the community at https://sites.google.com/view/arena-unity/.


Entropy ◽  
2021 ◽  
Vol 23 (9) ◽  
pp. 1133
Author(s):  
Shanzhi Gu ◽  
Mingyang Geng ◽  
Long Lan

The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents. Typically, an agent receives its private observations providing a partial view of the true state of the environment. However, in realistic settings, the harsh environment might cause one or more agents to show arbitrarily faulty or malicious behavior, which may suffice to allow the current coordination mechanisms fail. In this paper, we study a practical scenario of multi-agent reinforcement learning systems considering the security issues in the presence of agents with arbitrarily faulty or malicious behavior. The previous state-of-the-art work that coped with extremely noisy environments was designed on the basis that the noise intensity in the environment was known in advance. However, when the noise intensity changes, the existing method has to adjust the configuration of the model to learn in new environments, which limits the practical applications. To overcome these difficulties, we present an Attention-based Fault-Tolerant (FT-Attn) model, which can select not only correct, but also relevant information for each agent at every time step in noisy environments. The multihead attention mechanism enables the agents to learn effective communication policies through experience concurrent with the action policies. Empirical results showed that FT-Attn beats previous state-of-the-art methods in some extremely noisy environments in both cooperative and competitive scenarios, much closer to the upper-bound performance. Furthermore, FT-Attn maintains a more general fault tolerance ability and does not rely on the prior knowledge about the noise intensity of the environment.


2020 ◽  
Vol 12 (17) ◽  
pp. 2770 ◽  
Author(s):  
Yajie Chai ◽  
Kun Fu ◽  
Xian Sun ◽  
Wenhui Diao ◽  
Zhiyuan Yan ◽  
...  

The deep convolutional neural network has made significant progress in cloud detection. However, the compromise between having a compact model and high accuracy has always been a challenging task in cloud detection for large-scale remote sensing imagery. A promising method to tackle this problem is knowledge distillation, which usually lets the compact model mimic the cumbersome model’s output to get better generalization. However, vanilla knowledge distillation methods cannot properly distill the characteristics of clouds in remote sensing images. In this paper, we propose a novel self-attention knowledge distillation approach for compact and accurate cloud detection, named Bidirectional Self-Attention Distillation (Bi-SAD). Bi-SAD lets a model learn from itself without adding additional parameters or supervision. With bidirectional layer-wise features learning, the model can get a better representation of the cloud’s textural information and semantic information, so that the cloud’s boundaries become more detailed and the predictions become more reliable. Experiments on a dataset acquired by GaoFen-1 satellite show that our Bi-SAD has a great balance between compactness and accuracy, and outperforms vanilla distillation methods. Compared with state-of-the-art cloud detection models, the parameter size and FLOPs are reduced by 100 times and 400 times, respectively, with a small drop in accuracy.


Author(s):  
Man Luo ◽  
Wenzhe Zhang ◽  
Tianyou Song ◽  
Kun Li ◽  
Hongming Zhu ◽  
...  

Electric Vehicle (EV) sharing systems have recently experienced unprecedented growth across the world. One of the key challenges in their operation is vehicle rebalancing, i.e., repositioning the EVs across stations to better satisfy future user demand. This is particularly challenging in the shared EV context, because i) the range of EVs is limited while charging time is substantial, which constrains the rebalancing options; and ii) as a new mobility trend, most of the current EV sharing systems are still continuously expanding their station networks, i.e., the targets for rebalancing can change over time. To tackle these challenges, in this paper we model the rebalancing task as a Multi-Agent Reinforcement Learning (MARL) problem, which directly takes the range and charging properties of the EVs into account. We propose a novel approach of policy optimization with action cascading, which isolates the non-stationarity locally, and use two connected networks to solve the formulated MARL. We evaluate the proposed approach using a simulator calibrated with 1-year operation data from a real EV sharing system. Results show that our approach significantly outperforms the state-of-the-art, offering up to 14% gain in order satisfied rate and 12% increase in net revenue.


Author(s):  
Cheng Li ◽  
Levi Fussell ◽  
Taku Komura

AbstractSimultaneous control of multiple characters has been a research topic that has been extensively pursued for applications in computer games and computer animations, for applications such as crowd simulation, controlling two characters carrying objects or fighting with one another and controlling a team of characters playing collective sports. With the advance in deep learning and reinforcement learning, there is a growing interest in applying multi-agent reinforcement learning for intelligently controlling the characters to produce realistic movements. In this paper we will survey the state-of-the-art MARL techniques that are applicable for character control. We will then survey papers that make use of MARL for multi-character control and then discuss about the possible future directions of research.


2021 ◽  
Author(s):  
Nikolaos Al. Papadopoulos ◽  
Marti Sanchez-Fibla

Multi-Agent Reinforcement Learning reductionist simulations can provide a spectrum of opportunities towards the modeling and understanding of complex social phenomena such as common-pool appropriation. In this paper, a multiplayer variant of Battle-of-the-Exes is suggested as appropriate for experimentation regarding fair and efficient coordination and turn-taking among selfish agents. Going beyond literature’s fairness and efficiency, a novel measure is proposed for turn-taking coordination evaluation, robust to the number of agents and episodes of a system. Six variants of this measure are defined, entitled Alternation Measures or ALT. ALT measures were found sufficient to capture the desired properties (alternation, fair and efficient distribution) in comparison to state-of-the-art measures, thus they were benchmarked and tested through a series of experiments with Reinforcement Learning agents, aspiring to contribute novel tools for a deeper understanding of emergent social outcomes.


Author(s):  
Wei Qiu ◽  
Haipeng Chen ◽  
Bo An

Over the past decades, Electronic Toll Collection (ETC) systems have been proved the capability of alleviating traffic congestion in urban areas. Dynamic Electronic Toll Collection (DETC) was recently proposed to further improve the efficiency of ETC, where tolls are dynamically set based on traffic dynamics. However, computing the optimal DETC scheme is computationally difficult and existing approaches are limited to small scale or partial road networks, which significantly restricts the adoption of DETC. To this end, we propose a novel multi-agent reinforcement learning (RL) approach for DETC. We make several key contributions: i) an enhancement over the state-of-the-art RL-based method with a deep neural network representation of the policy and value functions and a temporal difference learning framework to accelerate the update of target values, ii) a novel edge-based graph convolutional neural network (eGCN) to extract the spatio-temporal correlations of the road network state features, iii) a novel cooperative multi-agent reinforcement learning (MARL) which divides the whole road network into partitions according to their geographic and economic characteristics and trains a tolling agent for each partition. Experimental results show that our approach can scale up to realistic-sized problems with robust performance and significantly outperform the state-of-the-art method.


Author(s):  
Yue Hu ◽  
Juntao Li ◽  
Xi Li ◽  
Gang Pan ◽  
Mingliang Xu

As an important and challenging problem in artificial intelligence (AI) game playing, StarCraft micromanagement involves a dynamically adversarial game playing process with complex multi-agent control within a large action space. In this paper, we propose a novel knowledge-guided agent-tactic-aware learning scheme, that is, opponent-guided tactic learning (OGTL), to cope with this micromanagement problem. In principle, the proposed scheme takes a two-stage cascaded learning strategy which is capable of not only transferring the human tactic knowledge from the human-made opponent agents to our AI agents but also improving the adversarial ability. With the power of reinforcement learning, such a knowledge-guided agent-tactic-aware scheme has the ability to guide the AI agents to achieve high winning-rate performances while accelerating the policy exploration process in a tactic-interpretable fashion. Experimental results demonstrate the effectiveness of the proposed scheme against the state-of-the-art approaches in several benchmark combat scenarios.


2005 ◽  
Vol 20 (1) ◽  
pp. 63-90 ◽  
Author(s):  
KARL TUYLS ◽  
ANN NOWÉ

In this paper we survey the basics of reinforcement learning and (evolutionary) game theory, applied to the field of multi-agent systems. This paper contains three parts. We start with an overview on the fundamentals of reinforcement learning. Next we summarize the most important aspects of evolutionary game theory. Finally, we discuss the state-of-the-art of multi-agent reinforcement learning and the mathematical connection with evolutionary game theory.


Sign in / Sign up

Export Citation Format

Share Document