Robust experience replay sampling for multi-agent reinforcement learning

This work presents a sample efficient and effective value-based method, named SMIX(λ), for reinforcement learning in multi-agent environments (MARL) within the paradigm of centralized training with decentralized execution (CTDE), in which learning a stable and generalizable centralized value function (CVF) is crucial. To achieve this, our method carefully combines different elements, including 1) removing the unrealistic centralized greedy assumption during the learning phase, 2) using the λ-return to balance the trade-off between bias and variance and to deal with the environment's non-Markovian property, and 3) adopting an experience-replay style off-policy training. Interestingly, it is revealed that there exists inherent connection between SMIX(λ) and previous off-policy Q(λ) approach for single-agent learning. Experiments on the StarCraft Multi-Agent Challenge (SMAC) benchmark show that the proposed SMIX(λ) algorithm outperforms several state-of-the-art MARL methods by a large margin, and that it can be used as a general tool to improve the overall performance of a CTDE-type method by enhancing the evaluation quality of its CVF. We open-source our code at: https://github.com/chaovven/SMIX.

Download Full-text

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7247 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13949-13950

Author(s):

Wang Qisheng ◽

Wang Qichao ◽

Li Xiao

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Experimental Results ◽

State Action ◽

Reward Function ◽

Current State ◽

Learning Speed ◽

Communication Method ◽

Experience Replay ◽

Multi Agent

Exploration efficiency challenges for multi-agent reinforcement learning (MARL), as the policy learned by confederate MARL depends on the interaction among agents. Less informative reward also restricts the learning speed of MARL in comparison with the informative label in supervised learning. This paper proposes a novel communication method which helps agents focus on different exploration subarea to guide MARL to accelerate exploration. We propose a predictive network to forecast the reward of current state-action pair and use the guidance learned by the predictive network to modify the reward function. An improved prioritized experience replay is employed to help agents better take advantage of the different knowledge learned by different agents. Experimental results demonstrate that the proposed algorithm outperforms existing methods in cooperative multi-agent environments.

Download Full-text

Multi-Agent Deep Reinforcement Learning for Decentralized Cooperative Traffic Signal Control

CICTP 2020 ◽

10.1061/9780784483053.039 ◽

2020 ◽

Author(s):

Yang Zhao ◽

Jian-Ming Hu ◽

Ming-Yang Gao ◽

Zuo Zhang

Keyword(s):

Reinforcement Learning ◽

Traffic Signal ◽

Signal Control ◽

Traffic Signal Control ◽

Multi Agent

Download Full-text

Output feedback reinforcement learning based optimal output synchronisation of heterogeneous discrete-time multi-agent systems

IET Control Theory and Applications ◽

10.1049/iet-cta.2018.6266 ◽

2019 ◽

Vol 13 (17) ◽

pp. 2866-2876

Author(s):

Syed Ali Asad Rizvi ◽

Zongli Lin

Keyword(s):

Reinforcement Learning ◽

Discrete Time ◽

Output Feedback ◽

Multi Agent Systems ◽

Agent Systems ◽

Optimal Output ◽

Multi Agent

Download Full-text

Multi-agent deep reinforcement learning with type-based hierarchical group communication

Applied Intelligence ◽

10.1007/s10489-020-02065-9 ◽

2021 ◽

Author(s):

Hao Jiang ◽

Dianxi Shi ◽

Chao Xue ◽

Yajie Wang ◽

Gongju Wang ◽

...

Keyword(s):

Reinforcement Learning ◽

Group Communication ◽

Multi Agent ◽

Hierarchical Group

Download Full-text

Multi-Agent Deep Reinforcement Learning Based Cooperative Edge Caching for Ultra-Dense Next-Generation Networks

IEEE Transactions on Communications ◽

10.1109/tcomm.2020.3044298 ◽

2020 ◽

pp. 1-1

Author(s):

Shuangwu Chen ◽

Zhen Yao ◽

Xiaofeng Jiang ◽

Jian Yang ◽

Lajos Hanzo

Keyword(s):

Reinforcement Learning ◽

Next Generation Networks ◽

Next Generation ◽

Multi Agent ◽

Edge Caching

Download Full-text

Coordinated Ramp Metering Control Based on Multi-Agent Reinforcement Learning

2020 35th Youth Academic Annual Conference of Chinese Association of Automation (YAC) ◽

10.1109/yac51587.2020.9337711 ◽

2020 ◽

Author(s):

Jiyuan Tan ◽

Qianqian Qiu ◽

Weiwei Guo

Keyword(s):

Reinforcement Learning ◽

Ramp Metering ◽

Multi Agent

Download Full-text

Multi-Agent Deep Reinforcement Learning for Vehicular Computation Offloading in IoT

IEEE Internet of Things Journal ◽

10.1109/jiot.2020.3040768 ◽

2020 ◽

pp. 1-1 ◽

Cited By ~ 1

Author(s):

Xiaoyu Zhu ◽

Yueyi Luo ◽

Anfeng Liu ◽

Md Zakirul Alam Bhuiyan ◽

Shaobo Zhang

Keyword(s):

Reinforcement Learning ◽

Computation Offloading ◽

Multi Agent

Download Full-text

Multi-Agent Reinforcement Learning: A Review of Challenges and Applications

Applied Sciences ◽

10.3390/app11114948 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4948

Author(s):

Lorenzo Canese ◽

Gian Carlo Cardarilli ◽

Luca Di Di Nunzio ◽

Rocco Fazzolari ◽

Daniele Giardino ◽

...

Keyword(s):

Reinforcement Learning ◽

Mathematical Models ◽

Learning Algorithms ◽

Single Agent ◽

Critical Issues ◽

Multi Agent ◽

Pros And Cons ◽

Application Fields

In this review, we present an analysis of the most used multi-agent reinforcement learning algorithms. Starting with the single-agent reinforcement learning algorithms, we focus on the most critical issues that must be taken into account in their extension to multi-agent scenarios. The analyzed algorithms were grouped according to their features. We present a detailed taxonomy of the main multi-agent approaches proposed in the literature, focusing on their related mathematical models. For each algorithm, we describe the possible application fields, while pointing out its pros and cons. The described multi-agent algorithms are compared in terms of the most important characteristics for multi-agent reinforcement learning applications—namely, nonstationarity, scalability, and observability. We also describe the most common benchmark environments used to evaluate the performances of the considered methods.

Download Full-text

Robust experience replay sampling for multi-agent reinforcement learning

Parallelized Synchronous Multi-agent Deep Reinforcement Learning with Experience Replay Memory

SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Multi-Agent Deep Reinforcement Learning for Decentralized Cooperative Traffic Signal Control

Output feedback reinforcement learning based optimal output synchronisation of heterogeneous discrete-time multi-agent systems

Multi-agent deep reinforcement learning with type-based hierarchical group communication

Multi-Agent Deep Reinforcement Learning Based Cooperative Edge Caching for Ultra-Dense Next-Generation Networks

Coordinated Ramp Metering Control Based on Multi-Agent Reinforcement Learning

Multi-Agent Deep Reinforcement Learning for Vehicular Computation Offloading in IoT

Multi-Agent Reinforcement Learning: A Review of Challenges and Applications

Export Citation Format