Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns

Many real-world problems, such as robot control and soccer game, are naturally modeled as sparse-interaction multi-agent systems. Reutilizing single-agent knowledge in multi-agent systems with sparse interactions can greatly accelerate the multi-agent learning process. Previous works rely on bisimulation metric to define Markov decision process (MDP) similarity for controlling knowledge transfer. However, bisimulation metric is costly to compute and is not suitable for high-dimensional state space problems. In this work, we propose more scalable transfer learning methods based on a novel MDP similarity concept. We start by defining the MDP similarity based on the N-step return (NSR) values of an MDP. Then, we propose two knowledge transfer methods based on deep neural networks called direct value function transfer and NSR-based value function transfer. We conduct experiments in image-based grid world, multi-agent particle environment (MPE) and Ms. Pac-Man game. The results indicate that the proposed methods can significantly accelerate multi-agent reinforcement learning and meanwhile get better asymptotic performance.

Download Full-text

A Novel Heterogeneous Swarm Reinforcement Learning Method for Sequential Decision Making Problems

Machine Learning and Knowledge Extraction ◽

10.3390/make1020035 ◽

2019 ◽

Vol 1 (2) ◽

pp. 590-610

Author(s):

Zohreh Akbari ◽

Rainer Unland

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Single Agent ◽

Sequential Decision Making ◽

Multi Agent Systems ◽

Sequential Decision ◽

Agent Systems ◽

Novel Approach ◽

Markov Decision ◽

Multi Agent

Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be designed for single-agent systems or multi-agent systems that either consist of agents with individual goals and decision making capabilities, which are influenced by other agent’s decisions, or behave as a swarm of agents that collaboratively learn a single objective. Many studies have been conducted in this area; however, when concentrating on available swarm RL algorithms, one obtains a clear view of the areas that still require attention. Most of the studies in this area focus on homogeneous swarms and so far, systems introduced as Heterogeneous Swarms (HetSs) merely include very few, i.e., two or three sub-swarms of homogeneous agents, which either, according to their capabilities, deal with a specific sub-problem of the general problem or exhibit different behaviors in order to reduce the risk of bias. This study introduces a novel approach that allows agents, which are originally designed to solve different problems and hence have higher degrees of heterogeneity, to behave as a swarm when addressing identical sub-problems. In fact, the affinity between two agents, which measures the compatibility of agents to work together towards solving a specific sub-problem, is used in designing a Heterogeneous Swarm RL (HetSRL) algorithm that allows HetSs to solve the intended SDMPs.

Download Full-text

Output feedback reinforcement learning based optimal output synchronisation of heterogeneous discrete-time multi-agent systems

IET Control Theory and Applications ◽

10.1049/iet-cta.2018.6266 ◽

2019 ◽

Vol 13 (17) ◽

pp. 2866-2876

Author(s):

Syed Ali Asad Rizvi ◽

Zongli Lin

Keyword(s):

Reinforcement Learning ◽

Discrete Time ◽

Output Feedback ◽

Multi Agent Systems ◽

Agent Systems ◽

Optimal Output ◽

Multi Agent

Download Full-text

A novel optimal bipartite consensus control scheme for unknown multi-agent systems via model-free reinforcement learning

Applied Mathematics and Computation ◽

10.1016/j.amc.2019.124821 ◽

2020 ◽

Vol 369 ◽

pp. 124821 ◽

Cited By ~ 10

Author(s):

Zhinan Peng ◽

Jiangping Hu ◽

Kaibo Shi ◽

Rui Luo ◽

Rui Huang ◽

...

Keyword(s):

Reinforcement Learning ◽

Multi Agent Systems ◽

Consensus Control ◽

Agent Systems ◽

Model Free ◽

Control Scheme ◽

Multi Agent ◽

Bipartite Consensus

Download Full-text

Formation Control using Simplified Reinforcement Learning for Multi-agent systems with State Delay

10.23919/ccc52363.2021.9549357 ◽

2021 ◽

Author(s):

Wentai Shao ◽

Yutao Chen ◽

Jie Huang

Keyword(s):

Reinforcement Learning ◽

Formation Control ◽

Multi Agent Systems ◽

State Delay ◽

Agent Systems ◽

Multi Agent

Download Full-text

Optimized Backstepping Consensus Control Using Reinforcement Learning for a Class of Nonlinear Strict-Feedback-Dynamic Multi-Agent Systems

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2021.3105548 ◽

2021 ◽

pp. 1-13

Author(s):

Guoxing Wen ◽

C. L. Philip Chen

Keyword(s):

Reinforcement Learning ◽

Multi Agent Systems ◽

Consensus Control ◽

Agent Systems ◽

Multi Agent ◽

Strict Feedback

Download Full-text

Optimal robust formation control for heterogeneous multi‐agent systems based on reinforcement learning

International Journal of Robust and Nonlinear Control ◽

10.1002/rnc.5828 ◽

2021 ◽

Author(s):

Bing Yan ◽

Peng Shi ◽

Cheng‐Chew Lim ◽

Zhiyuan Shi

Keyword(s):

Reinforcement Learning ◽

Formation Control ◽

Multi Agent Systems ◽

Agent Systems ◽

Multi Agent

Download Full-text

Improvement on Supporting Machine Learning Algorithm for Solving Problem in Immediate Decision Making

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.566.572 ◽

2012 ◽

Vol 566 ◽

pp. 572-579

Author(s):

Abdolkarim Niazi ◽

Norizah Redzuan ◽

Raja Ishak Raja Hamzah ◽

Sara Esfandiari

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Multi Agent Systems ◽

Combined Model ◽

Q Learning ◽

Agent Systems ◽

Multi Agent ◽

Case Base ◽

Case Base Reasoning ◽

Robotic Tool

In this paper, a new algorithm based on case base reasoning and reinforcement learning (RL) is proposed to increase the convergence rate of the reinforcement learning algorithms. RL algorithms are very useful for solving wide variety decision problems when their models are not available and they must make decision correctly in every state of system, such as multi agent systems, artificial control systems, robotic, tool condition monitoring and etc. In the propose method, we investigate how making improved action selection in reinforcement learning (RL) algorithm. In the proposed method, the new combined model using case base reasoning systems and a new optimized function is proposed to select the action, which led to an increase in algorithms based on Q-learning. The algorithm mentioned was used for solving the problem of cooperative Markov’s games as one of the models of Markov based multi-agent systems. The results of experiments Indicated that the proposed algorithms perform better than the existing algorithms in terms of speed and accuracy of reaching the optimal policy.

Download Full-text