Evaluating Strategic Structures in Multi-Agent Inverse Reinforcement Learning

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12594 ◽

2021 ◽

Vol 71 ◽

pp. 925-951

Author(s):

Justin Fu ◽

Andrea Tacchetti ◽

Julien Perolat ◽

Yoram Bachrach

Keyword(s):

Reinforcement Learning ◽

Single Agent ◽

Utility Functions ◽

Decision Makers ◽

Multi Agent Systems ◽

Inverse Reinforcement Learning ◽

Agent Behavior ◽

Markov Games ◽

Multi Agent ◽

Reward Functions

A core question in multi-agent systems is understanding the motivations for an agent's actions based on their behavior. Inverse reinforcement learning provides a framework for extracting utility functions from observed agent behavior, casting the problem as finding domain parameters which induce such a behavior from rational decision makers. We show how to efficiently and scalably extend inverse reinforcement learning to multi-agent settings, by reducing the multi-agent problem to N single-agent problems while still satisfying rationality conditions such as strong rationality. However, we observe that rewards learned naively tend to lack insightful structure, which causes them to produce undesirable behavior when optimized in games with different players from those encountered during training. We further investigate conditions under which rewards or utility functions can be precisely identified, on problem domains such as normal-form and Markov games, as well as auctions, where we show we can learn reward functions that properly generalize to new settings.

Download Full-text

Multi-agent reinforcement learning using ordinal action selection and approximate policy iteration

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691316500533 ◽

2016 ◽

Vol 14 (06) ◽

pp. 1650053

Author(s):

Daxue Liu ◽

Jun Wu ◽

Xin Xu

Keyword(s):

Reinforcement Learning ◽

Single Agent ◽

Action Selection ◽

Policy Iteration ◽

Common Interest ◽

Policy Space ◽

Markov Games ◽

Approximate Policy Iteration ◽

Multi Agent ◽

Agent Coordination

Multi-agent reinforcement learning (MARL) provides a useful and flexible framework for multi-agent coordination in uncertain dynamic environments. However, the generalization ability and scalability of algorithms to large problem sizes, already problematic in single-agent RL, is an even more formidable obstacle in MARL applications. In this paper, a new MARL method based on ordinal action selection and approximate policy iteration called OAPI (Ordinal Approximate Policy Iteration), is presented to address the scalability issue of MARL algorithms in common-interest Markov Games. In OAPI, an ordinal action selection and learning strategy is integrated with distributed approximate policy iteration not only to simplify the policy space and eliminate the conflicts in multi-agent coordination, but also to realize the approximation of near-optimal policies for Markov Games with large state spaces. Based on the simplified policy space using ordinal action selection, the OAPI algorithm implements distributed approximate policy iteration utilizing online least-squares policy iteration (LSPI). This resulted in multi-agent coordination with good convergence properties with reduced computational complexity. The simulation results of a coordinated multi-robot navigation task illustrate the feasibility and effectiveness of the proposed approach.

Download Full-text

Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/65 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yong Liu ◽

Yujing Hu ◽

Yang Gao ◽

Yingfeng Chen ◽

Changjie Fan

Keyword(s):

Reinforcement Learning ◽

Knowledge Transfer ◽

Value Function ◽

Single Agent ◽

Multi Agent Systems ◽

Agent Systems ◽

Markov Decision ◽

Dimensional State Space ◽

Multi Agent ◽

Function Transfer

Many real-world problems, such as robot control and soccer game, are naturally modeled as sparse-interaction multi-agent systems. Reutilizing single-agent knowledge in multi-agent systems with sparse interactions can greatly accelerate the multi-agent learning process. Previous works rely on bisimulation metric to define Markov decision process (MDP) similarity for controlling knowledge transfer. However, bisimulation metric is costly to compute and is not suitable for high-dimensional state space problems. In this work, we propose more scalable transfer learning methods based on a novel MDP similarity concept. We start by defining the MDP similarity based on the N-step return (NSR) values of an MDP. Then, we propose two knowledge transfer methods based on deep neural networks called direct value function transfer and NSR-based value function transfer. We conduct experiments in image-based grid world, multi-agent particle environment (MPE) and Ms. Pac-Man game. The results indicate that the proposed methods can significantly accelerate multi-agent reinforcement learning and meanwhile get better asymptotic performance.

Download Full-text

A Novel Heterogeneous Swarm Reinforcement Learning Method for Sequential Decision Making Problems

Machine Learning and Knowledge Extraction ◽

10.3390/make1020035 ◽

2019 ◽

Vol 1 (2) ◽

pp. 590-610

Author(s):

Zohreh Akbari ◽

Rainer Unland

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Single Agent ◽

Sequential Decision Making ◽

Multi Agent Systems ◽

Sequential Decision ◽

Agent Systems ◽

Novel Approach ◽

Markov Decision ◽

Multi Agent

Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be designed for single-agent systems or multi-agent systems that either consist of agents with individual goals and decision making capabilities, which are influenced by other agent’s decisions, or behave as a swarm of agents that collaboratively learn a single objective. Many studies have been conducted in this area; however, when concentrating on available swarm RL algorithms, one obtains a clear view of the areas that still require attention. Most of the studies in this area focus on homogeneous swarms and so far, systems introduced as Heterogeneous Swarms (HetSs) merely include very few, i.e., two or three sub-swarms of homogeneous agents, which either, according to their capabilities, deal with a specific sub-problem of the general problem or exhibit different behaviors in order to reduce the risk of bias. This study introduces a novel approach that allows agents, which are originally designed to solve different problems and hence have higher degrees of heterogeneity, to behave as a swarm when addressing identical sub-problems. In fact, the affinity between two agents, which measures the compatibility of agents to work together towards solving a specific sub-problem, is used in designing a Heterogeneous Swarm RL (HetSRL) algorithm that allows HetSs to solve the intended SDMPs.

Download Full-text

Bi-Level Actor-Critic for Multi-Agent Coordination

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6226 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7325-7332

Author(s):

Haifeng Zhang ◽

Weizhe Chen ◽

Zeren Huang ◽

Minne Li ◽

Yaodong Yang ◽

...

Keyword(s):

Reinforcement Learning ◽

Nash Equilibrium ◽

Learning Algorithm ◽

Stackelberg Equilibrium ◽

Multi Agent Systems ◽

Matrix Games ◽

Markov Games ◽

The Arts ◽

Convergence Point ◽

Multi Agent

Coordination is one of the essential problems in multi-agent systems. Typically multi-agent reinforcement learning (MARL) methods treat agents equally and the goal is to solve the Markov game to an arbitrary Nash equilibrium (NE) when multiple equilibra exist, thus lacking a solution for NE selection. In this paper, we treat agents unequally and consider Stackelberg equilibrium as a potentially better convergence point than Nash equilibrium in terms of Pareto superiority, especially in cooperative environments. Under Markov games, we formally define the bi-level reinforcement learning problem in finding Stackelberg equilibrium. We propose a novel bi-level actor-critic learning method that allows agents to have different knowledge base (thus intelligent), while their actions still can be executed simultaneously and distributedly. The convergence proof is given, while the resulting learning algorithm is tested against the state of the arts. We found that the proposed bi-level actor-critic algorithm successfully converged to the Stackelberg equilibria in matrix games and find a asymmetric solution in a highway merge environment.

Download Full-text

Multi-agent deep reinforcement learning: a survey

Artificial Intelligence Review ◽

10.1007/s10462-021-09996-w ◽

2021 ◽

Author(s):

Sven Gronauer ◽

Klaus Diepold

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Single Agent ◽

Research Area ◽

Learning Gains ◽

Multiple Agents ◽

Agent Behavior ◽

Multi Agent ◽

Training Schemes ◽

Future Work

AbstractThe advances in reinforcement learning have recorded sublime success in various domains. Although the multi-agent domain has been overshadowed by its single-agent counterpart during this progress, multi-agent reinforcement learning gains rapid traction, and the latest accomplishments address problems with real-world complexity. This article provides an overview of the current developments in the field of multi-agent deep reinforcement learning. We focus primarily on literature from recent years that combines deep reinforcement learning methods with a multi-agent scenario. To survey the works that constitute the contemporary landscape, the main contents are divided into three parts. First, we analyze the structure of training schemes that are applied to train multiple agents. Second, we consider the emergent patterns of agent behavior in cooperative, competitive and mixed scenarios. Third, we systematically enumerate challenges that exclusively arise in the multi-agent domain and review methods that are leveraged to cope with these challenges. To conclude this survey, we discuss advances, identify trends, and outline possible directions for future work in this research area.

Download Full-text

Training Coordination Proxy Agents Using Reinforcement Learning

Handbook of Research on Agent-Based Societies ◽

10.4018/978-1-60566-236-7.ch011 ◽

2009 ◽

pp. 158-172 ◽

Cited By ~ 1

Author(s):

Myriam Abramson

Keyword(s):

Reinforcement Learning ◽

Single Agent ◽

Machine Learning Techniques ◽

Multi Agent Systems ◽

Adjustable Autonomy ◽

Novel Approach ◽

Learning Techniques ◽

Multi Agent ◽

Mixed Initiative ◽

Learning Team

In heterogeneous multi-agent systems, where human and non-human agents coexist, intelligent proxy agents can help smooth out fundamental differences. In this context, delegating the coordination role to proxy agents can improve the overall outcome of a task at the expense of human cognitive overload due to switching subtasks. Stability and commitment are characteristics of human teamwork, but must not prevent the detection of better opportunities. In addition, coordination proxy agents must be trained from examples as a single agent, but must interact with multiple agents. We apply machine learning techniques to the task of learning team preferences from mixed-initiative interactions and compare the outcome results of different simulated user patterns. This chapter introduces a novel approach for the adjustable autonomy of coordination proxies based on the reinforcement learning of abstract actions. In conclusion, some consequences of the symbiotic relationship that such an approach suggests are discussed.

Download Full-text

Output feedback reinforcement learning based optimal output synchronisation of heterogeneous discrete-time multi-agent systems

IET Control Theory and Applications ◽

10.1049/iet-cta.2018.6266 ◽

2019 ◽

Vol 13 (17) ◽

pp. 2866-2876

Author(s):

Syed Ali Asad Rizvi ◽

Zongli Lin

Keyword(s):

Reinforcement Learning ◽

Discrete Time ◽

Output Feedback ◽

Multi Agent Systems ◽

Agent Systems ◽

Optimal Output ◽

Multi Agent

Download Full-text

Multi-Agent Reinforcement Learning: A Review of Challenges and Applications

Applied Sciences ◽

10.3390/app11114948 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4948

Author(s):

Lorenzo Canese ◽

Gian Carlo Cardarilli ◽

Luca Di Di Nunzio ◽

Rocco Fazzolari ◽

Daniele Giardino ◽

...

Keyword(s):

Reinforcement Learning ◽

Mathematical Models ◽

Learning Algorithms ◽

Single Agent ◽

Critical Issues ◽

Multi Agent ◽

Pros And Cons ◽

Application Fields

In this review, we present an analysis of the most used multi-agent reinforcement learning algorithms. Starting with the single-agent reinforcement learning algorithms, we focus on the most critical issues that must be taken into account in their extension to multi-agent scenarios. The analyzed algorithms were grouped according to their features. We present a detailed taxonomy of the main multi-agent approaches proposed in the literature, focusing on their related mathematical models. For each algorithm, we describe the possible application fields, while pointing out its pros and cons. The described multi-agent algorithms are compared in terms of the most important characteristics for multi-agent reinforcement learning applications—namely, nonstationarity, scalability, and observability. We also describe the most common benchmark environments used to evaluate the performances of the considered methods.

Download Full-text

A novel optimal bipartite consensus control scheme for unknown multi-agent systems via model-free reinforcement learning

Applied Mathematics and Computation ◽

10.1016/j.amc.2019.124821 ◽

2020 ◽

Vol 369 ◽

pp. 124821 ◽

Cited By ~ 10

Author(s):

Zhinan Peng ◽

Jiangping Hu ◽

Kaibo Shi ◽

Rui Luo ◽

Rui Huang ◽

...

Keyword(s):

Reinforcement Learning ◽

Multi Agent Systems ◽

Consensus Control ◽

Agent Systems ◽

Model Free ◽

Control Scheme ◽

Multi Agent ◽

Bipartite Consensus

Download Full-text

On-Demand Channel Bonding in Heterogeneous WLANs: A Multi-Agent Deep Reinforcement Learning Approach

Sensors ◽

10.3390/s20102789 ◽

2020 ◽

Vol 20 (10) ◽

pp. 2789 ◽

Cited By ~ 1

Author(s):

Hang Qi ◽

Hao Huang ◽

Zhiqun Hu ◽

Xiangming Wen ◽

Zhaoming Lu

Keyword(s):

Reinforcement Learning ◽

Transmission Rate ◽

Single Agent ◽

Time Of Day ◽

Action Space ◽

Traffic Load ◽

Traffic Demand ◽

Channel Bonding ◽

On Demand ◽

Multi Agent

In order to meet the ever-increasing traffic demand of Wireless Local Area Networks (WLANs), channel bonding is introduced in IEEE 802.11 standards. Although channel bonding effectively increases the transmission rate, the wider channel reduces the number of non-overlapping channels and is more susceptible to interference. Meanwhile, the traffic load differs from one access point (AP) to another and changes significantly depending on the time of day. Therefore, the primary channel and channel bonding bandwidth should be carefully selected to meet traffic demand and guarantee the performance gain. In this paper, we proposed an On-Demand Channel Bonding (O-DCB) algorithm based on Deep Reinforcement Learning (DRL) for heterogeneous WLANs to reduce transmission delay, where the APs have different channel bonding capabilities. In this problem, the state space is continuous and the action space is discrete. However, the size of action space increases exponentially with the number of APs by using single-agent DRL, which severely affects the learning rate. To accelerate learning, Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is used to train O-DCB. Real traffic traces collected from a campus WLAN are used to train and test O-DCB. Simulation results reveal that the proposed algorithm has good convergence and lower delay than other algorithms.

Download Full-text