Bi-Level Actor-Critic for Multi-Agent Coordination

Haifeng Zhang; Weizhe Chen; Zeren Huang; Minne Li; Yaodong Yang; Weinan Zhang; Jun Wang

doi:10.1609/aaai.v34i05.6226

Bi-Level Actor-Critic for Multi-Agent Coordination

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6226 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7325-7332

Author(s):

Haifeng Zhang ◽

Weizhe Chen ◽

Zeren Huang ◽

Minne Li ◽

Yaodong Yang ◽

...

Keyword(s):

Reinforcement Learning ◽

Nash Equilibrium ◽

Learning Algorithm ◽

Stackelberg Equilibrium ◽

Multi Agent Systems ◽

Matrix Games ◽

Markov Games ◽

The Arts ◽

Convergence Point ◽

Multi Agent

Coordination is one of the essential problems in multi-agent systems. Typically multi-agent reinforcement learning (MARL) methods treat agents equally and the goal is to solve the Markov game to an arbitrary Nash equilibrium (NE) when multiple equilibra exist, thus lacking a solution for NE selection. In this paper, we treat agents unequally and consider Stackelberg equilibrium as a potentially better convergence point than Nash equilibrium in terms of Pareto superiority, especially in cooperative environments. Under Markov games, we formally define the bi-level reinforcement learning problem in finding Stackelberg equilibrium. We propose a novel bi-level actor-critic learning method that allows agents to have different knowledge base (thus intelligent), while their actions still can be executed simultaneously and distributedly. The convergence proof is given, while the resulting learning algorithm is tested against the state of the arts. We found that the proposed bi-level actor-critic algorithm successfully converged to the Stackelberg equilibria in matrix games and find a asymmetric solution in a highway merge environment.

Download Full-text

Improvement on Supporting Machine Learning Algorithm for Solving Problem in Immediate Decision Making

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.566.572 ◽

2012 ◽

Vol 566 ◽

pp. 572-579

Author(s):

Abdolkarim Niazi ◽

Norizah Redzuan ◽

Raja Ishak Raja Hamzah ◽

Sara Esfandiari

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Multi Agent Systems ◽

Combined Model ◽

Q Learning ◽

Agent Systems ◽

Multi Agent ◽

Case Base ◽

Case Base Reasoning ◽

Robotic Tool

In this paper, a new algorithm based on case base reasoning and reinforcement learning (RL) is proposed to increase the convergence rate of the reinforcement learning algorithms. RL algorithms are very useful for solving wide variety decision problems when their models are not available and they must make decision correctly in every state of system, such as multi agent systems, artificial control systems, robotic, tool condition monitoring and etc. In the propose method, we investigate how making improved action selection in reinforcement learning (RL) algorithm. In the proposed method, the new combined model using case base reasoning systems and a new optimized function is proposed to select the action, which led to an increase in algorithms based on Q-learning. The algorithm mentioned was used for solving the problem of cooperative Markov’s games as one of the models of Markov based multi-agent systems. The results of experiments Indicated that the proposed algorithms perform better than the existing algorithms in terms of speed and accuracy of reaching the optimal policy.

Download Full-text

An Enhanced Model-Free Reinforcement Learning Algorithm to Solve Nash Equilibrium for Multi-Agent Cooperative Game Systems

IEEE Access ◽

10.1109/access.2020.3043806 ◽

2020 ◽

Vol 8 ◽

pp. 223743-223755

Author(s):

Yuannan Jiang ◽

Fuxiao Tan

Keyword(s):

Reinforcement Learning ◽

Nash Equilibrium ◽

Cooperative Game ◽

Learning Algorithm ◽

Model Free ◽

Multi Agent ◽

Reinforcement Learning Algorithm

Download Full-text

Multiagent reinforcement learning using Non-Parametric Approximation

Respuestas ◽

10.22463/0122820x.1738 ◽

2018 ◽

Vol 23 (2) ◽

pp. 53-61

Author(s):

David Luviano Cruz ◽

Francesco José García Luna ◽

Luis Asunción Pérez Domínguez

Keyword(s):

Reinforcement Learning ◽

Hybrid Control ◽

Learning Algorithm ◽

Multi Agent Systems ◽

Generation Task ◽

Q Learning ◽

Agent Systems ◽

Multi Agent ◽

Optimal Set ◽

Parametric Approximation

This paper presents a hybrid control proposal for multi-agent systems, where the advantages of the reinforcement learning and nonparametric functions are exploited. A modified version of the Q-learning algorithm is used which will provide data training for a Kernel, this approach will provide a sub optimal set of actions to be used by the agents. The proposed algorithm is experimentally tested in a path generation task in an unknown environment for mobile robots.

Download Full-text

A reinforcement learning algorithm for obtaining the Nash equilibrium of multi-player matrix games

IIE Transactions ◽

10.1080/07408170802369417 ◽

2009 ◽

Vol 41 (2) ◽

pp. 158-167 ◽

Cited By ~ 13

Author(s):

Vishnu Nanduri ◽

Tapas K. Das

Keyword(s):

Reinforcement Learning ◽

Nash Equilibrium ◽

Learning Algorithm ◽

Matrix Games ◽

Reinforcement Learning Algorithm

Download Full-text

Approximating Stackelberg Equilibrium in Anti-UAV Jamming Markov Game with Hierarchical Multi-Agent Deep Reinforcement Learning Algorithm

10.21203/rs.3.rs-1156014/v1 ◽

2021 ◽

Author(s):

Zikai Feng ◽

Yuanyuan Wu ◽

Mengxing Huang ◽

Di Wu

Keyword(s):

Reinforcement Learning ◽

Stackelberg Game ◽

Learning Algorithm ◽

Computational Cost ◽

Stackelberg Equilibrium ◽

Large State Space ◽

Comparable Performance ◽

Aerial Vehicle ◽

Multi Agent ◽

Policy Optimization

Abstract In order to avoid the malicious jamming of the intelligent unmanned aerial vehicle (UAV) to ground users in the downlink communications, a new anti-UAV jamming strategy based on multi-agent deep reinforcement learning is studied in this paper. In this method, ground users aim to learn the best mobile strategies to avoid the jamming of UAV. The problem is modeled as a Stackelberg game to describe the competitive interaction between the UAV jammer (leader) and ground users (followers). To reduce the computational cost of equilibrium solution for the complex game with large state space, a hierarchical multi-agent proximal policy optimization (HMAPPO) algorithm is proposed to decouple the hybrid game into several sub-Markov games, which updates the actor and critic network of the UAV jammer and ground users at different time scales. Simulation results suggest that the hierarchical multi-agent proximal policy optimization -based anti-jamming strategy achieves comparable performance with lower time complexity than the benchmark strategies. The well-trained HMAPPO has the ability to obtain the optimal jamming strategy and the optimal anti-jamming strategies, which can approximate the Stackelberg equilibrium (SE).

Download Full-text

Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

The Knowledge Engineering Review ◽

10.1017/s0269888912000057 ◽

2012 ◽

Vol 27 (1) ◽

pp. 1-31 ◽

Cited By ~ 63

Author(s):

Laetitia Matignon ◽

Guillaume J. Laurent ◽

Nadine Le Fort-Piat

Keyword(s):

Learning Algorithms ◽

Hill Climbing ◽

Multi Agent Systems ◽

Matrix Games ◽

Q Learning ◽

Markov Games ◽

New Learning ◽

Multi Agent ◽

Frequency Maximum ◽

Selection Of

AbstractIn the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms’ strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.

Download Full-text

Evaluating Strategic Structures in Multi-Agent Inverse Reinforcement Learning

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12594 ◽

2021 ◽

Vol 71 ◽

pp. 925-951

Author(s):

Justin Fu ◽

Andrea Tacchetti ◽

Julien Perolat ◽

Yoram Bachrach

Keyword(s):

Reinforcement Learning ◽

Single Agent ◽

Utility Functions ◽

Decision Makers ◽

Multi Agent Systems ◽

Inverse Reinforcement Learning ◽

Agent Behavior ◽

Markov Games ◽

Multi Agent ◽

Reward Functions

A core question in multi-agent systems is understanding the motivations for an agent's actions based on their behavior. Inverse reinforcement learning provides a framework for extracting utility functions from observed agent behavior, casting the problem as finding domain parameters which induce such a behavior from rational decision makers. We show how to efficiently and scalably extend inverse reinforcement learning to multi-agent settings, by reducing the multi-agent problem to N single-agent problems while still satisfying rationality conditions such as strong rationality. However, we observe that rewards learned naively tend to lack insightful structure, which causes them to produce undesirable behavior when optimized in games with different players from those encountered during training. We further investigate conditions under which rewards or utility functions can be precisely identified, on problem domains such as normal-form and Markov games, as well as auctions, where we show we can learn reward functions that properly generalize to new settings.

Download Full-text

Output feedback reinforcement learning based optimal output synchronisation of heterogeneous discrete-time multi-agent systems

IET Control Theory and Applications ◽

10.1049/iet-cta.2018.6266 ◽

2019 ◽

Vol 13 (17) ◽

pp. 2866-2876

Author(s):

Syed Ali Asad Rizvi ◽

Zongli Lin

Keyword(s):

Reinforcement Learning ◽

Discrete Time ◽

Output Feedback ◽

Multi Agent Systems ◽

Agent Systems ◽

Optimal Output ◽

Multi Agent

Download Full-text

Expressiveness and Nash Equilibrium in Iterated Boolean Games

ACM Transactions on Computational Logic ◽

10.1145/3439900 ◽

2021 ◽

Vol 22 (2) ◽

pp. 1-38

Author(s):

Julian Gutierrez ◽

Paul Harrenstein ◽

Giuseppe Perelli ◽

Michael Wooldridge

Keyword(s):

Nash Equilibrium ◽

Nash Equilibria ◽

Infinite Sequence ◽

Multi Agent Systems ◽

Temporal Logics ◽

Agent Systems ◽

Temporal Properties ◽

Multi Agent ◽

Game Theoretic ◽

Boolean Games

We define and investigate a novel notion of expressiveness for temporal logics that is based on game theoretic equilibria of multi-agent systems. We use iterated Boolean games as our abstract model of multi-agent systems [Gutierrez et al. 2013, 2015a]. In such a game, each agent has a goal , represented using (a fragment of) Linear Temporal Logic ( ) . The goal captures agent ’s preferences, in the sense that the models of represent system behaviours that would satisfy . Each player controls a subset of Boolean variables , and at each round in the game, player is at liberty to choose values for variables in any way that she sees fit. Play continues for an infinite sequence of rounds, and so as players act they collectively trace out a model for , which for every player will either satisfy or fail to satisfy their goal. Players are assumed to act strategically, taking into account the goals of other players, in an attempt to bring about computations satisfying their goal. In this setting, we apply the standard game-theoretic concept of (pure) Nash equilibria. The (possibly empty) set of Nash equilibria of an iterated Boolean game can be understood as inducing a set of computations, each computation representing one way the system could evolve if players chose strategies that together constitute a Nash equilibrium. Such a set of equilibrium computations expresses a temporal property—which may or may not be expressible within a particular fragment. The new notion of expressiveness that we formally define and investigate is then as follows: What temporal properties are characterised by the Nash equilibria of games in which agent goals are expressed in specific fragments of ? We formally define and investigate this notion of expressiveness for a range of fragments. For example, a very natural question is the following: Suppose we have an iterated Boolean game in which every goal is represented using a particular fragment of : is it then always the case that the equilibria of the game can be characterised within ? We show that this is not true in general.

Download Full-text

Distributed Learning Applications in Power Systems: A Review of Methods, Gaps, and Challenges

Energies ◽

10.3390/en14123654 ◽

2021 ◽

Vol 14 (12) ◽

pp. 3654

Author(s):

Nastaran Gholizadeh ◽

Petr Musilek

Keyword(s):

Machine Learning ◽

Power Systems ◽

Learning Algorithm ◽

Single Point ◽

Distributed Learning ◽

Large Data ◽

Multi Agent Systems ◽

Power Quality Monitoring ◽

Multi Agent ◽

Learning Frameworks

In recent years, machine learning methods have found numerous applications in power systems for load forecasting, voltage control, power quality monitoring, anomaly detection, etc. Distributed learning is a subfield of machine learning and a descendant of the multi-agent systems field. Distributed learning is a collaboratively decentralized machine learning algorithm designed to handle large data sizes, solve complex learning problems, and increase privacy. Moreover, it can reduce the risk of a single point of failure compared to fully centralized approaches and lower the bandwidth and central storage requirements. This paper introduces three existing distributed learning frameworks and reviews the applications that have been proposed for them in power systems so far. It summarizes the methods, benefits, and challenges of distributed learning frameworks in power systems and identifies the gaps in the literature for future studies.

Download Full-text