scholarly journals Multi-agent Inverse Reinforcement Learning for Certain General-sum Stochastic Games

2019 ◽  
Vol 66 ◽  
pp. 473-502 ◽  
Author(s):  
Xiaomin Lin ◽  
Stephen C. Adams ◽  
Peter A. Beling

This paper addresses the problem of multi-agent inverse reinforcement learning (MIRL) in a two-player general-sum stochastic game framework. Five variants of MIRL are considered: uCS-MIRL, advE-MIRL, cooE-MIRL, uCE-MIRL, and uNE-MIRL, each distinguished by its solution concept. Problem uCS-MIRL is a cooperative game in which the agents employ cooperative strategies that aim to maximize the total game value. In problem uCE-MIRL, agents are assumed to follow strategies that constitute a correlated equilibrium while maximizing total game value. Problem uNE-MIRL is similar to uCE-MIRL in total game value maximization, but it is assumed that the agents are playing a Nash equilibrium. Problems advE-MIRL and cooE-MIRL assume agents are playing an adversarial equilibrium and a coordination equilibrium, respectively. We propose novel approaches to address these five problems under the assumption that the game observer either knows or is able to accurately estimate the policies and solution concepts for players. For uCS-MIRL, we first develop a characteristic set of solutions ensuring that the observed bi-policy is a uCS and then apply a Bayesian inverse learning method. For uCE-MIRL, we develop a linear programming problem subject to constraints that define necessary and sufficient conditions for the observed policies to be correlated equilibria. The objective is to choose a solution that not only minimizes the total game value difference between the observed bi-policy and a local uCS, but also maximizes the scale of the solution. We apply a similar treatment to the problem of uNE-MIRL. The remaining two problems can be solved efficiently by taking advantage of solution uniqueness and setting up a convex optimization problem. Results are validated on various benchmark grid-world games.

2008 ◽  
Vol 33 ◽  
pp. 575-613 ◽  
Author(s):  
I. Ashlagi ◽  
D. Monderer ◽  
M. Tennenholtz

Correlated equilibrium generalizes Nash equilibrium to allow correlation devices. Correlated equilibrium captures the idea that in many systems there exists a trusted administrator who can recommend behavior to a set of agents, but can not enforce such behavior. This makes this solution concept most appropriate to the study of multi-agent systems in AI. Aumann showed an example of a game, and of a correlated equilibrium in this game in which the agents' welfare (expected sum of players' utilities) is greater than their welfare in all mixed-strategy equilibria. Following the idea initiated by the price of anarchy literature this suggests the study of two major measures for the value of correlation in a game with nonnegative payoffs: 1. The ratio between the maximal welfare obtained in a correlated equilibrium to the maximal welfare obtained in a mixed-strategy equilibrium. We refer to this ratio as the mediation value. 2. The ratio between the maximal welfare to the maximal welfare obtained in a correlated equilibrium. We refer to this ratio as the enforcement value. In this work we initiate the study of the mediation and enforcement values, providing several general results on the value of correlation as captured by these concepts. We also present a set of results for the more specialized case of congestion games, a class of games that received a lot of attention in the recent literature.


Author(s):  
Sriraam Natarajan ◽  
Gautam Kunapuli ◽  
Kshitij Judah ◽  
Prasad Tadepalli ◽  
Kristian Kersting ◽  
...  

2021 ◽  
Vol 71 ◽  
pp. 925-951
Author(s):  
Justin Fu ◽  
Andrea Tacchetti ◽  
Julien Perolat ◽  
Yoram Bachrach

A core question in multi-agent systems is understanding the motivations for an agent's actions based on their behavior. Inverse reinforcement learning provides a framework for extracting utility functions from observed agent behavior, casting the problem as finding domain parameters which induce such a behavior from rational decision makers.  We show how to efficiently and scalably extend inverse reinforcement learning to multi-agent settings, by reducing the multi-agent problem to N single-agent problems while still satisfying rationality conditions such as strong rationality. However, we observe that rewards learned naively tend to lack insightful structure, which causes them to produce undesirable behavior when optimized in games with different players from those encountered during training. We further investigate conditions under which rewards or utility functions can be precisely identified, on problem domains such as normal-form and Markov games, as well as auctions, where we show we can learn reward functions that properly generalize to new settings.


2017 ◽  
Vol 137 (4) ◽  
pp. 667-673
Author(s):  
Shinji Tomita ◽  
Fumiya Hamatsu ◽  
Tomoki Hamagami

Author(s):  
Ritesh Noothigattu ◽  
Djallel Bouneffouf ◽  
Nicholas Mattei ◽  
Rachita Chandra ◽  
Piyush Madan ◽  
...  

Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using Pac-Man and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.


Sign in / Sign up

Export Citation Format

Share Document