CFR-MIX: Solving Imperfect Information Extensive-Form Games with Combinatorial Action Space

In many real-world scenarios, a team of agents must coordinate with each other to compete against an opponent. The challenge of solving this type of game is that the team's joint action space grows exponentially with the number of agents, which results in the inefficiency of the existing algorithms, e.g., Counterfactual Regret Minimization (CFR). To address this problem, we propose a new framework of CFR: CFR-MIX. Firstly, we propose a new strategy representation that represents a joint action strategy using individual strategies of all agents and a consistency relationship to maintain the cooperation between agents. To compute the equilibrium with individual strategies under the CFR framework, we transform the consistency relationship between strategies to the consistency relationship between the cumulative regret values. Furthermore, we propose a novel decomposition method over cumulative regret values to guarantee the consistency relationship between the cumulative regret values. Finally, we introduce our new algorithm CFR-MIX which employs a mixing layer to estimate cumulative regret values of joint actions as a non-linear combination of cumulative regret values of individual actions. Experimental results show that CFR-MIX outperforms existing algorithms on various games significantly.

Download Full-text

Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/66 ◽

2019 ◽

Author(s):

Edward Lockhart ◽

Marc Lanctot ◽

Julien Pérolat ◽

Jean-Baptiste Lespiau ◽

Dustin Morrill ◽

...

Keyword(s):

Function Approximation ◽

Imperfect Information ◽

Convergence Rates ◽

Convergence Result ◽

Extensive Form ◽

Worst Case ◽

Regret Minimization ◽

Imperfect Information Games ◽

Policy Optimization ◽

Zero Sum

In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents. We prove that when following this optimization, the exploitability of a player's strategy converges asymptotically to zero, and hence when both players employ this optimization, the joint policies converge to a Nash equilibrium. Unlike fictitious play (XFP) and counterfactual regret minimization (CFR), our convergence result pertains to the policies being optimized rather than the average policies. Our experiments demonstrate convergence rates comparable to XFP and CFR in four benchmark games in the tabular case. Using function approximation, we find that our algorithm outperforms the tabular version in two of the games, which, to the best of our knowledge, is the first such result in imperfect information games among this class of algorithms.

Download Full-text

Parallel Counterfactual Regret Minimization in Crowdsourcing Imperfect-information Expanded Game

10.1109/ispa-bdcloud-socialcom-sustaincom52081.2021.00195 ◽

2021 ◽

Author(s):

Jie Zhang ◽

Kefan Li ◽

Baoming Zhang ◽

Ming Xu ◽

Chongjun Wang

Keyword(s):

Imperfect Information ◽

Regret Minimization

Download Full-text

Solving Large Extensive-Form Games with Strategy Constraints

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33011861 ◽

2019 ◽

Vol 33 ◽

pp. 1861-1868

Author(s):

Trevor Davis ◽

Kevin Waugh ◽

Michael Bowling

Keyword(s):

Private Information ◽

Imperfect Information ◽

Risk Mitigation ◽

Solution Concept ◽

Optimal Strategies ◽

Linear Constraints ◽

Convex Constraints ◽

Extensive Form ◽

Extensive Form Games ◽

Large Extensive Form Games

Extensive-form games are a common model for multiagent interactions with imperfect information. In two-player zerosum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. In many situations, however, we would like to constrain the set of possible strategies. For example, constraints are a natural way to model limited resources, risk mitigation, safety, consistency with past observations of behavior, or other secondary objectives for an agent. In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints. In this work we introduce a generalized form of Counterfactual Regret Minimization that provably finds optimal strategies under any feasible set of convex constraints. We demonstrate the effectiveness of our algorithm for finding strategies that mitigate risk in security games, and for opponent modeling in poker games when given only partial observations of private information.

Download Full-text

An Exact Double-Oracle Algorithm for Zero-Sum Extensive-Form Games with Imperfect Information

Journal of Artificial Intelligence Research ◽

10.1613/jair.4477 ◽

2014 ◽

Vol 51 ◽

pp. 829-866 ◽

Cited By ~ 14

Author(s):

B. Bosansky ◽

C. Kiekintveld ◽

V. Lisy ◽

M. Pechoucek

Keyword(s):

Nash Equilibrium ◽

Imperfect Information ◽

Search Algorithm ◽

Main Idea ◽

Substantial Improvement ◽

Extensive Form ◽

Extensive Form Games ◽

Solution Algorithms ◽

Restricted Game ◽

Zero Sum

Developing scalable solution algorithms is one of the central problems in computational game theory. We present an iterative algorithm for computing an exact Nash equilibrium for two-player zero-sum extensive-form games with imperfect information. Our approach combines two key elements: (1) the compact sequence-form representation of extensive-form games and (2) the algorithmic framework of double-oracle methods. The main idea of our algorithm is to restrict the game by allowing the players to play only selected sequences of available actions. After solving the restricted game, new sequences are added by finding best responses to the current solution using fast algorithms. We experimentally evaluate our algorithm on a set of games inspired by patrolling scenarios, board, and card games. The results show significant runtime improvements in games admitting an equilibrium with small support, and substantial improvement in memory use even on games with large support. The improvement in memory use is particularly important because it allows our algorithm to solve much larger game instances than existing linear programming methods. Our main contributions include (1) a generic sequence-form double-oracle algorithm for solving zero-sum extensive-form games; (2) fast methods for maintaining a valid restricted game model when adding new sequences; (3) a search algorithm and pruning methods for computing best-response sequences; (4) theoretical guarantees about the convergence of the algorithm to a Nash equilibrium; (5) experimental analysis of our algorithm on several games, including an approximate version of the algorithm.

Download Full-text

Smoothing Method for Approximate Extensive-Form Perfect Equilibrium

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/42 ◽

2017 ◽

Cited By ~ 2

Author(s):

Christian Kroer ◽

Gabriele Farina ◽

Tuomas Sandholm

Keyword(s):

Nash Equilibrium ◽

Imperfect Information ◽

Large Scale ◽

Nash Equilibria ◽

Convex Polytope ◽

Solution Concept ◽

Extensive Form ◽

Game Tree ◽

Equilibrium Refinements ◽

Perfect Equilibria

Nash equilibrium is a popular solution concept for solving imperfect-information games in practice. However, it has a major drawback: it does not preclude suboptimal play in branches of the game tree that are not reached in equilibrium. Equilibrium refinements can mend this issue, but have experienced little practical adoption. This is largely due to a lack of scalable algorithms.Sparse iterative methods, in particular first-order methods, are known to be among the most effective algorithms for computing Nash equilibria in large-scale two-player zero-sum extensive-form games. In this paper, we provide, to our knowledge, the first extension of these methods to equilibrium refinements. We develop a smoothing approach for behavioral perturbations of the convex polytope that encompasses the strategy spaces of players in an extensive-form game. This enables one to compute an approximate variant of extensive-form perfect equilibria. Experiments show that our smoothing approach leads to solutions with dramatically stronger strategies at information sets that are reached with low probability in approximate Nash equilibria, while retaining the overall convergence rate associated with fast algorithms for Nash equilibrium. This has benefits both in approximate equilibrium finding (such approximation is necessary in practice in large games) where some probabilities are low while possibly heading toward zero in the limit, and exact equilibrium computation where the low probabilities are actually zero.

Download Full-text

Equilibrium in behavior strategies in infinite extensive form games with imperfect information

Economic Theory ◽

10.1007/bf01212472 ◽

1992 ◽

Vol 2 (4) ◽

pp. 481-494

Author(s):

Subir K. Chakrabarti

Keyword(s):

Imperfect Information ◽

Extensive Form ◽

Extensive Form Games ◽

Behavior Strategies

Download Full-text

Double-Oracle Sampling Method for Stackelberg Equilibrium Approximation in General-Sum Extensive-Form Games

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i02.5578 ◽

2020 ◽

Vol 34 (02) ◽

pp. 2054-2061 ◽

Cited By ~ 1

Author(s):

Jan Karwowski ◽

Jecek Mańdziuk

Keyword(s):

Imperfect Information ◽

Sampling Method ◽

Stackelberg Equilibrium ◽

Test Cases ◽

Extensive Form ◽

Strategy Space ◽

Monte Carlo Tree Search ◽

Sequential Games ◽

Simulation Based ◽

Time Requirements

The paper presents a new method for approximating Strong Stackelberg Equilibrium in general-sum sequential games with imperfect information and perfect recall. The proposed approach is generic as it does not rely on any specific properties of a particular game model. The method is based on iterative interleaving of the two following phases: (1) guided Monte Carlo Tree Search sampling of the Follower's strategy space and (2) building the Leader's behavior strategy tree for which the sampled Follower's strategy is an optimal response. The above solution scheme is evaluated with respect to expected Leader's utility and time requirements on three sets of interception games with variable characteristics, played on graphs. A comparison with three state-of-the-art MILP/LP-based methods shows that in vast majority of test cases proposed simulation-based approach leads to optimal Leader's strategies, while excelling the competitive methods in terms of better time scalability and lower memory requirements.

Download Full-text

A Game-Theoretic Account of Responsibility Allocation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/244 ◽

2021 ◽

Author(s):

Christel Baier ◽

Florian Funke ◽

Rupak Majumdar

Keyword(s):

Imperfect Information ◽

Fundamental Problem ◽

Multi Agent Systems ◽

Extensive Form ◽

Responsibility Attribution ◽

Agent Interaction ◽

Responsibility Ascription ◽

Multi Agent ◽

Game Theoretic ◽

Extensive Form Game

When designing or analyzing multi-agent systems, a fundamental problem is responsibility ascription: to specify which agents are responsible for the joint outcome of their behaviors and to which extent. We model strategic multi-agent interaction as an extensive form game of imperfect information and define notions of forward (prospective) and backward (retrospective) responsibility. Forward responsibility identifies the responsibility of a group of agents for an outcome along all possible plays, whereas backward responsibility identifies the responsibility along a given play. We further distinguish between strategic and causal backward responsibility, where the former captures the epistemic knowledge of players along a play, while the latter formalizes which players – possibly unknowingly – caused the outcome. A formal connection between forward and backward notions is established in the case of perfect recall. We further ascribe quantitative responsibility through cooperative game theory. We show through a number of examples that our approach encompasses several prior formal accounts of responsibility attribution.

Download Full-text

Learning a humanoid robot interface by embedding a low-dimensional command manifold into a high-dimensional joint action space

2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids) ◽

10.1109/humanoids.2013.7030022 ◽

2013 ◽

Cited By ~ 2

Author(s):

Yuka Ariki ◽

Tetsunari Inamura ◽

Jun Morimoto

Keyword(s):

Joint Action ◽

Humanoid Robot ◽

Action Space ◽

High Dimensional ◽

Low Dimensional

Download Full-text

Automated Construction of Bounded-Loss Imperfect-Recall Abstractions in Extensive-Form Games (Extended Abstract)

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/701 ◽

2020 ◽

Author(s):

Jiří Čermák ◽

Viliam Lisý ◽

Branislav Bošanský

Keyword(s):

Extensive Form ◽

Original Game ◽

Extensive Form Games ◽

Regret Minimization ◽

Imperfect Recall ◽

Domain Specific ◽

Approximate Nash Equilibrium ◽

Information Sets ◽

History Of ◽

Large Extensive Form Games

Information abstraction is one of the methods for tackling large extensive-form games (EFGs). Removing some information available to players reduces the memory required for computing and storing strategies. We present novel domain-independent abstraction methods for creating very coarse abstractions of EFGs that still compute strategies that are (near) optimal in the original game. First, the methods start with an arbitrary abstraction of the original game (domain-specific or the coarsest possible). Next, they iteratively detect which information is required in the abstract game so that a (near) optimal strategy in the original game can be found and include this information into the abstract game. Moreover, the methods are able to exploit imperfect-recall abstractions where players can even forget the history of their own actions. We present two algorithms that follow these steps -- FPIRA, based on fictitious play, and CFR+IRA, based on counterfactual regret minimization. The experimental evaluation confirms that our methods can closely approximate Nash equilibrium of large games using abstraction with only 0.9% of information sets of the original game.

Download Full-text