scholarly journals Smoothing Method for Approximate Extensive-Form Perfect Equilibrium

Author(s):  
Christian Kroer ◽  
Gabriele Farina ◽  
Tuomas Sandholm

Nash equilibrium is a popular solution concept for solving imperfect-information games in practice. However, it has a major drawback: it does not preclude suboptimal play in branches of the game tree that are not reached in equilibrium. Equilibrium refinements can mend this issue, but have experienced little practical adoption. This is largely due to a lack of scalable algorithms.Sparse iterative methods, in particular first-order methods, are known to be among the most effective algorithms for computing Nash equilibria in large-scale two-player zero-sum extensive-form games. In this paper, we provide, to our knowledge, the first extension of these methods to equilibrium refinements. We develop a smoothing approach for behavioral perturbations of the convex polytope that encompasses the strategy spaces of players in an extensive-form game. This enables one to compute an approximate variant of extensive-form perfect equilibria. Experiments show that our smoothing approach leads to solutions with dramatically stronger strategies at information sets that are reached with low probability in approximate Nash equilibria, while retaining the overall convergence rate associated with fast algorithms for Nash equilibrium. This has benefits both in approximate equilibrium finding (such approximation is necessary in practice in large games) where some probabilities are low while possibly heading toward zero in the limit, and exact equilibrium computation where the low probabilities are actually zero.

Author(s):  
Trevor Davis ◽  
Kevin Waugh ◽  
Michael Bowling

Extensive-form games are a common model for multiagent interactions with imperfect information. In two-player zerosum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. In many situations, however, we would like to constrain the set of possible strategies. For example, constraints are a natural way to model limited resources, risk mitigation, safety, consistency with past observations of behavior, or other secondary objectives for an agent. In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints. In this work we introduce a generalized form of Counterfactual Regret Minimization that provably finds optimal strategies under any feasible set of convex constraints. We demonstrate the effectiveness of our algorithm for finding strategies that mitigate risk in security games, and for opponent modeling in poker games when given only partial observations of private information.


2014 ◽  
Vol 51 ◽  
pp. 829-866 ◽  
Author(s):  
B. Bosansky ◽  
C. Kiekintveld ◽  
V. Lisy ◽  
M. Pechoucek

Developing scalable solution algorithms is one of the central problems in computational game theory. We present an iterative algorithm for computing an exact Nash equilibrium for two-player zero-sum extensive-form games with imperfect information. Our approach combines two key elements: (1) the compact sequence-form representation of extensive-form games and (2) the algorithmic framework of double-oracle methods. The main idea of our algorithm is to restrict the game by allowing the players to play only selected sequences of available actions. After solving the restricted game, new sequences are added by finding best responses to the current solution using fast algorithms. We experimentally evaluate our algorithm on a set of games inspired by patrolling scenarios, board, and card games. The results show significant runtime improvements in games admitting an equilibrium with small support, and substantial improvement in memory use even on games with large support. The improvement in memory use is particularly important because it allows our algorithm to solve much larger game instances than existing linear programming methods. Our main contributions include (1) a generic sequence-form double-oracle algorithm for solving zero-sum extensive-form games; (2) fast methods for maintaining a valid restricted game model when adding new sequences; (3) a search algorithm and pruning methods for computing best-response sequences; (4) theoretical guarantees about the convergence of the algorithm to a Nash equilibrium; (5) experimental analysis of our algorithm on several games, including an approximate version of the algorithm.


Author(s):  
Alfredo Garro

Game Theory (Von Neumann & Morgenstern, 1944) is a branch of applied mathematics and economics that studies situations (games) where self-interested interacting players act for maximizing their returns; therefore, the return of each player depends on his behaviour and on the behaviours of the other players. Game Theory, which plays an important role in the social and political sciences, has recently drawn attention in new academic fields which go from algorithmic mechanism design to cybernetics. However, a fundamental problem to solve for effectively applying Game Theory in real word applications is the definition of well-founded solution concepts of a game and the design of efficient algorithms for their computation. A widely accepted solution concept of a game in which any cooperation among the players must be selfenforcing (non-cooperative game) is represented by the Nash Equilibrium. In particular, a Nash Equilibrium is a set of strategies, one for each player of the game, such that no player can benefit by changing his strategy unilaterally, i.e. while the other players keep their strategies unchanged (Nash, 1951). The problem of computing Nash Equilibria in non-cooperative games is considered one of the most important open problem in Complexity Theory (Papadimitriou, 2001). Daskalakis, Goldbergy, and Papadimitriou (2005), showed that the problem of computing a Nash equilibrium in a game with four or more players is complete for the complexity class PPAD-Polynomial Parity Argument Directed version (Papadimitriou, 1991), moreover, Chen and Deng extended this result for 2-player games (Chen & Deng, 2005). However, even in the two players case, the best algorithm known has an exponential worst-case running time (Savani & von Stengel, 2004); furthermore, if the computation of equilibria with simple additional properties is required, the problem immediately becomes NP-hard (Bonifaci, Di Iorio, & Laura, 2005) (Conitzer & Sandholm, 2003) (Gilboa & Zemel, 1989) (Gottlob, Greco, & Scarcello, 2003). Motivated by these results, recent studies have dealt with the problem of efficiently computing Nash Equilibria by exploiting approaches based on the concepts of learning and evolution (Fudenberg & Levine, 1998) (Maynard Smith, 1982). In these approaches the Nash Equilibria of a game are not statically computed but are the result of the evolution of a system composed by agents playing the game. In particular, each agent after different rounds will learn to play a strategy that, under the hypothesis of agent’s rationality, will be one of the Nash equilibria of the game (Benaim & Hirsch, 1999) (Carmel & Markovitch, 1996). This article presents SALENE, a Multi-Agent System (MAS) for learning Nash Equilibria in noncooperative games, which is based on the above mentioned concepts.


Entropy ◽  
2018 ◽  
Vol 20 (10) ◽  
pp. 782 ◽  
Author(s):  
Christos Papadimitriou ◽  
Georgios Piliouras

In 1950, Nash proposed a natural equilibrium solution concept for games hence called Nash equilibrium, and proved that all finite games have at least one. The proof is through a simple yet ingenious application of Brouwer’s (or, in another version Kakutani’s) fixed point theorem, the most sophisticated result in his era’s topology—in fact, recent algorithmic work has established that Nash equilibria are computationally equivalent to fixed points. In this paper, we propose a new class of universal non-equilibrium solution concepts arising from an important theorem in the topology of dynamical systems that was unavailable to Nash. This approach starts with both a game and a learning dynamics, defined over mixed strategies. The Nash equilibria are fixpoints of the dynamics, but the system behavior is captured by an object far more general than the Nash equilibrium that is known in dynamical systems theory as chain recurrent set. Informally, once we focus on this solution concept—this notion of “the outcome of the game”—every game behaves like a potential game with the dynamics converging to these states. In other words, unlike Nash equilibria, this solution concept is algorithmic in the sense that it has a constructive proof of existence. We characterize this solution for simple benchmark games under replicator dynamics, arguably the best known evolutionary dynamics in game theory. For (weighted) potential games, the new concept coincides with the fixpoints/equilibria of the dynamics. However, in (variants of) zero-sum games with fully mixed (i.e., interior) Nash equilibria, it covers the whole state space, as the dynamics satisfy specific information theoretic constants of motion. We discuss numerous novel computational, as well as structural, combinatorial questions raised by this chain recurrence conception of games.


2021 ◽  
Vol 14 ◽  
pp. 257-272
Author(s):  
Denis Kuzyutin ◽  
◽  
Yulia Skorodumova ◽  
Nadezhda Smirnova ◽  
◽  
...  

A novel approach to sustainable cooperation called subgameperfect core (S-P Core) was introduced by P. Chander and M. Wooders in 2020 for n-person extensive-form games with terminal payoffs. This solution concept incorporates both subgame perfection and cooperation incentives and implies certain distribution of the total players' payoff at the terminal node of the cooperative history. We use in the paper an extension of the S-P Core to the class of extensive games with payoffs defined at all nodes of the game tree that is based on designing an appropriate payoff distribution procedure β and its implementation when a game unfolds along the cooperative history. The difference is that in accordance with this so-called β-subgameperfect core the players can redistribute total current payoff at each node in the cooperative path. Moreover, a payoff distribution procedure from the β-S-P Core satisfies a number of good properties such as subgame efficiency, non-negativity and strict balance condition. In the paper, we examine different properties of the β-S-P Core, introduce several refinements of this cooperative solution and provide examples of its implementation in extensive-form games. Finally, we consider an application of the β-S-P Core to the symmetric discrete-time alternating-move model of fishery management.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Yossi Feinberg

AbstractWe provide a tool to model and solve strategic situations where players’ perceptions are limited, as well as situations where players realize that other players’ perceptions may be limited and so on. We define normal, repeated, incomplete information, and extensive form games with unawareness using a unified methodology. A game with unawareness is defined as a collection of standard games (of the corresponding form). The collection specifies how each player views the game, how she views the other players’ perceptions of the game and so on. The modeler’s description of perceptions, the players’ description of other players’ perceptions, etc. are shown to have consistent representations. We extend solution concepts such as rationalizability and Nash equilibrium to these games and study their properties. It is shown that while unawareness in normal form games can be mapped to incomplete information games, the extended Nash equilibrium solution is not mapped to a known solution concept in the equivalent incomplete information games, implying that games with unawareness generate novel types of behavior.


Author(s):  
Gabriele Farina ◽  
Christian Kroer ◽  
Tuomas Sandholm

Regret minimization is a powerful tool for solving large-scale extensive-form games. State-of-the-art methods rely on minimizing regret locally at each decision point. In this work we derive a new framework for regret minimization on sequential decision problems and extensive-form games with general compact convex sets at each decision point and general convex losses, as opposed to prior work which has been for simplex decision points and linear losses. We call our framework laminar regret decomposition. It generalizes the CFR algorithm to this more general setting. Furthermore, our framework enables a new proof of CFR even in the known setting, which is derived from a perspective of decomposing polytope regret, thereby leading to an arguably simpler interpretation of the algorithm. Our generalization to convex compact sets and convex losses allows us to develop new algorithms for several problems: regularized sequential decision making, regularized Nash equilibria in zero-sum extensive-form games, and computing approximate extensive-form perfect equilibria. Our generalization also leads to the first regret-minimization algorithm for computing reduced-normal-form quantal response equilibria based on minimizing local regrets. Experiments show that our framework leads to algorithms that scale at a rate comparable to the fastest variants of counterfactual regret minimization for computing Nash equilibrium, and therefore our approach leads to the first algorithm for computing quantal response equilibria in extremely large games. Our algorithms for (quadratically) regularized equilibrium finding are orders of magnitude faster than the fastest algorithms for Nash equilibrium finding; this suggests regret-minimization algorithms based on decreasing regularization for Nash equilibrium finding as future work. Finally we show that our framework enables a new kind of scalable opponent exploitation approach.


Author(s):  
Noam Brown ◽  
Tuomas Sandholm

Counterfactual regret minimization (CFR) is a family of iterative algorithms that are the most popular and, in practice, fastest approach to approximately solving large imperfectinformation games. In this paper we introduce novel CFR variants that 1) discount regrets from earlier iterations in various ways (in some cases differently for positive and negative regrets), 2) reweight iterations in various ways to obtain the output strategies, 3) use a non-standard regret minimizer and/or 4) leverage “optimistic regret matching”. They lead to dramatically improved performance in many settings. For one, we introduce a variant that outperforms CFR+, the prior state-of-the-art algorithm, in every game tested, including large-scale realistic settings. CFR+ is a formidable benchmark: no other algorithm has been able to outperform it. Finally, we show that, unlike CFR+, many of the important new variants are compatible with modern imperfect-informationgame pruning techniques and one is also compatible with sampling in the game tree.


2021 ◽  
Vol 66 (2) ◽  
pp. 51
Author(s):  
T.-V. Pricope

Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best solution. Neural Fictitious Self Play (NFSP) is a powerful algorithm for learning approximate Nash equilibrium of imperfect information games from self-play. However, it uses only crude data as input and its most successful experiment was on the in-limit version of Texas Hold’em Poker. In this paper, we develop a new variant of NFSP that combines the established fictitious self-play with neural gradient play in an attempt to improve the performance on large-scale zero-sum imperfect information games and to solve the more complex no-limit version of Texas Hold’em Poker using powerful handcrafted metrics and heuristics alongside crude, raw data. When applied to no-limit Hold’em Poker, the agents trained through self-play outperformed the ones that used fictitious play with a normal-form single-step approach to the game. Moreover, we showed that our algorithm converges close to a Nash equilibrium within the limited training process of our agents with very limited hardware. Finally, our best self-play-based agent learnt a strategy that rivals expert human level.  


2018 ◽  
Vol 20 (03) ◽  
pp. 1840001
Author(s):  
Stefanos Leonardos ◽  
Costis Melolidakis

Given a bimatrix game, the associated leadership or commitment games are defined as the games at which one player, the leader, commits to a (possibly mixed) strategy and the other player, the follower, chooses his strategy after being informed of the irrevocable commitment of the leader (but not of its realization in case it is mixed). Based on a result by Von Stengel and Zamir [2010], the notions of commitment value and commitment optimal strategies for each player are discussed as a possible solution concept. It is shown that in nondegenerate bimatrix games (a) pure commitment optimal strategies together with the follower’s best response constitute Nash equilibria, and (b) strategies that participate in a completely mixed Nash equilibrium are strictly worse than commitment optimal strategies, provided they are not matrix game optimal. For various classes of bimatrix games that generalize zero-sum games, the relationship between the maximin value of the leader’s payoff matrix, the Nash equilibrium payoff and the commitment optimal value are discussed. For the Traveler’s Dilemma, the commitment optimal strategy and commitment value for the leader are evaluated and seem more acceptable as a solution than the unique Nash equilibrium. Finally, the relationship between commitment optimal strategies and Nash equilibria in [Formula: see text] bimatrix games is thoroughly examined and in addition, necessary and sufficient conditions for the follower to be worse off at the equilibrium of the leadership game than at any Nash equilibrium of the simultaneous move game are provided.


Sign in / Sign up

Export Citation Format

Share Document