scholarly journals Double-Oracle Sampling Method for Stackelberg Equilibrium Approximation in General-Sum Extensive-Form Games

2020 ◽  
Vol 34 (02) ◽  
pp. 2054-2061 ◽  
Author(s):  
Jan Karwowski ◽  
Jecek Mańdziuk

The paper presents a new method for approximating Strong Stackelberg Equilibrium in general-sum sequential games with imperfect information and perfect recall. The proposed approach is generic as it does not rely on any specific properties of a particular game model. The method is based on iterative interleaving of the two following phases: (1) guided Monte Carlo Tree Search sampling of the Follower's strategy space and (2) building the Leader's behavior strategy tree for which the sampled Follower's strategy is an optimal response. The above solution scheme is evaluated with respect to expected Leader's utility and time requirements on three sets of interception games with variable characteristics, played on graphs. A comparison with three state-of-the-art MILP/LP-based methods shows that in vast majority of test cases proposed simulation-based approach leads to optimal Leader's strategies, while excelling the competitive methods in terms of better time scalability and lower memory requirements.

Author(s):  
Alberto Marchesi ◽  
Gabriele Farina ◽  
Christian Kroer ◽  
Nicola Gatti ◽  
Tuomas Sandholm

Equilibrium refinements are important in extensive-form (i.e., tree-form) games, where they amend weaknesses of the Nash equilibrium concept by requiring sequential rationality and other beneficial properties. One of the most attractive refinement concepts is quasi-perfect equilibrium. While quasiperfection has been studied in extensive-form games, it is poorly understood in Stackelberg settings—that is, settings where a leader can commit to a strategy—which are important for modeling, for example, security games. In this paper, we introduce the axiomatic definition of quasi-perfect Stackelberg equilibrium. We develop a broad class of game perturbation schemes that lead to them in the limit. Our class of perturbation schemes strictly generalizes prior perturbation schemes introduced for the computation of (non-Stackelberg) quasi-perfect equilibria. Based on our perturbation schemes, we develop a branch-and-bound algorithm for computing a quasi-perfect Stackelberg equilibrium. It leverages a perturbed variant of the linear program for computing a Stackelberg extensive-form correlated equilibrium. Experiments show that our algorithm can be used to find an approximate quasi-perfect Stackelberg equilibrium in games with thousands of nodes.


Author(s):  
Trevor Davis ◽  
Kevin Waugh ◽  
Michael Bowling

Extensive-form games are a common model for multiagent interactions with imperfect information. In two-player zerosum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. In many situations, however, we would like to constrain the set of possible strategies. For example, constraints are a natural way to model limited resources, risk mitigation, safety, consistency with past observations of behavior, or other secondary objectives for an agent. In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints. In this work we introduce a generalized form of Counterfactual Regret Minimization that provably finds optimal strategies under any feasible set of convex constraints. We demonstrate the effectiveness of our algorithm for finding strategies that mitigate risk in security games, and for opponent modeling in poker games when given only partial observations of private information.


2014 ◽  
Vol 51 ◽  
pp. 829-866 ◽  
Author(s):  
B. Bosansky ◽  
C. Kiekintveld ◽  
V. Lisy ◽  
M. Pechoucek

Developing scalable solution algorithms is one of the central problems in computational game theory. We present an iterative algorithm for computing an exact Nash equilibrium for two-player zero-sum extensive-form games with imperfect information. Our approach combines two key elements: (1) the compact sequence-form representation of extensive-form games and (2) the algorithmic framework of double-oracle methods. The main idea of our algorithm is to restrict the game by allowing the players to play only selected sequences of available actions. After solving the restricted game, new sequences are added by finding best responses to the current solution using fast algorithms. We experimentally evaluate our algorithm on a set of games inspired by patrolling scenarios, board, and card games. The results show significant runtime improvements in games admitting an equilibrium with small support, and substantial improvement in memory use even on games with large support. The improvement in memory use is particularly important because it allows our algorithm to solve much larger game instances than existing linear programming methods. Our main contributions include (1) a generic sequence-form double-oracle algorithm for solving zero-sum extensive-form games; (2) fast methods for maintaining a valid restricted game model when adding new sequences; (3) a search algorithm and pruning methods for computing best-response sequences; (4) theoretical guarantees about the convergence of the algorithm to a Nash equilibrium; (5) experimental analysis of our algorithm on several games, including an approximate version of the algorithm.


Author(s):  
Christian Kroer ◽  
Gabriele Farina ◽  
Tuomas Sandholm

Nash equilibrium is a popular solution concept for solving imperfect-information games in practice. However, it has a major drawback: it does not preclude suboptimal play in branches of the game tree that are not reached in equilibrium. Equilibrium refinements can mend this issue, but have experienced little practical adoption. This is largely due to a lack of scalable algorithms.Sparse iterative methods, in particular first-order methods, are known to be among the most effective algorithms for computing Nash equilibria in large-scale two-player zero-sum extensive-form games. In this paper, we provide, to our knowledge, the first extension of these methods to equilibrium refinements. We develop a smoothing approach for behavioral perturbations of the convex polytope that encompasses the strategy spaces of players in an extensive-form game. This enables one to compute an approximate variant of extensive-form perfect equilibria. Experiments show that our smoothing approach leads to solutions with dramatically stronger strategies at information sets that are reached with low probability in approximate Nash equilibria, while retaining the overall convergence rate associated with fast algorithms for Nash equilibrium. This has benefits both in approximate equilibrium finding (such approximation is necessary in practice in large games) where some probabilities are low while possibly heading toward zero in the limit, and exact equilibrium computation where the low probabilities are actually zero.


Author(s):  
Shuxin Li ◽  
Youzhi Zhang ◽  
Xinrun Wang ◽  
Wanqi Xue ◽  
Bo An

In many real-world scenarios, a team of agents must coordinate with each other to compete against an opponent. The challenge of solving this type of game is that the team's joint action space grows exponentially with the number of agents, which results in the inefficiency of the existing algorithms, e.g., Counterfactual Regret Minimization (CFR). To address this problem, we propose a new framework of CFR: CFR-MIX. Firstly, we propose a new strategy representation that represents a joint action strategy using individual strategies of all agents and a consistency relationship to maintain the cooperation between agents. To compute the equilibrium with individual strategies under the CFR framework, we transform the consistency relationship between strategies to the consistency relationship between the cumulative regret values. Furthermore, we propose a novel decomposition method over cumulative regret values to guarantee the consistency relationship between the cumulative regret values. Finally, we introduce our new algorithm CFR-MIX which employs a mixing layer to estimate cumulative regret values of joint actions as a non-linear combination of cumulative regret values of individual actions. Experimental results show that CFR-MIX outperforms existing algorithms on various games significantly.


Author(s):  
Christel Baier ◽  
Florian Funke ◽  
Rupak Majumdar

When designing or analyzing multi-agent systems, a fundamental problem is responsibility ascription: to specify which agents are responsible for the joint outcome of their behaviors and to which extent. We model strategic multi-agent interaction as an extensive form game of imperfect information and define notions of forward (prospective) and backward (retrospective) responsibility. Forward responsibility identifies the responsibility of a group of agents for an outcome along all possible plays, whereas backward responsibility identifies the responsibility along a given play. We further distinguish between strategic and causal backward responsibility, where the former captures the epistemic knowledge of players along a play, while the latter formalizes which players – possibly unknowingly – caused the outcome. A formal connection between forward and backward notions is established in the case of perfect recall. We further ascribe quantitative responsibility through cooperative game theory. We show through a number of examples that our approach encompasses several prior formal accounts of responsibility attribution.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Zhenyang Guo ◽  
Xuan Wang ◽  
Shuhan Qi ◽  
Tao Qian ◽  
Jiajia Zhang

Imperfect information games have served as benchmarks and milestones in fields of artificial intelligence (AI) and game theory for decades. Sensing and exploiting information to effectively describe the game environment is of critical importance for game solving, besides computing or approximating an optimal strategy. Reconnaissance blind chess (RBC), a new variant of chess, is a quintessential game of imperfect information where the player’s actions are definitely unobserved by the opponent. This characteristic of RBC exponentially expands the scale of the information set and extremely invokes uncertainty of the game environment. In this paper, we introduce a novel sense method, Heuristic Search of Uncertainty Control (HSUC), to significantly reduce the uncertainty of real-time information set. The key idea of HSUC is to consider the whole uncertainty of the environment rather than predicting the opponents’ strategy. Furthermore, we realize a practical framework for RBC game that incorporates our HSUC method with Monte Carlo Tree Search (MCTS). In the experiments, HSUC has shown better effectiveness and robustness than comparison opponents in information sensing. It is worth mentioning that our RBC game agent has won the first place in terms of uncertainty management in NeurIPS 2019 RBC tournament.


Sign in / Sign up

Export Citation Format

Share Document