Mixed Policies

Author(s):  
João P. Hespanha

This chapter explores the concept of mixed policies and how the notions for pure policies can be adapted to this more general type of policies. A pure policy consists of choices of particular actions (perhaps based on some observation), whereas a mixed policy involves choosing a probability distribution to select actions (perhaps as a function of observations). The idea behind mixed policies is that the players select their actions randomly according to a previously selected probability distribution. The chapter first considers the rock-paper-scissors game as an example of mixed policy before discussing mixed action spaces, mixed security policy and saddle-point equilibrium, mixed saddle-point equilibrium vs. average security levels, and general zero-sum games. It concludes with practice exercises with corresponding solutions and an additional exercise.

Author(s):  
João P. Hespanha

This chapter discusses a number of key concepts for zero-sum matrix games. A zero-sum matrix game is played by two players, each with a finite set of actions. Player 1 wants to minimize the outcome and Player 2 wants to maximize it. After providing an overview of how zero-sum matrix games are played, the chapter considers the security levels and policies involved and how they can be computed using MATLAB. It then examines the case of a matrix game with alternate play and one with simultaneous play to determine whether rational players will regret their decision to play a security policy. It also describes the saddle-point equilibrium and its relation to the security levels for the two players, as well as the order interchangeability property and computational complexity of a matrix game before concluding with a practice exercise with the corresponding solution and an additional exercise.


Author(s):  
João P. Hespanha

This chapter defines a number of key concepts for non-zero-sum games involving two players. It begins by considering a two-player game G in which two players P₁ and P₂ are allowed to select policies within action spaces Γ‎₁ and Γ‎₂, respectively. Each player wants to minimize their own outcome, and does not care about the outcome of the other player. The chapter proceeds by discussing the security policy and Nash equilibrium for two-player non-zero-sum games, bimatrix games, admissible Nash equilibrium, and mixed policy. It also explores the order interchangeability property for Nash equilibria in best-response equivalent games before concluding with practice exercises and their corresponding solutions, along with additional exercises.


Author(s):  
João P. Hespanha

This chapter discusses two types of stochastic policy for extensive form game representation as well as the existence and computation of saddle-point equilibrium. For games in extensive form, a mixed policy corresponds to selecting a pure policy in random based on a previously selected probability distribution before the game starts, and then playing that policy throughout the game. It is assumed that the random selections by both players are done statistically independently and the players will try to optimize the expected outcome of the game. After providing an overview of mixed policies and saddle-point equilibria, the chapter considers the behavioral policy for games in extensive form. It also explores behavioral saddle-point equilibrium, behavioral vs. mixed policy, recursive computation of equilibria for feedback games, mixed vs. behavioral order interchangeability, and non-feedback games. It concludes with practice exercises and their corresponding solutions, along with additional exercises.


Author(s):  
Levi H. S. Lelis

In this paper we review several planning algorithms developed for zero-sum games with exponential action spaces, i.e., spaces that grow exponentially with the number of game components that can act simultaneously at a given game state. As an example, real-time strategy games have exponential action spaces because the number of actions available grows exponentially with the number of units controlled by the player. We also present a unifying perspective in which several existing algorithms can be described as an instantiation of a variant of NaiveMCTS. In addition to describing several existing planning algorithms for exponential action spaces, we show that other instantiations of this variant of NaiveMCTS represent novel and promising algorithms to be studied in future works.


Author(s):  
João P. Hespanha

This chapter focuses on the computation of the saddle-point equilibrium of a zero-sum continuous time dynamic game in a state-feedback policy. It begins by considering the solution for two-player zero sum dynamic games in continuous time, assuming a finite horizon integral cost that Player 1 wants to minimize and Player 2 wants to maximize, and taking into account a state feedback information structure. Continuous time dynamic programming can also be used to construct saddle-point equilibria in state-feedback policies. The discussion then turns to continuous time linear quadratic dynamic games and the use of dynamic programming to construct a saddle-point equilibrium in a state-feedback policy for a two-player zero sum differential game with variable termination time. The chapter also describes pursuit-evasion games before concluding with a practice exercise and the corresponding solution.


2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Chandan Pal ◽  
Somnath Pradhan

<p style='text-indent:20px;'>In this paper we study zero-sum stochastic games for pure jump processes on a general state space with risk sensitive discounted criteria. We establish a saddle point equilibrium in Markov strategies for bounded cost function. We achieve our results by studying relevant Hamilton-Jacobi-Isaacs equations.</p>


Author(s):  
João P. Hespanha

This chapter focuses on the computation of mixed saddle-point equilibrium policies. In view of the Minimax Theorem, the mixed saddle-point equilibria can be determined by computing the mixed security policies for both players. For 2 x 2 games, the mixed security policy can be computed in closed form using the “graphical method.” After providing an overview of the graphical method, the chapter considers a systematic numerical procedure to find the linear program solution for mixed saddle-point equilibria and the use of MATLAB's Optimization Toolbox to numerically solve linear programs. It then describes a strictly dominating policy and a “weakly” dominating policy before concluding with practice exercises and their corresponding solutions, along with an additional exercise.


2013 ◽  
Vol 23 (4) ◽  
pp. 473-493
Author(s):  
Muhammad Wakhid Musthofa ◽  
Jacob C. Engwerda ◽  
Ari Suparwanto ◽  

Abstract In this paper the feedback saddle point equilibria of soft-constrained zero-sum linear quadratic differential games for descriptor systems that have index one will be studied for a finite and infinite planning horizon. Both necessary and sufficient conditions for the existence of a feedback saddle point equilibrium are considered


Sign in / Sign up

Export Citation Format

Share Document