scholarly journals Zero and non-zero sum risk-sensitive Semi-Markov games

Author(s):  
Arnab Bhabak ◽  
Subhamay Saha
Keyword(s):  
2020 ◽  
Vol 58 (1) ◽  
pp. 580-604 ◽  
Author(s):  
Arnab Basu ◽  
Łukasz Stettner

2017 ◽  
Vol 49 (3) ◽  
pp. 826-849 ◽  
Author(s):  
Prasenjit Mondal

Abstract Zero-sum two-person finite undiscounted (limiting ratio average) semi-Markov games (SMGs) are considered with a general multichain structure. We derive the strategy evaluation equations for stationary strategies of the players. A relation between the payoff in the multichain SMG and that in the associated stochastic game (SG) obtained by a data-transformation is established. We prove that the multichain optimality equations (OEs) for an SMG have a solution if and only if the associated SG has optimal stationary strategies. Though the solution of the OEs may not be optimal for an SMG, we establish the significance of studying the OEs for a multichain SMG. We provide a nice example of SMGs in which one player has no optimal strategy in the stationary class but has an optimal semistationary strategy (that depends only on the initial and current state of the game). For an SMG with absorbing states, we prove that solutions in the game where all players are restricted to semistationary strategies are solutions for the unrestricted game. Finally, we prove the existence of stationary optimal strategies for unichain SMGs and conclude that the unichain condition is equivalent to require that the game satisfies some recurrence/ergodicity/weakly communicating conditions.


2005 ◽  
Vol 61 (3) ◽  
pp. 437-454 ◽  
Author(s):  
Tomás Prieto-Rumeau ◽  
Onésimo Hernández-Lerma

1999 ◽  
Vol 11 (8) ◽  
pp. 2017-2060 ◽  
Author(s):  
Csaba Szepesvári ◽  
Michael L. Littman

Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.


Bernoulli ◽  
2005 ◽  
Vol 11 (6) ◽  
pp. 1009-1029 ◽  
Author(s):  
Xianping Guo ◽  
Onésimo Hernández-Lerma

Sign in / Sign up

Export Citation Format

Share Document