A Policy Improvement Algorithm for Solving a Mixture Class of Perfect Information and AR-AT Semi-Markov Games

Zero-sum two-person discounted semi-Markov games with finite state and action spaces are studied where a collection of states having Perfect Information (PI) property is mixed with another collection of states having Additive Reward–Additive Transition and Action Independent Transition Time (AR-AT-AITT) property. For such a PI/AR-AT-AITT mixture class of games, we prove the existence of an optimal pure stationary strategy for each player. We develop a policy improvement algorithm for solving discounted semi-Markov decision processes (one player version of semi-Markov games) and using it we obtain a policy-improvement type algorithm for computing an optimal strategy pair of a PI/AR-AT-AITT mixture semi-Markov game. Finally, we extend our results when the states having PI property are replaced by a subclass of Switching Control (SC) states.

Download Full-text

EXISTENCE OF OPTIMAL STATIONARY POLICIES IN FINITE DYNAMIC PROGRAMS WITH NONNEGATIVE REWARDS

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964801154082 ◽

2001 ◽

Vol 15 (4) ◽

pp. 557-564 ◽

Cited By ~ 1

Author(s):

Rolando Cavazos-Cadena ◽

Raúl Montes-de-Oca

Keyword(s):

Control Policy ◽

Stationary Policy ◽

Reward Function ◽

Total Reward ◽

Dynamic Programs ◽

Finite State ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Action Spaces ◽

Discounted Criterion

This article concerns Markov decision chains with finite state and action spaces, and a control policy is graded via the expected total-reward criterion associated to a nonnegative reward function. Within this framework, a classical theorem guarantees the existence of an optimal stationary policy whenever the optimal value function is finite, a result that is obtained via a limit process using the discounted criterion. The objective of this article is to present an alternative approach, based entirely on the properties of the expected total-reward index, to establish such an existence result.

Download Full-text

On the Complexity of the Policy Improvement Algorithm for Markov Decision Processes

INFORMS Journal on Computing ◽

10.1287/ijoc.6.2.188 ◽

1994 ◽

Vol 6 (2) ◽

pp. 188-192 ◽

Cited By ~ 16

Author(s):

Mary Melekopoglou ◽

Anne Condon

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Policy Improvement ◽

Improvement Algorithm ◽

Markov Decision

Download Full-text

ORDERED FIELD PROPERTY IN A SUBCLASS OF FINITE SER-SIT SEMI-MARKOV GAMES

International Game Theory Review ◽

10.1142/s0219198913400264 ◽

2013 ◽

Vol 15 (04) ◽

pp. 1340026 ◽

Cited By ~ 6

Author(s):

PRASENJIT MONDAL ◽

SAGNIK SINHA

Keyword(s):

Stochastic Game ◽

Transition Probability ◽

Pollution Tax ◽

Markov Games ◽

Ordered Field ◽

Markov Game ◽

Ordered Field Property ◽

Single Matrix ◽

Zero Sum ◽

Field Property

In this paper, we deal with a subclass of two-person finite SeR-SIT (Separable Reward-State Independent Transition) semi-Markov games which can be solved by solving a single matrix/bimatrix game under discounted as well as limiting average (undiscounted) payoff criteria. A SeR-SIT semi-Markov game does not satisfy the so-called (Archimedean) ordered field property in general. Besides, the ordered field property does not hold even for a SeR-SIT-PT (Separable Reward-State-Independent Transition Probability and Time) semi-Markov game, which is a natural version of a SeR-SIT stochastic (Markov) game. However by using an additional condition, we have shown that a subclass of finite SeR-SIT-PT semi-Markov games have the ordered field property for both discounted and undiscounted semi-Markov games with both players having state-independent stationary optimals. The ordered field property also holds for the nonzero-sum case under the same assumptions. We find a relation between the values of the discounted and the undiscounted zero-sum semi-Markov games for this modified subclass. We propose a more realistic pollution tax model for this subclass of SeR-SIT semi-Markov games than pollution tax model for SeR-SIT stochastic game. Finite step algorithms are given for the discounted and for the zero-sum undiscounted cases.

Download Full-text

Linear Programming and Zero-Sum Two-Person Undiscounted Semi-Markov Games

Asia Pacific Journal of Operational Research ◽

10.1142/s0217595915500438 ◽

2015 ◽

Vol 32 (06) ◽

pp. 1550043 ◽

Cited By ~ 6

Author(s):

Prasenjit Mondal

Keyword(s):

Linear Programming ◽

Transition Probabilities ◽

Optimal Solution ◽

Programming Algorithm ◽

Stationary Strategy ◽

Markov Games ◽

Markov Decision ◽

Transition Times ◽

Optimal Stationary Strategies ◽

Zero Sum

In this paper, zero-sum two-person finite undiscounted (limiting average) semi-Markov games (SMGs) are considered. We prove that the solutions of the game when both players are restricted to semi-Markov strategies are solutions for the original game. In addition, we show that if one player fixes a stationary strategy, then the other player can restrict himself in solving an undiscounted semi-Markov decision process associated with that stationary strategy. The undiscounted SMGs are also studied when the transition probabilities and the transition times are controlled by a fixed player in all states. If such games are unichain, we prove that the value and optimal stationary strategies of the players can be obtained from an optimal solution of a linear programming algorithm. We propose a realistic and generalized traveling inspection model that suitably fits into the class of one player control undiscounted unichain semi-Markov games.

Download Full-text

A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information

Mathematical Programming ◽

10.1007/s10107-002-0312-3 ◽

2003 ◽

Vol 95 (3) ◽

pp. 513-532 ◽

Cited By ~ 20

Author(s):

T.E.S. Raghavan ◽

Zamir Syed

Keyword(s):

Stochastic Games ◽

Perfect Information ◽

Policy Improvement ◽

Type Algorithm ◽

Zero Sum

Download Full-text

Policy improvement algorithm for continuous time Markov decision processes with switching costs

Stochastic Control Theory and Stochastic Differential Systems - Lecture Notes in Control and Information Sciences ◽

10.1007/bfb0009393 ◽

2005 ◽

pp. 320-331 ◽

Cited By ~ 1

Author(s):

Bharat Doshi

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Switching Costs ◽

Decision Processes ◽

Policy Improvement ◽

Improvement Algorithm ◽

Markov Decision

Download Full-text

Nonzero-sum games for continuous-time Markov chains with unbounded discounted payoffs

Journal of Applied Probability ◽

10.1239/jap/1118777172 ◽

2005 ◽

Vol 42 (2) ◽

pp. 303-320 ◽

Cited By ~ 14

Author(s):

Xianping Guo ◽

Onésimo Hernández-Lerma

Keyword(s):

Markov Chains ◽

Continuous Time ◽

Queueing System ◽

Transition Rates ◽

Markov Game ◽

Continuous Time Markov Chains ◽

Payoff Functions ◽

Zero Sum ◽

Action Spaces ◽

Uniformly Bounded

In this paper, we study two-person nonzero-sum games for continuous-time Markov chains with discounted payoff criteria and Borel action spaces. The transition rates are possibly unbounded, and the payoff functions might have neither upper nor lower bounds. We give conditions that ensure the existence of Nash equilibria in stationary strategies. For the zero-sum case, we prove the existence of the value of the game, and also provide arecursiveway to compute it, or at least to approximate it. Our results are applied to a controlled queueing system. We also show that if the transition rates areuniformly bounded, then a continuous-time game is equivalent, in a suitable sense, to a discrete-time Markov game.

Download Full-text

Converging Coevolutionary Algorithm for Two-Person Zero-Sum Discounted Markov Games With Perfect Information

IEEE Transactions on Automatic Control ◽

10.1109/tac.2007.914299 ◽

2008 ◽

Vol 53 (2) ◽

pp. 596-601 ◽

Cited By ~ 4

Author(s):

Hyeong Soo Chang

Keyword(s):

Perfect Information ◽

Markov Games ◽

Coevolutionary Algorithm ◽

Zero Sum

Download Full-text

Perfect information two-person zero-sum markov games with imprecise transition probabilities

Mathematical Methods of Operations Research ◽

10.1007/s00186-006-0081-5 ◽

2006 ◽

Vol 64 (2) ◽

pp. 335-351 ◽

Cited By ~ 2

Author(s):

Hyeong Soo Chang

Keyword(s):

Transition Probabilities ◽

Perfect Information ◽

Markov Games ◽

Zero Sum

Download Full-text

Adaptive Stochastic Resource Control: A Machine Learning Approach

Journal of Artificial Intelligence Research ◽

10.1613/jair.2548 ◽

2008 ◽

Vol 32 ◽

pp. 453-486 ◽

Cited By ~ 7

Author(s):

B. C. Csaji ◽

L. Monostori

Keyword(s):

Control Policy ◽

Natural Generalization ◽

Support Vector ◽

Control Policies ◽

Related Data ◽

Stochastic Resource Allocation ◽

Finite State ◽

Markov Decision ◽

Distributed Sampling ◽

Action Spaces

The paper investigates stochastic resource allocation problems with scarce, reusable resources and non-preemtive, time-dependent, interconnected tasks. This approach is a natural generalization of several standard resource management problems, such as scheduling and transportation problems. First, reactive solutions are considered and defined as control policies of suitably reformulated Markov decision processes (MDPs). We argue that this reformulation has several favorable properties, such as it has finite state and action spaces, it is aperiodic, hence all policies are proper and the space of control policies can be safely restricted. Next, approximate dynamic programming (ADP) methods, such as fitted Q-learning, are suggested for computing an efficient control policy. In order to compactly maintain the cost-to-go function, two representations are studied: hash tables and support vector regression (SVR), particularly, nu-SVRs. Several additional improvements, such as the application of limited-lookahead rollout algorithms in the initial phases, action space decomposition, task clustering and distributed sampling are investigated, too. Finally, experimental results on both benchmark and industry-related data are presented.

Download Full-text