The Computation of Average Optimal Policies in Denumerable State Markov Decision Chains

This paper studies the expected average cost control problem for discrete-time Markov decision processes with denumerably infinite state spaces. A sequence of finite state space truncations is defined such that the average costs and average optimal policies in the sequence converge to the optimal average cost and an optimal policy in the original process. The theory is illustrated with several examples from the control of discrete-time queueing systems. Numerical results are discussed.

Download Full-text

Computation of Average Cost Optimal Policies for Infinite State Spaces

Stochastic Dynamic Programming and the Control of Queueing Systems - Wiley Series in Probability and Statistics ◽

10.1002/9780470317037.ch8 ◽

2008 ◽

pp. 168-198

Keyword(s):

Average Cost ◽

State Spaces ◽

Optimal Policies ◽

Infinite State

Download Full-text

Vanishing discount approximations in controlled Markov chains with risk-sensitive average criterion

Advances in Applied Probability ◽

10.1017/apr.2018.10 ◽

2018 ◽

Vol 50 (01) ◽

pp. 204-230 ◽

Cited By ~ 3

Author(s):

Rolando Cavazos-Cadena ◽

Daniel Hernández-Hernández

Keyword(s):

Average Cost ◽

Control Policy ◽

Sensitivity Coefficient ◽

Value Functions ◽

Risk Sensitive ◽

Finite State ◽

Markov Decision ◽

Functional Part ◽

Optimal Average ◽

Average Cost Optimality Equation

Abstract This work concerns Markov decision chains on a finite state space. The decision-maker has a constant and nonnull risk sensitivity coefficient, and the performance of a control policy is measured by two different indices, namely, the discounted and average criteria. Motivated by well-known results for the risk-neutral case, the problem of approximating the optimal risk-sensitive average cost in terms of the optimal risk-sensitive discounted value functions is addressed. Under suitable communication assumptions, it is shown that, as the discount factor increases to 1, appropriate normalizations of the optimal discounted value functions converge to the optimal average cost, and to the functional part of the solution of the risk-sensitive average cost optimality equation.

Download Full-text

Mixed risk-neutral/minimax control of discrete-time, finite-state Markov decision processes

IEEE Transactions on Automatic Control ◽

10.1109/9.847737 ◽

2000 ◽

Vol 45 (3) ◽

pp. 528-532 ◽

Cited By ~ 18

Author(s):

S.P. Coraluppi ◽

S.I. Marcus

Keyword(s):

Markov Decision Processes ◽

Discrete Time ◽

Decision Processes ◽

Minimax Control ◽

Risk Neutral ◽

Finite State ◽

Markov Decision

Download Full-text

On the total reward variance for continuous-time Markov reward chains

Journal of Applied Probability ◽

10.1017/s0021900200002412 ◽

2006 ◽

Vol 43 (04) ◽

pp. 1044-1052 ◽

Cited By ~ 2

Author(s):

Nico M. Van Dijk ◽

Karel Sladký

Keyword(s):

Growth Rate ◽

Discrete Time ◽

Continuous Time ◽

Asymptotically Linear ◽

State Spaces ◽

Total Reward ◽

Finite State ◽

Markov Reward ◽

Discrete Time Case

As an extension of the discrete-time case, this note investigates the variance of the total cumulative reward for continuous-time Markov reward chains with finite state spaces. The results correspond to discrete-time results. In particular, the variance growth rate is shown to be asymptotically linear in time. Expressions are provided to compute this growth rate.

Download Full-text

Constrained Discounted Markov Decision Chains

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964800002230 ◽

1991 ◽

Vol 5 (4) ◽

pp. 463-475 ◽

Cited By ~ 26

Author(s):

Linn I. Sennott

Keyword(s):

State Space ◽

Discrete Time ◽

Queueing Systems ◽

Operating Cost ◽

Countable State Space ◽

Countable State ◽

Markov Decision ◽

Holding Cost

A Markov decision chain with countable state space incurs two types of costs: an operating cost and a holding cost. The objective is to minimize the expected discounted operating cost, subject to a constraint on the expected discounted holding cost. The existence of an optimal randomized simple policy is proved. This is a policy that randomizes between two stationary policies, that differ in at most one state. Several examples from the control of discrete time queueing systems are discussed.

Download Full-text

INDEXABILITY OF BANDIT PROBLEMS WITH RESPONSE DELAYS

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964810000021 ◽

2010 ◽

Vol 24 (3) ◽

pp. 349-374 ◽

Cited By ~ 5

Author(s):

Felipe Caro ◽

Onesun Steve Yoo

Keyword(s):

Discrete Time ◽

Bayesian Learning ◽

The Other ◽

Independent Random Variables ◽

Important Class ◽

Marginal Productivity ◽

Bandit Problems ◽

Theoretical Justification ◽

State Spaces ◽

Infinite State

This article considers an important class of discrete time restless bandits, given by the discounted multiarmed bandit problems with response delays. The delays in each period are independent random variables, in which the delayed responses do not cross over. For a bandit arm in this class, we use a coupling argument to show that in each state there is a unique subsidy that equates the pulling and nonpulling actions (i.e., the bandit satisfies the indexibility criterion introduced by Whittle (1988). The result allows for infinite or finite horizon and holds for arbitrary delay lengths and infinite state spaces. We compute the resulting marginal productivity indexes (MPI) for the Beta-Bernoulli Bayesian learning model, formulate and compute a tractable upper bound, and compare the suboptimality gap of the MPI policy to those of other heuristics derived from different closed-form indexes. The MPI policy performs near optimally and provides a theoretical justification for the use of the other heuristics.

Download Full-text

Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space

Mathematical Methods of Operations Research ◽

10.1007/s001860200256 ◽

2003 ◽

Vol 57 (2) ◽

pp. 263-285 ◽

Cited By ~ 10

Author(s):

Rolando Cavazos-Cadena

Keyword(s):

Markov Decision Processes ◽

Average Cost ◽

Decision Processes ◽

Optimality Equation ◽

Risk Sensitive ◽

Finite State ◽

Markov Decision ◽

Average Cost Optimality Equation ◽

Cost Optimality ◽

Finite State Space

Download Full-text

Approximation of average cost optimal policies for general Markov decision processes with unbounded costs

Mathematical Methods of Operations Research ◽

10.1007/bf01193864 ◽

1997 ◽

Vol 45 (2) ◽

pp. 245-263

Author(s):

Evgueni Gordienko ◽

Ra�l Montes-De-Oca ◽

Adolfo Minj�rez-Sosa

Keyword(s):

Markov Decision Processes ◽

Average Cost ◽

Decision Processes ◽

Optimal Policies ◽

Markov Decision

Download Full-text

Average Cost Semi-Markov Decision Processes and the Control of Queueing Systems

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964800001121 ◽

1989 ◽

Vol 3 (2) ◽

pp. 247-272 ◽

Cited By ~ 47

Author(s):

Linn I. Sennott

Keyword(s):

Markov Decision Processes ◽

Average Cost ◽

Queueing Systems ◽

Decision Processes ◽

Single Server ◽

Stationary Policy ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Poisson Arrivals ◽

Action Spaces

Semi-Markov decision processes underlie the control of many queueing systems. In this paper, we deal with infinite state semi-Markov decision processes with nonnegative, unbounded costs and finite action sets. Axioms for the existence of an expected average cost optimal stationary policy are presented. These conditions generalize the work in Sennott [22] for Markov decision processes. Verifiable conditions for the axioms to hold are obtained. The theory is applied to control of the M/G/l queue with variable service parameter, with on-off server, and with batch processing, and to control of the G/M/m queue with variable arrival parameter and customer rejection. It is applied to a timesharing network of queues with a single server and finally to optimal routing of Poisson arrivals to parallel exponential servers. The final section extends the existence result to compact action spaces.

Download Full-text