On the Set of Optimal Policies in Variance Penalized Markov Decision Chains

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text

A Markov Decision Process to Determine Optimal Policies in Moving Target

Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security ◽

10.1145/3243734.3278489 ◽

2018 ◽

Cited By ~ 5

Author(s):

Jianjun Zheng ◽

Akbar Siami Namin

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Moving Target ◽

Optimal Policies ◽

Markov Decision

Download Full-text

The Computation of Average Optimal Policies in Denumerable State Markov Decision Chains

Advances in Applied Probability ◽

10.1017/s0001867800027816 ◽

1997 ◽

Vol 29 (01) ◽

pp. 114-137

Author(s):

Linn I. Sennott

Keyword(s):

Discrete Time ◽

Average Cost ◽

Queueing Systems ◽

State Spaces ◽

Original Process ◽

Optimal Policies ◽

Finite State ◽

Markov Decision ◽

Optimal Average ◽

Infinite State

This paper studies the expected average cost control problem for discrete-time Markov decision processes with denumerably infinite state spaces. A sequence of finite state space truncations is defined such that the average costs and average optimal policies in the sequence converge to the optimal average cost and an optimal policy in the original process. The theory is illustrated with several examples from the control of discrete-time queueing systems. Numerical results are discussed.

Download Full-text

Average and blackwell optimal policies in denumerable Markov decision chains

10.1109/cdc.1986.267436 ◽

1986 ◽

Author(s):

Arie Hordijk

Keyword(s):

Optimal Policies ◽

Markov Decision

Download Full-text

Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards

Mathematics of Operations Research ◽

10.1287/moor.13.3.395 ◽

1988 ◽

Vol 13 (3) ◽

pp. 395-420 ◽

Cited By ~ 42

Author(s):

Rommert Dekker ◽

Arie Hordijk

Keyword(s):

Optimal Policies ◽

Markov Decision

Download Full-text

Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model

Advances in Applied Probability ◽

10.2307/1426437 ◽

1983 ◽

Vol 15 (2) ◽

pp. 274-303 ◽

Cited By ~ 28

Author(s):

Arie Hordijk ◽

Frank A. Van Der Duyn Schouten

Keyword(s):

Markov Decision Processes ◽

Optimal Policy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Time Parameter ◽

Queueing Model ◽

Replacement Model ◽

Optimal Policies ◽

Markov Decision

Recently the authors introduced the concept of Markov decision drift processes. A Markov decision drift process can be seen as a straightforward generalization of a Markov decision process with continuous time parameter. In this paper we investigate the existence of stationary average optimal policies for Markov decision drift processes. Using a well-known Abelian theorem we derive sufficient conditions, which guarantee that a ‘limit point' of a sequence of discounted optimal policies with the discounting factor approaching 1 is an average optimal policy. An alternative set of sufficient conditions is obtained for the case in which the discounted optimal policies generate regenerative stochastic processes. The latter set of conditions is easier to verify in several applications. The results of this paper are also applicable to Markov decision processes with discrete or continuous time parameter and to semi-Markov decision processes. In this sense they generalize some well-known results for Markov decision processes with finite or compact action space. Applications to an M/M/1 queueing model and a maintenance replacement model are given. It is shown that under certain conditions on the model parameters the average optimal policy for the M/M/1 queueing model is monotone non-decreasing (as a function of the number of waiting customers) with respect to the service intensity and monotone non-increasing with respect to the arrival intensity. For the maintenance replacement model we prove the average optimality of a bang-bang type policy. Special attention is paid to the computation of the optimal control parameters.

Download Full-text

Conditions for the uniqueness of optimal policies of discounted Markov decision processes

Mathematical Methods of Operations Research ◽

10.1007/s001860400372 ◽

2004 ◽

Vol 60 (3) ◽

pp. 415-436 ◽

Cited By ~ 12

Author(s):

Daniel Cruz-Su�rez ◽

Ra�l Montes-de-Oca ◽

Francisco Salem-Silva

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Optimal Policies ◽

Markov Decision

Download Full-text

Approximation of average cost optimal policies for general Markov decision processes with unbounded costs

Mathematical Methods of Operations Research ◽

10.1007/bf01193864 ◽

1997 ◽

Vol 45 (2) ◽

pp. 245-263

Author(s):

Evgueni Gordienko ◽

Ra�l Montes-De-Oca ◽

Adolfo Minj�rez-Sosa

Keyword(s):

Markov Decision Processes ◽

Average Cost ◽

Decision Processes ◽

Optimal Policies ◽

Markov Decision

Download Full-text

The Determination of Approximately Optimal Policies in Markov Decision Processes by the Use of Bounds

Journal of the Operational Research Society ◽

10.2307/2581490 ◽

1982 ◽

Vol 33 (3) ◽

pp. 253

Author(s):

D. J. White

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Optimal Policies ◽

Markov Decision

Download Full-text

Blackwell optimal policies in a Markov decision process with a Borel state space

Mathematical Methods of Operations Research ◽

10.1007/bf01432969 ◽

1994 ◽

Vol 40 (3) ◽

pp. 253-288 ◽

Cited By ~ 8

Author(s):

A. A. Yushkevich

Keyword(s):

State Space ◽

Markov Decision Process ◽

Decision Process ◽

Borel State Space ◽

Optimal Policies ◽

Markov Decision

Download Full-text