Optimal Policies for Quantum Markov Decision Processes
Keyword(s):
AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.
2019 ◽
Vol 33
◽
pp. 10007-10008
◽
2017 ◽
Vol 29
(12)
◽
pp. 2103-2113
◽
2019 ◽
Vol 33
◽
pp. 8393-8400
◽
2000 ◽
pp. 264-287
◽