Deterministic policies based on maximum regrets in MDPs with imprecise rewards

Markov Decision Process Models (MDPs) are a powerful tool for planning tasks and sequential decision-making issues. In this work we deal with MDPs with imprecise rewards, often used when dealing with situations where the data is uncertain. In this context, we provide algorithms for finding the policy that minimizes the maximum regret. To the best of our knowledge, all the regret-based methods proposed in the literature focus on providing an optimal stochastic policy. We introduce for the first time a method to calculate an optimal deterministic policy using optimization approaches. Deterministic policies are easily interpretable for users because for a given state they provide a unique choice. To better motivate the use of an exact procedure for finding a deterministic policy, we show some (theoretical and experimental) cases where the intuitive idea of using a deterministic policy obtained after “determinizing” the optimal stochastic policy leads to a policy far from the exact deterministic policy.

Download Full-text

Optimal Policies for Quantum Markov Decision Processes

International Journal of Automation and Computing ◽

10.1007/s11633-021-1278-z ◽

2021 ◽

Author(s):

Ming-Sheng Ying ◽

Yuan Feng ◽

Sheng-Gang Ying

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Quantum Systems ◽

Sequential Decision Making ◽

Mathematical Framework ◽

Sequential Decision ◽

Learning Techniques ◽

Optimal Policies ◽

Markov Decision ◽

Programming Algorithms

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text

A Novel Heterogeneous Swarm Reinforcement Learning Method for Sequential Decision Making Problems

Machine Learning and Knowledge Extraction ◽

10.3390/make1020035 ◽

2019 ◽

Vol 1 (2) ◽

pp. 590-610

Author(s):

Zohreh Akbari ◽

Rainer Unland

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Single Agent ◽

Sequential Decision Making ◽

Multi Agent Systems ◽

Sequential Decision ◽

Agent Systems ◽

Novel Approach ◽

Markov Decision ◽

Multi Agent

Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be designed for single-agent systems or multi-agent systems that either consist of agents with individual goals and decision making capabilities, which are influenced by other agent’s decisions, or behave as a swarm of agents that collaboratively learn a single objective. Many studies have been conducted in this area; however, when concentrating on available swarm RL algorithms, one obtains a clear view of the areas that still require attention. Most of the studies in this area focus on homogeneous swarms and so far, systems introduced as Heterogeneous Swarms (HetSs) merely include very few, i.e., two or three sub-swarms of homogeneous agents, which either, according to their capabilities, deal with a specific sub-problem of the general problem or exhibit different behaviors in order to reduce the risk of bias. This study introduces a novel approach that allows agents, which are originally designed to solve different problems and hence have higher degrees of heterogeneity, to behave as a swarm when addressing identical sub-problems. In fact, the affinity between two agents, which measures the compatibility of agents to work together towards solving a specific sub-problem, is used in designing a Heterogeneous Swarm RL (HetSRL) algorithm that allows HetSs to solve the intended SDMPs.

Download Full-text

Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making

Sequence Learning - Lecture Notes in Computer Science ◽

10.1007/3-540-44565-x_12 ◽

2000 ◽

pp. 264-287 ◽

Cited By ~ 7

Author(s):

Samuel P. M. Choi ◽

Dit-Yan Yeung ◽

Nevin L. Zhang

Keyword(s):

Decision Making ◽

Markov Decision Processes ◽

Decision Processes ◽

Sequential Decision Making ◽

Sequential Decision ◽

Markov Decision

Download Full-text

Markov Decision Processes: A Tool for Sequential Decision Making under Uncertainty

Medical Decision Making ◽

10.1177/0272989x09353194 ◽

2009 ◽

Vol 30 (4) ◽

pp. 474-483 ◽

Cited By ~ 100

Author(s):

Oguzhan Alagoz ◽

Heather Hsu ◽

Andrew J. Schaefer ◽

Mark S. Roberts

Keyword(s):

Decision Making ◽

Markov Decision Processes ◽

Living Donor ◽

Decision Processes ◽

Medical Decision ◽

Sequential Decision Making ◽

Optimal Timing ◽

Decision Making Under Uncertainty ◽

Sequential Decision ◽

Markov Decision

We provide a tutorial on the construction and evaluation of Markov decision processes (MDPs), which are powerful analytical tools used for sequential decision making under uncertainty that have been widely used in many industrial and manufacturing applications but are underutilized in medical decision making (MDM). We demonstrate the use of an MDP to solve a sequential clinical treatment problem under uncertainty. Markov decision processes generalize standard Markov models in that a decision process is embedded in the model and multiple decisions are made over time. Furthermore, they have significant advantages over standard decision analysis. We compare MDPs to standard Markov-based simulation models by solving the problem of the optimal timing of living-donor liver transplantation using both methods. Both models result in the same optimal transplantation policy and the same total life expectancies for the same patient and living donor. The computation time for solving the MDP model is significantly smaller than that for solving the Markov model. We briefly describe the growing literature of MDPs applied to medical decisions.

Download Full-text

A Markov Decision Process Model for Traffic Prioritisation Provisioning

10.28945/2750 ◽

2004 ◽

Author(s):

Abdullah Gani ◽

Omar Zakaria ◽

Nor Badrul Anuar Jumaat

Keyword(s):

Decision Making ◽

Markov Decision Process ◽

Decision Process ◽

Process Model ◽

Decision Problems ◽

Sequential Decision ◽

Best Effort ◽

Sequential Decision Problems ◽

Markov Decision ◽

Decision Making Processes

This paper presents an application of Markov Decision Process (MDP) into the provision of traffic prioritisation in the best-effort networks. MDP was used because it is a standard, general formalism for modelling stochastic, sequential decision problems. The implementation of traffic prioritisation involves a series of decision making processes by which packets are marked and classified before being despatched to destinations. The application of MDP was driven by the objective of ensuring the higher priority packets are not delayed by the lower ones. The MDP is believed to be applicable in improving the traffic prioritisation arbitration.

Download Full-text

An Introduction to Fully and Partially Observable Markov Decision Processes

Decision Theory Models for Applications in Artificial Intelligence ◽

10.4018/978-1-60960-165-2.ch003 ◽

2012 ◽

pp. 33-62 ◽

Cited By ~ 2

Author(s):

Pascal Poupart

Keyword(s):

Decision Making ◽

Markov Decision Processes ◽

Decision Processes ◽

Sequential Decision Making ◽

Decision Making Under Uncertainty ◽

Sequential Decision ◽

Markov Decision ◽

The Common ◽

Partially Observable Markov ◽

Partially Observable

The goal of this chapter is to provide an introduction to Markov decision processes as a framework for sequential decision making under uncertainty. The aim of this introduction is to provide practitioners with a basic understanding of the common modeling and solution techniques. Hence, we will not delve into the details of the most recent algorithms, but rather focus on the main concepts and the issues that impact deployment in practice. More precisely, we will review fully and partially observable Markov decision processes, describe basic algorithms to find good policies and discuss modeling/computational issues that arise in practice.

Download Full-text