Elaboration Tolerant Representation of Markov Decision Process via Decision-Theoretic Extension of Probabilistic Action Language +

Author(s):  
YI WANG ◽  
JOOHYUNG LEE

Abstract We extend probabilistic action language $p{\cal BC}$ + with the notion of utility in decision theory. The semantics of the extended $p{\cal BC}$ + can be defined as a shorthand notation for a decision-theoretic extension of the probabilistic answer set programming language LPMLN. Alternatively, the semantics of $p{\cal BC}$ + can also be defined in terms of Markov decision process (MDP), which in turn allows for representing MDP in a succinct and elaboration tolerant way as well as leveraging an MDP solver to compute a $p{\cal BC}$ + action description. The idea led to the design of the system pbcplus2mdp, which can find an optimal policy of a $p{\cal BC}$ + action description using an MDP solver.

2017 ◽  
Vol 14 (5) ◽  
pp. 467-472 ◽  
Author(s):  
Mohammad M. Hamasha ◽  
George Rumbe

Purpose Emergency departments (ED) are faced with the challenge of capacity planning that caused by the high demand for patients and limited resources. Consequently, inadequate resources lead to increased delays, impacts on the quality of care and increase the health-care costs. Such circumstances necessitate utilizing operational research modules, such as the Markov decision process (MDP) to enable better decision-making. The purpose of this paper is to demonstrate the applicability and usage of MDP on ED. Design/methodology/approach The adoption of MDP provides invaluable insights into system operations based on the different system states (e.g. very busy to unoccupied) to ensure optimal assigning of resources and reduced costs. In this paper, a descriptive health system model based on the MDP is presented, and a numerical example is illustrated to elaborate its appropriateness in optimal policy decision determination. Findings Faced with numerous decisions, hospital managers have to ensure that the appropriate technique is used to minimize any undesired outcomes. MDP has been shown to be a robust approach which provides support to the critical decision-making processes. Additionally, MDP also provides insights on the associated costs which enable the hospital managers to efficiently allocate resources ensuring quality health care and increased throughput while minimizing costs. Originality/value Applying MDP in the ED is a unique and good starting. MDP is powerful tool helps in making a decision in the critical situations, and the ED needs such tool.


Author(s):  
Thiago Freitas dos Santos ◽  
Paulo E. Santos ◽  
Leonardo Anjoletto Ferreira ◽  
Reinaldo A. C. Bianchi ◽  
Pedro Cabalar

Mathematics ◽  
2021 ◽  
Vol 9 (19) ◽  
pp. 2437
Author(s):  
Kausthub Keshava ◽  
Alain Jean-Marie ◽  
Sara Alouf

We propose and analyze a model for optimizing the prefetching of documents, in the situation where the connection between documents is discovered progressively. A random surfer moves along the edges of a random tree representing possible sequences of documents, which is known to a controller only up to depth d. A quantity k of documents can be prefetched between two movements. The question is to determine which nodes of the known tree should be prefetched so as to minimize the probability of the surfer moving to a node not prefetched. We analyzed the model with the tools of Markov decision process theory. We formally identified the optimal policy in several situations, and we identified it numerically in others.


Author(s):  
Alessandro Ronca ◽  
Giuseppe De Giacomo

Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process.


1983 ◽  
Vol 20 (02) ◽  
pp. 368-379
Author(s):  
Lam Yeh ◽  
L. C. Thomas

By considering continuous-time Markov decision processes where decisions can be made at any time, we show in the case of M/M/1 queues with discounted costs that there exists a monotone optimal policy among all the regular policies.


1983 ◽  
Vol 20 (2) ◽  
pp. 368-379 ◽  
Author(s):  
Lam Yeh ◽  
L. C. Thomas

By considering continuous-time Markov decision processes where decisions can be made at any time, we show in the case of M/M/1 queues with discounted costs that there exists a monotone optimal policy among all the regular policies.


2020 ◽  
Vol 40 (1) ◽  
pp. 117-137
Author(s):  
R. Israel Ortega-Gutiérrez ◽  
H. Cruz-Suárez

This paper addresses a class of sequential optimization problems known as Markov decision processes. These kinds of processes are considered on Euclidean state and action spaces with the total expected discounted cost as the objective function. The main goal of the paper is to provide conditions to guarantee an adequate Moreau-Yosida regularization for Markov decision processes (named the original process). In this way, a new Markov decision process that conforms to the Markov control model of the original process except for the cost function induced via the Moreau-Yosida regularization is established. Compared to the original process, this new discounted Markov decision process has richer properties, such as the differentiability of its optimal value function, strictly convexity of the value function, uniqueness of optimal policy, and the optimal value function and the optimal policy of both processes, are the same. To complement the theory presented, an example is provided.


Author(s):  
Sanmit Narvekar ◽  
Jivko Sinapov ◽  
Peter Stone

Transfer learning is a method where an agent reuses knowledge learned in a source task to improve learning on a target task. Recent work has shown that transfer learning can be extended to the idea of curriculum learning, where the agent incrementally accumulates knowledge over a sequence of tasks (i.e. a curriculum). In most existing work, such curricula have been constructed manually. Furthermore, they are fixed ahead of time, and do not adapt to the progress or abilities of the agent. In this paper, we formulate the design of a curriculum as a Markov Decision Process, which directly models the accumulation of knowledge as an agent interacts with tasks, and propose a method that approximates an execution of an optimal policy in this MDP to produce an agent-specific curriculum. We use our approach to automatically sequence tasks for 3 agents with varying sensing and action capabilities in an experimental domain, and show that our method produces curricula customized for each agent that improve performance relative to learning from scratch or using a different agent's curriculum.


Sign in / Sign up

Export Citation Format

Share Document