scholarly journals Risk-Sensitive Multiagent Decision-Theoretic Planning Based on MDP and One-Switch Utility Functions

2014 ◽  
Vol 2014 ◽  
pp. 1-11
Author(s):  
Wei Zeng ◽  
Hongtao Zhou ◽  
Mingshan You

In high stakes situations decision-makers are often risk-averse and decision-making processes often take place in group settings. This paper studies multiagent decision-theoretic planning under Markov decision processes (MDPs) framework with considering the change of agent’s risk attitude as his wealth level varies. Based on one-switch utility function that describes agent’s risk attitude change with his wealth level, we give the additive and multiplicative aggregation models of group utility and adopt maximizing expected group utility as planning objective. When the wealth level approaches infinity, the characteristics of optimal policy are analyzed for the additive and multiplicative aggregation model, respectively. Then a backward-induction method is proposed to divide the wealth level interval from negative infinity to initial wealth level into subintervals and determine the optimal policy in states and subintervals. The proposed method is illustrated by numerical examples and the influences of agent’s risk aversion parameters and weights on group decision-making are also analyzed.

Author(s):  
A. V. Lachikhin

Currently, the paradigm of intelligent agents and multi-agent systems is actively developing. The policy of agents ‘ actions can be represented as a Markov decision-making process. Such agents need methods to develop optimal policies. The purpose of this study is to review existing techniques, determine the possibility and conditions of their application. The main approaches based on linear and dynamic programming are considered. The specific algorithms used to find the extreme value of utility are given. The method of linear programming - simplex method, and the method of dynamic programming method-iteration of values are considered. The equations necessary to find the optimal policy of intelligent agent actions are given. Restrictions of application of various algorithms are considered. Conclusions the most suitable method for finding the optimal policy is the iteration of values.


1984 ◽  
Vol 28 (1) ◽  
pp. 191-193 ◽  
Author(s):  
T. M. Vinogradskaya ◽  
B. A. Geninson ◽  
A. A. Rubchinskii

2019 ◽  
Vol 9 (2) ◽  
pp. 43-61 ◽  
Author(s):  
Sérgio Guerreiro

Decision-making processes are the utmost important to steer the organizational change whenever business process workarounds are attempted during operational times. However, to decide the non-compliant situations, e.g., bypasses, social resistance, or collusion; the business manager demands contextualized and correct interpretations of the existing business process redesign options to cope with workarounds. This article explores the need to aid the decision-making process with a full constructional perspective to optimize the business processes redesign. So, the Markov decision process is combined with the body of knowledge of business processes, in specific, the concepts of designing enterprise-wide business transactions. This methodology supports the management initiatives with more knowledge about the value of business processes redesign. A classical chain of Order-to-Cash business processes (the order, the production, the distribution and the selling of goods) illustrate the benefits of this quantitative approach. Results obtained for business processes redesign in reaction to workarounds are reported. The analysis results show that this approach can anticipate the sub-optimal solutions before taking actions and highlights the impact of discount factors in the final obtained value. The contribution of this novel conceptual integration to the business processes community is the forecast of value function of business transaction redesign options when facing non-compliant workarounds. From related literature, business processes compliance usually comprises offline computation and the redesign is only considered in the forthcoming business processes instances. This article is innovative in the sense that it anticipates the value impact of a redesign, allowing more effective decisions to be taken.


2012 ◽  
Vol 6-7 ◽  
pp. 267-272
Author(s):  
Ming Shan You ◽  
Wei Zeng ◽  
Hong Tao Zhou

One-switch utility function is used to describe how the risk attitude of a decision maker changes with his wealth level. In this paper additive decision rule is used for the aggregation of decision member’s utility which is represented by one-switch utility function. Based on Markov decision processes (MDP) and group utility, a dynamic, multi-stages and risk sensitive group decision model is proposed. The proposed model augments the state of MDP with wealth level, so the policy of the model is defined as an action executed in a state and a wealth level interval. A backward-induction algorithm is given to solve the optimal policy for the model. Numerical examples show that personal risk attitude has a great influence on group decision-making when personal risk attitudes of members are different, while the weights of members play a critical role when personal risk attitudes of members are similar.


2017 ◽  
Vol 14 (5) ◽  
pp. 467-472 ◽  
Author(s):  
Mohammad M. Hamasha ◽  
George Rumbe

Purpose Emergency departments (ED) are faced with the challenge of capacity planning that caused by the high demand for patients and limited resources. Consequently, inadequate resources lead to increased delays, impacts on the quality of care and increase the health-care costs. Such circumstances necessitate utilizing operational research modules, such as the Markov decision process (MDP) to enable better decision-making. The purpose of this paper is to demonstrate the applicability and usage of MDP on ED. Design/methodology/approach The adoption of MDP provides invaluable insights into system operations based on the different system states (e.g. very busy to unoccupied) to ensure optimal assigning of resources and reduced costs. In this paper, a descriptive health system model based on the MDP is presented, and a numerical example is illustrated to elaborate its appropriateness in optimal policy decision determination. Findings Faced with numerous decisions, hospital managers have to ensure that the appropriate technique is used to minimize any undesired outcomes. MDP has been shown to be a robust approach which provides support to the critical decision-making processes. Additionally, MDP also provides insights on the associated costs which enable the hospital managers to efficiently allocate resources ensuring quality health care and increased throughput while minimizing costs. Originality/value Applying MDP in the ED is a unique and good starting. MDP is powerful tool helps in making a decision in the critical situations, and the ED needs such tool.


Author(s):  
Thomas Boraud

This chapter assesses alternative approaches of reinforcement learning that are developed by machine learning. The initial goal of this branch of artificial intelligence, which appeared in the middle of the twentieth century, was to develop and implement algorithms that allow a machine to learn. Originally, they were computers or more or less autonomous robotic automata. As artificial intelligence has developed and cross-fertilized with neuroscience, it has begun to be used to model the learning and decision-making processes for biological agents, broadening the meaning of the word ‘machine’. Theoreticians of this discipline define several categories of learning, but this chapter only deals with those which are related to reinforcement learning. To understand how these algorithms work, it is necessary first of all to explain the Markov chain and the Markov decision-making process. The chapter then goes on to examine model-free reinforcement learning algorithms, the actor-critic model, and finally model-based reinforcement learning algorithms.


2019 ◽  
Author(s):  
Ryan Smith ◽  
Sahib Khalsa ◽  
Martin Paulus

AbstractBackgroundAntidepressant medication adherence is among the most important problems in health care worldwide. Interventions designed to increase adherence have largely failed, pointing towards a critical need to better understand the underlying decision-making processes that contribute to adherence. A computational decision-making model that integrates empirical data with a fundamental action selection principle could be pragmatically useful in 1) making individual level predictions about adherence, and 2) providing an explanatory framework that improves our understanding of non-adherence.MethodsHere we formulate a partially observable Markov decision process model based on the active inference framework that can simulate several processes that plausibly influence adherence decisions.ResultsUsing model simulations of the day-to-day decisions to take a prescribed selective serotonin reuptake inhibitor (SSRI), we show that several distinct parameters in the model can influence adherence decisions in predictable ways. These parameters include differences in policy depth (i.e., how far into the future one considers when deciding), decision uncertainty, beliefs about the predictability (stochasticity) of symptoms, beliefs about the magnitude and time course of symptom reductions and side effects, and the strength of medication-taking habits that one has acquired.ConclusionsClarifying these influential factors will be an important first step toward empirically determining which are contributing to non-adherence to antidepressants in individual patients. The model can also be seamlessly extended to simulate adherence to other medications (by incorporating the known symptom reduction and side effect trajectories of those medications), with the potential promise of identifying which medications may be best suited for different patients.


10.28945/2750 ◽  
2004 ◽  
Author(s):  
Abdullah Gani ◽  
Omar Zakaria ◽  
Nor Badrul Anuar Jumaat

This paper presents an application of Markov Decision Process (MDP) into the provision of traffic prioritisation in the best-effort networks. MDP was used because it is a standard, general formalism for modelling stochastic, sequential decision problems. The implementation of traffic prioritisation involves a series of decision making processes by which packets are marked and classified before being despatched to destinations. The application of MDP was driven by the objective of ensuring the higher priority packets are not delayed by the lower ones. The MDP is believed to be applicable in improving the traffic prioritisation arbitration.


Sign in / Sign up

Export Citation Format

Share Document