Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/676 ◽

2018 ◽

Cited By ~ 1

Author(s):

Shun Zhang ◽

Edmund H. Durfee ◽

Satinder Singh

Keyword(s):

Side Effects ◽

Markov Decision Processes ◽

Decision Processes ◽

Minimax Regret ◽

Negative Side ◽

Markov Decision ◽

Planning Algorithm ◽

Negative Side Effects

As it achieves a goal on behalf of its human user, an autonomous agent's actions may have side effects that change features of its environment in ways that negatively surprise its user. An agent that can be trusted to operate safely should thus only change features the user has explicitly permitted. We formalize this problem, and develop a planning algorithm that avoids potentially negative side effects given what the agent knows about (un)changeable features. Further, we formulate a provably minimax-regret querying strategy for the agent to selectively ask the user about features that it hasn't explicitly been told about. We empirically show how much faster it is than a more exhaustive approach and how much better its queries are than those found by the best known heuristic.

Download Full-text

A Multi-Objective Approach to Mitigate Negative Side Effects

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/50 ◽

2020 ◽

Author(s):

Sandhya Saisubramanian ◽

Ece Kamar ◽

Shlomo Zilberstein

Keyword(s):

Side Effects ◽

Empirical Evaluation ◽

Primary Objective ◽

Maximum Deviation ◽

Negative Side ◽

Multi Objective ◽

Markov Decision ◽

Secondary Objective ◽

Negative Side Effects ◽

The Impact

Agents operating in unstructured environments often create negative side effects (NSE) that may not be easy to identify at design time. We examine how various forms of human feedback or autonomous exploration can be used to learn a penalty function associated with NSE during system deployment. We formulate the problem of mitigating the impact of NSE as a multi-objective Markov decision process with lexicographic reward preferences and slack. The slack denotes the maximum deviation from an optimal policy with respect to the agent's primary objective allowed in order to mitigate NSE as a secondary objective. Empirical evaluation of our approach shows that the proposed framework can successfully mitigate NSE and that different feedback mechanisms introduce different biases, which influence the identification of NSE.

Download Full-text

Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)

Journal of Artificial Intelligence Research ◽

10.1613/jair.5242 ◽

2017 ◽

Vol 59 ◽

pp. 229-264 ◽

Cited By ~ 5

Author(s):

Asrar Ahmed ◽

Pradeep Varakantham ◽

Meghna Lowalekar ◽

Yossiri Adulyasak ◽

Patrick Jaillet

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Mixed Integer ◽

Benchmark Problems ◽

Minimax Regret ◽

Mixed Integer Linear Program ◽

Worst Case ◽

Integer Linear Program Formulation ◽

Markov Decision ◽

Reward Functions

Markov Decision Processes (MDPs) are an effective model to represent decision processes in the presence of transitional uncertainty and reward tradeoffs. However, due to the difficulty in exactly specifying the transition and reward functions in MDPs, researchers have proposed uncertain MDP models and robustness objectives in solving those models. Most approaches for computing robust policies have focused on the computation of maximin policies which maximize the value in the worst case amongst all realisations of uncertainty. Given the overly conservative nature of maximin policies, recent work has proposed minimax regret as an ideal alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only and they are also limited in their scalability. Therefore, we provide a general model of uncertain MDPs that considers uncertainty over both transition and reward functions. Furthermore, we also consider dependence of the uncertainty across different states and decision epochs. We also provide a mixed integer linear program formulation for minimizing regret given a set of samples of the transition and reward functions in the uncertain MDP. In addition, we provide two myopic variants of regret, namely Cumulative Expected Myopic Regret (CEMR) and One Step Regret (OSR) that can be optimized in a scalable manner. Specifically, we provide dynamic programming and policy iteration based algorithms to optimize CEMR and OSR respectively. Finally, to demonstrate the effectiveness of our approaches, we provide comparisons on two benchmark problems from literature. We observe that optimizing the myopic variants of regret, OSR and CEMR are better than directly optimizing the regret.

Download Full-text

Querying to Find a Safe Policy under Uncertain Safety Constraints in Markov Decision Processes

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5638 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2552-2559

Author(s):

Shun Zhang ◽

Edmund Durfee ◽

Satinder Singh

Keyword(s):

Side Effects ◽

Markov Decision Processes ◽

Side Effect ◽

Decision Processes ◽

Autonomous Agent ◽

Prior Work ◽

Markov Decision ◽

Safety Constraints ◽

Optimal Approach

An autonomous agent acting on behalf of a human user has the potential of causing side-effects that surprise the user in unsafe ways. When the agent cannot formulate a policy with only side-effects it knows are safe, it needs to selectively query the user about whether other useful side-effects are safe. Our goal is an algorithm that queries about as few potential side-effects as possible to find a safe policy, or to prove that none exists. We extend prior work on irreducible infeasible sets to also handle our problem's complication that a constraint to avoid a side-effect cannot be relaxed without user permission. By proving that our objectives are also adaptive submodular, we devise a querying algorithm that we empirically show finds nearly-optimal queries with much less computation than a guaranteed-optimal approach, and outperforms competing approximate approaches.

Download Full-text

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i06.6531 ◽

2020 ◽

Vol 34 (06) ◽

pp. 9794-9801

Author(s):

Tomáš Brázdil ◽

Krishnendu Chatterjee ◽

Petr Novotný ◽

Jiří Vahala

Keyword(s):

Markov Decision Processes ◽

Negative Impact ◽

Optimization Criterion ◽

Decision Processes ◽

Risk Averse ◽

Sequential Decision ◽

Failure State ◽

Markov Decision ◽

Planning Algorithm ◽

Low Probability

Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 106 states.

Download Full-text

Learning Control of Dynamical Systems Based on Markov Decision Processes: Research Frontiers and Outlooks

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2012.00673 ◽

2012 ◽

Vol 38 (5) ◽

pp. 673-687 ◽

Cited By ~ 1

Author(s):

Xin XU ◽

Dong SHEN ◽

Yan-Qing GAO ◽

Kai WANG

Keyword(s):

Dynamical Systems ◽

Markov Decision Processes ◽

Learning Control ◽

Decision Processes ◽

Markov Decision ◽

Research Frontiers

Download Full-text

A Framework for Modeling Bounded Rationality: Mis-Specified Bayesian-Markov Decision Processes

SSRN Electronic Journal ◽

10.2139/ssrn.2710475 ◽

2016 ◽

Cited By ~ 1

Author(s):

Ignacio Esponda ◽

Demian Pouzo

Keyword(s):

Bounded Rationality ◽

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision

Download Full-text

A Vector Minimum Superharmonic Approach to Solving Infinite-Horizon Discounted Markov Decision Processes

Journal of the Operational Research Society ◽

10.1038/sj/jors/0431109 ◽

1992 ◽

Vol 43 (11) ◽

pp. 1095-1102

Author(s):

D J White

Keyword(s):

Markov Decision Processes ◽

Infinite Horizon ◽

Decision Processes ◽

Markov Decision

Download Full-text

A Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion

SIAM Journal on Control and Optimization ◽

10.1137/19m1255811 ◽

2020 ◽

Vol 58 (4) ◽

pp. 2535-2566

Author(s):

François Dufour ◽

Alexandre Genadot

Keyword(s):

Convex Programming ◽

Markov Decision Processes ◽

Discrete Time ◽

Decision Processes ◽

Programming Approach ◽

Total Reward ◽

Markov Decision ◽

Reward Criterion

Download Full-text

Extreme-point solutions in Markov decision processes

Journal of Applied Probability ◽

10.1017/s002190020002413x ◽

1983 ◽

Vol 20 (04) ◽

pp. 835-842

Author(s):

David Assaf

Keyword(s):

Convex Function ◽

Extreme Point ◽

Markov Decision Processes ◽

Convex Functions ◽

Sufficient Conditions ◽

Decision Processes ◽

Markov Decision ◽

Full Solution

The paper presents sufficient conditions for certain functions to be convex. Functions of this type often appear in Markov decision processes, where their maximum is the solution of the problem. Since a convex function takes its maximum at an extreme point, the conditions may greatly simplify a problem. In some cases a full solution may be obtained after the reduction is made. Some illustrative examples are discussed.

Download Full-text

Singularly perturbed Markov chains. II. Applications to controlled dynamic systems and Markov decision processes

Proceedings of the 36th IEEE Conference on Decision and Control ◽

10.1109/cdc.1997.657595 ◽

2002 ◽

Author(s):

Q. Zhang ◽

G. Yin

Keyword(s):

Markov Chains ◽

Markov Decision Processes ◽

Dynamic Systems ◽

Decision Processes ◽

Singularly Perturbed ◽

Markov Decision

Download Full-text