scholarly journals The Choice Function Framework for Online Policy Improvement

2020 ◽  
Vol 34 (06) ◽  
pp. 10178-10185
Author(s):  
Murugeswari Issakkimuthu ◽  
Alan Fern ◽  
Prasad Tadepalli

There are notable examples of online search improving over hand-coded or learned policies (e.g. AlphaZero) for sequential decision making. It is not clear, however, whether or not policy improvement is guaranteed for many of these approaches, even when given a perfect leaf evaluation function and transition model. Indeed, simple counterexamples show that seemingly reasonable online search procedures can hurt performance compared to the original policy. To address this issue, we introduce the choice function framework for analyzing online search procedures for policy improvement. A choice function specifies the actions to be considered at every node of a search tree, with all other actions being pruned. Our main contribution is to give sufficient conditions for stationary and non-stationary choice functions to guarantee that the value achieved by online search is no worse than the original policy. In addition, we describe a general parametric class of choice functions that satisfy those conditions and present an illustrative use case of the empirical utility of the framework.

Author(s):  
Madhuparna Karmokar ◽  
Souvik Roy ◽  
Ton Storcken

AbstractIn this paper, we consider choice functions that are unanimous, anonymous, symmetric, and group strategy-proof and consider domains that are single-peaked on some tree. We prove the following three results in this setting. First, there exists a unanimous, anonymous, symmetric, and group strategy-proof choice function on a path-connected domain if and only if the domain is single-peaked on a tree and the number of agents is odd. Second, a choice function is unanimous, anonymous, symmetric, and group strategy-proof on a single-peaked domain on a tree if and only if it is the pairwise majority rule (also known as the tree-median rule) and the number of agents is odd. Third, there exists a unanimous, anonymous, symmetric, and strategy-proof choice function on a strongly path-connected domain if and only if the domain is single-peaked on a tree and the number of agents is odd. As a corollary of these results, we obtain that there exists no unanimous, anonymous, symmetric, and group strategy-proof choice function on a path-connected domain if the number of agents is even.


2008 ◽  
Vol 04 (03) ◽  
pp. 309-327 ◽  
Author(s):  
JOHN N. MORDESON ◽  
KIRAN R. BHUTANI ◽  
TERRY D. CLARK

If we assume that the preferences of a set of political actors are not cyclic, we would like to know if their collective choices are rationalizable. Given a fuzzy choice rule, do they collectively choose an alternative from the set of undominated alternatives? We consider necessary and sufficient conditions for a partially acyclic fuzzy choice function to be rationalizable. We find that certain fuzzy choice functions that satisfy conditions α and β are rationalizable. Furthermore, any fuzzy choice function that satisfies these two conditions also satisfies Arrow and Warp.


Author(s):  
Ming-Sheng Ying ◽  
Yuan Feng ◽  
Sheng-Gang Ying

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.


Sign in / Sign up

Export Citation Format

Share Document