scholarly journals Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability

Author(s):  
David Simchi-Levi ◽  
Yunzong Xu

We consider the general (stochastic) contextual bandit problem under the realizability assumption, that is, the expected reward, as a function of contexts and actions, belongs to a general function class [Formula: see text]. We design a fast and simple algorithm that achieves the statistically optimal regret with only [Formula: see text] calls to an offline regression oracle across all T rounds. The number of oracle calls can be further reduced to [Formula: see text] if T is known in advance. Our results provide the first universal and optimal reduction from contextual bandits to offline regression, solving an important open problem in the contextual bandit literature. A direct consequence of our results is that any advances in offline regression immediately translate to contextual bandits, statistically and computationally. This leads to faster algorithms and improved regret guarantees for broader classes of contextual bandit problems.

1995 ◽  
Vol 32 (1) ◽  
pp. 168-182 ◽  
Author(s):  
K. D. Glazebrook ◽  
S. Greatrix

Nash (1980) demonstrated that index policies are optimal for a class of generalised bandit problem. A transform of the index concerned has many of the attributes of the Gittins index. The transformed index is positive-valued, with maximal values yielding optimal actions. It may be characterised as the value of a restart problem and is hence computable via dynamic programming methodologies. The transformed index can also be used in procedures for policy evaluation.


1997 ◽  
Vol 06 (02) ◽  
pp. 95-149 ◽  
Author(s):  
Parke Godfrey

When a query fails, it is more cooperative to identify the cause of failure, rather than just to report the empty answer set. When there is not a cause per se for the query's failure, it is then worthwhile to report the part of the query which failed. To identify a Minimal Failing Subquery (MFS) of the query is the best way to do this. (This MFS is not unique; there may be many of them.) Likewise, to identify a Maximal Succeeding Subquery (XSS) can help a user to recast a new query that leads to a non-empty answer set. Database systems do not provide the functionality of these types of cooperative responses. This may be, in part, because algorithmic approaches to finding the MFSs and the XSSs to a failing query are not obvious. The search space of subqueries is large. Despite work on MFSs in the past, the algorithmic complexity of these identification problems had remained uncharted. This paper shows the complexity profile of MFS and XSS identification. It is shown that there exists a simple algorithm for finding an MFS or an XSS by asking N subsequent queries, in which N is the length of the query. To find more MFSs (or XSSs) can be hard. It is shown that to find N MFSs (or XSSs) is NP-hard. To find k MFSs (or XSSs), for a fixed k, remains polynomial. An optimal algorithm for enumerating MFSs and XSSs, ISHMAEL, is developed and presented. The algorithm has ideal performance in enumeration, finding the first answers quickly, and only decaying toward intractability in a predictable manner as further answers are found. The complexity results and the algorithmic approaches given in this paper should allow for the construction of cooperative facilities which identify MFSs and XSSs for database systems. These results are relevant to a number of problems outside of databases too, and may find further application.


2008 ◽  
Vol 40 (02) ◽  
pp. 377-400 ◽  
Author(s):  
Savas Dayanik ◽  
Warren Powell ◽  
Kazutoshi Yamazaki

A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied. Then it is assumed that the arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.


Author(s):  
Zehong Hu ◽  
Jie Zhang

Posted-price mechanisms are widely-adopted to decide the price of tasks in popular microtask crowdsourcing. In this paper, we propose a novel posted-price mechanism which not only outperforms existing mechanisms on performance but also avoids their need of a finite price range. The advantages are achieved by converting the pricing problem into a multi-armed bandit problem and designing an optimal algorithm to exploit the unique features of microtask crowdsourcing. We theoretically show the optimality of our algorithm and prove that the performance upper bound can be achieved without the need of a prior price range. We also conduct extensive experiments using real price data to verify the advantages and practicability of our mechanism.


2011 ◽  
Vol 11 (11&12) ◽  
pp. 1019-1027
Author(s):  
Itai Itai Arad

This is not a disproof of the quantum PCP conjecture! In this note we use perturbation on the commuting Hamiltonian problem on a graph, based on results by Bravyi and Vyalyi, to provide a very partial no-go theorem for quantum PCP. Specifically, we derive an upper bound on how large the promise gap can be for the quantum PCP still to hold, as a function of the non-commuteness of the system. As the system becomes more and more commuting, the maximal promise gap shrinks. We view these results as possibly a preliminary step towards disproving the quantum PCP conjecture posed in \cite{ref:Aha09}. A different way to view these results is actually as indications that a critical point exists, beyond which quantum PCP indeed holds; in any case, we hope that these results will lead to progress on this important open problem.


2011 ◽  
Vol 76 (2) ◽  
pp. 368-376 ◽  
Author(s):  
Mark Fulk

AbstractResults in recursion-theoretic inductive inference have been criticized as depending on unrealistic self-referential examples. J. M. Bārzdiņš proposed a way of ruling out such examples, and conjectured that one of the earliest results of inductive inference theory would fall if his method were used. In this paper we refute Bārzdiņš' conjecture.We propose a new line of research examining robust separations; these are defined using a strengthening of Bārzdiņš' original idea. The preliminary results of the new line of research are presented, and the most important open problem is stated as a conjecture. Finally, we discuss the extension of this work from function learning to formal language learning.


2008 ◽  
Vol 40 (2) ◽  
pp. 377-400 ◽  
Author(s):  
Savas Dayanik ◽  
Warren Powell ◽  
Kazutoshi Yamazaki

A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied. Then it is assumed that the arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.


1995 ◽  
Vol 32 (01) ◽  
pp. 168-182
Author(s):  
K. D. Glazebrook ◽  
S. Greatrix

Nash (1980) demonstrated that index policies are optimal for a class of generalised bandit problem. A transform of the index concerned has many of the attributes of the Gittins index. The transformed index is positive-valued, with maximal values yielding optimal actions. It may be characterised as the value of a restart problem and is hence computable via dynamic programming methodologies. The transformed index can also be used in procedures for policy evaluation.


2017 ◽  
Vol 114 (43) ◽  
pp. 11380-11385 ◽  
Author(s):  
Noah E. Friedkin ◽  
Francesco Bullo

How truth wins in social groups is an important open problem. Classic experiments on social groups dealing with truth statement issues present mixed findings on the conditions of truth abandonment and reaching a consensus on the truth. No theory has been developed and evaluated that might integrate these findings with a mathematical model of the interpersonal influence system that alters some or all of its members’ positions on an issue. In this paper we provide evidence that a general model in the network science on opinion dynamics substantially clarifies how truth wins in groups.


Sign in / Sign up

Export Citation Format

Share Document