Linear Upper Confidence Bound Algorithm for Contextual Bandit Problem with Piled Rewards

Multi-fidelity Gaussian Process Bandit Optimisation

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11288 ◽

2019 ◽

Vol 66 ◽

pp. 151-196 ◽

Cited By ~ 2

Author(s):

Kirthevasan Kandasamy ◽

Gautam Dasarathy ◽

Junier Oliva ◽

Jeff Schneider ◽

Barnabás Póczos

Keyword(s):

Computer Simulation ◽

Theoretical Analysis ◽

Gaussian Process ◽

Target Function ◽

Black Box ◽

Confidence Bound ◽

Bandit Problem ◽

Single Function ◽

Novel Method ◽

Upper Confidence Bound

In many scientific and engineering applications, we are tasked with the maximisation of an expensive to evaluate black box function f. Traditional settings for this problem assume just the availability of this single function. However, in many cases, cheap approximations to f may be obtainable. For example, the expensive real world behaviour of a robot can be approximated by a cheap computer simulation. We can use these approximations to eliminate low function value regions cheaply and use the expensive evaluations of f in a small but promising region and speedily identify the optimum. We formalise this task as a multi-fidelity bandit problem where the target function and its approximations are sampled from a Gaussian process. We develop MF-GP-UCB, a novel method based on upper confidence bound techniques. In our theoretical analysis we demonstrate that it exhibits precisely the above behaviour and achieves better bounds on the regret than strategies which ignore multi-fidelity information. Empirically, MF-GP-UCB outperforms such naive strategies and other multi-fidelity methods on several synthetic and real experiments.

Download Full-text

A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents

Operations Research ◽

10.1287/opre.2020.1987 ◽

2020 ◽

Author(s):

Daniel Russo

Keyword(s):

Posterior Distribution ◽

Error Term ◽

Discount Factor ◽

Gittins Index ◽

Confidence Bound ◽

Bandit Problem ◽

Confidence Bounds ◽

Upper Confidence Bound ◽

Gittins Indices ◽

Multiarmed Bandit

This note gives a short, self-contained proof of a sharp connection between Gittins indices and Bayesian upper confidence bound algorithms. I consider a Gaussian multiarmed bandit problem with discount factor [Formula: see text]. The Gittins index of an arm is shown to equal the [Formula: see text]-quantile of the posterior distribution of the arm's mean plus an error term that vanishes as [Formula: see text]. In this sense, for sufficiently patient agents, a Gittins index measures the highest plausible mean-reward of an arm in a manner equivalent to an upper confidence bound.

Download Full-text

Federated Bandit

Proceedings of the ACM on Measurement and Analysis of Computing Systems ◽

10.1145/3447380 ◽

2021 ◽

Vol 5 (1) ◽

pp. 1-29

Author(s):

Zhaowei Zhu ◽

Jingxuan Zhu ◽

Ji Liu ◽

Yang Liu

Keyword(s):

Connected Graph ◽

Differential Privacy ◽

Distributed Data ◽

Confidence Bound ◽

Bandit Problem ◽

Local Data ◽

Largest Eigenvalue ◽

Upper Confidence Bound ◽

Second Largest Eigenvalue ◽

Private Version

In this paper, we study Federated Bandit, a decentralized Multi-Armed Bandit problem with a set of N agents, who can only communicate their local data with neighbors described by a connected graph G. Each agent makes a sequence of decisions on selecting an arm from M candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm \textttGossip\_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that \textttGossip\_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of O(\max\ \textttpoly (N,M) łog T, \textttpoly (N,M)łog_łambda_2^-1 N\ ) for all N agents, where łambda_2\in(0,1) is the second largest eigenvalue of the expected gossip matrix, which is a function of G. We then propose \textttFed\_UCB, a differentially private version of \textttGossip\_UCB, in which the agents preserve ε-differential privacy of their local data while achieving O(\max \\frac\textttpoly (N,M) ε łog^2.5 T, \textttpoly (N,M) (łog_łambda_2^-1 N + łog T) \ ) regret.

Download Full-text

Computer Adaptive Testing Using Upper-Confidence Bound Algorithm for Formative Assessment

Applied Sciences ◽

10.3390/app9204303 ◽

2019 ◽

Vol 9 (20) ◽

pp. 4303 ◽

Cited By ~ 2

Author(s):

Jaroslav Melesko ◽

Vitalij Novickij

Keyword(s):

Formative Assessment ◽

Latent Trait ◽

Corrective Feedback ◽

Strong Support ◽

Adaptive Testing ◽

Computer Adaptive Testing ◽

Confidence Bound ◽

Question Item ◽

E Learning ◽

Upper Confidence Bound

There is strong support for formative assessment inclusion in learning processes, with the main emphasis on corrective feedback for students. However, traditional testing and Computer Adaptive Testing can be problematic to implement in the classroom. Paper based tests are logistically inconvenient and are hard to personalize, and thus must be longer to accurately assess every student in the classroom. Computer Adaptive Testing can mitigate these problems by making use of Multi-Dimensional Item Response Theory at cost of introducing several new problems, most problematic of which are the greater test creation complexity, because of the necessity of question pool calibration, and the debatable premise that different questions measure one common latent trait. In this paper a new approach of modelling formative assessment as a Multi-Armed bandit problem is proposed and solved using Upper-Confidence Bound algorithm. The method in combination with e-learning paradigm has the potential to mitigate such problems as question item calibration and lengthy tests, while providing accurate formative assessment feedback for students. A number of simulation and empirical data experiments (with 104 students) are carried out to explore and measure the potential of this application with positive results.

Download Full-text

Identification of Top-K Influencers Based on Upper Confidence Bound and Local Structure

Big Data Research ◽

10.1016/j.bdr.2021.100208 ◽

2021 ◽

pp. 100208

Author(s):

Mohammed Alshahrani ◽

Fuxi Zhu ◽

Soufiana Mekouar ◽

Mohammed Yahya Alghamdi ◽

Shichao Liu

Keyword(s):

Local Structure ◽

Confidence Bound ◽

Upper Confidence Bound

Download Full-text

Deep learning classification of bitcoin miners and exploration of upper confidence bound algorithm with less regret for the selection of honest mining

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-021-03527-9 ◽

2021 ◽

Author(s):

M. J. Jeyasheela Rakkini ◽

K. Geetha

Keyword(s):

Deep Learning ◽

Confidence Bound ◽

Upper Confidence Bound ◽

Selection Of

Download Full-text

Fast Iterative model for Sequential-Selection-Based Applications

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v12i7.3092 ◽

2014 ◽

Vol 12 (7) ◽

pp. 3689-3696 ◽

Cited By ~ 1

Author(s):

Khosrow Amirizadeh ◽

Rajeswari Mandava

Keyword(s):

Confidence Bound ◽

Adaptive Model ◽

Step Size ◽

Simple Task ◽

Iterative Model ◽

Lack Of Information ◽

Sequential Selection ◽

Proposed Model ◽

On Line ◽

Upper Confidence Bound

Accelerated multi-armed bandit (MAB) model in Reinforcement-Learning for on-line sequential selection problems is presented. This iterative model utilizes an automatic step size calculation that improves the performance of MAB algorithm under different conditions such as, variable variance of reward and larger set of usable actions. As result of these modifications, number of optimal selections will be maximized and stability of the algorithm under mentioned conditions may be amplified. This adaptive model with automatic step size computation may attractive for on-line applications in which,Â variance of observations vary with time and re-tuning their step size are unavoidable where, this re-tuning is not a simple task. The proposed model governed by upper confidence bound (UCB) approach in iterative form with automatic step size computation. It called adaptive UCB (AUCB) that may use in industrial robotics, autonomous control and intelligent selection or prediction tasks in the economical engineering applications under lack of information.

Download Full-text

Randomised Gaussian Process Upper Confidence Bound for Bayesian Optimisation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/316 ◽

2020 ◽

Author(s):

Julian Berk ◽

Sunil Gupta ◽

Santu Rana ◽

Svetha Venkatesh

Keyword(s):

Gaussian Process ◽

Real World ◽

Confidence Bound ◽

Trade Off ◽

Upper Confidence Bound ◽

Exploration Exploitation

In order to improve the performance of Bayesian optimisation, we develop a modified Gaussian process upper confidence bound (GP-UCB) acquisition function. This is done by sampling the exploration-exploitation trade-off parameter from a distribution. We prove that this allows the expected trade-off parameter to be altered to better suit the problem without compromising a bound on the function's Bayesian regret. We also provide results showing that our method achieves better performance than GP-UCB in a range of real-world and synthetic problems.

Download Full-text

Hybridising Ant Colony Optimisation with a upper confidence bound algorithm for routing and wavelength assignment in an optical burst switching network

2016 IEEE Symposium Series on Computational Intelligence (SSCI) ◽

10.1109/ssci.2016.7849900 ◽

2016 ◽

Cited By ~ 3

Author(s):

Andrew S. Gravett ◽

Mathys C. du Plessis ◽

Timothy B. Gibbon

Keyword(s):

Optical Burst Switching ◽

Routing And Wavelength Assignment ◽

Wavelength Assignment ◽

Ant Colony ◽

Switching Network ◽

Ant Colony Optimisation ◽

Confidence Bound ◽

Burst Switching ◽

Optical Burst Switching Network ◽

Upper Confidence Bound

Download Full-text

Optimization based sampling for gray-box modeling using a modified upper confidence bound acquisition function

31st European Symposium on Computer Aided Process Engineering - Computer Aided Chemical Engineering ◽

10.1016/b978-0-323-88506-5.50147-9 ◽

2021 ◽

pp. 953-958

Author(s):

Joschka Winz ◽

Sebastian Engell

Keyword(s):

Confidence Bound ◽

Upper Confidence Bound

Download Full-text