Observe Before Play: Multi-Armed Bandit with Pre-Observations

Jinhang Zuo; Xiaoxi Zhang; Carlee Joe-Wong

doi:10.1609/aaai.v34i04.6187

Observe Before Play: Multi-Armed Bandit with Pre-Observations

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6187 ◽

2020 ◽

Vol 34 (04) ◽

pp. 7023-7030

Author(s):

Jinhang Zuo ◽

Xiaoxi Zhang ◽

Carlee Joe-Wong

Keyword(s):

Upper Bound ◽

Synthetic Data ◽

Wireless Channel ◽

Confidence Bound ◽

Trade Off ◽

Greedy Strategy ◽

Optimal Policies ◽

Centralized Algorithm ◽

Upper Confidence Bound

We consider the stochastic multi-armed bandit (MAB) problem in a setting where a player can pay to pre-observe arm rewards before playing an arm in each round. Apart from the usual trade-off between exploring new arms to find the best one and exploiting the arm believed to offer the highest reward, we encounter an additional dilemma: pre-observing more arms gives a higher chance to play the best one, but incurs a larger cost. For the single-player setting, we design an Observe-Before-Play Upper Confidence Bound (OBP-UCB) algorithm for K arms with Bernoulli rewards, and prove a T-round regret upper bound O(K2log T). In the multi-player setting, collisions will occur when players select the same arm to play in the same round. We design a centralized algorithm, C-MP-OBP, and prove its T-round regret relative to an offline greedy strategy is upper bounded in O(K4/M2log T) for K arms and M players. We also propose distributed versions of the C-MP-OBP policy, called D-MP-OBP and D-MP-Adapt-OBP, achieving logarithmic regret with respect to collision-free target policies. Experiments on synthetic data and wireless channel traces show that C-MP-OBP and D-MP-OBP outperform random heuristics and offline optimal policies that do not allow pre-observations.

Download Full-text

AdaLinUCB: Opportunistic Learning for Contextual Bandits

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/336 ◽

2019 ◽

Author(s):

Xueying Guo ◽

Xiaoxiao Wang ◽

Xin Liu

Keyword(s):

Real World ◽

Upper Bound ◽

Environmental Conditions ◽

Confidence Bound ◽

Trade Off ◽

Network Load ◽

Opportunistic Learning ◽

Upper Confidence Bound ◽

Exploration Exploitation ◽

Special Case

In this paper, we propose and study opportunistic contextual bandits - a special case of contextual bandits where the exploration cost varies under different environmental conditions, such as network load or return variation in recommendations. When the exploration cost is low, so is the actual regret of pulling a sub-optimal arm (e.g., trying a suboptimal recommendation). Therefore, intuitively, we could explore more when the exploration cost is relatively low and exploit more when the exploration cost is relatively high. Inspired by this intuition, for opportunistic contextual bandits with Linear payoffs, we propose an Adaptive Upper-Confidence-Bound algorithm (AdaLinUCB) to adaptively balance the exploration-exploitation trade-off for opportunistic learning. We prove that AdaLinUCB achieves O((log T)^2) problem-dependent regret upper bound, which has a smaller coefficient than that of the traditional LinUCB algorithm. Moreover, based on both synthetic and real-world dataset, we show that AdaLinUCB significantly outperforms other contextual bandit algorithms, under large exploration cost fluctuations.

Download Full-text

Randomised Gaussian Process Upper Confidence Bound for Bayesian Optimisation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/316 ◽

2020 ◽

Author(s):

Julian Berk ◽

Sunil Gupta ◽

Santu Rana ◽

Svetha Venkatesh

Keyword(s):

Gaussian Process ◽

Real World ◽

Confidence Bound ◽

Trade Off ◽

Upper Confidence Bound ◽

Exploration Exploitation

In order to improve the performance of Bayesian optimisation, we develop a modified Gaussian process upper confidence bound (GP-UCB) acquisition function. This is done by sampling the exploration-exploitation trade-off parameter from a distribution. We prove that this allows the expected trade-off parameter to be altered to better suit the problem without compromising a bound on the function's Bayesian regret. We also provide results showing that our method achieves better performance than GP-UCB in a range of real-world and synthetic problems.

Download Full-text

Cascaded Algorithm-Selection and Hyper-Parameter Optimization with Extreme-Region Upper Confidence Bound Bandit

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/351 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yi-Qi Hu ◽

Yang Yu ◽

Jun-Da Liao

Keyword(s):

Machine Learning ◽

Parameter Optimization ◽

Upper Bound ◽

Search Space ◽

Automatic Machine ◽

Search Procedure ◽

Confidence Bound ◽

Algorithm Selection ◽

Average Performance ◽

Upper Confidence Bound

An automatic machine learning (AutoML) task is to select the best algorithm and its hyper-parameters simultaneously. Previously, the hyper-parameters of all algorithms are joint as a single search space, which is not only huge but also redundant, because many dimensions of hyper-parameters are irrelevant with the selected algorithms. In this paper, we propose a cascaded approach for algorithm selection and hyper-parameter optimization. While a search procedure is employed at the level of hyper-parameter optimization, a bandit strategy runs at the level of algorithm selection to allocate the budget based on the search feedbacks. Since the bandit is required to select the algorithm with the maximum performance, instead of the average performance, we thus propose the extreme-region upper confidence bound (ER-UCB) strategy, which focuses on the extreme region of the underlying feedback distribution. We show theoretically that the ER-UCB has a regret upper bound O(K ln n) with independent feedbacks, which is as efficient as the classical UCB bandit. We also conduct experiments on a synthetic problem as well as a set of AutoML tasks. The results verify the effectiveness of the proposed method.

Download Full-text

UCB with An Optimal Inequality

10.20944/preprints202004.0426.v1 ◽

2020 ◽

Author(s):

Mark Burgess

Keyword(s):

Numerical Simulations ◽

Upper Bound ◽

Sample Variance ◽

Confidence Bound ◽

Sample Mean ◽

Information Theoretic ◽

Other Information ◽

Optimal Inequality ◽

Upper Confidence Bound ◽

Infor Mation

Upper confidence bound multi-armed bandit algorithms (UCB) typically rely on concentration in- equalities (such as Hoeffding’s inequality) for the creation of the upper confidence bound. Intu- itively, the tighter the bound is, the more likely the respective arm is or isn’t judged appropriately for selection. Hence we derive and utilise an optimal inequality. Usually the sample mean (and sometimes the sample variance) of previous rewards are the information which are used in the bounds which drive the algorithm, but intuitively the more infor- mation that taken from the previous rewards, the tighter the bound could be. Hence our inequality explicitly considers the values of each and every past reward into the upper bound expression which drives the method. We show how this UCB method fits into the broader scope of other information theoretic UCB algorithms, but unlike them is free from assumptions about the distribution of the data, We conclude by reporting some already established regret information, and give some numerical simulations to demonstrate the method’s effectiveness.

Download Full-text

Computer Adaptive Testing Using Upper-Confidence Bound Algorithm for Formative Assessment

Applied Sciences ◽

10.3390/app9204303 ◽

2019 ◽

Vol 9 (20) ◽

pp. 4303 ◽

Cited By ~ 2

Author(s):

Jaroslav Melesko ◽

Vitalij Novickij

Keyword(s):

Formative Assessment ◽

Latent Trait ◽

Corrective Feedback ◽

Strong Support ◽

Adaptive Testing ◽

Computer Adaptive Testing ◽

Confidence Bound ◽

Question Item ◽

E Learning ◽

Upper Confidence Bound

There is strong support for formative assessment inclusion in learning processes, with the main emphasis on corrective feedback for students. However, traditional testing and Computer Adaptive Testing can be problematic to implement in the classroom. Paper based tests are logistically inconvenient and are hard to personalize, and thus must be longer to accurately assess every student in the classroom. Computer Adaptive Testing can mitigate these problems by making use of Multi-Dimensional Item Response Theory at cost of introducing several new problems, most problematic of which are the greater test creation complexity, because of the necessity of question pool calibration, and the debatable premise that different questions measure one common latent trait. In this paper a new approach of modelling formative assessment as a Multi-Armed bandit problem is proposed and solved using Upper-Confidence Bound algorithm. The method in combination with e-learning paradigm has the potential to mitigate such problems as question item calibration and lengthy tests, while providing accurate formative assessment feedback for students. A number of simulation and empirical data experiments (with 104 students) are carried out to explore and measure the potential of this application with positive results.

Download Full-text

Fast Water Transport Through Sub-5 nm Polyamide Nanofilms: The New Upper-Bound of the Permeance-Selectivity Trade-Off in Nanofiltration

Journal of Materials Chemistry A ◽

10.1039/d1ta04763a ◽

2021 ◽

Author(s):

Pulak Sarkar ◽

Solagna Modak ◽

Santanu Ray ◽

Vasista Adupa ◽

K. Anki Reddy ◽

...

Keyword(s):

Water Transport ◽

Composite Membrane ◽

Upper Bound ◽

Polymer Membranes ◽

Liquid Transport ◽

Trade Off ◽

Commercial Exploitation

Liquid transport through the composite membrane is inversely proportional to the thickness of its separation layer. While the scalable fabrication of ultrathin polymer membranes is sought for their commercial exploitation,...

Download Full-text

Identification of Top-K Influencers Based on Upper Confidence Bound and Local Structure

Big Data Research ◽

10.1016/j.bdr.2021.100208 ◽

2021 ◽

pp. 100208

Author(s):

Mohammed Alshahrani ◽

Fuxi Zhu ◽

Soufiana Mekouar ◽

Mohammed Yahya Alghamdi ◽

Shichao Liu

Keyword(s):

Local Structure ◽

Confidence Bound ◽

Upper Confidence Bound

Download Full-text

Deep learning classification of bitcoin miners and exploration of upper confidence bound algorithm with less regret for the selection of honest mining

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-021-03527-9 ◽

2021 ◽

Author(s):

M. J. Jeyasheela Rakkini ◽

K. Geetha

Keyword(s):

Deep Learning ◽

Confidence Bound ◽

Upper Confidence Bound ◽

Selection Of

Download Full-text

Fast Iterative model for Sequential-Selection-Based Applications

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v12i7.3092 ◽

2014 ◽

Vol 12 (7) ◽

pp. 3689-3696 ◽

Cited By ~ 1

Author(s):

Khosrow Amirizadeh ◽

Rajeswari Mandava

Keyword(s):

Confidence Bound ◽

Adaptive Model ◽

Step Size ◽

Simple Task ◽

Iterative Model ◽

Lack Of Information ◽

Sequential Selection ◽

Proposed Model ◽

On Line ◽

Upper Confidence Bound

Accelerated multi-armed bandit (MAB) model in Reinforcement-Learning for on-line sequential selection problems is presented. This iterative model utilizes an automatic step size calculation that improves the performance of MAB algorithm under different conditions such as, variable variance of reward and larger set of usable actions. As result of these modifications, number of optimal selections will be maximized and stability of the algorithm under mentioned conditions may be amplified. This adaptive model with automatic step size computation may attractive for on-line applications in which,Â variance of observations vary with time and re-tuning their step size are unavoidable where, this re-tuning is not a simple task. The proposed model governed by upper confidence bound (UCB) approach in iterative form with automatic step size computation. It called adaptive UCB (AUCB) that may use in industrial robotics, autonomous control and intelligent selection or prediction tasks in the economical engineering applications under lack of information.

Download Full-text

Synthetic Business Microdata

Journal of Privacy and Confidentiality ◽

10.29012/jpc.733 ◽

2020 ◽

Vol 10 (2) ◽

Author(s):

Chien-Hung Chien ◽

Alan Hepburn Welsh ◽

John D Moore

Keyword(s):

Synthetic Data ◽

Data Access ◽

Australian Bureau ◽

Trade Off ◽

Information Reduction ◽

Input And Output ◽

Business Data ◽

Business Survey ◽

Australian Bureau Of Statistics ◽

Inform Decision Making

Enhancing microdata access is one of the strategic priorities for the Australian Bureau of Statistics (ABS) in its transformation program. However, balancing the trade-off between enhancing data access and protecting confidentiality is a delicate act. The ABS could use synthetic data to make its business microdata more accessible for researchers to inform decision making while maintaining confidentiality. This study explores the synthetic data approach for the release and analysis of business data. Australian businesses in some industries are characterised by oligopoly or duopoly. This means the existing microdata protection techniques such as information reduction or perturbation may not be as effective as for household microdata. The research focuses on addressing the following questions: Can a synthetic data approach enhance microdata access for the longitudinal business data? What is the utility and protection trade-off using the synthetic data approach? The study compares confidentialised input and output approaches for protecting confidentiality and analysing Australian microdata from business survey or administrative data sources.

Download Full-text