upper confidence bound
Recently Published Documents


TOTAL DOCUMENTS

101
(FIVE YEARS 59)

H-INDEX

8
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Wang Chi Cheung ◽  
David Simchi-Levi ◽  
Ruihao Zhu

We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic regret bounds for a collection of nonstationary stochastic bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in changing environments. We show how the difficulty posed by the (unknown a priori and possibly adversarial) nonstationarity can be overcome by an unconventional marriage between stochastic and adversarial bandit learning algorithms. Beginning with the linear bandit setting, we design and analyze a sliding window-upper confidence bound algorithm that achieves the optimal dynamic regret bound when the underlying variation budget is known. This budget quantifies the total amount of temporal variation of the latent environments. Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, our algorithm can further enjoy nearly optimal dynamic regret bounds in a (surprisingly) parameter-free manner. We extend our results to other related bandit problems, namely the multiarmed bandit, generalized linear bandit, and combinatorial semibandit settings, which model a variety of operations research applications. In addition to the classical exploration-exploitation trade-off, our algorithms leverage the power of the “forgetting principle” in the learning processes, which is vital in changing environments. Extensive numerical experiments with synthetic datasets and a dataset of an online auto-loan company during the severe acute respiratory syndrome (SARS) epidemic period demonstrate that our proposed algorithms achieve superior performance compared with existing algorithms. This paper was accepted by George J. Shanthikumar for the Management Science Special Issue on Data-Driven Prescriptive Analytics.


2021 ◽  
Vol 2052 (1) ◽  
pp. 012013
Author(s):  
S V Garbar

Abstract We consider two variations of upper confidence bound strategy for Gaussian two-armed bandits. Rewards for the arms are assumed to have unknown expected values and unknown variances. It is demonstrated that expected regret values for both discussed strategies are continuous functions of reward variance. A set of Monte-Carlo simulations was performed to show the nature of the relation between variance estimation and losses. It is shown that the regret grows only slightly when the estimation error is fairly large, which allows to estimate the variance during the initial steps of the control and stop this estimation later.


2021 ◽  
Author(s):  
Bo Shen ◽  
Raghav Gnanasambandam ◽  
Rongxuan Wang ◽  
Zhenyu Kong

In many scientific and engineering applications, Bayesian optimization (BO) is a powerful tool for hyperparameter tuning of a machine learning model, materials design and discovery, etc. BO guides the choice of experiments in a sequential way to find a good combination of design points in as few experiments as possible. It can be formulated as a problem of optimizing a “black-box” function. Different from single-task Bayesian optimization, Multi-task Bayesian optimization is a general method to efficiently optimize multiple different but correlated “black-box” functions. The previous works in Multi-task Bayesian optimization algorithm queries a point to be evaluated for all tasks in each round of search, which is not efficient. For the case where different tasks are correlated, it is not necessary to evaluate all tasks for a given query point. Therefore, the objective of this work is to develop an algorithm for multi-task Bayesian optimization with automatic task selection so that only one task evaluation is needed per query round. Specifically, a new algorithm, namely, multi-task Gaussian process upper confidence bound (MT-GPUCB), is proposed to achieve this objective. The MT-GPUCB is a two-step algorithm, where the first step chooses which query point to evaluate, and the second step automatically selects the most informative task to evaluate. Under the bandit setting, a theoretical analysis is provided to show that our proposed MT-GPUCB is no-regret under some mild conditions. Our proposed algorithm is verified experimentally on a range of synthetic functions as well as real-world problems. The results clearly show the advantages of our query strategy for both design point and task.


2021 ◽  
Author(s):  
Bo Shen ◽  
Raghav Gnanasambandam ◽  
Rongxuan Wang ◽  
Zhenyu Kong

In many scientific and engineering applications, Bayesian optimization (BO) is a powerful tool for hyperparameter tuning of a machine learning model, materials design and discovery, etc. BO guides the choice of experiments in a sequential way to find a good combination of design points in as few experiments as possible. It can be formulated as a problem of optimizing a “black-box” function. Different from single-task Bayesian optimization, Multi-task Bayesian optimization is a general method to efficiently optimize multiple different but correlated “black-box” functions. The previous works in Multi-task Bayesian optimization algorithm queries a point to be evaluated for all tasks in each round of search, which is not efficient. For the case where different tasks are correlated, it is not necessary to evaluate all tasks for a given query point. Therefore, the objective of this work is to develop an algorithm for multi-task Bayesian optimization with automatic task selection so that only one task evaluation is needed per query round. Specifically, a new algorithm, namely, multi-task Gaussian process upper confidence bound (MT-GPUCB), is proposed to achieve this objective. The MT-GPUCB is a two-step algorithm, where the first step chooses which query point to evaluate, and the second step automatically selects the most informative task to evaluate. Under the bandit setting, a theoretical analysis is provided to show that our proposed MT-GPUCB is no-regret under some mild conditions. Our proposed algorithm is verified experimentally on a range of synthetic functions as well as real-world problems. The results clearly show the advantages of our query strategy for both design point and task.


2021 ◽  
Vol 17 (3) ◽  
pp. 85-100
Author(s):  
Jayaraman Parthasarathy ◽  
Ramesh Babu Kalivaradhan

Online collaborative movie recommendation systems attempt to help customers accessing their favourable movies by gathering exactly comparable neighbors between the movies from their chronological identical ratings. Collaborative filtering-based movie recommendation systems require viewer-specific data, and the need for collecting viewer-specific data diminishes the effectiveness of the recommendation. To solve this problem, the authors employ an effective multi-armed bandit called upper confidence bound, which is applied to automatically recommend the movies for the users. In addition, the concept of time decay is provided in a mathematical definition that redefines the dynamic item-to-item similarity. Then, two patterns of time decay are analyzed, namely concave and convex functions, for simulation. The experiment test the MovieLens 100K dataset. The proposed method attains a maximum F-measure of 98.45 whereas the existing method reaches a minimum F-measure of only 95.60. The presented model adaptively responds to new users, can provide a better service, and generate more user engagement.


Buildings ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 275
Author(s):  
Yachen Shen ◽  
Jianping Chen ◽  
Qiming Fu ◽  
Hongjie Wu ◽  
Yunzhe Wang ◽  
...  

District heating networks make up an important public energy service, in which leakage is the main problem affecting the safety of pipeline network operation. This paper proposes a Leakage Fault Detection (LFD) method based on the Linear Upper Confidence Bound (LinUCB) which is used for arm selection in the Contextual Bandit (CB) algorithm. With data collected from end-users’ pressure and flow information in the simulation model, the LinUCB method is adopted to locate the leakage faults. Firstly, we use a hydraulic simulation model to simulate all failure conditions that can occur in the network, and these change rate vectors of observed data form a dataset. Secondly, the LinUCB method is used to train an agent for the arm selection, and the outcome of arm selection is the leaking pipe label. Thirdly, the experiment results show that this method can detect the leaking pipe accurately and effectively. Furthermore, it allows operators to evaluate the system performance, supports troubleshooting of decision mechanisms, and provides guidance in the arrangement of maintenance.


2021 ◽  
Author(s):  
S. V. Sai Santosh ◽  
sumit darak

Multi-armed Bandit (MAB) algorithms identify the best arm among multiple arms via exploration-exploitation trade-off without prior knowledge of arm statistics. Their usefulness in wireless radio, IoT, and robotics demand deployment on edge devices, and hence, a mapping on system-on-chip (SoC) is desired. Theoretically, the Bayesian approach-based Thompson Sampling (TS) algorithm offers better performance than the frequentist approach-based Upper Confidence Bound (UCB) algorithm. However, TS is not synthesizable due to \textit{Beta} function. We address this problem by approximating it via a pseudo-random number generator-based approach and efficiently realize the TS algorithm on Zynq SoC. In practice, the type of arms distribution (e.g., Bernoulli, Gaussian, etc.) is unknown and hence, a single algorithm may not be optimal. We propose a reconfigurable and intelligent MAB (RI-MAB) framework. Here, intelligence enables the identification of appropriate MAB algorithms for a given environment, and reconfigurability allows on-the-fly switching between algorithms on the SoC. This eliminates the need for parallel implementation of algorithms resulting in huge savings in resources and power consumption. We analyze the functional correctness, area, power, and execution time of the proposed and existing architectures for various arm distributions, word-length, and hardware-software co-design approaches. We demonstrate the superiority of the RI-MAB over TS and UCB only architectures.


2021 ◽  
Author(s):  
S. V. Sai Santosh ◽  
sumit darak

Multi-armed Bandit (MAB) algorithms identify the best arm among multiple arms via exploration-exploitation trade-off without prior knowledge of arm statistics. Their usefulness in wireless radio, IoT, and robotics demand deployment on edge devices, and hence, a mapping on system-on-chip (SoC) is desired. Theoretically, the Bayesian approach-based Thompson Sampling (TS) algorithm offers better performance than the frequentist approach-based Upper Confidence Bound (UCB) algorithm. However, TS is not synthesizable due to \textit{Beta} function. We address this problem by approximating it via a pseudo-random number generator-based approach and efficiently realize the TS algorithm on Zynq SoC. In practice, the type of arms distribution (e.g., Bernoulli, Gaussian, etc.) is unknown and hence, a single algorithm may not be optimal. We propose a reconfigurable and intelligent MAB (RI-MAB) framework. Here, intelligence enables the identification of appropriate MAB algorithms for a given environment, and reconfigurability allows on-the-fly switching between algorithms on the SoC. This eliminates the need for parallel implementation of algorithms resulting in huge savings in resources and power consumption. We analyze the functional correctness, area, power, and execution time of the proposed and existing architectures for various arm distributions, word-length, and hardware-software co-design approaches. We demonstrate the superiority of the RI-MAB over TS and UCB only architectures.


2021 ◽  
Author(s):  
Ki Myung Brian Lee ◽  
Felix Kong ◽  
Ricardo Cannizzaro ◽  
Jennifer L. Palmer ◽  
David Johnson ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document