upper confidence bound Latest Research Papers

We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic regret bounds for a collection of nonstationary stochastic bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in changing environments. We show how the difficulty posed by the (unknown a priori and possibly adversarial) nonstationarity can be overcome by an unconventional marriage between stochastic and adversarial bandit learning algorithms. Beginning with the linear bandit setting, we design and analyze a sliding window-upper confidence bound algorithm that achieves the optimal dynamic regret bound when the underlying variation budget is known. This budget quantifies the total amount of temporal variation of the latent environments. Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, our algorithm can further enjoy nearly optimal dynamic regret bounds in a (surprisingly) parameter-free manner. We extend our results to other related bandit problems, namely the multiarmed bandit, generalized linear bandit, and combinatorial semibandit settings, which model a variety of operations research applications. In addition to the classical exploration-exploitation trade-off, our algorithms leverage the power of the “forgetting principle” in the learning processes, which is vital in changing environments. Extensive numerical experiments with synthetic datasets and a dataset of an online auto-loan company during the severe acute respiratory syndrome (SARS) epidemic period demonstrate that our proposed algorithms achieve superior performance compared with existing algorithms. This paper was accepted by George J. Shanthikumar for the Management Science Special Issue on Data-Driven Prescriptive Analytics.

Download Full-text

Dependency of regret on accuracy of variance estimation for different versions of UCB strategy for Gaussian multi-armed bandits

Journal of Physics Conference Series ◽

10.1088/1742-6596/2052/1/012013 ◽

2021 ◽

Vol 2052 (1) ◽

pp. 012013

Author(s):

S V Garbar

Keyword(s):

Monte Carlo ◽

Monte Carlo Simulations ◽

Variance Estimation ◽

Estimation Error ◽

Continuous Functions ◽

Confidence Bound ◽

Expected Values ◽

Upper Confidence Bound

Abstract We consider two variations of upper confidence bound strategy for Gaussian two-armed bandits. Rewards for the arms are assumed to have unknown expected values and unknown variances. It is demonstrated that expected regret values for both discussed strategies are continuous functions of reward variance. A set of Monte-Carlo simulations was performed to show the nature of the relation between variance estimation and losses. It is shown that the regret grows only slightly when the estimation error is fairly large, which allows to estimate the variance during the initial steps of the control and stop this estimation later.

Download Full-text

Deep learning classification of bitcoin miners and exploration of upper confidence bound algorithm with less regret for the selection of honest mining

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-021-03527-9 ◽

2021 ◽

Author(s):

M. J. Jeyasheela Rakkini ◽

K. Geetha

Keyword(s):

Deep Learning ◽

Confidence Bound ◽

Upper Confidence Bound ◽

Selection Of

Download Full-text

Multi-Task Gaussian Process Upper Confidence Bound for Hyperparameter Tuning

10.36227/techrxiv.16674400.v1 ◽

2021 ◽

Author(s):

Bo Shen ◽

Raghav Gnanasambandam ◽

Rongxuan Wang ◽

Zhenyu Kong

Keyword(s):

Gaussian Process ◽

Black Box ◽

Bayesian Optimization ◽

Query Point ◽

Confidence Bound ◽

Bayesian Optimization Algorithm ◽

Machine Learning Model ◽

Step Algorithm ◽

Upper Confidence Bound ◽

General Method

In many scientific and engineering applications, Bayesian optimization (BO) is a powerful tool for hyperparameter tuning of a machine learning model, materials design and discovery, etc. BO guides the choice of experiments in a sequential way to find a good combination of design points in as few experiments as possible. It can be formulated as a problem of optimizing a “black-box” function. Different from single-task Bayesian optimization, Multi-task Bayesian optimization is a general method to efficiently optimize multiple different but correlated “black-box” functions. The previous works in Multi-task Bayesian optimization algorithm queries a point to be evaluated for all tasks in each round of search, which is not efficient. For the case where different tasks are correlated, it is not necessary to evaluate all tasks for a given query point. Therefore, the objective of this work is to develop an algorithm for multi-task Bayesian optimization with automatic task selection so that only one task evaluation is needed per query round. Specifically, a new algorithm, namely, multi-task Gaussian process upper confidence bound (MT-GPUCB), is proposed to achieve this objective. The MT-GPUCB is a two-step algorithm, where the first step chooses which query point to evaluate, and the second step automatically selects the most informative task to evaluate. Under the bandit setting, a theoretical analysis is provided to show that our proposed MT-GPUCB is no-regret under some mild conditions. Our proposed algorithm is verified experimentally on a range of synthetic functions as well as real-world problems. The results clearly show the advantages of our query strategy for both design point and task.

Download Full-text

Multi-Task Gaussian Process Upper Confidence Bound for Hyperparameter Tuning

10.36227/techrxiv.16674400 ◽

2021 ◽

Author(s):

Bo Shen ◽

Raghav Gnanasambandam ◽

Rongxuan Wang ◽

Zhenyu Kong

Keyword(s):

Gaussian Process ◽

Black Box ◽

Bayesian Optimization ◽

Query Point ◽

Confidence Bound ◽

Bayesian Optimization Algorithm ◽

Machine Learning Model ◽

Step Algorithm ◽

Upper Confidence Bound ◽

General Method

In many scientific and engineering applications, Bayesian optimization (BO) is a powerful tool for hyperparameter tuning of a machine learning model, materials design and discovery, etc. BO guides the choice of experiments in a sequential way to find a good combination of design points in as few experiments as possible. It can be formulated as a problem of optimizing a “black-box” function. Different from single-task Bayesian optimization, Multi-task Bayesian optimization is a general method to efficiently optimize multiple different but correlated “black-box” functions. The previous works in Multi-task Bayesian optimization algorithm queries a point to be evaluated for all tasks in each round of search, which is not efficient. For the case where different tasks are correlated, it is not necessary to evaluate all tasks for a given query point. Therefore, the objective of this work is to develop an algorithm for multi-task Bayesian optimization with automatic task selection so that only one task evaluation is needed per query round. Specifically, a new algorithm, namely, multi-task Gaussian process upper confidence bound (MT-GPUCB), is proposed to achieve this objective. The MT-GPUCB is a two-step algorithm, where the first step chooses which query point to evaluate, and the second step automatically selects the most informative task to evaluate. Under the bandit setting, a theoretical analysis is provided to show that our proposed MT-GPUCB is no-regret under some mild conditions. Our proposed algorithm is verified experimentally on a range of synthetic functions as well as real-world problems. The results clearly show the advantages of our query strategy for both design point and task.

Download Full-text

Collaborative Filtering-Based Recommendation System Using Time Decay Model

International Journal of e-Collaboration ◽

10.4018/ijec.2021070106 ◽

2021 ◽

Vol 17 (3) ◽

pp. 85-100

Author(s):

Jayaraman Parthasarathy ◽

Ramesh Babu Kalivaradhan

Keyword(s):

Collaborative Filtering ◽

Recommendation System ◽

Recommendation Systems ◽

User Engagement ◽

Confidence Bound ◽

Time Decay ◽

Specific Data ◽

Movie Recommendation ◽

Upper Confidence Bound ◽

F Measure

Online collaborative movie recommendation systems attempt to help customers accessing their favourable movies by gathering exactly comparable neighbors between the movies from their chronological identical ratings. Collaborative filtering-based movie recommendation systems require viewer-specific data, and the need for collecting viewer-specific data diminishes the effectiveness of the recommendation. To solve this problem, the authors employ an effective multi-armed bandit called upper confidence bound, which is applied to automatically recommend the movies for the users. In addition, the concept of time decay is provided in a mathematical definition that redefines the dynamic item-to-item similarity. Then, two patterns of time decay are analyzed, namely concave and convex functions, for simulation. The experiment test the MovieLens 100K dataset. The proposed method attains a maximum F-measure of 98.45 whereas the existing method reaches a minimum F-measure of only 95.60. The presented model adaptively responds to new users, can provide a better service, and generate more user engagement.

Download Full-text

Detection of District Heating Pipe Network Leakage Fault Using UCB Arm Selection Method

Buildings ◽

10.3390/buildings11070275 ◽

2021 ◽

Vol 11 (7) ◽

pp. 275

Author(s):

Yachen Shen ◽

Jianping Chen ◽

Qiming Fu ◽

Hongjie Wu ◽

Yunzhe Wang ◽

...

Keyword(s):

Simulation Model ◽

District Heating ◽

Hydraulic Simulation ◽

Change Rate ◽

Flow Information ◽

Network Operation ◽

Energy Service ◽

Failure Conditions ◽

Upper Confidence Bound ◽

Decision Mechanisms

District heating networks make up an important public energy service, in which leakage is the main problem affecting the safety of pipeline network operation. This paper proposes a Leakage Fault Detection (LFD) method based on the Linear Upper Confidence Bound (LinUCB) which is used for arm selection in the Contextual Bandit (CB) algorithm. With data collected from end-users’ pressure and flow information in the simulation model, the LinUCB method is adopted to locate the leakage faults. Firstly, we use a hydraulic simulation model to simulate all failure conditions that can occur in the network, and these change rate vectors of observed data form a dataset. Secondly, the LinUCB method is used to train an agent for the arm selection, and the outcome of arm selection is the leaking pipe label. Thirdly, the experiment results show that this method can detect the leaking pipe accurately and effectively. Furthermore, it allows operators to evaluate the system performance, supports troubleshooting of decision mechanisms, and provides guidance in the arrangement of maintenance.

Download Full-text

Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?

10.36227/techrxiv.14749494 ◽

2021 ◽

Author(s):

S. V. Sai Santosh ◽

sumit darak

Keyword(s):

Parallel Implementation ◽

Beta Function ◽

Random Number Generator ◽

System On Chip ◽

Thompson Sampling ◽

Functional Correctness ◽

On Chip ◽

Upper Confidence Bound ◽

Exploration Exploitation ◽

And Robotics

Multi-armed Bandit (MAB) algorithms identify the best arm among multiple arms via exploration-exploitation trade-off without prior knowledge of arm statistics. Their usefulness in wireless radio, IoT, and robotics demand deployment on edge devices, and hence, a mapping on system-on-chip (SoC) is desired. Theoretically, the Bayesian approach-based Thompson Sampling (TS) algorithm offers better performance than the frequentist approach-based Upper Confidence Bound (UCB) algorithm. However, TS is not synthesizable due to \textit{Beta} function. We address this problem by approximating it via a pseudo-random number generator-based approach and efficiently realize the TS algorithm on Zynq SoC. In practice, the type of arms distribution (e.g., Bernoulli, Gaussian, etc.) is unknown and hence, a single algorithm may not be optimal. We propose a reconfigurable and intelligent MAB (RI-MAB) framework. Here, intelligence enables the identification of appropriate MAB algorithms for a given environment, and reconfigurability allows on-the-fly switching between algorithms on the SoC. This eliminates the need for parallel implementation of algorithms resulting in huge savings in resources and power consumption. We analyze the functional correctness, area, power, and execution time of the proposed and existing architectures for various arm distributions, word-length, and hardware-software co-design approaches. We demonstrate the superiority of the RI-MAB over TS and UCB only architectures.

Download Full-text

Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?

10.36227/techrxiv.14749494.v1 ◽

2021 ◽

Author(s):

S. V. Sai Santosh ◽

sumit darak

Keyword(s):

Parallel Implementation ◽

Beta Function ◽

Random Number Generator ◽

System On Chip ◽

Thompson Sampling ◽

Functional Correctness ◽

On Chip ◽

Upper Confidence Bound ◽

Exploration Exploitation ◽

And Robotics

Multi-armed Bandit (MAB) algorithms identify the best arm among multiple arms via exploration-exploitation trade-off without prior knowledge of arm statistics. Their usefulness in wireless radio, IoT, and robotics demand deployment on edge devices, and hence, a mapping on system-on-chip (SoC) is desired. Theoretically, the Bayesian approach-based Thompson Sampling (TS) algorithm offers better performance than the frequentist approach-based Upper Confidence Bound (UCB) algorithm. However, TS is not synthesizable due to \textit{Beta} function. We address this problem by approximating it via a pseudo-random number generator-based approach and efficiently realize the TS algorithm on Zynq SoC. In practice, the type of arms distribution (e.g., Bernoulli, Gaussian, etc.) is unknown and hence, a single algorithm may not be optimal. We propose a reconfigurable and intelligent MAB (RI-MAB) framework. Here, intelligence enables the identification of appropriate MAB algorithms for a given environment, and reconfigurability allows on-the-fly switching between algorithms on the SoC. This eliminates the need for parallel implementation of algorithms resulting in huge savings in resources and power consumption. We analyze the functional correctness, area, power, and execution time of the proposed and existing architectures for various arm distributions, word-length, and hardware-software co-design approaches. We demonstrate the superiority of the RI-MAB over TS and UCB only architectures.

Download Full-text

An Upper Confidence Bound for Simultaneous Exploration and Exploitation in Heterogeneous Multi-Robot Systems

10.1109/icra48506.2021.9560822 ◽

2021 ◽

Author(s):

Ki Myung Brian Lee ◽

Felix Kong ◽

Ricardo Cannizzaro ◽

Jennifer L. Palmer ◽

David Johnson ◽

...

Keyword(s):

Confidence Bound ◽

Exploration And Exploitation ◽

Robot Systems ◽

Upper Confidence Bound ◽

Multi Robot

Download Full-text

upper confidence bound
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Hedging the Drift: Learning to Optimize Under Nonstationarity

Dependency of regret on accuracy of variance estimation for different versions of UCB strategy for Gaussian multi-armed bandits

Deep learning classification of bitcoin miners and exploration of upper confidence bound algorithm with less regret for the selection of honest mining

Multi-Task Gaussian Process Upper Confidence Bound for Hyperparameter Tuning

Multi-Task Gaussian Process Upper Confidence Bound for Hyperparameter Tuning

Collaborative Filtering-Based Recommendation System Using Time Decay Model

Detection of District Heating Pipe Network Leakage Fault Using UCB Arm Selection Method

Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?

Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?

An Upper Confidence Bound for Simultaneous Exploration and Exploitation in Heterogeneous Multi-Robot Systems

Export Citation Format

upper confidence boundRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Hedging the Drift: Learning to Optimize Under Nonstationarity

Dependency of regret on accuracy of variance estimation for different versions of UCB strategy for Gaussian multi-armed bandits

Deep learning classification of bitcoin miners and exploration of upper confidence bound algorithm with less regret for the selection of honest mining

Multi-Task Gaussian Process Upper Confidence Bound for Hyperparameter Tuning

Multi-Task Gaussian Process Upper Confidence Bound for Hyperparameter Tuning

Collaborative Filtering-Based Recommendation System Using Time Decay Model

Detection of District Heating Pipe Network Leakage Fault Using UCB Arm Selection Method

Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?

Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?

An Upper Confidence Bound for Simultaneous Exploration and Exploitation in Heterogeneous Multi-Robot Systems

upper confidence bound
Recently Published Documents