regret bounds Latest Research Papers

AbstractBayesian optimization (BO) is an approach to optimizing an expensive-to-evaluate black-box function and sequentially determines the values of input variables to evaluate the function. However, it is expensive and in some cases becomes difficult to specify values for all input variables, for example, in outsourcing scenarios where production of input queries with many input variables involves significant cost. In this paper, we propose a novel Gaussian process bandit problem, BO with partially specified queries (BOPSQ). In BOPSQ, unlike the standard BO setting, a learner specifies only the values of some input variables, and the values of the unspecified input variables are randomly determined according to a known or unknown distribution. We propose two algorithms based on posterior sampling for cases of known and unknown input distributions. We further derive their regret bounds that are sublinear for popular kernels. We demonstrate the effectiveness of the proposed algorithms using test functions and real-world datasets.

Download Full-text

Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly

Entropy ◽

10.3390/e23111534 ◽

2021 ◽

Vol 23 (11) ◽

pp. 1534

Author(s):

Le Li ◽

Benjamin Guedj

Keyword(s):

Data Streams ◽

Real Life ◽

Sequential Learning ◽

Massive Data ◽

Principal Curves ◽

Principal Curve ◽

Real Life Data ◽

Reduction Methods ◽

Regret Bounds ◽

And Performance

When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. A principal curve acts as a nonlinear generalization of PCA, and the present paper proposes a novel algorithm to automatically and sequentially learn principal curves from data streams. We show that our procedure is supported by regret bounds with optimal sublinear remainder terms. A greedy local search implementation (called slpc, for sequential learning principal curves) that incorporates both sleeping experts and multi-armed bandit ingredients is presented, along with its regret computation and performance on synthetic and real-life data.

Download Full-text

Hedging the Drift: Learning to Optimize Under Nonstationarity

Management Science ◽

10.1287/mnsc.2021.4024 ◽

2021 ◽

Author(s):

Wang Chi Cheung ◽

David Simchi-Levi ◽

Ruihao Zhu

Keyword(s):

Dynamic Pricing ◽

A Priori ◽

Network Routing ◽

Data Driven ◽

Superior Performance ◽

Changing Environments ◽

Epidemic Period ◽

Optimal Dynamic ◽

Regret Bounds ◽

Upper Confidence Bound

We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic regret bounds for a collection of nonstationary stochastic bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in changing environments. We show how the difficulty posed by the (unknown a priori and possibly adversarial) nonstationarity can be overcome by an unconventional marriage between stochastic and adversarial bandit learning algorithms. Beginning with the linear bandit setting, we design and analyze a sliding window-upper confidence bound algorithm that achieves the optimal dynamic regret bound when the underlying variation budget is known. This budget quantifies the total amount of temporal variation of the latent environments. Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, our algorithm can further enjoy nearly optimal dynamic regret bounds in a (surprisingly) parameter-free manner. We extend our results to other related bandit problems, namely the multiarmed bandit, generalized linear bandit, and combinatorial semibandit settings, which model a variety of operations research applications. In addition to the classical exploration-exploitation trade-off, our algorithms leverage the power of the “forgetting principle” in the learning processes, which is vital in changing environments. Extensive numerical experiments with synthetic datasets and a dataset of an online auto-loan company during the severe acute respiratory syndrome (SARS) epidemic period demonstrate that our proposed algorithms achieve superior performance compared with existing algorithms. This paper was accepted by George J. Shanthikumar for the Management Science Special Issue on Data-Driven Prescriptive Analytics.

Download Full-text

Multiplayer Bandits Without Observing Collision Information

Mathematics of Operations Research ◽

10.1287/moor.2021.1168 ◽

2021 ◽

Author(s):

Gábor Lugosi ◽

Abbas Mehrabian

Keyword(s):

Nash Equilibria ◽

Square Root ◽

Bandit Problems ◽

Approximate Nash Equilibria ◽

Regret Bounds ◽

Multiarmed Bandit

We study multiplayer stochastic multiarmed bandit problems in which the players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider two feedback models: a model in which the players can observe whether a collision has occurred and a more difficult setup in which no collision information is available. We give the first theoretical guarantees for the second model: an algorithm with a logarithmic regret and an algorithm with a square-root regret that does not depend on the gaps between the means. For the first model, we give the first square-root regret bounds that do not depend on the gaps. Building on these ideas, we also give an algorithm for reaching approximate Nash equilibria quickly in stochastic anticoordination games.

Download Full-text

Generalized Empirical Regret Bounds for Control of Renewable Energy Systems in Spatiotemporally Varying Environments

Journal of Dynamic Systems Measurement and Control ◽

10.1115/1.4052396 ◽

2021 ◽

Author(s):

Ben Haydon ◽

Jack Cole ◽

Laurel Dunn ◽

Patrick Keyantuo ◽

Fotini Chow ◽

...

Keyword(s):

Renewable Energy ◽

Control Strategies ◽

Energy Resource ◽

Mobile Systems ◽

Wind Data ◽

Maximum Probability ◽

Data Set ◽

Actual Performance ◽

Regret Bounds ◽

Varying Environments

Abstract This paper focuses on the empirical derivation of regret bounds for mobile systems that can optimize their locations in real time within a spatiotemporally varying renewable energy resource. The case studies in this paper focus specifically on an airborne wind energy system, where the replacement of towers with tethers and a lifting body allows the system to adjust its altitude continuously, with the goal of operating at the altitude that maximizes net power production. While prior publications have proposed control strategies for this problem, often with favorable results based on simulations that use real wind data, they lack any theoretical or statistical performance guarantees. In the present work, we make use of a very large synthetic data set, identified through parameters from real wind data, to derive probabilistic bounds on the difference between optimal and actual performance, termed regret. The results are presented for a variety of control strategies, including maximum probability of improvement, upper confidence bound, greedy, and constant altitude approaches. In addition, we use dimensional analysis to generalize the aforementioned results to other spatiotemporally varying environments, making the results applicable to a wider variety of renewably powered mobile systems. Finally, to deal with more general environmental mean models, we introduce a novel approach to modify calculable regret bounds to accommodate any mean model through what we term an "effective spatial domain."

Download Full-text

Jointly Learning Prices and Product Features

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/325 ◽

2021 ◽

Author(s):

Ehsan Emamjomeh-Zadeh ◽

Renato Paes Leme ◽

Jon Schneider ◽

Balasubramanian Sivan

Keyword(s):

Online Learning ◽

Product Design ◽

Marketing Research ◽

Product Configuration ◽

Purchasing Decision ◽

Product Features ◽

Regret Bounds

Product Design is an important problem in marketing research where a firm tries to learn what features of a product are more valuable to consumers. We study this problem from the viewpoint of online learning: a firm repeatedly interacts with a buyer by choosing a product configuration as well as a price and observing the buyer's purchasing decision. The goal of the firm is to maximize revenue throughout the course of $T$ rounds by learning the buyer's preferences. We study both the case of a set of discrete products and the case of a continuous set of allowable product features. In both cases we provide nearly tight upper and lower regret bounds.

Download Full-text

Stochastic Shortest Path with Adversarially Changing Costs

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/404 ◽

2021 ◽

Author(s):

Aviv Rosenberg ◽

Yishay Mansour

Keyword(s):

Shortest Path ◽

High Probability ◽

Transition Function ◽

Natural Setting ◽

Goal State ◽

Stochastic Shortest Path ◽

Planning And Control ◽

Regret Bounds ◽

The Cost ◽

And Control

Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. In this paper we present the adversarial SSP model that also accounts for adversarial changes in the costs over time, while the underlying transition function remains unchanged. Formally, an agent interacts with an SSP environment for K episodes, the cost function changes arbitrarily between episodes, and the transitions are unknown to the agent. We develop the first algorithms for adversarial SSPs and prove high probability regret bounds of square-root K assuming all costs are strictly positive, and sub-linear regret in the general case. We are the first to consider this natural setting of adversarial SSP and obtain sub-linear regret for it.

Download Full-text