bandit problem
Recently Published Documents


TOTAL DOCUMENTS

276
(FIVE YEARS 70)

H-INDEX

26
(FIVE YEARS 3)

2022 ◽  
Vol 3 (1) ◽  
pp. 1-23
Author(s):  
Mao V. Ngo ◽  
Tie Luo ◽  
Tony Q. S. Quek

The advances in deep neural networks (DNN) have significantly enhanced real-time detection of anomalous data in IoT applications. However, the complexity-accuracy-delay dilemma persists: Complex DNN models offer higher accuracy, but typical IoT devices can barely afford the computation load, and the remedy of offloading the load to the cloud incurs long delay. In this article, we address this challenge by proposing an adaptive anomaly detection scheme with hierarchical edge computing (HEC). Specifically, we first construct multiple anomaly detection DNN models with increasing complexity and associate each of them to a corresponding HEC layer. Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network . We also incorporate a parallelism policy training method to accelerate the training process by taking advantage of distributed models. We build an HEC testbed using real IoT devices and implement and evaluate our contextual-bandit approach with both univariate and multivariate IoT datasets. In comparison with both baseline and state-of-the-art schemes, our adaptive approach strikes the best accuracy-delay tradeoff on the univariate dataset and achieves the best accuracy and F1-score on the multivariate dataset with only negligibly longer delay than the best (but inflexible) scheme.


2022 ◽  
Author(s):  
Shogo Hayashi ◽  
Junya Honda ◽  
Hisashi Kashima

AbstractBayesian optimization (BO) is an approach to optimizing an expensive-to-evaluate black-box function and sequentially determines the values of input variables to evaluate the function. However, it is expensive and in some cases becomes difficult to specify values for all input variables, for example, in outsourcing scenarios where production of input queries with many input variables involves significant cost. In this paper, we propose a novel Gaussian process bandit problem, BO with partially specified queries (BOPSQ). In BOPSQ, unlike the standard BO setting, a learner specifies only the values of some input variables, and the values of the unspecified input variables are randomly determined according to a known or unknown distribution. We propose two algorithms based on posterior sampling for cases of known and unknown input distributions. We further derive their regret bounds that are sublinear for popular kernels. We demonstrate the effectiveness of the proposed algorithms using test functions and real-world datasets.


2022 ◽  
Vol 13 (1) ◽  
pp. 112-122
Author(s):  
Akihiro Oda ◽  
Takatomo Mihana ◽  
Kazutaka Kanno ◽  
Makoto Naruse ◽  
Atsushi Uchida

Author(s):  
David Simchi-Levi ◽  
Yunzong Xu

We consider the general (stochastic) contextual bandit problem under the realizability assumption, that is, the expected reward, as a function of contexts and actions, belongs to a general function class [Formula: see text]. We design a fast and simple algorithm that achieves the statistically optimal regret with only [Formula: see text] calls to an offline regression oracle across all T rounds. The number of oracle calls can be further reduced to [Formula: see text] if T is known in advance. Our results provide the first universal and optimal reduction from contextual bandits to offline regression, solving an important open problem in the contextual bandit literature. A direct consequence of our results is that any advances in offline regression immediately translate to contextual bandits, statistically and computationally. This leads to faster algorithms and improved regret guarantees for broader classes of contextual bandit problems.


2021 ◽  
Author(s):  
Peter Gibbard

This paper presents a model of choice with two stages of information acquisition. In this model, the choice problem can be interpreted as a variant of a more general multiarmed bandit problem. We assume that information acquisition takes a simple “additive form”—the value of an alternative is the sum of two components, which the decision maker can learn by undertaking two stages of information acquisition. This assumption yields a model that is tractable for the purposes of structural estimation. One possible application of the model is to online purchasing on e-commerce sites. For a consumer on an e-commerce website, there are potentially two stages of information acquisition: the consumer can obtain information about an alternative from (i) browsing the search results page and (ii) clicking on the alternative. By way of contrast, in much of the literature on structural econometric models of online purchasing, there is typically only one stage of information acquisition. Our paper may, therefore, provide a more realistic theory for modeling search, at least for those types of search—such as online purchasing—that involve two stages of information acquisition. This paper was accepted by Manel Baucells, behavioral economics and decision analysis.


Machines ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 319
Author(s):  
Yi-Liang Yeh ◽  
Po-Kai Yang

This paper presents innovative reinforcement learning methods for automatically tuning the parameters of a proportional integral derivative controller. Conventionally, the high dimension of the Q-table is a primary drawback when implementing a reinforcement learning algorithm. To overcome the obstacle, the idea underlying the n-armed bandit problem is used in this paper. Moreover, gain-scheduled actions are presented to tune the algorithms to improve the overall system behavior; therefore, the proposed controllers fulfill the multiple performance requirements. An experiment was conducted for the piezo-actuated stage to illustrate the effectiveness of the proposed control designs relative to competing algorithms.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1475
Author(s):  
Marton Havasi ◽  
Jasper Snoek ◽  
Dustin Tran ◽  
Jonathan Gordon ◽  
José Miguel Hernández-Lobato

Variational inference is an optimization-based method for approximating the posterior distribution of the parameters in Bayesian probabilistic models. A key challenge of variational inference is to approximate the posterior with a distribution that is computationally tractable yet sufficiently expressive. We propose a novel method for generating samples from a highly flexible variational approximation. The method starts with a coarse initial approximation and generates samples by refining it in selected, local regions. This allows the samples to capture dependencies and multi-modality in the posterior, even when these are absent from the initial approximation. We demonstrate theoretically that our method always improves the quality of the approximation (as measured by the evidence lower bound). In experiments, our method consistently outperforms recent variational inference methods in terms of log-likelihood and ELBO across three example tasks: the Eight-Schools example (an inference task in a hierarchical model), training a ResNet-20 (Bayesian inference in a large neural network), and the Mushroom task (posterior sampling in a contextual bandit problem).


2021 ◽  
Author(s):  
Xavier Alejandro Flores Cabezas ◽  
Diana Pamela Moya Osorio ◽  
Matti Latva-aho

Abstract Unmanned Aerial Vehicles (UAVs) are becoming increasingly attractive for the ambitious expectations for 5G and beyond networks due to their several benefits. Indeed, UAV-assisted communications introduce a new range of challenges and opportunities regarding the security of these networks. Thus, in this paper we explore the opportunities that UAVs can provide for physical layer security solutions. Particularly, we analize the secrecy performance of a ground wireless communication network assisted by two friendly UAV jammers in the presence of an eavesdropper. To tackle the secrecy performance of this system, we introduce a new area-based metric, the weighted secrecy coverage, that measures the improvement on the secrecy performance of a system over a certain physical area given by the introduction of friendly jamming. Herein, the optimal 3D positioning of the UAVs and the power allocation is addressed in order to maximize the WSC. For that purpose, we provide a Reinforcement Learning-based solution by modeling the positioning problem as a Multi-Armed Bandit problem over three positioning variables for the UAVs: angle, height and orbit radius. Our results show that there is a trade-off between expediency of the positioning of the UAVs to positions of better secrecy outcome and energy expenditure, and that the proposed algorithm efficiently converges into a stable state.


Sign in / Sign up

Export Citation Format

Share Document