bandit problem Latest Research Papers

Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge Computing: A Contextual-Bandit Approach

ACM Transactions on Internet of Things ◽

10.1145/3480172 ◽

2022 ◽

Vol 3 (1) ◽

pp. 1-23

Author(s):

Mao V. Ngo ◽

Tie Luo ◽

Tony Q. S. Quek

Keyword(s):

Anomaly Detection ◽

Edge Computing ◽

Policy Network ◽

Bandit Problem ◽

Detection Scheme ◽

Selection Scheme ◽

Distributed Models ◽

Anomalous Data ◽

Iot Devices ◽

Computation Load

The advances in deep neural networks (DNN) have significantly enhanced real-time detection of anomalous data in IoT applications. However, the complexity-accuracy-delay dilemma persists: Complex DNN models offer higher accuracy, but typical IoT devices can barely afford the computation load, and the remedy of offloading the load to the cloud incurs long delay. In this article, we address this challenge by proposing an adaptive anomaly detection scheme with hierarchical edge computing (HEC). Specifically, we first construct multiple anomaly detection DNN models with increasing complexity and associate each of them to a corresponding HEC layer. Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network . We also incorporate a parallelism policy training method to accelerate the training process by taking advantage of distributed models. We build an HEC testbed using real IoT devices and implement and evaluate our contextual-bandit approach with both univariate and multivariate IoT datasets. In comparison with both baseline and state-of-the-art schemes, our adaptive approach strikes the best accuracy-delay tradeoff on the univariate dataset and achieves the best accuracy and F1-score on the multivariate dataset with only negligibly longer delay than the best (but inflexible) scheme.

Download Full-text

Bayesian optimization with partially specified queries

Machine Learning ◽

10.1007/s10994-021-06079-3 ◽

2022 ◽

Author(s):

Shogo Hayashi ◽

Junya Honda ◽

Hisashi Kashima

Keyword(s):

Black Box ◽

Bayesian Optimization ◽

Unknown Input ◽

Bandit Problem ◽

Test Functions ◽

Significant Cost ◽

Posterior Sampling ◽

Regret Bounds ◽

Input Variables ◽

Real World Datasets

AbstractBayesian optimization (BO) is an approach to optimizing an expensive-to-evaluate black-box function and sequentially determines the values of input variables to evaluate the function. However, it is expensive and in some cases becomes difficult to specify values for all input variables, for example, in outsourcing scenarios where production of input queries with many input variables involves significant cost. In this paper, we propose a novel Gaussian process bandit problem, BO with partially specified queries (BOPSQ). In BOPSQ, unlike the standard BO setting, a learner specifies only the values of some input variables, and the values of the unspecified input variables are randomly determined according to a known or unknown distribution. We propose two algorithms based on posterior sampling for cases of known and unknown input distributions. We further derive their regret bounds that are sublinear for popular kernels. We demonstrate the effectiveness of the proposed algorithms using test functions and real-world datasets.

Download Full-text

Adaptive decision making using a chaotic semiconductor laser for multi-armed bandit problem with time-varying hit probabilities

Nonlinear Theory and Its Applications IEICE ◽

10.1587/nolta.13.112 ◽

2022 ◽

Vol 13 (1) ◽

pp. 112-122

Author(s):

Akihiro Oda ◽

Takatomo Mihana ◽

Kazutaka Kanno ◽

Makoto Naruse ◽

Atsushi Uchida

Keyword(s):

Decision Making ◽

Semiconductor Laser ◽

Time Varying ◽

Bandit Problem ◽

Adaptive Decision Making

Download Full-text

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability

Mathematics of Operations Research ◽

10.1287/moor.2021.1193 ◽

2021 ◽

Author(s):

David Simchi-Levi ◽

Yunzong Xu

Keyword(s):

Open Problem ◽

Optimal Algorithm ◽

Simple Algorithm ◽

Direct Consequence ◽

Function Class ◽

General Function ◽

Bandit Problem ◽

Bandit Problems ◽

Important Open Problem ◽

Optimal Reduction

We consider the general (stochastic) contextual bandit problem under the realizability assumption, that is, the expected reward, as a function of contexts and actions, belongs to a general function class [Formula: see text]. We design a fast and simple algorithm that achieves the statistically optimal regret with only [Formula: see text] calls to an offline regression oracle across all T rounds. The number of oracle calls can be further reduced to [Formula: see text] if T is known in advance. Our results provide the first universal and optimal reduction from contextual bandits to offline regression, solving an important open problem in the contextual bandit literature. A direct consequence of our results is that any advances in offline regression immediately translate to contextual bandits, statistically and computationally. This leads to faster algorithms and improved regret guarantees for broader classes of contextual bandit problems.

Download Full-text

A Model of Search with Two Stages of Information Acquisition and Additive Learning

Management Science ◽

10.1287/mnsc.2021.4150 ◽

2021 ◽

Author(s):

Peter Gibbard

Keyword(s):

Information Acquisition ◽

Econometric Models ◽

Bandit Problem ◽

Choice Problem ◽

Search Results ◽

Realistic Theory ◽

Two Stages ◽

Structural Econometric Models ◽

Structural Econometric ◽

Additive Form

This paper presents a model of choice with two stages of information acquisition. In this model, the choice problem can be interpreted as a variant of a more general multiarmed bandit problem. We assume that information acquisition takes a simple “additive form”—the value of an alternative is the sum of two components, which the decision maker can learn by undertaking two stages of information acquisition. This assumption yields a model that is tractable for the purposes of structural estimation. One possible application of the model is to online purchasing on e-commerce sites. For a consumer on an e-commerce website, there are potentially two stages of information acquisition: the consumer can obtain information about an alternative from (i) browsing the search results page and (ii) clicking on the alternative. By way of contrast, in much of the literature on structural econometric models of online purchasing, there is typically only one stage of information acquisition. Our paper may, therefore, provide a more realistic theory for modeling search, at least for those types of search—such as online purchasing—that involve two stages of information acquisition. This paper was accepted by Manel Baucells, behavioral economics and decision analysis.

Download Full-text

Adaptive Sequence-Based Stimulus Selection in an ERP-Based Brain-Computer Interface by Thompson Sampling in a Multi-Armed Bandit Problem

10.1109/bibm52615.2021.9669724 ◽

2021 ◽

Author(s):

Tianwen Ma ◽

Jane E. Huggins ◽

Jian Kang

Keyword(s):

Brain Computer Interface ◽

Computer Interface ◽

Stimulus Selection ◽

Bandit Problem ◽

Thompson Sampling

Download Full-text

Design and Comparison of Reinforcement-Learning-Based Time-Varying PID Controllers with Gain-Scheduled Actions

Machines ◽

10.3390/machines9120319 ◽

2021 ◽

Vol 9 (12) ◽

pp. 319

Author(s):

Yi-Liang Yeh ◽

Po-Kai Yang

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Pid Controllers ◽

Time Varying ◽

Bandit Problem ◽

Proportional Integral Derivative ◽

System Behavior ◽

Performance Requirements ◽

Gain Scheduled ◽

Control Designs

This paper presents innovative reinforcement learning methods for automatically tuning the parameters of a proportional integral derivative controller. Conventionally, the high dimension of the Q-table is a primary drawback when implementing a reinforcement learning algorithm. To overcome the obstacle, the idea underlying the n-armed bandit problem is used in this paper. Moreover, gain-scheduled actions are presented to tune the algorithms to improve the overall system behavior; therefore, the proposed controllers fulfill the multiple performance requirements. An experiment was conducted for the piezo-actuated stage to illustrate the effectiveness of the proposed control designs relative to competing algorithms.

Download Full-text

Sampling the Variational Posterior with Local Refinement

Entropy ◽

10.3390/e23111475 ◽

2021 ◽

Vol 23 (11) ◽

pp. 1475

Author(s):

Marton Havasi ◽

Jasper Snoek ◽

Dustin Tran ◽

Jonathan Gordon ◽

José Miguel Hernández-Lobato

Keyword(s):

Probabilistic Models ◽

Initial Approximation ◽

Variational Inference ◽

Variational Approximation ◽

Bandit Problem ◽

Posterior Sampling ◽

Novel Method ◽

Model Training ◽

Inference Methods

Variational inference is an optimization-based method for approximating the posterior distribution of the parameters in Bayesian probabilistic models. A key challenge of variational inference is to approximate the posterior with a distribution that is computationally tractable yet sufficiently expressive. We propose a novel method for generating samples from a highly flexible variational approximation. The method starts with a coarse initial approximation and generates samples by refining it in selected, local regions. This allows the samples to capture dependencies and multi-modality in the posterior, even when these are absent from the initial approximation. We demonstrate theoretically that our method always improves the quality of the approximation (as measured by the evidence lower bound). In experiments, our method consistently outperforms recent variational inference methods in terms of log-likelihood and ELBO across three example tasks: the Eight-Schools example (an inference task in a hierarchical model), training a ResNet-20 (Bayesian inference in a large neural network), and the Mushroom task (posterior sampling in a contextual bandit problem).

Download Full-text

Positioning And Power Optimization For UAV-Assisted Networks In The Presence of Eavesdroppers: A Multi-Armed Bandit Approach

10.21203/rs.3.rs-1013858/v1 ◽

2021 ◽

Author(s):

Xavier Alejandro Flores Cabezas ◽

Diana Pamela Moya Osorio ◽

Matti Latva-aho

Keyword(s):

Unmanned Aerial Vehicles ◽

Physical Layer Security ◽

Stable State ◽

Bandit Problem ◽

Trade Off ◽

Wireless Communication Network ◽

Aerial Vehicles ◽

Challenges And Opportunities ◽

3D Positioning ◽

Friendly Jamming

Abstract Unmanned Aerial Vehicles (UAVs) are becoming increasingly attractive for the ambitious expectations for 5G and beyond networks due to their several benefits. Indeed, UAV-assisted communications introduce a new range of challenges and opportunities regarding the security of these networks. Thus, in this paper we explore the opportunities that UAVs can provide for physical layer security solutions. Particularly, we analize the secrecy performance of a ground wireless communication network assisted by two friendly UAV jammers in the presence of an eavesdropper. To tackle the secrecy performance of this system, we introduce a new area-based metric, the weighted secrecy coverage, that measures the improvement on the secrecy performance of a system over a certain physical area given by the introduction of friendly jamming. Herein, the optimal 3D positioning of the UAVs and the power allocation is addressed in order to maximize the WSC. For that purpose, we provide a Reinforcement Learning-based solution by modeling the positioning problem as a Multi-Armed Bandit problem over three positioning variables for the UAVs: angle, height and orbit radius. Our results show that there is a trade-off between expediency of the positioning of the UAVs to positions of better secrecy outcome and energy expenditure, and that the proposed algorithm efficiently converges into a stable state.

Download Full-text

Gaussian One-Armed Bandit Problem

10.1109/redundancy52534.2021.9606464 ◽

2021 ◽

Author(s):

Alexander Kolnogorov

Keyword(s):

Bandit Problem

Download Full-text

bandit problem
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge Computing: A Contextual-Bandit Approach

Bayesian optimization with partially specified queries

Adaptive decision making using a chaotic semiconductor laser for multi-armed bandit problem with time-varying hit probabilities

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability

A Model of Search with Two Stages of Information Acquisition and Additive Learning

Adaptive Sequence-Based Stimulus Selection in an ERP-Based Brain-Computer Interface by Thompson Sampling in a Multi-Armed Bandit Problem

Design and Comparison of Reinforcement-Learning-Based Time-Varying PID Controllers with Gain-Scheduled Actions

Sampling the Variational Posterior with Local Refinement

Positioning And Power Optimization For UAV-Assisted Networks In The Presence of Eavesdroppers: A Multi-Armed Bandit Approach

Gaussian One-Armed Bandit Problem

Export Citation Format

bandit problemRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge Computing: A Contextual-Bandit Approach

Bayesian optimization with partially specified queries

Adaptive decision making using a chaotic semiconductor laser for multi-armed bandit problem with time-varying hit probabilities

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability

A Model of Search with Two Stages of Information Acquisition and Additive Learning

Adaptive Sequence-Based Stimulus Selection in an ERP-Based Brain-Computer Interface by Thompson Sampling in a Multi-Armed Bandit Problem

Design and Comparison of Reinforcement-Learning-Based Time-Varying PID Controllers with Gain-Scheduled Actions

Sampling the Variational Posterior with Local Refinement

Positioning And Power Optimization For UAV-Assisted Networks In The Presence of Eavesdroppers: A Multi-Armed Bandit Approach

Gaussian One-Armed Bandit Problem

bandit problem
Recently Published Documents