Adaptive Operator Selection Based on Dynamic Thompson Sampling for MOEA/D

The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face a non-stationary environment, that is, with a changing reward distribution. Thus, we present a new MAB algorithm, named f-Discounted-Sliding-Window Thompson Sampling (f-dsw TS), for non-stationary environments, that is, when the data streaming is affected by concept drift. The f-dsw TS algorithm is based on Thompson Sampling (TS) and exploits a discount factor on the reward history and an arm-related sliding window to contrast concept drift in non-stationary environments. We investigate how to combine these two sources of information, namely the discount factor and the sliding window, by means of an aggregation function f(.). In particular, we proposed a pessimistic (f=min), an optimistic (f=max), as well as an averaged (f=mean) version of the f-dsw TS algorithm. A rich set of numerical experiments is performed to evaluate the f-dsw TS algorithm compared to both stationary and non-stationary state-of-the-art TS baselines. We exploited synthetic environments (both randomly-generated and controlled) to test the MAB algorithms under different types of drift, that is, sudden/abrupt, incremental, gradual and increasing/decreasing drift. Furthermore, we adapt four real-world active learning tasks to our framework—a prediction task on crimes in the city of Baltimore, a classification task on insects species, a recommendation task on local web-news, and a time-series analysis on microbial organisms in the tropical air ecosystem. The f-dsw TS approach emerges as the best performing MAB algorithm. At least one of the versions of f-dsw TS performs better than the baselines in synthetic environments, proving the robustness of f-dsw TS under different concept drift types. Moreover, the pessimistic version (f=min) results as the most effective in all real-world tasks.

Download Full-text

Safe Linear Thompson Sampling with Side Information

IEEE Transactions on Signal Processing ◽

10.1109/tsp.2021.3089822 ◽

2021 ◽

pp. 1-1

Author(s):

Ahmadreza Moradipari ◽

Sanae Amani ◽

Mahnoosh Alizadeh ◽

Christos Thrampoulidis

Keyword(s):

Side Information ◽

Thompson Sampling

Download Full-text

TSOR: Thompson Sampling-based Opportunistic Routing

IEEE Transactions on Wireless Communications ◽

10.1109/twc.2021.3082080 ◽

2021 ◽

pp. 1-1

Author(s):

Zhiming Huang ◽

Yifan Xu ◽

Jianping Pan

Keyword(s):

Opportunistic Routing ◽

Thompson Sampling

Download Full-text

MmWave Codebook Selection in Rapidly-Varying Channels via Multinomial Thompson Sampling

Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing ◽

10.1145/3466772.3467044 ◽

2021 ◽

Author(s):

Yi Zhang ◽

Soumya Basu ◽

Sanjay Shakkottai ◽

Robert W. Heath

Keyword(s):

Thompson Sampling

Download Full-text

Empirical study of Thompson sampling: Tuning the posterior parameters

10.1063/1.4985354 ◽

2017 ◽

Author(s):

R. Devanand ◽

P. Kumar

Keyword(s):

Empirical Study ◽

Thompson Sampling

Download Full-text

Adaptive Operator Selection with Reinforcement Learning

Information Sciences ◽

10.1016/j.ins.2021.10.025 ◽

2021 ◽

Author(s):

Rafet Durgut ◽

Mehmet Emin Aydin ◽

Ibrahim Atli

Keyword(s):

Reinforcement Learning ◽

Adaptive Operator Selection

Download Full-text

A note on the advantage of context in Thompson sampling

Journal of Revenue and Pricing Management ◽

10.1057/s41272-021-00314-1 ◽

2021 ◽

Author(s):

Michael Byrd ◽

Ross Darrow

Keyword(s):

Thompson Sampling

Download Full-text

Thompson Sampling for Online Personalized Assortment Optimization Problems with Multinomial Logit Choice Models

SSRN Electronic Journal ◽

10.2139/ssrn.3075658 ◽

2017 ◽

Cited By ~ 4

Author(s):

Wang Chi Cheung ◽

David Simchi-Levi

Keyword(s):

Optimization Problems ◽

Multinomial Logit ◽

Choice Models ◽

Thompson Sampling ◽

Assortment Optimization

Download Full-text

On Thompson Sampling and Asymptotic Optimality

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/688 ◽

2017 ◽

Cited By ~ 3

Author(s):

Jan Leike ◽

Tor Lattimore ◽

Laurent Orseau ◽

Marcus Hutter

Keyword(s):

Reinforcement Learning ◽

Asymptotic Optimality ◽

Thompson Sampling ◽

Stochastic Environments ◽

Optimal Value ◽

Partially Observable ◽

General Stochastic

We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.

Download Full-text

Sliding-Window Thompson Sampling for Non-Stationary Settings

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11407 ◽

2020 ◽

Vol 68 ◽

pp. 311-364

Author(s):

Francesco Trovo ◽

Stefano Paladino ◽

Marcello Restelli ◽

Nicola Gatti

Keyword(s):

Real World ◽

State Of The Art ◽

Sliding Window ◽

Upper Bounds ◽

Decision Problems ◽

Sequential Decision ◽

Thompson Sampling ◽

The Past ◽

Real World Applications ◽

Window Approach

Multi-Armed Bandit (MAB) techniques have been successfully applied to many classes of sequential decision problems in the past decades. However, non-stationary settings -- very common in real-world applications -- received little attention so far, and theoretical guarantees on the regret are known only for some frequentist algorithms. In this paper, we propose an algorithm, namely Sliding-Window Thompson Sampling (SW-TS), for nonstationary stochastic MAB settings. Our algorithm is based on Thompson Sampling and exploits a sliding-window approach to tackle, in a unified fashion, two different forms of non-stationarity studied separately so far: abruptly changing and smoothly changing. In the former, the reward distributions are constant during sequences of rounds, and their change may be arbitrary and happen at unknown rounds, while, in the latter, the reward distributions smoothly evolve over rounds according to unknown dynamics. Under mild assumptions, we provide regret upper bounds on the dynamic pseudo-regret of SW-TS for the abruptly changing environment, for the smoothly changing one, and for the setting in which both the non-stationarity forms are present. Furthermore, we empirically show that SW-TS dramatically outperforms state-of-the-art algorithms even when the forms of non-stationarity are taken separately, as previously studied in the literature.

Download Full-text