thompson sampling Latest Research Papers

Este trabalho propõe uma Hiper-Heurística (HH) de seleção baseada na abordagem Thompson Sampling (TS) para a solução do Problema Quadrático de Alocação (PQA). O PQA tem como objetivo a alocação de instalações em um conjunto de possíveis localidades já conhecidas, a fim de minimizar o custo total de todas as movimentações entre as instalações. A HH proposta é aplicada na configuração automática de um algoritmo memético, atuando na seleção de uma combinação de heurísticas de baixo nível. Cada combinação envolve a seleção de uma heurística de recombinação, de uma estratégia de busca local e de uma heurística de mutação. O algoritmo foi analisado em 15 instâncias do benchmark Nug e o desempenho da HH é superior àquele obtido por qualquer combinação de heurísticas aplicada de forma isolada, demonstrando a sua eficiência na configuração automática do algoritmo. Os experimentos mostram que o desempenho da TS é afetado pela qualidade do conjunto de heurísticas de baixo nível. A melhor versão da HH obtém a solução ótima em 9 instâncias e o desvio médio percentual da solução ótima (gap), considerando todas as 15 instâncias foi de 8,6%, sendo que os maiores gaps foram encontrados para as três maiores instâncias.

Download Full-text

Decision tree Thompson sampling for mining hidden populations through attributed search

Social Network Analysis and Mining ◽

10.1007/s13278-021-00812-5 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Suhansanu Kumar ◽

Heting Gao ◽

Changyu Wang ◽

Kevin Chen-Chuan Chang ◽

Hari Sundaram

Keyword(s):

Decision Tree ◽

Hidden Populations ◽

Thompson Sampling

Download Full-text

Improving A/B Testing on the Basis of Possibilistic Reward Methods: A Numerical Analysis

Symmetry ◽

10.3390/sym13112175 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2175

Author(s):

Miguel Martín ◽

Antonio Jiménez-Martín ◽

Alfonso Mateos ◽

Josefa Z. Hernández

Keyword(s):

Numerical Analysis ◽

Stopping Criterion ◽

Personalized Service ◽

Thompson Sampling ◽

Randomized Controlled ◽

Testing Method ◽

Traffic Distribution ◽

Definition Of ◽

Number Of Customers ◽

Purchasing Process

A/B testing is used in digital contexts both to offer a more personalized service and to optimize the e-commerce purchasing process. A personalized service provides customers with the fastest possible access to the contents that they are most likely to use. An optimized e-commerce purchasing process reduces customer effort during online purchasing and assures that the largest possible number of customers place their order. The most widespread A/B testing method is to implement the equivalent of RCT (randomized controlled trials). Recently, however, some companies and solutions have addressed this experimentation process as a multi-armed bandit (MAB). This is known in the A/B testing market as dynamic traffic distribution. A complementary technique used to optimize the performance of A/B testing is to improve the experiment stopping criterion. In this paper, we propose an adaptation of A/B testing to account for possibilistic reward (PR) methods, together with the definition of a new stopping criterion also based on PR methods to be used for both classical A/B testing and A/B testing based on MAB algorithms. A comparative numerical analysis based on the simulation of real scenarios is used to analyze the performance of the proposed adaptations in both Bernoulli and non-Bernoulli environments. In this analysis, we show that the possibilistic reward method PR3 produced the lowest mean cumulative regret in non-Bernoulli environments, which proved to have a high confidence level and be highly stable as demonstrated by low standard deviation measures. PR3 behaves exactly the same as Thompson sampling in Bernoulli environments. The conclusion is that PR3 can be used efficiently in both environments in combination with the value remaining stopping criterion in Bernoulli environments and the PR3 bounds stopping criterion for non-Bernoulli environments.

Download Full-text

Seamlessly Unifying Attributes and Items: Conversational Recommendation for Cold-start Users

ACM Transactions on Information Systems ◽

10.1145/3446427 ◽

2021 ◽

Vol 39 (4) ◽

pp. 1-29

Author(s):

Shijun Li ◽

Wenqiang Lei ◽

Qingyun Wu ◽

Xiangnan He ◽

Peng Jiang ◽

...

Keyword(s):

State Of The Art ◽

Cold Start ◽

User Preference ◽

Inherent Limitation ◽

Trade Off ◽

Thompson Sampling ◽

Trade Offs ◽

Benchmark Datasets ◽

Item Attributes ◽

Exploration Exploitation

Static recommendation methods like collaborative filtering suffer from the inherent limitation of performing real-time personalization for cold-start users. Online recommendation, e.g., multi-armed bandit approach, addresses this limitation by interactively exploring user preference online and pursuing the exploration-exploitation (EE) trade-off. However, existing bandit-based methods model recommendation actions homogeneously. Specifically, they only consider the items as the arms, being incapable of handling the item attributes , which naturally provide interpretable information of user’s current demands and can effectively filter out undesired items. In this work, we consider the conversational recommendation for cold-start users, where a system can both ask the attributes from and recommend items to a user interactively. This important scenario was studied in a recent work [54]. However, it employs a hand-crafted function to decide when to ask attributes or make recommendations. Such separate modeling of attributes and items makes the effectiveness of the system highly rely on the choice of the hand-crafted function, thus introducing fragility to the system. To address this limitation, we seamlessly unify attributes and items in the same arm space and achieve their EE trade-offs automatically using the framework of Thompson Sampling. Our Conversational Thompson Sampling (ConTS) model holistically solves all questions in conversational recommendation by choosing the arm with the maximal reward to play. Extensive experiments on three benchmark datasets show that ConTS outperforms the state-of-the-art methods Conversational UCB (ConUCB) [54] and Estimation—Action—Reflection model [27] in both metrics of success rate and average number of conversation turns.

Download Full-text

Slotted ALOHA Based on Reinforcement Learning with Thompson Sampling

The Journal of Korean Institute of Communications and Information Sciences ◽

10.7840/kics.2021.46.10.1646 ◽

2021 ◽

Vol 46 (10) ◽

pp. 1646-1649

Author(s):

Yeong-Je Jo ◽

Gyung-Ho Hwang

Keyword(s):

Reinforcement Learning ◽

Slotted Aloha ◽

Thompson Sampling

Download Full-text

Search of Multiple Targets with Different Unknown Distributions Using Thompson Sampling

10.1109/isrimt53730.2021.9596971 ◽

2021 ◽

Author(s):

Muxin Li

Keyword(s):

Multiple Targets ◽

Thompson Sampling

Download Full-text

Thompson Sampling and Its Applications: Solution for Multi-robot Targets Search

10.1109/isrimt53730.2021.9596890 ◽

2021 ◽

Author(s):

Yifan Guo ◽

Hongqing Liu ◽

Jia Ye

Keyword(s):

Thompson Sampling ◽

Multi Robot

Download Full-text

Meta Dynamic Pricing: Transfer Learning Across Experiments

Management Science ◽

10.1287/mnsc.2021.4071 ◽

2021 ◽

Author(s):

Hamsa Bastani ◽

David Simchi-Levi ◽

Ruihao Zhu

Keyword(s):

Transfer Learning ◽

Dynamic Pricing ◽

Numerical Experiments ◽

Estimation Error ◽

Data Driven ◽

Special Issue ◽

Thompson Sampling ◽

Large N ◽

Alignment Technique ◽

Shared Structure

We study the problem of learning shared structure across a sequence of dynamic pricing experiments for related products. We consider a practical formulation in which the unknown demand parameters for each product come from an unknown distribution (prior) that is shared across products. We then propose a meta dynamic pricing algorithm that learns this prior online while solving a sequence of Thompson sampling pricing experiments (each with horizon T) for N different products. Our algorithm addresses two challenges: (i) balancing the need to learn the prior (meta-exploration) with the need to leverage the estimated prior to achieve good performance (meta-exploitation) and (ii) accounting for uncertainty in the estimated prior by appropriately “widening” the estimated prior as a function of its estimation error. We introduce a novel prior alignment technique to analyze the regret of Thompson sampling with a misspecified prior, which may be of independent interest. Unlike prior-independent approaches, our algorithm’s meta regret grows sublinearly in N, demonstrating that the price of an unknown prior in Thompson sampling can be negligible in experiment-rich environments (large N). Numerical experiments on synthetic and real auto loan data demonstrate that our algorithm significantly speeds up learning compared with prior-independent algorithms. This paper was accepted by George J. Shanthikumar for the Management Science Special Issue on Data-Driven Analytics.

Download Full-text

Thompson Sampling for Bandits with Clustered Arms

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/305 ◽

2021 ◽

Author(s):

Emil Carlsson ◽

Devdatt Dubhashi ◽

Fredrik D. Johansson

Keyword(s):

Cluster Structure ◽

Computational Cost ◽

Empirical Evaluation ◽

Upper Bounds ◽

Sampling Scheme ◽

Thompson Sampling ◽

Multi Level

We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost compared to using standard Thompson sampling. In the case of the stochastic multi-armed bandit we give upper bounds on the expected cumulative regret showing how it depends on the quality of the clustering. Finally, we perform an empirical evaluation showing that our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms.

Download Full-text

thompson sampling
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Adaptive Sequence-Based Stimulus Selection in an ERP-Based Brain-Computer Interface by Thompson Sampling in a Multi-Armed Bandit Problem

Thompson Sampling in Heuristic Selection for the Quadratic Assignment Problem

Decision tree Thompson sampling for mining hidden populations through attributed search

Improving A/B Testing on the Basis of Possibilistic Reward Methods: A Numerical Analysis

Seamlessly Unifying Attributes and Items: Conversational Recommendation for Cold-start Users

Slotted ALOHA Based on Reinforcement Learning with Thompson Sampling

Search of Multiple Targets with Different Unknown Distributions Using Thompson Sampling

Thompson Sampling and Its Applications: Solution for Multi-robot Targets Search

Meta Dynamic Pricing: Transfer Learning Across Experiments

Thompson Sampling for Bandits with Clustered Arms

Export Citation Format

thompson samplingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Adaptive Sequence-Based Stimulus Selection in an ERP-Based Brain-Computer Interface by Thompson Sampling in a Multi-Armed Bandit Problem

Thompson Sampling in Heuristic Selection for the Quadratic Assignment Problem

Decision tree Thompson sampling for mining hidden populations through attributed search

Improving A/B Testing on the Basis of Possibilistic Reward Methods: A Numerical Analysis

Seamlessly Unifying Attributes and Items: Conversational Recommendation for Cold-start Users

Slotted ALOHA Based on Reinforcement Learning with Thompson Sampling

Search of Multiple Targets with Different Unknown Distributions Using Thompson Sampling

Thompson Sampling and Its Applications: Solution for Multi-robot Targets Search

Meta Dynamic Pricing: Transfer Learning Across Experiments

Thompson Sampling for Bandits with Clustered Arms

thompson sampling
Recently Published Documents