DUELING BANDIT PROBLEMS

Author(s):  
Erol Peköz ◽  
Sheldon M. Ross ◽  
Zhengyu Zhang

There is a set of n bandits and at every stage, two of the bandits are chosen to play a game, with the result of a game being learned. In the “weak regret problem,” we suppose there is a “best” bandit that wins each game it plays with probability at least p > 1/2, with the value of p being unknown. The objective is to choose bandits to maximize the number of times that one of the competitors is the best bandit. In the “strong regret problem”, we suppose that bandit i has unknown value v i , i = 1, …, n, and that i beats j with probability v i /(v i  + v j ). One version of strong regret is interested in maximizing the number of times that the contest is between the players with the two largest values. Another version supposes that at any stage, rather than choosing two arms to play a game, the decision maker can declare that a particular arm is the best, with the objective of maximizing the number of stages in which the arm with the largest value is declared to be the best. In the weak regret problem, we propose a policy and obtain an analytic bound on the expected number of stages over an infinite time frame that the best arm is not one of the competitors when this policy is employed. In the strong regret problem, we propose a Thompson sampling type algorithm and empirically compare its performance with others in the literature.

2020 ◽  
Vol 45 (2) ◽  
pp. 175-183
Author(s):  
Deepjyoti Choudhury ◽  
Dibyojyoti Bhattacharjee

The importance of wicketkeeper in a cricket team is indispensable. A perfect wicketkeeper keeps the morale of the team high and acts as the confidence booster to the bowler and the entire team. The performance of a wicketkeeper in a match can change the fate of the game. A wicketkeeper should be capable of several cricketing skills like stamping, catching and appealing for a dismissal and in these days an important decision maker of call for a decision review. However, the International Cricket Council (ICC), which regularly publishes ranking for batsman, bowlers, all-rounders and cricket teams, do not produce any ranking for wicketkeepers. Thus, considering the importance of a wicketkeeper in a cricket team, the researchers feel that it is necessary to identify the performance of wicketkeeper. The exercise can help in selecting the best keeper for a team from a host of available options. In this study an attempt has been made to rank the performance of wicketkeepers with the help of Sharpe ratio. Here the score card of wicket keeping data of selected players are taken from last five editions of Indian Premiere League (IPL) and accordingly the wicketkeeper with most expected number of dismissals is extracted. Based on the descriptive statistics, the sharpe ratio is calculated to identify and rank of the wicketkeepers.


Econometrica ◽  
2021 ◽  
Vol 89 (4) ◽  
pp. 1717-1751
Author(s):  
Olivier Gossner ◽  
Jakub Steiner ◽  
Colin Stewart

We study the impact of manipulating the attention of a decision‐maker who learns sequentially about a number of items before making a choice. Under natural assumptions on the decision‐maker's strategy, directing attention toward one item increases its likelihood of being chosen regardless of its value. This result applies when the decision‐maker can reject all items in favor of an outside option with known value; if no outside option is available, the direction of the effect of manipulation depends on the value of the item. A similar result applies to manipulation of choices in bandit problems.


2016 ◽  
Vol 32 (4) ◽  
pp. 265-275 ◽  
Author(s):  
Shannon E. Kelly ◽  
David Moher ◽  
Tammy J. Clifford

Objectives: Rapid reviews are characterized as an accelerated evidence synthesis approach with no universally accepted methodology or definition. This modified Delphi consensus study aimed to develop a comprehensive set of defining characteristics for rapid reviews that may be used as a functional definition.Methods: Expert panelists with knowledge in rapid reviews and evidence synthesis were identified. In the first round, panelists were asked to answer a seventeen-item survey addressing a variety of rapid review topics. Results led to the development of statements describing the characteristics of rapid reviews that were circulated to experts for agreement in a second survey round and further revised in a third round. Consensus was reached if ≥70 percent of experts agreed and there was stability in free-text comments.Results: A panel of sixty-six experts participated. Consensus was reached on ten of eleven statements describing the characteristics of rapid reviews. According to the panel, rapid reviews aim to meet the requirements and timelines of a decision maker and should be conducted in less time than a systematic review. They use a variety of approaches to accelerate the evidence synthesis process, tailor the methods conventionally used to carry out systematic reviews, and use the most rigorous methods that the delivery time frame will allow.Conclusions: This study achieved consensus on ten statements describing the defining characteristics of rapid reviews based on the opinion of a panel of knowledgeable experts. Areas of disagreement were also highlighted. Findings emphasize the role of the decision maker and stress the importance of transparent reporting.


2020 ◽  
Vol 34 (04) ◽  
pp. 4932-4939
Author(s):  
Yong Liu ◽  
Yingtai Xiao ◽  
Qiong Wu ◽  
Chunyan Miao ◽  
Juyong Zhang ◽  
...  

Interactive recommender systems that enable the interactions between users and the recommender system have attracted increasing research attention. Previous methods mainly focus on optimizing recommendation accuracy. However, they usually ignore the diversity of the recommendation results, thus usually results in unsatisfying user experiences. In this paper, we propose a novel diversified recommendation model, named Diversified Contextual Combinatorial Bandit (DC2B), for interactive recommendation with users' implicit feedback. Specifically, DC2B employs determinantal point process in the recommendation procedure to promote diversity of the recommendation results. To learn the model parameters, a Thompson sampling-type algorithm based on variational Bayesian inference is proposed. In addition, theoretical regret analysis is also provided to guarantee the performance of DC2B. Extensive experiments on real datasets are performed to demonstrate the effectiveness of the proposed method in balancing the recommendation accuracy and diversity.


2020 ◽  
Vol 59 (SI) ◽  
pp. SIIG01
Author(s):  
D. Etoh ◽  
T. Tsuchiya ◽  
Y. Kitagawa ◽  
M. Takayanagi ◽  
Y. Itoh ◽  
...  

2020 ◽  
Vol 71 (7) ◽  
pp. 453-458
Author(s):  
Takashi TSUCHIYA ◽  
Kazuya TERABE

2020 ◽  
Vol 41 (2) ◽  
pp. 61-67
Author(s):  
Marko Tončić ◽  
Petra Anić

Abstract. This study aims to examine the effect of affect on satisfaction, both at the between- and the within-person level for momentary assessments. Affect is regarded as an important source of information for life satisfaction judgments. This affective effect on satisfaction is well established at the dispositional level, while at the within-person level it is heavily under-researched. This is true especially for momentary assessments. In this experience sampling study both mood and satisfaction scales were administered five times a day for 7 days via hand-held devices ( N = 74 with 2,122 assessments). Several hierarchical linear models were fitted to the data. Even though the amount of between-person variance was relatively low, both positive and negative affect had substantial effects on momentary satisfaction on the between- and the within-person level as well. The within-person effects of affect on satisfaction appear to be more pronounced than the between-person ones. At the momentary level, the amount of between-person variance is lower than in studies with longer time-frames. The affect-related effects on satisfaction possibly have a curvilinear relationship with the time-frame used, increasing in intensity up to a point and then decreasing again. Such a relationship suggests that, at the momentary level, satisfaction might behave in a more stochastic manner, allowing for transient events/data which are not necessarily affect-related to affect it.


2012 ◽  
Author(s):  
Michael Schulte-Mecklenbeck ◽  
Thorsten Pachur ◽  
Ryan O. Murphy ◽  
Ralph Hertwig

Sign in / Sign up

Export Citation Format

Share Document