index policies
Recently Published Documents


TOTAL DOCUMENTS

65
(FIVE YEARS 6)

H-INDEX

12
(FIVE YEARS 2)

2020 ◽  
Vol 66 (7) ◽  
pp. 3029-3050 ◽  
Author(s):  
David B. Brown ◽  
James E. Smith

We consider dynamic selection problems, where a decision maker repeatedly selects a set of items from a larger collection of available items. A classic example is the dynamic assortment problem with demand learning, where a retailer chooses items to offer for sale subject to a display space constraint. The retailer may adjust the assortment over time in response to the observed demand. These dynamic selection problems are naturally formulated as stochastic dynamic programs (DPs) but are difficult to solve because the optimal selection decisions depend on the states of all items. In this paper, we study heuristic policies for dynamic selection problems and provide upper bounds on the performance of an optimal policy that can be used to assess the performance of a heuristic policy. The policies and bounds that we consider are based on a Lagrangian relaxation of the DP that relaxes the constraint limiting the number of items that may be selected. We characterize the performance of the Lagrangian index policy and bound and show that, under mild conditions, these policies and bounds are asymptotically optimal for problems with many items; mixed policies and tiebreaking play an essential role in the analysis of these index policies and can have a surprising impact on performance. We demonstrate these policies and bounds in two large scale examples: a dynamic assortment problem with demand learning and an applicant screening problem. This paper was accepted by Yinyu Ye, optimization.


2019 ◽  
Vol 47 (3) ◽  
pp. 213-218
Author(s):  
Franziska Eberle ◽  
Felix Fischer ◽  
Jannik Matuschke ◽  
Nicole Megow
Keyword(s):  

2017 ◽  
Vol 32 (2) ◽  
pp. 229-245 ◽  
Author(s):  
Sofía S. Villar

In a rare life-threatening disease setting the number of patients in the trial is a high proportion of all patients with the condition (if not all of them). Further, this number is usually not enough to guarantee the required statistical power to detect a treatment effect of a meaningful size. In such a context, the idea of prioritizing patient benefit over hypothesis testing as the goal of the trial can lead to a trial design that produces useful information to guide treatment, even if it does not do so with the standard levels of statistical confidence. The idealized model to consider such an optimal design of a clinical trial is known as a classic multi-armed bandit problem with a finite patient horizon and a patient benefit objective function. Such a design maximizes patient benefit by balancing the learning and earning goals as data accumulates and given the patient horizon. On the other hand, optimally solving such a model has a very high computational cost (many times prohibitive) and more importantly, a cumbersome implementation, even for populations as small as a hundred patients. Several computationally feasible heuristic rules to address this problem have been proposed over the last 40 years in the literature. In this paper, we study a novel heuristic approach to solve it based on the reformulation of the problem as a Restless bandit problem and the derivation of its corresponding Whittle Index (WI) rule. Such rule was recently proposed in the context of a clinical trial in Villar, Bowden, and Wason [16]. We perform extensive computational studies to compare through both exact value calculations and simulated values the performance of this rule, other index rules and simpler heuristics previously proposed in the literature. Our results suggest that for the two and three-armed case and a patient horizon less or equal than a hundred patients, all index rules are a priori practically identical in terms of the expected proportion of success attained when all arms start with a uniform prior. However, we find that a posteriori, for specific values of the parameters of interest, the index policies outperform the simpler rules in every instance and specially so in the case of many arms and a larger, though still relatively small, total number of patients with the diseases. The very good performance of bandit rules in terms of patient benefit (i.e., expected number of successes and mean number of patients allocated to the best arm, if it exists) makes them very appealing in context of the challenge posed by drug development and treatment for rare life-threatening diseases.


Sign in / Sign up

Export Citation Format

Share Document