scholarly journals Cost-aware Cascading Bandits

Author(s):  
Ruida Zhou ◽  
Chao Gan ◽  
Jing Yang ◽  
Cong Shen

In this paper, we propose a cost-aware cascading bandits model, a new variant of multi-armed bandits with cascading feedback, by considering the random cost of pulling arms. In each step, the learning agent chooses an {\it ordered} list of items and \congr{examines} them sequentially, until certain stopping condition is satisfied. Our objective is then to maximize the expected {\it net reward} in each step, i.e., the reward obtained in each step minus the total cost incurred in examining the items, by deciding the ordered list of items, as well as when to stop examination. We study both the offline and online settings, depending on whether the state and cost statistics of the items are known beforehand. For the offline setting, we show that the Unit Cost Ranking with Threshold 1 (UCR-T1) policy is optimal. For the online setting, we propose a Cost-aware Cascading Upper Confidence Bound (CC-UCB) algorithm, and show that the cumulative regret scales in $O(\log T)$. We also provide a lower bound for all $\alpha$-consistent policies, which scales in $\Omega(\log T)$ and matches our upper bound. The performance of the CC-UCB algorithm is evaluated with both synthetic and real-world data.

Author(s):  
Julian Berk ◽  
Sunil Gupta ◽  
Santu Rana ◽  
Svetha Venkatesh

In order to improve the performance of Bayesian optimisation, we develop a modified Gaussian process upper confidence bound (GP-UCB) acquisition function. This is done by sampling the exploration-exploitation trade-off parameter from a distribution. We prove that this allows the expected trade-off parameter to be altered to better suit the problem without compromising a bound on the function's Bayesian regret. We also provide results showing that our method achieves better performance than GP-UCB in a range of real-world and synthetic problems.


Author(s):  
Zejian Li ◽  
Yongchuan Tang ◽  
Wei Li ◽  
Yongxing He

Unsupervised disentangled representation learning is one of the foundational methods to learn interpretable factors in the data. Existing learning methods are based on the assumption that disentangled factors are mutually independent and incorporate this assumption with the evidence lower bound. However, our experiment reveals that factors in real-world data tend to be pairwise independent. Accordingly, we propose a new method based on a pairwise independence assumption to learn the disentangled representation. The evidence lower bound implicitly encourages mutual independence of latent codes so it is too strong for our assumption. Therefore, we introduce another lower bound in our method. Extensive experiments show that our proposed method gives competitive performances as compared with other state-of-the-art methods.


Author(s):  
Takanori Maehara ◽  
Atsuhiro Narita ◽  
Jun Baba ◽  
Takayuki Kawabata

Brand advertising is a type of advertising that aims at increasing the awareness of companies or products. This type of advertising is well studied in economic, marketing, and psychological literature; however, there are no studies in the area of computational advertising because the effect of such advertising is difficult to observe. In this study, we consider a real-time biding strategy for brand advertising. Here, our objective to maximizes the total number of users who remember the advertisement, averaged over the time. For this objective, we first introduce a new objective function that captures the cognitive psychological properties of memory retention, and can be optimized efficiently in the online setting (i.e., it is a monotone submodular function). Then, we propose an algorithm for the bid optimization problem with the proposed objective function under the second price mechanism by reducing the problem to the online knapsack constrained monotone submodular maximization problem. We evaluated the proposed objective function and the algorithm in a real-world data collected from our system and a questionnaire survey. We observed that our objective function is reasonable in real-world setting, and the proposed algorithm outperformed the baseline online algorithms.


Author(s):  
Xueying Guo ◽  
Xiaoxiao Wang ◽  
Xin Liu

In this paper, we propose and study opportunistic contextual bandits - a special case of contextual bandits where the exploration cost varies under different environmental conditions, such as network load or return variation in recommendations. When the exploration cost is low, so is the actual regret of pulling a sub-optimal arm (e.g., trying a suboptimal recommendation). Therefore, intuitively, we could explore more when the exploration cost is relatively low and exploit more when the exploration cost is relatively high. Inspired by this intuition, for opportunistic contextual bandits with Linear payoffs, we propose an Adaptive Upper-Confidence-Bound algorithm (AdaLinUCB) to adaptively balance the exploration-exploitation trade-off for opportunistic learning. We prove that AdaLinUCB achieves O((log T)^2) problem-dependent regret upper bound, which has a smaller coefficient than that of the traditional LinUCB algorithm. Moreover, based on both synthetic and real-world dataset, we show that AdaLinUCB significantly outperforms other contextual bandit algorithms, under large exploration cost fluctuations.


Author(s):  
Avinash Balakrishnan ◽  
Djallel Bouneffouf ◽  
Nicholas Mattei ◽  
Francesca Rossi

AI systems that learn through reward feedback about the actions they take are increasingly deployed in domains that have significant impact on our daily life. However, in many cases the online rewards should not be the only guiding criteria, as there are additional constraints and/or priorities imposed by regulations, values, preferences, or ethical principles. We detail a novel online agent that learns a set of behavioral constraints by observation and uses these learned constraints as a guide when making decisions in an online setting while still being reactive to reward feedback. To define this agent, we propose to adopt a novel extension to the classical contextual multi-armed bandit setting and we provide a new algorithm called Behavior Constrained Thompson Sampling (BCTS) that allows for online learning while obeying exogenous constraints. Our agent learns a constrained policy that implements the observed behavioral constraints demonstrated by a teacher agent, and then uses this constrained policy to guide the reward-based online exploration and exploitation. We characterize the upper bound on the expected regret of the contextual bandit algorithm that underlies our agent and provide a case study with real world data in two application domains. Our experiments show that the designed agent is able to act within the set of behavior constraints without significantly degrading its overall reward performance.


2016 ◽  
Vol 22 ◽  
pp. 219
Author(s):  
Roberto Salvatori ◽  
Olga Gambetti ◽  
Whitney Woodmansee ◽  
David Cox ◽  
Beloo Mirakhur ◽  
...  

VASA ◽  
2019 ◽  
Vol 48 (2) ◽  
pp. 134-147 ◽  
Author(s):  
Mirko Hirschl ◽  
Michael Kundi

Abstract. Background: In randomized controlled trials (RCTs) direct acting oral anticoagulants (DOACs) showed a superior risk-benefit profile in comparison to vitamin K antagonists (VKAs) for patients with nonvalvular atrial fibrillation. Patients enrolled in such studies do not necessarily reflect the whole target population treated in real-world practice. Materials and methods: By a systematic literature search, 88 studies including 3,351,628 patients providing over 2.9 million patient-years of follow-up were identified. Hazard ratios and event-rates for the main efficacy and safety outcomes were extracted and the results for DOACs and VKAs combined by network meta-analysis. In addition, meta-regression was performed to identify factors responsible for heterogeneity across studies. Results: For stroke and systemic embolism as well as for major bleeding and intracranial bleeding real-world studies gave virtually the same result as RCTs with higher efficacy and lower major bleeding risk (for dabigatran and apixaban) and lower risk of intracranial bleeding (all DOACs) compared to VKAs. Results for gastrointestinal bleeding were consistently better for DOACs and hazard ratios of myocardial infarction were significantly lower in real-world for dabigatran and apixaban compared to RCTs. By a ranking analysis we found that apixaban is the safest anticoagulant drug, while rivaroxaban closely followed by dabigatran are the most efficacious. Risk of bias and heterogeneity was assessed and had little impact on the overall results. Analysis of effect modification could guide the clinical decision as no single DOAC was superior/inferior to the others under all conditions. Conclusions: DOACs were at least as efficacious as VKAs. In terms of safety endpoints, DOACs performed better under real-world conditions than in RCTs. The current real-world data showed that differences in efficacy and safety, despite generally low event rates, exist between DOACs. Knowledge about these differences in performance can contribute to a more personalized medicine.


Sign in / Sign up

Export Citation Format

Share Document