scholarly journals Enhancing Greedy Policy Techniques for Complex Cost-Sensitive Problems

10.5772/6357 ◽  
2008 ◽  
Author(s):  
Camelia Vidrighin ◽  
Rodica Potole
Keyword(s):  
2020 ◽  
Author(s):  
Alberto Vera ◽  
Siddhartha Banerjee

We develop a new framework for designing online policies given access to an oracle providing statistical information about an off-line benchmark. Having access to such prediction oracles enables simple and natural Bayesian selection policies and raises the question as to how these policies perform in different settings. Our work makes two important contributions toward this question: First, we develop a general technique we call compensated coupling, which can be used to derive bounds on the expected regret (i.e., additive loss with respect to a benchmark) for any online policy and off-line benchmark. Second, using this technique, we show that a natural greedy policy, which we call the Bayes selector, has constant expected regret (i.e., independent of the number of arrivals and resource levels) for a large class of problems we refer to as “online allocation with finite types,” which includes widely studied online packing and online matching problems. Our results generalize and simplify several existing results for online packing and online matching and suggest a promising pathway for obtaining oracle-driven policies for other online decision-making settings. This paper was accepted by George Shanthikumar, big data analytics.


Author(s):  
Maury Bramson ◽  
Bernardo D’Auria ◽  
Neil Walton

Consider a switched queueing network with general routing among its queues. The MaxWeight policy assigns available service by maximizing the objective function [Formula: see text] among the different feasible service options, where [Formula: see text] denotes queue size and [Formula: see text] denotes the amount of service to be executed at queue [Formula: see text]. MaxWeight is a greedy policy that does not depend on knowledge of arrival rates and is straightforward to implement. These properties and its simple formulation suggest MaxWeight as a serious candidate for implementation in the setting of switched queueing networks; MaxWeight has been extensively studied in the context of communication networks. However, a fluid model variant of MaxWeight was previously shown not to be maximally stable. Here, we prove that MaxWeight itself is not in general maximally stable. We also prove MaxWeight is maximally stable in a much more restrictive setting, and that a weighted version of MaxWeight, where the weighting depends on the traffic intensity, is always stable.


Symmetry ◽  
2019 ◽  
Vol 11 (11) ◽  
pp. 1352 ◽  
Author(s):  
Kim ◽  
Park

In terms of deep reinforcement learning (RL), exploration is highly significant in achieving better generalization. In benchmark studies, ε-greedy random actions have been used to encourage exploration and prevent over-fitting, thereby improving generalization. Deep RL with random ε-greedy policies, such as deep Q-networks (DQNs), can demonstrate efficient exploration behavior. A random ε-greedy policy exploits additional replay buffers in an environment of sparse and binary rewards, such as in the real-time online detection of network securities by verifying whether the network is “normal or anomalous.” Prior studies have illustrated that a prioritized replay memory attributed to a complex temporal difference error provides superior theoretical results. However, another implementation illustrated that in certain environments, the prioritized replay memory is not superior to the randomly-selected buffers of random ε-greedy policy. Moreover, a key challenge of hindsight experience replay inspires our objective by using additional buffers corresponding to each different goal. Therefore, we attempt to exploit multiple random ε-greedy buffers to enhance explorations for a more near-perfect generalization with one original goal in off-policy RL. We demonstrate the benefit of off-policy learning from our method through an experimental comparison of DQN and a deep deterministic policy gradient in terms of discrete action, as well as continuous control for complete symmetric environments.


2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Xinglin Yu ◽  
Yuhu Wu ◽  
Xi-Ming Sun ◽  
Wenya Zhou

Abstract Balancing the exploration and exploitation in reinforcement learning is a commonly dilemma and time-wasting work. In this paper, a novel exploration policy used in Q-Learning, called Memory-greedy policy, is proposed to accelerate learning. By memory storage and playback, the probability of random action selecting can be effectively dealt with or reduced, which hence speeds up learning. The principle of this policy is analyzed by maze scene, and the theoretical convergence is given according to dynamic programming.


Author(s):  
Xinyi Li ◽  
Hui He ◽  
Pengfei Chen ◽  
Xiaohui Zhang ◽  
Li Su ◽  
...  
Keyword(s):  

1994 ◽  
Vol 26 (04) ◽  
pp. 1095-1116 ◽  
Author(s):  
Eitan Altman ◽  
Hanoch Levy

We consider a problem in which a single server must serve a stream of customers whose arrivals are distributed over a finite-size convex space. Under the assumption that the server has full information on the customer location, obvious service policies are the FCFS and the greedy (serve-the-closest-customer) approaches. These algorithms are, however, either inefficient (FCFS) or ‘unfair' (greedy). We propose and study two alternative algorithms, the gated-greedy policy and the gated-scan policy, which are more ‘fair' than the pure greedy method. We show that the stability conditions of the gated-greedy are p < 1 (where p is the expected rate at which work arrives at the system), implying that the method is at least as efficient (in terms of system stability) as any other discipline, in particular the greedy one. For the gated-scan policy we show that for any p < 1 one can design a stable gated-scan policy; however, for any fixed gated-scan policy there exists p < 1 for which the policy is unstable. We evaluate the performance of the gated-scan policy, and present bounds for the performance of the gated-greedy policy. These results are derived for systems in which the arrivals occur on a two-dimensional space (a square) but they are not limited to this configuration; rather they hold for more complex N-dimensional spaces, in particular for serving customers in (three-dimensional) convex space and serving customers on a line.


2009 ◽  
Vol 54 (12) ◽  
pp. 2787-2802 ◽  
Author(s):  
A.J. Mersereau ◽  
P. Rusmevichientong ◽  
J.N. Tsitsiklis

2021 ◽  
Vol 15 (5) ◽  
pp. 1-23
Author(s):  
Jianxiong Guo ◽  
Weili Wu

Influence maximization problem attempts to find a small subset of nodes that makes the expected influence spread maximized, which has been researched intensively before. They all assumed that each user in the seed set we select is activated successfully and then spread the influence. However, in the real scenario, not all users in the seed set are willing to be an influencer. Based on that, we consider each user associated with a probability with which we can activate her as a seed, and we can attempt to activate her many times. In this article, we study the adaptive influence maximization with multiple activations (Adaptive-IMMA) problem, where we select a node in each iteration, observe whether she accepts to be a seed, if yes, wait to observe the influence diffusion process; if no, we can attempt to activate her again with a higher cost or select another node as a seed. We model the multiple activations mathematically and define it on the domain of integer lattice. We propose a new concept, adaptive dr-submodularity, and show our Adaptive-IMMA is the problem that maximizing an adaptive monotone and dr-submodular function under the expected knapsack constraint. Adaptive dr-submodular maximization problem is never covered by any existing studies. Thus, we summarize its properties and study its approximability comprehensively, which is a non-trivial generalization of existing analysis about adaptive submodularity. Besides, to overcome the difficulty to estimate the expected influence spread, we combine our adaptive greedy policy with sampling techniques without losing the approximation ratio but reducing the time complexity. Finally, we conduct experiments on several real datasets to evaluate the effectiveness and efficiency of our proposed policies.


Sign in / Sign up

Export Citation Format

Share Document