Fleet Repositioning for Vehicle Sharing Systems: the Optimality of Balanced Myopic Policy

2021 ◽  
Author(s):  
Yihang Yang ◽  
Yimin Yu ◽  
Qian Wang ◽  
Junming Liu
Keyword(s):  
1999 ◽  
Vol 12 (2) ◽  
pp. 151-160 ◽  
Author(s):  
Doncho S. Donchev

We consider the symmetric Poissonian two-armed bandit problem. For the case of switching arms, only one of which creates reward, we solve explicitly the Bellman equation for a β-discounted reward and prove that a myopic policy is optimal.


2021 ◽  
Author(s):  
Xiao-Yue Gong ◽  
Vineet Goyal ◽  
Garud N. Iyengar ◽  
David Simchi-Levi ◽  
Rajan Udwani ◽  
...  

We consider an online assortment optimization problem where we have n substitutable products with fixed reusable capacities [Formula: see text]. In each period t, a user with some preferences (potentially adversarially chosen) who offers a subset of products, St, from the set of available products arrives at the seller’s platform. The user selects product [Formula: see text] with probability given by the preference model and uses it for a random number of periods, [Formula: see text], that is distributed i.i.d. according to some distribution that depends only on j generating a revenue [Formula: see text] for the seller. The goal of the seller is to find a policy that maximizes the expected cumulative revenue over a finite horizon T. Our main contribution is to show that a simple myopic policy (where we offer the myopically optimal assortment from the available products to each user) provides a good approximation for the problem. In particular, we show that the myopic policy is 1/2-competitive, that is, the expected cumulative revenue of the myopic policy is at least half the expected revenue of the optimal policy with full information about the sequence of user preference models and the distribution of random usage times of all the products. In contrast, the myopic policy does not require any information about future arrivals or the distribution of random usage times. The analysis is based on a coupling argument that allows us to bound the expected revenue of the optimal algorithm in terms of the expected revenue of the myopic policy. We also consider the setting where usage time distributions can depend on the type of each user and show that in this more general case there is no online algorithm with a nontrivial competitive ratio guarantee. Finally, we perform numerical experiments to compare the robustness and performance of myopic policy with other natural policies. This paper was accepted by Gabriel Weintraub, revenue management and analytics.


2012 ◽  
Vol 27 (1) ◽  
pp. 53-55
Author(s):  
Sheldon M. Ross

Consider two independent Markov chains having states 0, 1, and identical transition probabilities. At each stage one of the chains is observed, and a reward equal to the observed state is earned. Assuming prior probabilities on the initial states of the chains it is shown that the myopic policy that always chooses to observe the chain most likely to be in state 1 stochastically maximizes the sequence of rewards earned in each period.


2016 ◽  
Vol 19 (03) ◽  
pp. 1650018 ◽  
Author(s):  
MICHELE LONGO ◽  
ALESSANDRA MAININI

We maximize the expected utility from terminal wealth for a Constant Relative Risk Aversion (CRRA) investor when the market price of risk is an unobservable random variable and explore the effects of learning by comparing the optimal portfolio under partial observation with the corresponding myopic policy. In particular, we show that, for a market price of risk constant in sign, the ratio between the portfolio under partial observation and its myopic counterpart increases with respect to risk tolerance. As a consequence, the absolute value of the partial observation case is larger (smaller) than the myopic one if the investor is more (less) risk tolerant than the logarithmic investor. Moreover, our explicit computations enable to study in detail the so called hedging demand induced by parameter uncertainty.


Sign in / Sign up

Export Citation Format

Share Document