Fleet Repositioning for Vehicle Sharing Systems: the Optimality of Balanced Myopic Policy

Exact solution of the Bellman equation for a β-discounted reward in a two-armed bandit with switching arms

Journal of Applied Mathematics and Stochastic Analysis ◽

10.1155/s1048953399000155 ◽

1999 ◽

Vol 12 (2) ◽

pp. 151-160 ◽

Cited By ~ 1

Author(s):

Doncho S. Donchev

Keyword(s):

Exact Solution ◽

Bellman Equation ◽

Bandit Problem ◽

Myopic Policy

We consider the symmetric Poissonian two-armed bandit problem. For the case of switching arms, only one of which creates reward, we solve explicitly the Bellman equation for a β-discounted reward and prove that a myopic policy is optimal.

Download Full-text

Myopic policy bounds for POMDPs and sensitivity to model parameters

Partially Observed Markov Decision Processes ◽

10.1017/cbo9781316471104.018 ◽

2016 ◽

pp. 312-340

Author(s):

Vikram Krishnamurthy

Keyword(s):

Model Parameters ◽

Myopic Policy

Download Full-text

Online Assortment Optimization with Reusable Resources

Management Science ◽

10.1287/mnsc.2021.4134 ◽

2021 ◽

Author(s):

Xiao-Yue Gong ◽

Vineet Goyal ◽

Garud N. Iyengar ◽

David Simchi-Levi ◽

Rajan Udwani ◽

...

Keyword(s):

Optimization Problem ◽

Online Algorithm ◽

Random Number ◽

Optimal Algorithm ◽

User Preference ◽

Substitutable Products ◽

Assortment Optimization ◽

Myopic Policy ◽

Expected Revenue ◽

And Performance

We consider an online assortment optimization problem where we have n substitutable products with fixed reusable capacities [Formula: see text]. In each period t, a user with some preferences (potentially adversarially chosen) who offers a subset of products, St, from the set of available products arrives at the seller’s platform. The user selects product [Formula: see text] with probability given by the preference model and uses it for a random number of periods, [Formula: see text], that is distributed i.i.d. according to some distribution that depends only on j generating a revenue [Formula: see text] for the seller. The goal of the seller is to find a policy that maximizes the expected cumulative revenue over a finite horizon T. Our main contribution is to show that a simple myopic policy (where we offer the myopically optimal assortment from the available products to each user) provides a good approximation for the problem. In particular, we show that the myopic policy is 1/2-competitive, that is, the expected cumulative revenue of the myopic policy is at least half the expected revenue of the optimal policy with full information about the sequence of user preference models and the distribution of random usage times of all the products. In contrast, the myopic policy does not require any information about future arrivals or the distribution of random usage times. The analysis is based on a coupling argument that allows us to bound the expected revenue of the optimal algorithm in terms of the expected revenue of the myopic policy. We also consider the setting where usage time distributions can depend on the type of each user and show that in this more general case there is no online algorithm with a nontrivial competitive ratio guarantee. Finally, we perform numerical experiments to compare the robustness and performance of myopic policy with other natural policies. This paper was accepted by Gabriel Weintraub, revenue management and analytics.

Download Full-text

Capacity Constraints, Inflation and the Transmission Mechanism: Forward-Looking Versus Myopic Policy Rules

SSRN Electronic Journal ◽

10.2139/ssrn.883223 ◽

1995 ◽

Author(s):

Douglas Laxton

Keyword(s):

Capacity Constraints ◽

Transmission Mechanism ◽

Policy Rules ◽

Myopic Policy ◽

Forward Looking

Download Full-text

Multi-Access Communications With Energy Harvesting: A Multi-Armed Bandit Model and the Optimality of the Myopic Policy

IEEE Journal on Selected Areas in Communications ◽

10.1109/jsac.2015.2391852 ◽

2015 ◽

Vol 33 (3) ◽

pp. 585-597 ◽

Cited By ~ 17

Author(s):

Pol Blasco ◽

Deniz Gunduz

Keyword(s):

Energy Harvesting ◽

Myopic Policy ◽

Multi Access

Download Full-text

A MARKOV CHAIN CHOICE PROBLEM

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964812000290 ◽

2012 ◽

Vol 27 (1) ◽

pp. 53-55

Author(s):

Sheldon M. Ross

Keyword(s):

Markov Chain ◽

Markov Chains ◽

Transition Probabilities ◽

Choice Problem ◽

Prior Probabilities ◽

Myopic Policy ◽

Initial States ◽

State 1

Consider two independent Markov chains having states 0, 1, and identical transition probabilities. At each stage one of the chains is observed, and a reward equal to the observed state is earned. Assuming prior probabilities on the initial states of the chains it is shown that the myopic policy that always chooses to observe the chain most likely to be in state 1 stochastically maximizes the sequence of rewards earned in each period.

Download Full-text

LEARNING AND PORTFOLIO DECISIONS FOR CRRA INVESTORS

International Journal of Theoretical and Applied Finance ◽

10.1142/s0219024916500187 ◽

2016 ◽

Vol 19 (03) ◽

pp. 1650018 ◽

Cited By ~ 5

Author(s):

MICHELE LONGO ◽

ALESSANDRA MAININI

Keyword(s):

Market Price ◽

Random Variable ◽

Risk Tolerance ◽

Partial Observation ◽

Market Price Of Risk ◽

Absolute Value ◽

Portfolio Decisions ◽

Price Of Risk ◽

Myopic Policy ◽

Hedging Demand

We maximize the expected utility from terminal wealth for a Constant Relative Risk Aversion (CRRA) investor when the market price of risk is an unobservable random variable and explore the effects of learning by comparing the optimal portfolio under partial observation with the corresponding myopic policy. In particular, we show that, for a market price of risk constant in sign, the ratio between the portfolio under partial observation and its myopic counterpart increases with respect to risk tolerance. As a consequence, the absolute value of the partial observation case is larger (smaller) than the myopic one if the investor is more (less) risk tolerant than the logarithmic investor. Moreover, our explicit computations enable to study in detail the so called hedging demand induced by parameter uncertainty.

Download Full-text