OPTIMALITY OF FOUR-THRESHOLD POLICIES IN INVENTORY SYSTEMS WITH CUSTOMER RETURNS AND BORROWING/STORAGE OPTIONS

2005 ◽  
Vol 19 (1) ◽  
pp. 45-71 ◽  
Author(s):  
Eugene A. Feinberg ◽  
Mark E. Lewis

Consider a single-commodity inventory system in which the demand is modeled by a sequence of independent and identically distributed random variables that can take negative values. Such problems have been studied in the literature under the namecash managementand relate to the variations of the on-hand cash balances of financial institutions. The possibility of a negative demand also models product returns in inventory systems. This article studies a model in which, in addition to standard ordering and scrapping decisions seen in the cash management models, the decision-maker can borrow and store some inventory for one period of time. For problems with back orders, zero setup costs, and linear ordering, scrapping, borrowing, and storage costs, we show that an optimal policy has a simple four-threshold structure. These thresholds, in a nondecreasing order, are order-up-to, borrow-up-to, store-down-to, and scrap-down-to levels; that is, if the inventory position is too low, an optimal policy is to order up to a certain level and then borrow up to a higher level. Analogously, if the inventory position is too high, the optimal decision is to reduce the inventory to a certain point, after which one should store some of the inventory down to a lower threshold. This structure holds for the finite and infinite horizon discounted expected cost criteria and for the average cost per unit time criterion. We also provide sufficient conditions when the borrowing and storage options should not be used. In order to prove our results for average costs per unit time, we establish sufficient conditions when the optimality equations hold for a Markov decision process with an uncountable state space, noncompact action sets, and unbounded costs.

Entropy ◽  
2021 ◽  
Vol 23 (1) ◽  
pp. 91
Author(s):  
Yuchao Chen ◽  
Haoyue Tang ◽  
Jintao Wang ◽  
Jian Song

In this paper, we consider a scenario where the base station (BS) collects time-sensitive data from multiple sensors through time-varying and error-prone channels. We characterize the data freshness at the terminal end through a class of monotone increasing functions related to Age of information (AoI). Our goal is to design an optimal policy to minimize the average age penalty of all sensors in infinite horizon under bandwidth and power constraint. By formulating the scheduling problem into a constrained Markov decision process (CMDP), we reveal the threshold structure for the optimal policy and approximate the optimal decision by solving a truncated linear programming (LP). Finally, a bandwidth-truncated policy is proposed to satisfy both power and bandwidth constraint. Through theoretical analysis and numerical simulations, we prove the proposed policy is asymptotic optimal in the large sensor regime.


Author(s):  
Alain Jean-Marie ◽  
Mabel Tidball ◽  
Víctor Bucarey López

We consider a discrete-time, infinite-horizon dynamic game of groundwater extraction. A Water Agency charges an extraction cost to water users and controls the marginal extraction cost so that it depends not only on the level of groundwater but also on total water extraction (through a parameter [Formula: see text] that represents the degree of strategic interactions between water users) and on rainfall (through parameter [Formula: see text]). The water users are selfish and myopic, and the goal of the agency is to give them incentives so as to improve their total discounted welfare. We look at this problem in several situations. In the first situation, the parameters [Formula: see text] and [Formula: see text] are considered to be fixed over time. The first result shows that when the Water Agency is patient (the discount factor tends to 1), the optimal marginal extraction cost asks for strategic interactions between agents. The contrary holds for a discount factor near 0. In a second situation, we look at the dynamic Stackelberg game where the Agency decides at each time what cost parameter they must announce. We study theoretically and numerically the solution to this problem. Simulations illustrate the possibility that threshold policies are good candidates for optimal policies.


2002 ◽  
Vol 39 (01) ◽  
pp. 20-37 ◽  
Author(s):  
Mark E. Lewis ◽  
Hayriye Ayhan ◽  
Robert D. Foley

We consider a finite-capacity queueing system where arriving customers offer rewards which are paid upon acceptance into the system. The gatekeeper, whose objective is to ‘maximize’ rewards, decides if the reward offered is sufficient to accept or reject the arriving customer. Suppose the arrival rates, service rates, and system capacity are changing over time in a known manner. We show that all bias optimal (a refinement of long-run average reward optimal) policies are of threshold form. Furthermore, we give sufficient conditions for the bias optimal policy to be monotonic in time. We show, via a counterexample, that if these conditions are violated, the optimal policy may not be monotonic in time or of threshold form.


Sensors ◽  
2019 ◽  
Vol 19 (14) ◽  
pp. 3231 ◽  
Author(s):  
Jiuyun Xu ◽  
Zhuangyuan Hao ◽  
Xiaoting Sun

Mobile edge computing (MEC) has become more popular both in academia and industry. Currently, with the help of edge servers and cloud servers, it is one of the substantial technologies to overcome the latency between cloud server and wireless device, computation capability and storage shortage of wireless devices. In mobile edge computing, wireless devices take responsibility with input data. At the same time, edge servers and cloud servers take charge of computation and storage. However, until now, how to balance the power consumption of edge devices and time delay has not been well addressed in mobile edge computing. In this paper, we focus on strategies of the task offloading decision and the influence analysis of offloading decisions on different environments. Firstly, we propose a system model considering both energy consumption and time delay and formulate it into an optimization problem. Then, we employ two algorithms—Enumerating and Branch-and-Bound—to get the optimal or near-optimal decision for minimizing the system cost including the time delay and energy consumption. Furthermore, we compare the performance between two algorithms and draw the conclusion that the comprehensive performance of Branch-and-Bound algorithm is better than that of the other. Finally, we analyse the influence factors of optimal offloading decisions and the minimum cost in detail by changing key parameters.


1991 ◽  
Vol 28 (02) ◽  
pp. 384-396 ◽  
Author(s):  
Wolfgang Stadje ◽  
Dror Zuckerman

In this study we examine repairable systems with random lifetime. Upon failure, a maintenance action, specifying the degree of repair, is taken by a controller. The objective is to determine an age-dependent maintenance strategy which minimizes the total expected discounted cost over an infinite planning horizon. Using several properties of the optimal policy which are derived in this study, we propose analytical and numerical methods for determining the optimal maintenance strategy. In order to obtain a better insight regarding the structure and nature of the optimal policy and to illustrate computational procedures, a numerical example is analysed. The proposed maintenance model outlines a new research channel in the area of reliability with interesting theoretical issues and a wide range of potential applications in various fields such as product design, inventory systems for spare parts, and management of maintenance crews.


2009 ◽  
Vol 2009 ◽  
pp. 1-16 ◽  
Author(s):  
Huiling Wu ◽  
Fengde Chen

A single species stage-structured model incorporating both toxicant and harvesting is proposed and studied. It is shown that toxicant has no influence on the persistent property of the system. The existence of the bionomic equilibrium is also studied. After that, we consider the system with variable harvest effect; sufficient conditions are obtained for the global stability of bionomic equilibrium by constructing a suitable Lyapunov function. The optimal policy is also investigated by using Pontryagin's maximal principle. Some numeric simulations are carried out to illustrate the feasibility of the main results. We end this paper by a brief discussion.


1983 ◽  
Vol 15 (2) ◽  
pp. 274-303 ◽  
Author(s):  
Arie Hordijk ◽  
Frank A. Van Der Duyn Schouten

Recently the authors introduced the concept of Markov decision drift processes. A Markov decision drift process can be seen as a straightforward generalization of a Markov decision process with continuous time parameter. In this paper we investigate the existence of stationary average optimal policies for Markov decision drift processes. Using a well-known Abelian theorem we derive sufficient conditions, which guarantee that a ‘limit point' of a sequence of discounted optimal policies with the discounting factor approaching 1 is an average optimal policy. An alternative set of sufficient conditions is obtained for the case in which the discounted optimal policies generate regenerative stochastic processes. The latter set of conditions is easier to verify in several applications. The results of this paper are also applicable to Markov decision processes with discrete or continuous time parameter and to semi-Markov decision processes. In this sense they generalize some well-known results for Markov decision processes with finite or compact action space. Applications to an M/M/1 queueing model and a maintenance replacement model are given. It is shown that under certain conditions on the model parameters the average optimal policy for the M/M/1 queueing model is monotone non-decreasing (as a function of the number of waiting customers) with respect to the service intensity and monotone non-increasing with respect to the arrival intensity. For the maintenance replacement model we prove the average optimality of a bang-bang type policy. Special attention is paid to the computation of the optimal control parameters.


2017 ◽  
Vol 26 (03) ◽  
pp. 1760014
Author(s):  
Paul Weng ◽  
Olivier Spanjaard

Markov decision processes (MDP) have become one of the standard models for decisiontheoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that several variants of MDPs presented in the literature can be instantiated in this setting. We then identify sufficient conditions on these reward functions for dynamic programming to be valid. We also discuss the infinite horizon case and the case where a maximum operator does not exist. In order to show the potential of our framework, we conclude the paper by presenting several illustrative examples.


Sign in / Sign up

Export Citation Format

Share Document