Bounds for the approximation of dynamic programs

1986 ◽  
Vol 30 (1) ◽  
pp. A65-A77
Author(s):  
Harald Benzing
Keyword(s):  
1985 ◽  
Vol 10 (2) ◽  
pp. 220-232 ◽  
Author(s):  
K.-H. Waldmann
Keyword(s):  

2021 ◽  
Author(s):  
David B. Brown ◽  
Jingwei Zhang

Allocating Resources Across Systems Coupled by Shared Information Many sequential decision problems involve repeatedly allocating a limited resource across subsystems that are jointly affected by randomly evolving exogenous factors. For example, in adaptive clinical trials, a decision maker needs to allocate patients to treatments in an effort to learn about the efficacy of treatments, but the number of available patients may vary randomly over time. In capital budgeting problems, firms may allocate resources to conduct R&D on new products, but funding budgets may evolve randomly. In many inventory management problems, firms need to allocate limited production capacity to satisfy uncertain demands at multiple locations, and these demands may be correlated due to vagaries in shared market conditions. In this paper, we develop a model involving “shared resources and signals” that captures these and potentially many other applications. The framework is naturally described as a stochastic dynamic program, but this problem is quite difficult to solve. We develop an approximation method based on a “dynamic fluid relaxation”: in this approximation, the subsystem state evolution is approximated by a deterministic fluid model, but the exogenous states (the signals) retain their stochastic evolution. We develop an algorithm for solving the dynamic fluid relaxation. We analyze the corresponding feasible policies and performance bounds from the dynamic fluid relaxation and show that these are asymptotically optimal as the number of subsystems grows large. We show that competing state-of-the-art approaches used in the literature on weakly coupled dynamic programs in general fail to provide asymptotic optimality. Finally, we illustrate the approach on the aforementioned dynamic capital budgeting and multilocation inventory management problems.


2001 ◽  
Vol 15 (4) ◽  
pp. 557-564 ◽  
Author(s):  
Rolando Cavazos-Cadena ◽  
Raúl Montes-de-Oca

This article concerns Markov decision chains with finite state and action spaces, and a control policy is graded via the expected total-reward criterion associated to a nonnegative reward function. Within this framework, a classical theorem guarantees the existence of an optimal stationary policy whenever the optimal value function is finite, a result that is obtained via a limit process using the discounted criterion. The objective of this article is to present an alternative approach, based entirely on the properties of the expected total-reward index, to establish such an existence result.


Sign in / Sign up

Export Citation Format

Share Document