Bounds for the approximation of dynamic programs

Allocating Resources Across Systems Coupled by Shared Information Many sequential decision problems involve repeatedly allocating a limited resource across subsystems that are jointly affected by randomly evolving exogenous factors. For example, in adaptive clinical trials, a decision maker needs to allocate patients to treatments in an effort to learn about the efficacy of treatments, but the number of available patients may vary randomly over time. In capital budgeting problems, firms may allocate resources to conduct R&D on new products, but funding budgets may evolve randomly. In many inventory management problems, firms need to allocate limited production capacity to satisfy uncertain demands at multiple locations, and these demands may be correlated due to vagaries in shared market conditions. In this paper, we develop a model involving “shared resources and signals” that captures these and potentially many other applications. The framework is naturally described as a stochastic dynamic program, but this problem is quite difficult to solve. We develop an approximation method based on a “dynamic fluid relaxation”: in this approximation, the subsystem state evolution is approximated by a deterministic fluid model, but the exogenous states (the signals) retain their stochastic evolution. We develop an algorithm for solving the dynamic fluid relaxation. We analyze the corresponding feasible policies and performance bounds from the dynamic fluid relaxation and show that these are asymptotically optimal as the number of subsystems grows large. We show that competing state-of-the-art approaches used in the literature on weakly coupled dynamic programs in general fail to provide asymptotic optimality. Finally, we illustrate the approach on the aforementioned dynamic capital budgeting and multilocation inventory management problems.

Download Full-text

On the Strength of Relaxations of Weakly Coupled Stochastic Dynamic Programs

SSRN Electronic Journal ◽

10.2139/ssrn.3849573 ◽

2021 ◽

Author(s):

David B. Brown ◽

Jingwei Zhang

Keyword(s):

Stochastic Dynamic ◽

Weakly Coupled ◽

Dynamic Programs

Download Full-text

Incremental Composition of Dynamic Programs

Advances in Intelligent Systems and Computing - Proceedings of International Conference on Computer Science and Information Technology ◽

10.1007/978-81-322-1759-6_99 ◽

2014 ◽

pp. 871-878

Author(s):

Minghui Wu ◽

Jia Lv

Keyword(s):

Dynamic Programs

Download Full-text

EXISTENCE OF OPTIMAL STATIONARY POLICIES IN FINITE DYNAMIC PROGRAMS WITH NONNEGATIVE REWARDS

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964801154082 ◽

2001 ◽

Vol 15 (4) ◽

pp. 557-564 ◽

Cited By ~ 1

Author(s):

Rolando Cavazos-Cadena ◽

Raúl Montes-de-Oca

Keyword(s):

Control Policy ◽

Stationary Policy ◽

Reward Function ◽

Total Reward ◽

Dynamic Programs ◽

Finite State ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Action Spaces ◽

Discounted Criterion

This article concerns Markov decision chains with finite state and action spaces, and a control policy is graded via the expected total-reward criterion associated to a nonnegative reward function. Within this framework, a classical theorem guarantees the existence of an optimal stationary policy whenever the optimal value function is finite, a result that is obtained via a limit process using the discounted criterion. The objective of this article is to present an alternative approach, based entirely on the properties of the expected total-reward index, to establish such an existence result.

Download Full-text