Three different operations research models for the same (s,S) policy

Operations Research techniques are usually presented as distinct models. Difficult as it may often be, achieving linkage between these models could reveal their interdependency and make them easier for the user to understand. In this article three different models, namely Markov Chain, Dynamic Programming, and Markov Sequential Decision Processes, are used to solve an inventory problem based on the periodic review system. We show how the three models converge to the same (s,S) policy and we provide a numerical example to illustrate such a convergence.

Download Full-text

Has Dynamic Programming Improved Decision Making?

Annual Review of Economics ◽

10.1146/annurev-economics-080218-025721 ◽

2019 ◽

Vol 11 (1) ◽

pp. 833-858 ◽

Cited By ~ 3

Author(s):

John Rust

Keyword(s):

Decision Making ◽

Dynamic Programming ◽

Real World ◽

Operations Research ◽

Decision Rules ◽

Decision Problems ◽

Optimal Decision ◽

Sequential Decision ◽

Success Stories ◽

Real World Applications

Dynamic programming (DP) is a powerful tool for solving a wide class of sequential decision-making problems under uncertainty. In principle, it enables us to compute optimal decision rules that specify the best possible decision in any situation. This article reviews developments in DP and contrasts its revolutionary impact on economics, operations research, engineering, and artificial intelligence with the comparative paucity of its real-world applications to improve the decision making of individuals and firms. The fuzziness of many real-world decision problems and the difficulty in mathematically modeling them are key obstacles to a wider application of DP in real-world settings. Nevertheless, I discuss several success stories, and I conclude that DP offers substantial promise for improving decision making if we let go of the empirically untenable assumption of unbounded rationality and confront the challenging decision problems faced every day by individuals and firms.

Download Full-text

Dynamic programming is optimal for certain sequential decision processes

Journal of Mathematical Analysis and Applications ◽

10.1016/0022-247x(80)90025-6 ◽

1980 ◽

Vol 73 (1) ◽

pp. 134-137 ◽

Cited By ~ 3

Author(s):

Arnon Rosenthal

Keyword(s):

Dynamic Programming ◽

Decision Processes ◽

Sequential Decision

Download Full-text

Improved dynamic programming algorithms for sequential decision processes with applications to economic dispatches of power systems

1993 (25th) Southeastern Symposium on System Theory ◽

10.1109/ssst.1993.522757 ◽

2002 ◽

Cited By ~ 2

Author(s):

M. Sun

Keyword(s):

Dynamic Programming ◽

Power Systems ◽

Decision Processes ◽

Sequential Decision ◽

Programming Algorithms ◽

Improved Dynamic Programming

Download Full-text

Introduction to Dynamic Programming**Adapted from E. V. Denardo and L. G. Mitten, “Elements of Sequential Decision Processes,” Journal of Industrial Engineering 18 (1967), 106-112.

Stochastic Optimization Models in Finance ◽

10.1016/b978-0-12-780850-5.50011-3 ◽

1975 ◽

pp. 43-56

Author(s):

W.T. Ziemba

Keyword(s):

Dynamic Programming ◽

Decision Processes ◽

Industrial Engineering ◽

Sequential Decision

Download Full-text

A three level joint location-inventory problem with correlated demand, shortages and periodic review system: Robust meta-heuristics

Computers & Industrial Engineering ◽

10.1016/j.cie.2017.04.041 ◽

2017 ◽

Vol 109 ◽

pp. 113-129 ◽

Cited By ~ 21

Author(s):

Behnam Vahdani ◽

M. Soltani ◽

M. Yazdani ◽

S. Meysam Mousavi

Keyword(s):

Periodic Review ◽

Inventory Problem ◽

Review System ◽

Joint Location

Download Full-text

Simulation-Based Algorithms for Markov Decision Processes: Monte Carlo Tree Search from AlphaGo to AlphaZero

Asia Pacific Journal of Operational Research ◽

10.1142/s0217595919400098 ◽

2019 ◽

Vol 36 (06) ◽

pp. 1940009

Author(s):

Michael C. Fu

Keyword(s):

Neural Networks ◽

Monte Carlo ◽

Dynamic Programming ◽

Markov Decision Processes ◽

Operations Research ◽

Decision Processes ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Simulation Based ◽

Markov Decision

AlphaGo and its successors AlphaGo Zero and AlphaZero made international headlines with their incredible successes in game playing, which have been touted as further evidence of the immense potential of artificial intelligence, and in particular, machine learning. AlphaGo defeated the reigning human world champion Go player Lee Sedol 4 games to 1, in March 2016 in Seoul, Korea, an achievement that surpassed previous computer game-playing program milestones by IBM’s Deep Blue in chess and by IBM’s Watson in the U.S. TV game show Jeopardy. AlphaGo then followed this up by defeating the world’s number one Go player Ke Jie 3-0 at the Future of Go Summit in Wuzhen, China in May 2017. Then, in December 2017, AlphaZero stunned the chess world by dominating the top computer chess program Stockfish (which has a far higher rating than any human) in a 100-game match by winning 28 games and losing none (72 draws) after training from scratch for just four hours! The deep neural networks of AlphaGo, AlphaZero, and all their incarnations are trained using a technique called Monte Carlo tree search (MCTS), whose roots can be traced back to an adaptive multistage sampling (AMS) simulation-based algorithm for Markov decision processes (MDPs) published in Operations Research back in 2005 [Chang, HS, MC Fu, J Hu and SI Marcus (2005). An adaptive sampling algorithm for solving Markov decision processes. Operations Research, 53, 126–139.] (and introduced even earlier in 2002). After reviewing the history and background of AlphaGo through AlphaZero, the origins of MCTS are traced back to simulation-based algorithms for MDPs, and its role in training the neural networks that essentially carry out the value/policy function approximation used in approximate dynamic programming, reinforcement learning, and neuro-dynamic programming is discussed, including some recently proposed enhancements building on statistical ranking & selection research in the operations research simulation community.

Download Full-text