Simulation-based optimization of Markov decision processes: An empirical process theory approach

10.1016/j.automatica.2010.05.021 ◽

2010 ◽

Vol 46 (8) ◽

pp. 1297-1304 ◽

Author(s):

Rahul Jain ◽

Pravin Varaiya

Keyword(s):

Markov Decision Processes ◽

Empirical Process ◽

Decision Processes ◽

Process Theory ◽

Theory Approach ◽

Empirical Process Theory ◽

Simulation Based ◽

Simulation Based Optimization ◽

Markov Decision

Download Full-text

PAC bounds for simulation-based optimization of Markov decision processes

2007 46th IEEE Conference on Decision and Control ◽

10.1109/cdc.2007.4435050 ◽

2007 ◽

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Simulation Based ◽

Simulation Based Optimization ◽

Markov Decision

Download Full-text

Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes

10.1177/0037549708098120 ◽

2008 ◽

Vol 84 (12) ◽

pp. 577-600 ◽

Author(s):

Shalabh Bhatnagar ◽

Mohammed Shahid Abdulla

Keyword(s):

Markov Decision Processes ◽

Optimization Algorithms ◽

Decision Processes ◽

Finite Horizon ◽

Simulation Based ◽

Simulation Based Optimization ◽

Markov Decision

Download Full-text

Simulation‐based Uniform Value Function Estimates of Markov Decision Processes

SIAM Journal on Control and Optimization ◽

10.1137/040619508 ◽

2006 ◽

Vol 45 (5) ◽

pp. 1633-1656 ◽

Author(s):

Rahul Jain ◽

Pravin P. Varaiya

Keyword(s):

Markov Decision Processes ◽

Value Function ◽

Decision Processes ◽

Uniform Value ◽

Simulation Based ◽

Markov Decision

Download Full-text

A survey of some simulation-based algorithms for Markov decision processes

Communications in Information and Systems ◽

10.4310/cis.2007.v7.n1.a4 ◽

2007 ◽

Vol 7 (1) ◽

pp. 59-92 ◽

Author(s):

Hyeong Soo Chang ◽

Michael C. Fu ◽

Jiaqiao Hu ◽

Steven I. Marcus

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Simulation Based ◽

Markov Decision

Download Full-text

A Two-Timescale Simulation-Based Gradient Algorithm for Weighted Cost Markov Decision Processes

Proceedings of the 44th IEEE Conference on Decision and Control ◽

10.1109/cdc.2005.1583460 ◽

2006 ◽

Author(s):

Ying He ◽

M.C. Fu ◽

S.I. Marcus

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Gradient Algorithm ◽

Simulation Based ◽

Markov Decision

Download Full-text

Simulation-Based Algorithms for Markov Decision Processes

10.1007/978-1-4471-5022-0 ◽

2013 ◽

Author(s):

Hyeong Soo Chang ◽

Jiaqiao Hu ◽

Michael C. Fu ◽

Steven I. Marcus

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Simulation Based ◽

Markov Decision

Download Full-text

Simulation-based policy generation using large-scale Markov decision processes

IEEE Transactions on Systems Man and Cybernetics - Part A Systems and Humans ◽

10.1109/3468.983417 ◽

2001 ◽

Vol 31 (6) ◽

pp. 609-622 ◽

Author(s):

C.W. Zobel ◽

W.T. Scherer

Keyword(s):

Markov Decision Processes ◽

Large Scale ◽

Decision Processes ◽

Simulation Based ◽

Markov Decision

Download Full-text

A Simulation-Based Policy Iteration Algorithm for Average Cost Unichain Markov Decision Processes

Operations Research/Computer Science Interfaces Series - Computing Tools for Modeling, Optimization and Simulation ◽

10.1007/978-1-4615-4567-5_9 ◽

2000 ◽

pp. 161-182 ◽

Author(s):

Ying He ◽

Michael C. Fu ◽

Steven I. Marcus

Keyword(s):

Markov Decision Processes ◽

Average Cost ◽

Policy Iteration ◽

Decision Processes ◽

Iteration Algorithm ◽

Simulation Based ◽

Markov Decision ◽

Policy Iteration Algorithm

Download Full-text

Simulation-Based Algorithms for Markov Decision Processes: Monte Carlo Tree Search from AlphaGo to AlphaZero

Asia Pacific Journal of Operational Research ◽

10.1142/s0217595919400098 ◽

2019 ◽

Vol 36 (06) ◽

pp. 1940009

Author(s):

Michael C. Fu

Keyword(s):

Neural Networks ◽

Monte Carlo ◽

Dynamic Programming ◽

Markov Decision Processes ◽

Operations Research ◽

Decision Processes ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Simulation Based ◽

Markov Decision

AlphaGo and its successors AlphaGo Zero and AlphaZero made international headlines with their incredible successes in game playing, which have been touted as further evidence of the immense potential of artificial intelligence, and in particular, machine learning. AlphaGo defeated the reigning human world champion Go player Lee Sedol 4 games to 1, in March 2016 in Seoul, Korea, an achievement that surpassed previous computer game-playing program milestones by IBM’s Deep Blue in chess and by IBM’s Watson in the U.S. TV game show Jeopardy. AlphaGo then followed this up by defeating the world’s number one Go player Ke Jie 3-0 at the Future of Go Summit in Wuzhen, China in May 2017. Then, in December 2017, AlphaZero stunned the chess world by dominating the top computer chess program Stockfish (which has a far higher rating than any human) in a 100-game match by winning 28 games and losing none (72 draws) after training from scratch for just four hours! The deep neural networks of AlphaGo, AlphaZero, and all their incarnations are trained using a technique called Monte Carlo tree search (MCTS), whose roots can be traced back to an adaptive multistage sampling (AMS) simulation-based algorithm for Markov decision processes (MDPs) published in Operations Research back in 2005 [Chang, HS, MC Fu, J Hu and SI Marcus (2005). An adaptive sampling algorithm for solving Markov decision processes. Operations Research, 53, 126–139.] (and introduced even earlier in 2002). After reviewing the history and background of AlphaGo through AlphaZero, the origins of MCTS are traced back to simulation-based algorithms for MDPs, and its role in training the neural networks that essentially carry out the value/policy function approximation used in approximate dynamic programming, reinforcement learning, and neuro-dynamic programming is discussed, including some recently proposed enhancements building on statistical ranking & selection research in the operations research simulation community.

Download Full-text

Simulation-based Algorithms for Markov Decision Processes

10.1007/978-1-84628-690-2 ◽

2007 ◽

Author(s):

Hyeong Soo Chang ◽

Jiaqiao Hu ◽

Michael C. Fu ◽

Steven I. Marcus

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Simulation Based ◽

Markov Decision

Download Full-text