Evaluation of reinforcement learning techniques

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text

A Survey of Applying Reinforcement Learning Techniques to Multicast Routing

2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) ◽

10.1109/uemcon47517.2019.8993014 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ola Ashour ◽

Marc St-Hilaire ◽

Thomas Kunz ◽

Maoyu Wang

Keyword(s):

Reinforcement Learning ◽

Multicast Routing ◽

Learning Techniques

Download Full-text

Optimizing time warp simulation with reinforcement learning techniques

2007 Winter Simulation Conference ◽

10.1109/wsc.2007.4419650 ◽

2007 ◽

Cited By ~ 9

Author(s):

Jun Wang ◽

Carl Tropper

Keyword(s):

Reinforcement Learning ◽

Time Warp ◽

Learning Techniques

Download Full-text

On collaborative reinforcement learning to optimize the redistribution of critical medical supplies throughout the COVID-19 pandemic

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa324 ◽

2020 ◽

Author(s):

Bryan P Bednarski ◽

Akash Deep Singh ◽

William M Jones

Keyword(s):

Public Health ◽

Reinforcement Learning ◽

Medical Equipment ◽

Census Bureau ◽

Learning Models ◽

Public Health Emergencies ◽

Medical Supplies ◽

Learning Techniques ◽

Disease Impact ◽

Random States

Abstract objective This work investigates how reinforcement learning and deep learning models can facilitate the near-optimal redistribution of medical equipment in order to bolster public health responses to future crises similar to the COVID-19 pandemic. materials and methods The system presented is simulated with disease impact statistics from the Institute of Health Metrics (IHME), Center for Disease Control, and Census Bureau[1, 2, 3]. We present a robust pipeline for data preprocessing, future demand inference, and a redistribution algorithm that can be adopted across broad scales and applications. results The reinforcement learning redistribution algorithm demonstrates performance optimality ranging from 93-95%. Performance improves consistently with the number of random states participating in exchange, demonstrating average shortage reductions of 78.74% (± 30.8) in simulations with 5 states to 93.50% (± 0.003) with 50 states. conclusion These findings bolster confidence that reinforcement learning techniques can reliably guide resource allocation for future public health emergencies.

Download Full-text