A Model-Based Factored Bayesian Reinforcement Learning Approach

Bayesian reinforcement learning has turned out to be an effective solution to the optimal tradeoff between exploration and exploitation. However, in practical applications, the learning parameters with exponential growth are the main impediment for online planning and learning. To overcome this problem, we bring factored representations, model-based learning, and Bayesian reinforcement learning together in a new approach. Firstly, we exploit a factored representation to describe the states to reduce the size of learning parameters, and adopt Bayesian inference method to learn the unknown structure and parameters simultaneously. Then, we use an online point-based value iteration algorithm to plan and learn. The experimental results show that the proposed approach is an effective way for improving the learning efficiency in large-scale state spaces.

Download Full-text

Approximate Value Iteration in the Reinforcement Learning Context. Application to Electrical Power System Control.

International Journal of Emerging Electric Power Systems ◽

10.2202/1553-779x.1066 ◽

2005 ◽

Vol 3 (1) ◽

Cited By ~ 14

Author(s):

Damien Ernst ◽

Mevludin Glavic ◽

Pierre Geurts ◽

Louis Wehenkel

Keyword(s):

Reinforcement Learning ◽

Power System ◽

Control Problem ◽

Learning Algorithm ◽

Electrical Power ◽

Complex Case ◽

Iteration Algorithm ◽

Value Iteration ◽

Learning Context ◽

Power System Control

In this paper we explain how to design intelligent agents able to process the information acquired from interaction with a system to learn a good control policy and show how the methodology can be applied to control some devices aimed to damp electrical power oscillations. The control problem is formalized as a discrete-time optimal control problem and the information acquired from interaction with the system is a set of samples, where each sample is composed of four elements: a state, the action taken while being in this state, the instantaneous reward observed and the successor state of the system. To process this information we consider reinforcement learning algorithms that determine an approximation of the so-called Q-function by mimicking the behavior of the value iteration algorithm. Simulations are first carried on a benchmark power system modeled with two state variables. Then we present a more complex case study on a four-machine power system where the reinforcement learning algorithm controls a Thyristor Controlled Series Capacitor (TCSC) aimed to damp power system oscillations.

Download Full-text

Proximal policy optimization with model-based methods

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211935 ◽

2022 ◽

pp. 1-12

Author(s):

Shuailong Li ◽

Wei Zhang ◽

Huiwen Zhang ◽

Xin Zhang ◽

Yuquan Leng

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Transition Model ◽

Practical Applications ◽

Original Algorithm ◽

Policy Performance ◽

Model Based ◽

Model Free ◽

Future State ◽

Policy Optimization

Model-free reinforcement learning methods have successfully been applied to practical applications such as decision-making problems in Atari games. However, these methods have inherent shortcomings, such as a high variance and low sample efficiency. To improve the policy performance and sample efficiency of model-free reinforcement learning, we propose proximal policy optimization with model-based methods (PPOMM), a fusion method of both model-based and model-free reinforcement learning. PPOMM not only considers the information of past experience but also the prediction information of the future state. PPOMM adds the information of the next state to the objective function of the proximal policy optimization (PPO) algorithm through a model-based method. This method uses two components to optimize the policy: the error of PPO and the error of model-based reinforcement learning. We use the latter to optimize a latent transition model and predict the information of the next state. For most games, this method outperforms the state-of-the-art PPO algorithm when we evaluate across 49 Atari games in the Arcade Learning Environment (ALE). The experimental results show that PPOMM performs better or the same as the original algorithm in 33 games.

Download Full-text

Smarter Sampling in Model-Based Bayesian Reinforcement Learning

Machine Learning and Knowledge Discovery in Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-642-15880-3_19 ◽

2010 ◽

pp. 200-214 ◽

Cited By ~ 4

Author(s):

Pablo Samuel Castro ◽

Doina Precup

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Bayesian Reinforcement Learning

Download Full-text

Perseus: Randomized Point-based Value Iteration for POMDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.1659 ◽

2005 ◽

Vol 24 ◽

pp. 195-220 ◽

Cited By ~ 209

Author(s):

M. T.J. Spaan ◽

N. Vlassis

Keyword(s):

Large Scale ◽

Iteration Algorithm ◽

Value Iteration ◽

Planning Under Uncertainty ◽

Markov Decision ◽

Finite Set ◽

Partially Observable ◽

Set Of Points ◽

Action Spaces ◽

Belief Set

Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent's belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems.

Download Full-text

Robust and explorative behavior in model-based Bayesian reinforcement learning

2016 IEEE Symposium Series on Computational Intelligence (SSCI) ◽

10.1109/ssci.2016.7849370 ◽

2016 ◽

Author(s):

Toru Hishinuma ◽

Kei Senda

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Bayesian Reinforcement Learning

Download Full-text

Model-based Bayesian Reinforcement Learning in Factored Markov Decision Process

Journal of Computers ◽

10.4304/jcp.9.4.845-850 ◽

2014 ◽

Vol 9 (4) ◽

Cited By ~ 2

Author(s):

Bo Wu ◽

Yanpeng Feng ◽

Hongyan Zheng

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Model Based ◽

Markov Decision ◽

Bayesian Reinforcement Learning

Download Full-text

Value Iteration Networks

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/700 ◽

2017 ◽

Cited By ~ 31

Author(s):

Aviv Tamar ◽

Yi Wu ◽

Garrett Thomas ◽

Sergey Levine ◽

Pieter Abbeel

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Path Planning ◽

Natural Language ◽

Convolutional Neural Network ◽

Search Task ◽

Iteration Algorithm ◽

Continuous Path ◽

Value Iteration ◽

Value Iteration Algorithm

We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation.We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.This paper is a significantly abridged and IJCAI audience targeted version of the original NIPS 2016 paper with the same title, available here: https://arxiv.org/abs/1602.02867

Download Full-text

Model-based Bayesian reinforcement learning for dialogue management

10.21437/interspeech.2013-138 ◽

2013 ◽

Author(s):

Pierre Lison

Keyword(s):

Reinforcement Learning ◽

Dialogue Management ◽

Model Based ◽

Bayesian Reinforcement Learning

Download Full-text

Model-based machine learning

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2012.0222 ◽

2013 ◽

Vol 371 (1984) ◽

pp. 20120222 ◽

Cited By ~ 55

Author(s):

Christopher M. Bishop

Keyword(s):

Machine Learning ◽

Graphical Models ◽

Large Scale ◽

Commercial Application ◽

Software Environment ◽

Probabilistic Programming ◽

Practical Applications ◽

Model Based ◽

Inference Algorithms ◽

Modelling Environment

Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET , which has been widely used in practical applications.

Download Full-text