Evaluation of Reinforcement Learning for Optimal Control of Building Active and Passive Thermal Storage Inventory

Author(s):  
Simeng Liu ◽  
Gregor P. Henze

This paper describes an investigation of machine-learning control for the supervisory control of building active and passive thermal storage inventory. Previous studies show that the utilization of either active or passive, or both can yield significant peak cooling load reduction and associated electrical demand and operational cost savings. In this study, a model-free learning control is investigated for the operation of electrically driven chilled water systems in heavy-mass commercial buildings. The reinforcement learning controller learns to operate the building and cooling plant optimally based on the feedback it receives from past control actions. The learning agent interacts with its environment by commanding the global zone temperature setpoints and TES charging/discharging rate. The controller extracts cues about the environment solely based on the reinforcement feedback it receives, which in this study is the monetary cost of each control action. No prediction or system model is required. Over time and by exploring the environment, the reinforcement learning controller establishes a statistical summary of plant operation, which is continuously updated as operation continues. This presented analysis revealed that learning control is a feasible methodology to find a near-optimal control strategy for exploiting the active and passive building thermal storage capacity, and also shows that the learning performance is affected by the dimensionality of the action and state space, the learning rate and several other factors. Moreover learning speed proved to be relatively low when dealing with tasks associated with large state and action spaces.

2006 ◽  
Vol 129 (2) ◽  
pp. 215-225 ◽  
Author(s):  
Simeng Liu ◽  
Gregor P. Henze

This paper describes an investigation of machine learning for supervisory control of active and passive thermal storage capacity in buildings. Previous studies show that the utilization of active or passive thermal storage, or both, can yield significant peak cooling load reduction and associated electrical demand and operational cost savings. In this study, a model-free learning control is investigated for the operation of electrically driven chilled water systems in heavy-mass commercial buildings. The reinforcement learning controller learns to operate the building and cooling plant based on the reinforcement feedback (monetary cost of each action, in this study) it receives for past control actions. The learning agent interacts with its environment by commanding the global zone temperature setpoints and thermal energy storage charging∕discharging rate. The controller extracts information about the environment based solely on the reinforcement signal; the controller does not contain a predictive or system model. Over time and by exploring the environment, the reinforcement learning controller establishes a statistical summary of plant operation, which is continuously updated as operation continues. The present analysis shows that learning control is a feasible methodology to find a near-optimal control strategy for exploiting the active and passive building thermal storage capacity, and also shows that the learning performance is affected by the dimensionality of the action and state space, the learning rate and several other factors. It is found that it takes a long time to learn control strategies for tasks associated with large state and action spaces.


Author(s):  
Ernst Moritz Hahn ◽  
Mateo Perez ◽  
Sven Schewe ◽  
Fabio Somenzi ◽  
Ashutosh Trivedi ◽  
...  

AbstractWe study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.


2016 ◽  
Vol 04 (01) ◽  
pp. 51-60 ◽  
Author(s):  
Bahare Kiumarsi ◽  
Wei Kang ◽  
Frank L. Lewis

This paper presents a completely model-free [Formula: see text] optimal tracking solution to the control of a general class of nonlinear nonaffine systems in the presence of the input constraints. The proposed method is motivated by nonaffine unmanned aerial vehicle (UAV) system as a real application. First, a general class of nonlinear nonaffine system dynamics is presented as an affine system in terms of a nonlinear function of the control input. It is shown that the optimal control of nonaffine systems may not have an admissible solution if the utility function is not defined properly. Moreover, the boundness of the optimal control input cannot be guaranteed for standard performance functions. A new performance function is defined and used in the [Formula: see text]-gain condition for this class of nonaffine system. This performance function guarantees the existence of an admissible solution (if any exists) and boundness of the control input solution. An off-policy reinforcement learning (RL) is employed to iteratively solve the [Formula: see text] optimal tracking control online using the measured data along the system trajectories. The proposed off-policy RL does not require any knowledge of the system dynamics. Moreover, the disturbance input does not need to be adjustable in a specific manner.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Brydon Eastman ◽  
Michelle Przedborski ◽  
Mohammad Kohandel

AbstractThe in-silico development of a chemotherapeutic dosing schedule for treating cancer relies upon a parameterization of a particular tumour growth model to describe the dynamics of the cancer in response to the dose of the drug. In practice, it is often prohibitively difficult to ensure the validity of patient-specific parameterizations of these models for any particular patient. As a result, sensitivities to these particular parameters can result in therapeutic dosing schedules that are optimal in principle not performing well on particular patients. In this study, we demonstrate that chemotherapeutic dosing strategies learned via reinforcement learning methods are more robust to perturbations in patient-specific parameter values than those learned via classical optimal control methods. By training a reinforcement learning agent on mean-value parameters and allowing the agent periodic access to a more easily measurable metric, relative bone marrow density, for the purpose of optimizing dose schedule while reducing drug toxicity, we are able to develop drug dosing schedules that outperform schedules learned via classical optimal control methods, even when such methods are allowed to leverage the same bone marrow measurements.


Sign in / Sign up

Export Citation Format

Share Document