Stochastic Actor-Executor-Critic for Image-to-Image Translation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/382 ◽

2021 ◽

Author(s):

Ziwei Luo ◽

Jing Hu ◽

Xin Wang ◽

Siwei Lyu ◽

Bin Kong ◽

...

Keyword(s):

Reinforcement Learning ◽

Control Policy ◽

High Dimensional ◽

Continuous Control ◽

Continuous Space ◽

Model Free ◽

Recent Success ◽

Image Translation ◽

Continuous State ◽

And Control

Training a model-free deep reinforcement learning model to solve image-to-image translation is difficult since it involves high-dimensional continuous state and action spaces. In this paper, we draw inspiration from the recent success of the maximum entropy reinforcement learning framework designed for challenging continuous control problems to develop stochastic policies over high dimensional continuous spaces including image representation, generation, and control simultaneously. Central to this method is the Stochastic Actor-Executor-Critic (SAEC) which is an off-policy actor-critic model with an additional executor to generate realistic images. Specifically, the actor focuses on the high-level representation and control policy by a stochastic latent action, as well as explicitly directs the executor to generate low-level actions to manipulate the state. Experiments on several image-to-image translation tasks have demonstrated the effectiveness and robustness of the proposed SAEC when facing high-dimensional continuous space problems.

Download Full-text

Deep reinforcement learning for the control of microbial co-cultures in bioreactors

10.1101/457366 ◽

2018 ◽

Cited By ~ 2

Author(s):

Neythen J. Treloar ◽

Alexander J.H. Fedorec ◽

Brian P. Ingalls ◽

Chris P. Barnes

Keyword(s):

Reinforcement Learning ◽

Microbial Communities ◽

Control Policy ◽

Continuous Control ◽

Natural Ecosystems ◽

Model Free ◽

Pure Cultures ◽

Integral Controller ◽

Proportional Integral Controller ◽

Bang Bang Control

AbstractMulti-species microbial communities are widespread in natural ecosystems. When employed for biomanufacturing, engineered synthetic communities have shown increased productivity (in comparison with pure cultures) and allow for the reduction of metabolic load by compartmentalising bioprocesses between multiple sub-populations. Despite these benefits, co-cultures are rarely used in practice because control over the constituent species of an assembled community has proven challenging. Here we demonstrate, in silico, the efficacy of an approach from artificial intelligence – reinforcement learning – in the control of co-cultures within continuous bioreactors. We confirm that feedback via reinforcement learning can be used to maintain populations at target levels, and that model-free performance with bang-bang control can outperform traditional proportional integral controller with continuous control, when faced with infrequent sampling. Further, we demonstrate that a satisfactory control policy can be learned in one twenty-four hour experiment, by running five bioreactors in parallel. Finally, we show that reinforcement learning can directly optimise the output of a co-culture bioprocess. Overall, reinforcement learning is a promising technique for the control of microbial communities.

Download Full-text

Deterministic Value-Policy Gradients

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5732 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3316-3323

Author(s):

Qingpeng Cai ◽

Ling Pan ◽

Pingzhong Tang

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Infinite Horizon ◽

Gradient Algorithm ◽

Continuous Control ◽

Model Bias ◽

Model Free ◽

Policy Gradient ◽

Analytical Gradients

Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.

Download Full-text

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013387 ◽

2019 ◽

Vol 33 ◽

pp. 3387-3395 ◽

Cited By ~ 21

Author(s):

Richard Cheng ◽

Gábor Orosz ◽

Richard M. Murray ◽

Joel W. Burdick

Keyword(s):

Reinforcement Learning ◽

System Dynamics ◽

Learning Process ◽

High Performance ◽

Continuous Control ◽

Barrier Functions ◽

Synthesis Algorithm ◽

Vehicle Communication ◽

Model Free ◽

Vehicle To Vehicle

Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) online learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous carfollowing with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.

Download Full-text

Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning

2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS) ◽

10.1109/iccps48487.2020.00017 ◽

2020 ◽

Cited By ~ 2

Author(s):

Abolfazl Lavaei ◽

Fabio Somenzi ◽

Sadegh Soudjani ◽

Ashutosh Trivedi ◽

Majid Zamani

Keyword(s):

Reinforcement Learning ◽

Controller Synthesis ◽

Continuous Space ◽

Model Free

Download Full-text

Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles

10.36227/techrxiv.17205740.v2 ◽

2021 ◽

Author(s):

Xinglong Zhang ◽

Yaoqian Peng ◽

Biao Luo ◽

Wei Pan ◽

Xin Xu ◽

...

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Control Policy ◽

Intelligent Vehicles ◽

Time Varying ◽

Control Constraints ◽

Model Based ◽

Safety Constraints ◽

And Control ◽

State And Control Constraints

<div>Recently, barrier function-based safe reinforcement learning (RL) with the actor-critic structure for continuous control tasks has received increasing attention. It is still challenging to learn a near-optimal control policy with safety and convergence guarantees. Also, few works have addressed the safe RL algorithm design under time-varying safety constraints. This paper proposes a model-based safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. In the proposed approach, we construct a novel barrier-based control policy structure that can guarantee control safety. A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely. Theoretical results on stability and robustness are proven. Also, the convergence of the actor-critic learning algorithm is analyzed. The performance of the proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. Furthermore, the approach is applied to the integrated path following and collision avoidance problem for two real-world intelligent vehicles. A differential-drive vehicle and an Ackermann-drive one are used to verify the offline deployment performance and the online learning performance, respectively. Our approach shows an impressive sim-to-real transfer capability and a satisfactory online control performance in the experiment.</div>

Download Full-text

End-to-End Deep Reinforcement Learning for Image-Based UAV Autonomous Control

Applied Sciences ◽

10.3390/app11188419 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8419

Author(s):

Jiang Zhao ◽

Jiaming Sun ◽

Zhihao Cai ◽

Longhong Wang ◽

Yingxun Wang

Keyword(s):

Reinforcement Learning ◽

Network Architecture ◽

Control Method ◽

Control Policy ◽

Input Image ◽

Autonomous Control ◽

Policy Network ◽

Model Free ◽

Control Command ◽

End To End

To achieve the perception-based autonomous control of UAVs, schemes with onboard sensing and computing are popular in state-of-the-art work, which often consist of several separated modules with respective complicated algorithms. Most methods depend on handcrafted designs and prior models with little capacity for adaptation and generalization. Inspired by the research on deep reinforcement learning, this paper proposes a new end-to-end autonomous control method to simplify the separate modules in the traditional control pipeline into a single neural network. An image-based reinforcement learning framework is established, depending on the design of the network architecture and the reward function. Training is performed with model-free algorithms developed according to the specific mission, and the control policy network can map the input image directly to the continuous actuator control command. A simulation environment for the scenario of UAV landing was built. In addition, the results under different typical cases, including both the small and large initial lateral or heading angle offsets, show that the proposed end-to-end method is feasible for perception-based autonomous control.

Download Full-text

A Tour of Reinforcement Learning: The View from Continuous Control

Annual Review of Control Robotics and Autonomous Systems ◽

10.1146/annurev-control-053018-023825 ◽

2019 ◽

Vol 2 (1) ◽

pp. 253-279 ◽

Cited By ~ 36

Author(s):

Benjamin Recht

Keyword(s):

Reinforcement Learning ◽

Linear Quadratic Regulator ◽

Continuous Control ◽

Linear Quadratic ◽

Uncertain Environments ◽

Optimization And Control ◽

The Cost ◽

Merging Techniques ◽

And Control

This article surveys reinforcement learning from the perspective of optimization and control, with a focus on continuous control applications. It reviews the general formulation, terminology, and typical experimental implementations of reinforcement learning as well as competing solution paradigms. In order to compare the relative merits of various techniques, it presents a case study of the linear quadratic regulator (LQR) with unknown dynamics, perhaps the simplest and best-studied problem in optimal control. It also describes how merging techniques from learning theory and control can provide nonasymptotic characterizations of LQR performance and shows that these characterizations tend to match experimental behavior. In turn, when revisiting more complex applications, many of the observed phenomena in LQR persist. In particular, theory and experiment demonstrate the role and importance of models and the cost of generality in reinforcement learning algorithms. The article concludes with a discussion of some of the challenges in designing learning systems that safely and reliably interact with complex and uncertain environments and how tools from reinforcement learning and control might be combined to approach these challenges.

Download Full-text

The algorithmic anatomy of model-based evaluation

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2013.0478 ◽

2014 ◽

Vol 369 (1655) ◽

pp. 20130478 ◽

Cited By ~ 86

Author(s):

Nathaniel D. Daw ◽

Peter Dayan

Keyword(s):

Twentieth Century ◽

Reinforcement Learning ◽

Evaluation Methods ◽

Neural Data ◽

Model Based ◽

Model Free ◽

Prediction And Control ◽

And Control

Despite many debates in the first half of the twentieth century, it is now largely a truism that humans and other animals build models of their environments and use them for prediction and control. However, model-based (MB) reasoning presents severe computational challenges. Alternative, computationally simpler, model-free (MF) schemes have been suggested in the reinforcement learning literature, and have afforded influential accounts of behavioural and neural data. Here, we study the realization of MB calculations, and the ways that this might be woven together with MF values and evaluation methods. There are as yet mostly only hints in the literature as to the resulting tapestry, so we offer more preview than review.

Download Full-text

Autonomous blimp control using model-free reinforcement learning in a continuous state and action space

2007 IEEE/RSJ International Conference on Intelligent Robots and Systems ◽

10.1109/iros.2007.4399531 ◽

2007 ◽

Cited By ~ 17

Author(s):

Axel Rottmann ◽

Christian Plagemann ◽

Peter Hilgers ◽

Wolfram Burgard

Keyword(s):

Reinforcement Learning ◽

Action Space ◽

Model Free ◽

Continuous State

Download Full-text

Control of chaotic systems by deep reinforcement learning

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2019.0351 ◽

2019 ◽

Vol 475 (2231) ◽

pp. 20190351 ◽

Cited By ~ 3

Author(s):

M. A. Bucci ◽

O. Semeraro ◽

A. Allauzen ◽

G. Wisniewski ◽

L. Cordier ◽

...

Keyword(s):

Reinforcement Learning ◽

Bluff Body ◽

Initial Conditions ◽

Control Policy ◽

Chaotic Regime ◽

Model Free ◽

Local Measurements ◽

Target States ◽

The One ◽

Learning Principles

Deep reinforcement learning (DRL) is applied to control a nonlinear, chaotic system governed by the one-dimensional Kuramoto–Sivashinsky (KS) equation. DRL uses reinforcement learning principles for the determination of optimal control solutions and deep neural networks for approximating the value function and the control policy. Recent applications have shown that DRL may achieve superhuman performance in complex cognitive tasks. In this work, we show that using restricted localized actuation, partial knowledge of the state based on limited sensor measurements and model-free DRL controllers, it is possible to stabilize the dynamics of the KS system around its unstable fixed solutions, here considered as target states. The robustness of the controllers is tested by considering several trajectories in the phase space emanating from different initial conditions; we show that DRL is always capable of driving and stabilizing the dynamics around target states. The possibility of controlling the KS system in the chaotic regime by using a DRL strategy solely relying on local measurements suggests the extension of the application of RL methods to the control of more complex systems such as drag reduction in bluff-body wakes or the enhancement/diminution of turbulent mixing.

Download Full-text