Model-free Control Design Using Policy Gradient Reinforcement Learning in LPV Framework

Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.

Download Full-text

Policy Gradient-based Integral Reinforcement Learning for Optimal Control Design of Nonaffine Morphing Aircraft Systems

2020 28th Mediterranean Conference on Control and Automation (MED) ◽

10.1109/med48518.2020.9183024 ◽

2020 ◽

Author(s):

Hanna Lee ◽

Seong-Hun Kim ◽

Youdan Kim

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Control Design ◽

Morphing Aircraft ◽

Aircraft Systems ◽

Policy Gradient ◽

Gradient Based

Download Full-text

Model-Free Control Design for Hybrid Magnetic Levitation System

The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ '05. ◽

10.1109/fuzzy.2005.1452519 ◽

2005 ◽

Author(s):

Rong-Jong Wai ◽

Jeng-Dao Lee ◽

Chiung-Chou Liao

Keyword(s):

Magnetic Levitation ◽

Control Design ◽

Model Free ◽

Levitation System ◽

Magnetic Levitation System ◽

Model Free Control

Download Full-text

A model free control design approach for a semi-active suspension of a passenger car

Proceedings of the 2005, American Control Conference, 2005. ◽

10.1109/acc.2005.1470296 ◽

2005 ◽

Cited By ~ 5

Author(s):

C. Lauwerys ◽

J. Swevers ◽

P. Sas

Keyword(s):

Control Design ◽

Active Suspension ◽

Design Approach ◽

Passenger Car ◽

Model Free ◽

Model Free Control

Download Full-text

Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/475 ◽

2019 ◽

Cited By ~ 1

Author(s):

Wenjie Shi ◽

Shiji Song ◽

Cheng Wu

Keyword(s):

Reinforcement Learning ◽

Maximum Entropy ◽

Bellman Equation ◽

Value Functions ◽

Policy Actor ◽

Model Free ◽

Policy Gradient ◽

Gradient Based ◽

Continuous Actions ◽

Stable Learning

Maximum entropy deep reinforcement learning (RL) methods have been demonstrated on a range of challenging continuous tasks. However, existing methods either suffer from severe instability when training on large off-policy data or cannot scale to tasks with very high state and action dimensionality such as 3D humanoid locomotion. Besides, the optimality of desired Boltzmann policy set for non-optimal soft value function is not persuasive enough. In this paper, we first derive soft policy gradient based on entropy regularized expected reward objective for RL with continuous actions. Then, we present an off-policy actor-critic, model-free maximum entropy deep RL algorithm called deep soft policy gradient (DSPG) by combining soft policy gradient with soft Bellman equation. To ensure stable learning while eliminating the need of two separate critics for soft value functions, we leverage double sampling approach to making the soft Bellman equation tractable. The experimental results demonstrate that our method outperforms in performance over off-policy prior methods.

Download Full-text

On the use of the policy gradient and Hessian in inverse reinforcement learning

Intelligenza Artificiale ◽

10.3233/ia-180011 ◽

2020 ◽

Vol 14 (1) ◽

pp. 117-150

Author(s):

Alberto Maria Metelli ◽

Matteo Pirotta ◽

Marcello Restelli

Keyword(s):

Reinforcement Learning ◽

Sequential Decision ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Model Free ◽

Learning Speed ◽

Policy Gradient ◽

Continuous Domains ◽

Learning Policies ◽

Finite Domains

Reinforcement Learning (RL) is an effective approach to solve sequential decision making problems when the environment is equipped with a reward function to evaluate the agent’s actions. However, there are several domains in which a reward function is not available and difficult to estimate. When samples of expert agents are available, Inverse Reinforcement Learning (IRL) allows recovering a reward function that explains the demonstrated behavior. Most of the classic IRL methods, in addition to expert’s demonstrations, require sampling the environment to evaluate each reward function, that, in turn, is built starting from a set of engineered features. This paper is about a novel model-free IRL approach that does not require to specify a function space where to search for the expert’s reward function. Leveraging on the fact that the policy gradient needs to be zero for an optimal policy, the algorithm generates an approximation space for the reward function, in which a reward is singled out employing a second-order criterion. After introducing our approach for finite domains, we extend it to continuous ones. The empirical results, on both finite and continuous domains, show that the reward function recovered by our algorithm allows learning policies that outperform those obtained with the true reward function, in terms of learning speed.

Download Full-text

Quadrotor Motion Control Using Deep Reinforcement Learning

Journal of Unmanned Vehicle Systems ◽

10.1139/juvs-2021-0010 ◽

2021 ◽

Author(s):

Zifei Jiang ◽

Alan F. Lynch

Keyword(s):

Reinforcement Learning ◽

Neural Nets ◽

Neural Net ◽

Reward Function ◽

Model Free ◽

Policy Gradient ◽

Aerial Vehicle ◽

Stochastic Controller ◽

Policy Optimization ◽

Gradient Approach

We present a deep neural net-based controller trained by a model-free reinforcement learning (RL) algorithm to achieve hover stabilization for a quadrotor unmanned aerial vehicle (UAV). With RL, two neural nets are trained. One neural net is used as a stochastic controller which gives the distribution of control inputs. The other maps the UAV state to a scalar which estimates the reward of the controller. A proximal policy optimization (PPO) method, which is an actor-critic policy gradient approach, is used to train the neural nets. Simulation results show that the trained controller achieves a comparable level of performance to a manually-tuned PID controller, despite not depending on any model information. The paper considers different choices of reward function and their influence on controller performance.

Download Full-text

An adaptive model free control design and its applications

2nd IEEE International Conference on Industrial Informatics, 2004. INDIN '04. 2004 ◽

10.1109/indin.2004.1417338 ◽

2005 ◽

Cited By ~ 1

Author(s):

Zhi-Gang Han ◽

Xinghuo Yu

Keyword(s):

Control Design ◽

Adaptive Model ◽

Model Free ◽

Model Free Control

Download Full-text