Dynamic Control Algorithm for Biped Walking Based on Policy Gradient Fuzzy Reinforcement Learning

Planetary soft landing has been studied extensively due to its promising application prospects. In this paper, a soft landing control algorithm based on deep reinforcement learning (DRL) with good convergence property is proposed. First, the soft landing problem of the powered descent phase is formulated and the theoretical basis of Reinforcement Learning (RL) used in this paper is introduced. Second, to make it easier to converge, a reward function is designed to include process rewards like velocity tracking reward, solving the problem of sparse reward. Then, by including the fuel consumption penalty and constraints violation penalty, the lander can learn to achieve velocity tracking goal while saving fuel and keeping attitude angle within safe ranges. Then, simulations of training are carried out under the frameworks of Deep deterministic policy gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor Critic (SAC), respectively, which are of the classical RL frameworks, and all converged. Finally, the trained policy is deployed into velocity tracking and soft landing experiments, results of which demonstrate the validity of the algorithm proposed.

Download Full-text

An Energy Dynamic Control Algorithm Based on Reinforcement Learning for Data Centers

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419510091 ◽

2019 ◽

Vol 33 (13) ◽

pp. 1951009

Author(s):

Yao Xiang ◽

Jingling Yuan ◽

Ruiqi Luo ◽

Xian Zhong ◽

Tao Li

Keyword(s):

Renewable Energy ◽

Reinforcement Learning ◽

Energy Cost ◽

Control Algorithm ◽

Dynamic Control ◽

Optimal Solution ◽

Operation Process ◽

Actual Operation ◽

Internet Data Center ◽

Future Information

In recent years, how to use renewable energy to reduce the energy cost of internet data center (IDC) has been an urgent problem to be solved. More and more solutions are beginning to consider machine learning, but many of the existing methods need to take advantage of some future information, which is difficult to obtain in the actual operation process. In this paper, we focus on reducing the energy cost of IDC by controlling the energy flow of renewable energy without any future information. we propose an efficient energy dynamic control algorithm based on the theory of reinforcement learning, which approximates the optimal solution by learning the feedback of historical control decisions. For the purpose of avoiding overestimation, improving the convergence ability of the algorithm, we use the double [Formula: see text]-method to further optimize. The extensive experimental results show that our algorithm can on average save the energy cost by 18.3% and reduce the rate of grid intervention by 26.2% compared with other algorithms, and thus has good application prospects.

Download Full-text

Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method

2021 American Control Conference (ACC) ◽

10.23919/acc50511.2021.9482765 ◽

2021 ◽

Author(s):

Sebastien Gros ◽

Mario Zanon

Keyword(s):

Reinforcement Learning ◽

Gradient Method ◽

Policy Gradient

Download Full-text

Strategy Generation Based on Reinforcement Learning with Deep Deterministic Policy Gradient for UCAV

2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV) ◽

10.1109/icarcv50220.2020.9305446 ◽

2020 ◽

Author(s):

Yunhong Ma ◽

Shuyao Bai ◽

Yifei Zhao ◽

Chao Song ◽

Jie Yang

Keyword(s):

Reinforcement Learning ◽

Policy Gradient

Download Full-text

Formula-E race strategy development using distributed policy gradient reinforcement learning

Knowledge-Based Systems ◽

10.1016/j.knosys.2021.106781 ◽

2021 ◽

Vol 216 ◽

pp. 106781

Author(s):

Xuze Liu ◽

Abbas Fotouhi ◽

Daniel J. Auger

Keyword(s):

Reinforcement Learning ◽

Strategy Development ◽

Policy Gradient

Download Full-text

Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies

2021 American Control Conference (ACC) ◽

10.23919/acc50511.2021.9483016 ◽

2021 ◽

Author(s):

Sebastien Gros ◽

Mario Zanon

Keyword(s):

Reinforcement Learning ◽

Gradient Method ◽

Bias Correction ◽

Policy Gradient

Download Full-text

Reinforced Knowledge Distillation: Multi-class Imbalanced Classifier Based on Policy Gradient Reinforcement Learning

Neurocomputing ◽

10.1016/j.neucom.2021.08.040 ◽

2021 ◽

Author(s):

Saite Fan ◽

Xinmin Zhang ◽

Zhihuan Song

Keyword(s):

Reinforcement Learning ◽

Policy Gradient ◽

Knowledge Distillation

Download Full-text

Preceding vehicle following algorithm with human driving characteristics

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407020981546 ◽

2021 ◽

pp. 095440702098154

Author(s):

Feng Pan ◽

Hong Bao

Keyword(s):

Reinforcement Learning ◽

Weight Vector ◽

Gradient Algorithm ◽

Inner Product ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Human Driver ◽

Policy Gradient ◽

Preceding Vehicle ◽

Action Spaces

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.

Download Full-text

The synthesis method of regulators for multichannel systems using neural networks

Вычислительные технологии ◽

10.25743/ict.2020.25.3.012 ◽

2020 ◽

pp. 111-118

Author(s):

Александр Александрович Воевода ◽

Дмитрий Олегович Романников

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Automatic Control ◽

Control Systems ◽

Synthesis Method ◽

Neural Net ◽

Initial State ◽

Set Point ◽

Automatic Control Systems ◽

Policy Gradient

Синтез регуляторов для многоканальных систем - актуальная и сложная задача. Одним из возможных способов синтеза является применение нейронных сетей. Нейронный регулятор либо обучают на предварительно рассчитанных данных, либо используют для настройки параметров ПИД-регулятора из начального устойчивого положения замкнутой системы. Предложено использовать нейронные сети для регулирования двухканального объекта, при этом обучение будет выполняться из неустойчивого (произвольного) начального положения с применением методов обучения нейронных сетей с подкреплением. Предложена структура нейронной сети и замкнутой системы, в которой уставка задается при помощи входного параметра нейронной сети регулятора The problem for synthesis of automatic control systems is hard, especially for multichannel objects. One of the approaches is the use of neural networks. For the approaches that are based on the use of reinforcement learning, there is an additional issue - supporting of range of values for the set points. The method of synthesis of automatic control systems using neural networks and the process of its learning with reinforcement learning that allows neural networks learning for supporting regulation is proposed in the predefined range of set points. The main steps of the method are 1) to form a neural net input as a state of the object and system set point; 2) to perform modelling of the system with a set of randomly generated set points from the desired range; 3) to perform a one-step of the learning using the Deterministic Policy Gradient method. The originality of the proposed method is that, in contrast to existing methods of using a neural network to synthesize a controller, the proposed method allows training a controller from an unstable initial state in a closed system and set of a range of set points. The method was applied to the problem of stabilizing the outputs of a two-channel object, for which stabilization both outputs and the first near the input set point is required

Download Full-text