Dynamic Control Algorithm for Biped Walking Based on Policy Gradient Fuzzy Reinforcement Learning

2008 ◽  
Vol 41 (2) ◽  
pp. 1717-1722 ◽  
Author(s):  
Duško M. Katić ◽  
Aleksandar D. Rodić
2007 ◽  
Vol 51 (1) ◽  
pp. 3-30 ◽  
Author(s):  
Duśko M. Katić ◽  
Aleksandar D. Rodić ◽  
Miomir K. Vukobratović

Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 8161
Author(s):  
Xibao Xu ◽  
Yushen Chen ◽  
Chengchao Bai

Planetary soft landing has been studied extensively due to its promising application prospects. In this paper, a soft landing control algorithm based on deep reinforcement learning (DRL) with good convergence property is proposed. First, the soft landing problem of the powered descent phase is formulated and the theoretical basis of Reinforcement Learning (RL) used in this paper is introduced. Second, to make it easier to converge, a reward function is designed to include process rewards like velocity tracking reward, solving the problem of sparse reward. Then, by including the fuel consumption penalty and constraints violation penalty, the lander can learn to achieve velocity tracking goal while saving fuel and keeping attitude angle within safe ranges. Then, simulations of training are carried out under the frameworks of Deep deterministic policy gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor Critic (SAC), respectively, which are of the classical RL frameworks, and all converged. Finally, the trained policy is deployed into velocity tracking and soft landing experiments, results of which demonstrate the validity of the algorithm proposed.


Author(s):  
Yao Xiang ◽  
Jingling Yuan ◽  
Ruiqi Luo ◽  
Xian Zhong ◽  
Tao Li

In recent years, how to use renewable energy to reduce the energy cost of internet data center (IDC) has been an urgent problem to be solved. More and more solutions are beginning to consider machine learning, but many of the existing methods need to take advantage of some future information, which is difficult to obtain in the actual operation process. In this paper, we focus on reducing the energy cost of IDC by controlling the energy flow of renewable energy without any future information. we propose an efficient energy dynamic control algorithm based on the theory of reinforcement learning, which approximates the optimal solution by learning the feedback of historical control decisions. For the purpose of avoiding overestimation, improving the convergence ability of the algorithm, we use the double [Formula: see text]-method to further optimize. The extensive experimental results show that our algorithm can on average save the energy cost by 18.3% and reduce the rate of grid intervention by 26.2% compared with other algorithms, and thus has good application prospects.


Author(s):  
Feng Pan ◽  
Hong Bao

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.


Author(s):  
Александр Александрович Воевода ◽  
Дмитрий Олегович Романников

Синтез регуляторов для многоканальных систем - актуальная и сложная задача. Одним из возможных способов синтеза является применение нейронных сетей. Нейронный регулятор либо обучают на предварительно рассчитанных данных, либо используют для настройки параметров ПИД-регулятора из начального устойчивого положения замкнутой системы. Предложено использовать нейронные сети для регулирования двухканального объекта, при этом обучение будет выполняться из неустойчивого (произвольного) начального положения с применением методов обучения нейронных сетей с подкреплением. Предложена структура нейронной сети и замкнутой системы, в которой уставка задается при помощи входного параметра нейронной сети регулятора The problem for synthesis of automatic control systems is hard, especially for multichannel objects. One of the approaches is the use of neural networks. For the approaches that are based on the use of reinforcement learning, there is an additional issue - supporting of range of values for the set points. The method of synthesis of automatic control systems using neural networks and the process of its learning with reinforcement learning that allows neural networks learning for supporting regulation is proposed in the predefined range of set points. The main steps of the method are 1) to form a neural net input as a state of the object and system set point; 2) to perform modelling of the system with a set of randomly generated set points from the desired range; 3) to perform a one-step of the learning using the Deterministic Policy Gradient method. The originality of the proposed method is that, in contrast to existing methods of using a neural network to synthesize a controller, the proposed method allows training a controller from an unstable initial state in a closed system and set of a range of set points. The method was applied to the problem of stabilizing the outputs of a two-channel object, for which stabilization both outputs and the first near the input set point is required


Sign in / Sign up

Export Citation Format

Share Document