Prudent Policy Gradient with Auxiliary Actor in Multi-degree-of-freedom Robotic Tasks

Author(s):  
Xiangjian Li ◽  
Huashan Liu ◽  
Xin Cheng ◽  
Menghua Dong
2020 ◽  
Vol 17 (1) ◽  
pp. 172988141989834
Author(s):  
Guoyu Zuo ◽  
Qishen Zhao ◽  
Jiahao Lu ◽  
Jiangeng Li

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.


Electronics ◽  
2021 ◽  
Vol 10 (7) ◽  
pp. 870
Author(s):  
Yangyang Hou ◽  
Huajie Hong ◽  
Zhaomei Sun ◽  
Dasheng Xu ◽  
Zhe Zeng

As a research hotspot in the field of artificial intelligence, the application of deep reinforcement learning to the learning of the motion ability of a manipulator can help to improve the learning of the motion ability of a manipulator without a kinematic model. To suppress the overestimation bias of values in Deep Deterministic Policy Gradient (DDPG) networks, the Twin Delayed Deep Deterministic Policy Gradient (TD3) was proposed. This paper further suppresses the overestimation bias of values for multi-degree of freedom (DOF) manipulator learning based on deep reinforcement learning. Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism (RTD3) was proposed. The experimental results show that RTD3 applied to multi degree freedom manipulators is in place, with an improved learning ability by 29.15% on the basis of TD3. In this paper, a step-by-step reward function is proposed specifically for the learning and innovation of the multi degree of freedom manipulator’s motion ability. The view of continuous decision-making and process problem is used to guide the learning of the manipulator, and the learning efficiency is improved by optimizing the playback of experience. In order to measure the point-to-point position motion ability of a manipulator, a new evaluation index based on the characteristics of the continuous decision process problem, energy efficiency distance, is presented in this paper, which can evaluate the learning quality of the manipulator motion ability by a more comprehensive and fair evaluation algorithm.


1997 ◽  
Vol 2 (2) ◽  
pp. 186-191 ◽  
Author(s):  
William P. Dunlap ◽  
Leann Myers

Author(s):  
Nguyen Cao Thang ◽  
Luu Xuan Hung

The paper presents a performance analysis of global-local mean square error criterion of stochastic linearization for some nonlinear oscillators. This criterion of stochastic linearization for nonlinear oscillators bases on dual conception to the local mean square error criterion (LOMSEC). The algorithm is generally built to multi degree of freedom (MDOF) nonlinear oscillators. Then, the performance analysis is carried out for two applications which comprise a rolling ship oscillation and two degree of freedom one. The improvement on accuracy of the proposed criterion has been shown in comparison with the conventional Gaussian equivalent linearization (GEL).


Sign in / Sign up

Export Citation Format

Share Document