Prudent Policy Gradient with Auxiliary Actor in Multi-degree-of-freedom Robotic Tasks

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.

Download Full-text

The Control Method of Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism to Multi-DOF Manipulator

Electronics ◽

10.3390/electronics10070870 ◽

2021 ◽

Vol 10 (7) ◽

pp. 870

Author(s):

Yangyang Hou ◽

Huajie Hong ◽

Zhaomei Sun ◽

Dasheng Xu ◽

Zhe Zeng

Keyword(s):

Reinforcement Learning ◽

Control Method ◽

Kinematic Model ◽

Degree Of Freedom ◽

Learning Ability ◽

Reward Function ◽

Policy Gradient ◽

Evaluation Algorithm ◽

Improved Learning ◽

Point To Point

As a research hotspot in the field of artificial intelligence, the application of deep reinforcement learning to the learning of the motion ability of a manipulator can help to improve the learning of the motion ability of a manipulator without a kinematic model. To suppress the overestimation bias of values in Deep Deterministic Policy Gradient (DDPG) networks, the Twin Delayed Deep Deterministic Policy Gradient (TD3) was proposed. This paper further suppresses the overestimation bias of values for multi-degree of freedom (DOF) manipulator learning based on deep reinforcement learning. Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism (RTD3) was proposed. The experimental results show that RTD3 applied to multi degree freedom manipulators is in place, with an improved learning ability by 29.15% on the basis of TD3. In this paper, a step-by-step reward function is proposed specifically for the learning and innovation of the multi degree of freedom manipulator’s motion ability. The view of continuous decision-making and process problem is used to guide the learning of the manipulator, and the learning efficiency is improved by optimizing the playback of experience. In order to measure the point-to-point position motion ability of a manipulator, a new evaluation index based on the characteristics of the continuous decision process problem, energy efficiency distance, is presented in this paper, which can evaluate the learning quality of the manipulator motion ability by a more comprehensive and fair evaluation algorithm.

Download Full-text

Approximating power for significance tests with one degree of freedom.

Psychological Methods ◽

10.1037/1082-989x.2.2.186 ◽

1997 ◽

Vol 2 (2) ◽

pp. 186-191 ◽

Cited By ~ 5

Author(s):

William P. Dunlap ◽

Leann Myers

Keyword(s):

Degree Of Freedom ◽

Significance Tests

Download Full-text

Two-Degree-of-Freedom Robust Temperature Control of Peltier Device Based on Heat Disturbance Observer

IEEJ Transactions on Industry Applications ◽

10.1541/ieejias.131.967 ◽

2011 ◽

Vol 131 (7) ◽

pp. 967-973 ◽

Cited By ~ 10

Author(s):

Hidetaka Morimitsu ◽

Seiichiro Katsura

Keyword(s):

Temperature Control ◽

Disturbance Observer ◽

Degree Of Freedom ◽

Peltier Device ◽

Two Degree Of Freedom

Download Full-text

Robust Control of DC-DC Converter by Approximate Two-Degree-of-Freedom Digital Controller with Second-Order Differential Disturbance Characteristics

IEEJ Transactions on Industry Applications ◽

10.1541/ieejias.129.1137 ◽

2009 ◽

Vol 129 (12) ◽

pp. 1137-1146

Author(s):

Eiji Takegami ◽

Kohji Higuchi ◽

Kazushi Nakano

Keyword(s):

Robust Control ◽

Second Order ◽

Degree Of Freedom ◽

Digital Controller ◽

Two Degree Of Freedom

Download Full-text

Two-Degree-of-Freedom Self-Tuning Control for Motor Drives Using Pole-Zero Cancellation Method

IEEJ Transactions on Industry Applications ◽

10.1541/ieejias.129.415 ◽

2009 ◽

Vol 129 (4) ◽

pp. 415-422 ◽

Cited By ~ 3

Author(s):

Akio Takano

Keyword(s):

Motor Drives ◽

Degree Of Freedom ◽

Self Tuning ◽

Two Degree Of Freedom

Download Full-text

Finite Element Simulation and Assessment of Single-Degree-of-Freedom Prediction Methodology for Insulated Concrete Sandwich Panels Subjected to Blast Loads

10.15554/pci.rr.misc-003 ◽

2011 ◽

Author(s):

Charles Michael Newberry

Keyword(s):

Finite Element ◽

Finite Element Simulation ◽

Sandwich Panels ◽

Degree Of Freedom ◽

Blast Loads ◽

Single Degree Of Freedom ◽

Single Degree ◽

Element Simulation

Download Full-text

Performance analysis of global local mean square error criterion of stochastic linearization for nonlinear oscillator

Vietnam Journal of Mechanics ◽

10.15625/0866-7136/12015 ◽

2019 ◽

Author(s):

Nguyen Cao Thang ◽

Luu Xuan Hung

Keyword(s):

Performance Analysis ◽

Mean Square Error ◽

Nonlinear Oscillators ◽

Degree Of Freedom ◽

Equivalent Linearization ◽

Mean Square ◽

Error Criterion ◽

Stochastic Linearization ◽

Local Mean ◽

Mean Square Error Criterion

The paper presents a performance analysis of global-local mean square error criterion of stochastic linearization for some nonlinear oscillators. This criterion of stochastic linearization for nonlinear oscillators bases on dual conception to the local mean square error criterion (LOMSEC). The algorithm is generally built to multi degree of freedom (MDOF) nonlinear oscillators. Then, the performance analysis is carried out for two applications which comprise a rolling ship oscillation and two degree of freedom one. The improvement on accuracy of the proposed criterion has been shown in comparison with the conventional Gaussian equivalent linearization (GEL).

Download Full-text