Autonomous Reinforcement Learning with Experience Replay for Humanoid Gait Optimization

Synthetic Experiences for Accelerating DQN Performance in Discrete Non-Deterministic Environments

Algorithms ◽

10.3390/a14080226 ◽

2021 ◽

Vol 14 (8) ◽

pp. 226

Author(s):

Wenzel Pilar von Pilchau ◽

Anthony Stein ◽

Jörg Hähner

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Weighted Average ◽

Up States ◽

Experience Replay

State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment.

Download Full-text

A real-time HIL control system on rotary inverted pendulum hardware platform based on double deep Q-network

Measurement and Control ◽

10.1177/00202940211000380 ◽

2021 ◽

Vol 54 (3-4) ◽

pp. 417-428

Author(s):

Yanyan Dai ◽

KiDong Lee ◽

SukGyu Lee

Keyword(s):

Control System ◽

Reinforcement Learning ◽

Inverted Pendulum ◽

Learning Algorithm ◽

Deep Understanding ◽

Control Engineering ◽

Experience Replay ◽

Real Hardware ◽

Rotary Inverted Pendulum ◽

Reinforcement Learning Algorithm

For real applications, rotary inverted pendulum systems have been known as the basic model in nonlinear control systems. If researchers have no deep understanding of control, it is difficult to control a rotary inverted pendulum platform using classic control engineering models, as shown in section 2.1. Therefore, without classic control theory, this paper controls the platform by training and testing reinforcement learning algorithm. Many recent achievements in reinforcement learning (RL) have become possible, but there is a lack of research to quickly test high-frequency RL algorithms using real hardware environment. In this paper, we propose a real-time Hardware-in-the-loop (HIL) control system to train and test the deep reinforcement learning algorithm from simulation to real hardware implementation. The Double Deep Q-Network (DDQN) with prioritized experience replay reinforcement learning algorithm, without a deep understanding of classical control engineering, is used to implement the agent. For the real experiment, to swing up the rotary inverted pendulum and make the pendulum smoothly move, we define 21 actions to swing up and balance the pendulum. Comparing Deep Q-Network (DQN), the DDQN with prioritized experience replay algorithm removes the overestimate of Q value and decreases the training time. Finally, this paper shows the experiment results with comparisons of classic control theory and different reinforcement learning algorithms.

Download Full-text

Play games using Reinforcement Learning and Artificial Neural Networks with Experience Replay

2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS) ◽

10.1109/icis.2018.8466428 ◽

2018 ◽

Cited By ~ 1

Author(s):

Meng Xu ◽

Haobin Shi ◽

Yao Wang

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Reinforcement Learning ◽

Experience Replay ◽

Artificial Neural

Download Full-text

Composite Experience Replay-Based Deep Reinforcement Learning With Application in Wind Farm Control

IEEE Transactions on Control Systems Technology ◽

10.1109/tcst.2021.3102476 ◽

2021 ◽

pp. 1-15

Author(s):

Hongyang Dong ◽

Xiaowei Zhao

Keyword(s):

Reinforcement Learning ◽

Wind Farm ◽

Experience Replay

Download Full-text

Prioritized Experience Replay in Multi-Actor-Attention-Critic for Reinforcement Learning

Journal of Physics Conference Series ◽

10.1088/1742-6596/1631/1/012040 ◽

2020 ◽

Vol 1631 ◽

pp. 012040

Author(s):

Sheng Fan ◽

Guanghua Song ◽

Bowei Yang ◽

Xiaohong Jiang

Keyword(s):

Reinforcement Learning ◽

Experience Replay

Download Full-text

SLER: Self-generated long-term experience replay for continual reinforcement learning

Applied Intelligence ◽

10.1007/s10489-020-01786-1 ◽

2020 ◽

Vol 51 (1) ◽

pp. 185-201

Author(s):

Chunmao Li ◽

Yang Li ◽

Yinliang Zhao ◽

Peng Peng ◽

Xupeng Geng

Keyword(s):

Reinforcement Learning ◽

Experience Replay ◽

Long Term Experience

Download Full-text

Deep Reinforcement Learning Based Left-Turn Connected and Automated Vehicle Control at Signalized Intersection in Vehicle-to-Infrastructure Environment

Information ◽

10.3390/info11020077 ◽

2020 ◽

Vol 11 (2) ◽

pp. 77 ◽

Cited By ~ 1

Author(s):

Juan Chen ◽

Zhengxuan Xue ◽

Daiqian Fan

Keyword(s):

Reinforcement Learning ◽

Control Method ◽

Signalized Intersection ◽

Signal Control ◽

Left Turn ◽

Automated Vehicle ◽

Whole Process ◽

Policy Gradient ◽

Experience Replay ◽

Automated Vehicle Control

In order to solve the problem of vehicle delay caused by stops at signalized intersections, a micro-control method of a left-turning connected and automated vehicle (CAV) based on an improved deep deterministic policy gradient (DDPG) is designed in this paper. In this paper, the micro-control of the whole process of a left-turn vehicle approaching, entering, and leaving a signalized intersection is considered. In addition, in order to solve the problems of low sampling efficiency and overestimation of the critic network of the DDPG algorithm, a positive and negative reward experience replay buffer sampling mechanism and multi-critic network structure are adopted in the DDPG algorithm in this paper. Finally, the effectiveness of the signal control method, six DDPG-based methods (DDPG, PNRERB-1C-DDPG, PNRERB-3C-DDPG, PNRERB-5C-DDPG, PNRERB-5CNG-DDPG, and PNRERB-7C-DDPG), and four DQN-based methods (DQN, Dueling DQN, Double DQN, and Prioritized Replay DQN) are verified under 0.2, 0.5, and 0.7 saturation degrees of left-turning vehicles at a signalized intersection within a VISSIM simulation environment. The results show that the proposed deep reinforcement learning method can get a number of stops benefits ranging from 5% to 94%, stop time benefits ranging from 1% to 99%, and delay benefits ranging from −17% to 93%, respectively compared with the traditional signal control method.

Download Full-text

A sample efficient model-based deep reinforcement learning algorithm with experience replay for robot manipulation

International Journal of Intelligent Robotics and Applications ◽

10.1007/s41315-020-00135-2 ◽

2020 ◽

Vol 4 (2) ◽

pp. 217-228

Author(s):

Cheng Zhang ◽

Liang Ma ◽

Alexander Schmitz

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Robot Manipulation ◽

Model Based ◽

Experience Replay ◽

Reinforcement Learning Algorithm

Download Full-text

An Experience Replay Method based on Tree Structure for Reinforcement Learning

IEEE Transactions on Emerging Topics in Computing ◽

10.1109/tetc.2018.2890682 ◽

2019 ◽

pp. 1-1

Author(s):

Wei-Cheng Jiang ◽

Kao-Shing Hwang ◽

Jin-Ling Lin

Keyword(s):

Reinforcement Learning ◽

Tree Structure ◽

Experience Replay

Download Full-text

UAV Maneuvering Target Tracking in Uncertain Environments Based on Deep Reinforcement Learning and Meta-Learning

Remote Sensing ◽

10.3390/rs12223789 ◽

2020 ◽

Vol 12 (22) ◽

pp. 3789

Author(s):

Bo Li ◽

Zhigang Gan ◽

Daqing Chen ◽

Dyachenko Sergey Aleksandrovich

Keyword(s):

Reinforcement Learning ◽

Target Tracking ◽

Uncertain Environments ◽

Target Movement ◽

Maneuvering Target Tracking ◽

Novel Approach ◽

Policy Gradient ◽

Meta Learning ◽

Experience Replay ◽

Task Experience

This paper combines deep reinforcement learning (DRL) with meta-learning and proposes a novel approach, named meta twin delayed deep deterministic policy gradient (Meta-TD3), to realize the control of unmanned aerial vehicle (UAV), allowing a UAV to quickly track a target in an environment where the motion of a target is uncertain. This approach can be applied to a variety of scenarios, such as wildlife protection, emergency aid, and remote sensing. We consider a multi-task experience replay buffer to provide data for the multi-task learning of the DRL algorithm, and we combine meta-learning to develop a multi-task reinforcement learning update method to ensure the generalization capability of reinforcement learning. Compared with the state-of-the-art algorithms, namely the deep deterministic policy gradient (DDPG) and twin delayed deep deterministic policy gradient (TD3), experimental results show that the Meta-TD3 algorithm has achieved a great improvement in terms of both convergence value and convergence rate. In a UAV target tracking problem, Meta-TD3 only requires a few steps to train to enable a UAV to adapt quickly to a new target movement mode more and maintain a better tracking effectiveness.

Download Full-text