Composing functions to speed up reinforcement learning in a changing world

Deep Reinforcement Learning Approach with Multiple Experience Pools for UAV’s Autonomous Motion Planning in Complex Unknown Environments

Sensors ◽

10.3390/s20071890 ◽

2020 ◽

Vol 20 (7) ◽

pp. 1890 ◽

Cited By ~ 6

Author(s):

Zijian Hu ◽

Kaifang Wan ◽

Xiaoguang Gao ◽

Yiwei Zhai ◽

Qianglong Wang

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

Predictive Control ◽

Experimental Testing ◽

The Novel ◽

Simulation Environment ◽

Unknown Environments ◽

Autonomous Motion ◽

Policy Gradient ◽

Speed Up

Autonomous motion planning (AMP) of unmanned aerial vehicles (UAVs) is aimed at enabling a UAV to safely fly to the target without human intervention. Recently, several emerging deep reinforcement learning (DRL) methods have been employed to address the AMP problem in some simplified environments, and these methods have yielded good results. This paper proposes a multiple experience pools (MEPs) framework leveraging human expert experiences for DRL to speed up the learning process. Based on the deep deterministic policy gradient (DDPG) algorithm, a MEP–DDPG algorithm was designed using model predictive control and simulated annealing to generate expert experiences. On applying this algorithm to a complex unknown simulation environment constructed based on the parameters of the real UAV, the training experiment results showed that the novel DRL algorithm resulted in a performance improvement exceeding 20% as compared with the state-of-the-art DDPG. The results of the experimental testing indicate that UAVs trained using MEP–DDPG can stably complete a variety of tasks in complex, unknown environments.

Using supervised training signals of observable state dynamics to speed-up and improve reinforcement learning

2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) ◽

10.1109/adprl.2014.7010640 ◽

2014 ◽

Cited By ~ 2

Author(s):

Daniel L Elliott ◽

Charles Anderson

Keyword(s):

Reinforcement Learning ◽

Supervised Training ◽

Speed Up ◽

Observable State

DrugEx v2: De Novo Design of Drug Molecule by Pareto-based Multi-Objective Reinforcement Learning in Polypharmacology

10.26434/chemrxiv.14474127 ◽

2021 ◽

Author(s):

Xuhan Liu ◽

Kai Ye ◽

Herman Van Vlijmen ◽

Michael T. M. Emmerich ◽

Adriaan P. IJzerman ◽

...

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Drug Target ◽

De Novo ◽

Multi Objective ◽

Learning Framework ◽

Drug Molecules ◽

Pareto Ranking ◽

Speed Up ◽

Deep Learning Model

In polypharmacology, ideal drugs are required to bind to multiple specific targets to enhance efficacy or to reduce resistance formation. Although deep learning has achieved breakthrough in drug discovery, most of its applications only focus on a single drug target to generate drug-like active molecules in spite of the reality that drug molecules often interact with more than one target which can have desired (polypharmacology) or undesired (toxicity) effects. In a previous study we proposed a new method named DrugEx that integrates an exploration strategy into RNN-based reinforcement learning to improve the diversity of the generated molecules. Here, we extended our DrugEx algorithm with multi-objective optimization to generate drug molecules towards more than one specific target (two adenosine receptors, A1AR and A2AAR, and the potassium ion channel hERG in this study). In our model, we applied an RNN as the agent and machine learning predictors as the environment, both of which were pre-trained in advance and then interplayed under the reinforcement learning framework. The concept of evolutionary algorithms was merged into our method such that crossover and mutation operations were implemented by the same deep learning model as the agent. During the training loop, the agent generates a batch of SMILES-based molecules. Subsequently scores for all objectives provided by the environment are used for constructing Pareto ranks of the generated molecules with non-dominated sorting and Tanimoto-based crowding distance algorithms. Here, we adopted GPU acceleration to speed up the process of Pareto optimization. The final reward of each molecule is calculated based on the Pareto ranking with the ranking selection algorithm. The agent is trained under the guidance of the reward to make sure it can generate more desired molecules after convergence of the training process. All in all we demonstrate generation of compounds with a diverse predicted selectivity profile toward multiple targets, offering the potential of high efficacy and lower toxicity.

Speed up of Reinforcement Learning using Chaotic Evolutionary Computation and its Application to Driver's Support Display System

IEEJ Transactions on Industry Applications ◽

10.1541/ieejias.121.948 ◽

2001 ◽

Vol 121 (9) ◽

pp. 948-955 ◽

Cited By ~ 1

Author(s):

Toru Yamaguchi ◽

Makoto Takahide ◽

Yoshimoto Nakamura ◽

Naoki Kohata

Keyword(s):

Reinforcement Learning ◽

Evolutionary Computation ◽

Display System ◽

Speed Up

Deep Reinforcement Learning with Interactive Feedback in a Human–Robot Environment

Applied Sciences ◽

10.3390/app10165574 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5574 ◽

Cited By ~ 4

Author(s):

Ithan Moreira ◽

Javier Rivas ◽

Francisco Cruz ◽

Richard Dazeley ◽

Angel Ayala ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Process ◽

Robotic Arm ◽

Learning Approach ◽

Speed Up ◽

Open Issue ◽

Domestic Environments ◽

The Given ◽

Different Sources ◽

Interactive Feedback

Robots are extending their presence in domestic environments every day, it being more common to see them carrying out tasks in home scenarios. In the future, robots are expected to increasingly perform more complex tasks and, therefore, be able to acquire experience from different sources as quickly as possible. A plausible approach to address this issue is interactive feedback, where a trainer advises a learner on which actions should be taken from specific states to speed up the learning process. Moreover, deep reinforcement learning has been recently widely used in robotics to learn the environment and acquire new skills autonomously. However, an open issue when using deep reinforcement learning is the excessive time needed to learn a task from raw input images. In this work, we propose a deep reinforcement learning approach with interactive feedback to learn a domestic task in a Human–Robot scenario. We compare three different learning methods using a simulated robotic arm for the task of organizing different objects; the proposed methods are (i) deep reinforcement learning (DeepRL); (ii) interactive deep reinforcement learning using a previously trained artificial agent as an advisor (agent–IDeepRL); and (iii) interactive deep reinforcement learning using a human advisor (human–IDeepRL). We demonstrate that interactive approaches provide advantages for the learning process. The obtained results show that a learner agent, using either agent–IDeepRL or human–IDeepRL, completes the given task earlier and has fewer mistakes compared to the autonomous DeepRL approach.

Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards

International Journal of Advanced Robotic Systems ◽

10.1177/1729881419898342 ◽

2020 ◽

Vol 17 (1) ◽

pp. 172988141989834

Author(s):

Guoyu Zuo ◽

Qishen Zhao ◽

Jiahao Lu ◽

Jiangeng Li

Keyword(s):

Reinforcement Learning ◽

Gradient Algorithm ◽

Learning To Learn ◽

Model Free ◽

Learning Speed ◽

Policy Gradient ◽

Experience Replay ◽

Speed Up ◽

Reward Functions ◽

Robotic Tasks

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.

Training in Task Space to Speed Up and Guide Reinforcement Learning

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros40897.2019.8967995 ◽

2019 ◽

Cited By ~ 1

Author(s):

Guillaume Bellegarda ◽

Katie Byl

Keyword(s):

Reinforcement Learning ◽

Task Space ◽

Speed Up

Experimental quantum speed-up in reinforcement learning agents

Nature ◽

10.1038/s41586-021-03242-7 ◽

2021 ◽

Vol 591 (7849) ◽

pp. 229-233

Author(s):

V. Saggio ◽

B. E. Asenbeck ◽

A. Hamann ◽

T. Strömberg ◽

P. Schiansky ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Agents ◽

Speed Up ◽

Quantum Speed Up

Using a case base of surfaces to speed-up reinforcement learning

Case-Based Reasoning Research and Development - Lecture Notes in Computer Science ◽

10.1007/3-540-63233-6_513 ◽

1997 ◽

pp. 435-444 ◽

Cited By ~ 2

Author(s):

Chris Drummond

Keyword(s):

Reinforcement Learning ◽

Speed Up ◽

Case Base

DrugEx v2: De Novo Design of Drug Molecule by Pareto-based Multi-Objective Reinforcement Learning in Polypharmacology

10.26434/chemrxiv.14474127.v1 ◽

2021 ◽

Author(s):

Xuhan Liu ◽

Kai Ye ◽

Herman Van Vlijmen ◽

Michael T. M. Emmerich ◽

Adriaan P. IJzerman ◽

...

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Drug Target ◽

De Novo ◽

Multi Objective ◽

Learning Framework ◽

Drug Molecules ◽

Pareto Ranking ◽

Speed Up ◽

Deep Learning Model

In polypharmacology, ideal drugs are required to bind to multiple specific targets to enhance efficacy or to reduce resistance formation. Although deep learning has achieved breakthrough in drug discovery, most of its applications only focus on a single drug target to generate drug-like active molecules in spite of the reality that drug molecules often interact with more than one target which can have desired (polypharmacology) or undesired (toxicity) effects. In a previous study we proposed a new method named DrugEx that integrates an exploration strategy into RNN-based reinforcement learning to improve the diversity of the generated molecules. Here, we extended our DrugEx algorithm with multi-objective optimization to generate drug molecules towards more than one specific target (two adenosine receptors, A1AR and A2AAR, and the potassium ion channel hERG in this study). In our model, we applied an RNN as the agent and machine learning predictors as the environment, both of which were pre-trained in advance and then interplayed under the reinforcement learning framework. The concept of evolutionary algorithms was merged into our method such that crossover and mutation operations were implemented by the same deep learning model as the agent. During the training loop, the agent generates a batch of SMILES-based molecules. Subsequently scores for all objectives provided by the environment are used for constructing Pareto ranks of the generated molecules with non-dominated sorting and Tanimoto-based crowding distance algorithms. Here, we adopted GPU acceleration to speed up the process of Pareto optimization. The final reward of each molecule is calculated based on the Pareto ranking with the ranking selection algorithm. The agent is trained under the guidance of the reward to make sure it can generate more desired molecules after convergence of the training process. All in all we demonstrate generation of compounds with a diverse predicted selectivity profile toward multiple targets, offering the potential of high efficacy and lower toxicity.