scholarly journals Composing functions to speed up reinforcement learning in a changing world

Author(s):  
Chris Drummond

Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 1890 ◽  
Author(s):  
Zijian Hu ◽  
Kaifang Wan ◽  
Xiaoguang Gao ◽  
Yiwei Zhai ◽  
Qianglong Wang

Autonomous motion planning (AMP) of unmanned aerial vehicles (UAVs) is aimed at enabling a UAV to safely fly to the target without human intervention. Recently, several emerging deep reinforcement learning (DRL) methods have been employed to address the AMP problem in some simplified environments, and these methods have yielded good results. This paper proposes a multiple experience pools (MEPs) framework leveraging human expert experiences for DRL to speed up the learning process. Based on the deep deterministic policy gradient (DDPG) algorithm, a MEP–DDPG algorithm was designed using model predictive control and simulated annealing to generate expert experiences. On applying this algorithm to a complex unknown simulation environment constructed based on the parameters of the real UAV, the training experiment results showed that the novel DRL algorithm resulted in a performance improvement exceeding 20% as compared with the state-of-the-art DDPG. The results of the experimental testing indicate that UAVs trained using MEP–DDPG can stably complete a variety of tasks in complex, unknown environments.



2021 ◽  
Author(s):  
Xuhan Liu ◽  
Kai Ye ◽  
Herman Van Vlijmen ◽  
Michael T. M. Emmerich ◽  
Adriaan P. IJzerman ◽  
...  

<p>In polypharmacology, ideal drugs are required to bind to multiple specific targets to enhance efficacy or to reduce resistance formation. Although deep learning has achieved breakthrough in drug discovery, most of its applications only focus on a single drug target to generate drug-like active molecules in spite of the reality that drug molecules often interact with more than one target which can have desired (polypharmacology) or undesired (toxicity) effects. In a previous study we proposed a new method named <i>DrugEx</i> that integrates an exploration strategy into RNN-based reinforcement learning to improve the diversity of the generated molecules. Here, we extended our <i>DrugEx</i> algorithm with multi-objective optimization to generate drug molecules towards more than one specific target (two adenosine receptors, A<sub>1</sub>AR and A<sub>2A</sub>AR, and the potassium ion channel hERG in this study). In our model, we applied an RNN as the <i>agent</i> and machine learning predictors as the <i>environment</i>, both of which were pre-trained in advance and then interplayed under the reinforcement learning framework. The concept of evolutionary algorithms was merged into our method such that <i>crossover</i> and <i>mutation</i> operations were implemented by the same deep learning model as the <i>agent</i>. During the training loop, the agent generates a batch of SMILES-based molecules. Subsequently scores for all objectives provided by the <i>environment</i> are used for constructing Pareto ranks of the generated molecules with non-dominated sorting and Tanimoto-based crowding distance algorithms. Here, we adopted GPU acceleration to speed up the process of Pareto optimization. The final reward of each molecule is calculated based on the Pareto ranking with the ranking selection algorithm. The agent is trained under the guidance of the reward to make sure it can generate more desired molecules after convergence of the training process. All in all we demonstrate generation of compounds with a diverse predicted selectivity profile toward multiple targets, offering the potential of high efficacy and lower toxicity.</p>



2001 ◽  
Vol 121 (9) ◽  
pp. 948-955 ◽  
Author(s):  
Toru Yamaguchi ◽  
Makoto Takahide ◽  
Yoshimoto Nakamura ◽  
Naoki Kohata


2020 ◽  
Vol 10 (16) ◽  
pp. 5574 ◽  
Author(s):  
Ithan Moreira ◽  
Javier Rivas ◽  
Francisco Cruz ◽  
Richard Dazeley ◽  
Angel Ayala ◽  
...  

Robots are extending their presence in domestic environments every day, it being more common to see them carrying out tasks in home scenarios. In the future, robots are expected to increasingly perform more complex tasks and, therefore, be able to acquire experience from different sources as quickly as possible. A plausible approach to address this issue is interactive feedback, where a trainer advises a learner on which actions should be taken from specific states to speed up the learning process. Moreover, deep reinforcement learning has been recently widely used in robotics to learn the environment and acquire new skills autonomously. However, an open issue when using deep reinforcement learning is the excessive time needed to learn a task from raw input images. In this work, we propose a deep reinforcement learning approach with interactive feedback to learn a domestic task in a Human–Robot scenario. We compare three different learning methods using a simulated robotic arm for the task of organizing different objects; the proposed methods are (i) deep reinforcement learning (DeepRL); (ii) interactive deep reinforcement learning using a previously trained artificial agent as an advisor (agent–IDeepRL); and (iii) interactive deep reinforcement learning using a human advisor (human–IDeepRL). We demonstrate that interactive approaches provide advantages for the learning process. The obtained results show that a learner agent, using either agent–IDeepRL or human–IDeepRL, completes the given task earlier and has fewer mistakes compared to the autonomous DeepRL approach.



2020 ◽  
Vol 17 (1) ◽  
pp. 172988141989834
Author(s):  
Guoyu Zuo ◽  
Qishen Zhao ◽  
Jiahao Lu ◽  
Jiangeng Li

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.



Nature ◽  
2021 ◽  
Vol 591 (7849) ◽  
pp. 229-233
Author(s):  
V. Saggio ◽  
B. E. Asenbeck ◽  
A. Hamann ◽  
T. Strömberg ◽  
P. Schiansky ◽  
...  


2021 ◽  
Author(s):  
Xuhan Liu ◽  
Kai Ye ◽  
Herman Van Vlijmen ◽  
Michael T. M. Emmerich ◽  
Adriaan P. IJzerman ◽  
...  

<p>In polypharmacology, ideal drugs are required to bind to multiple specific targets to enhance efficacy or to reduce resistance formation. Although deep learning has achieved breakthrough in drug discovery, most of its applications only focus on a single drug target to generate drug-like active molecules in spite of the reality that drug molecules often interact with more than one target which can have desired (polypharmacology) or undesired (toxicity) effects. In a previous study we proposed a new method named <i>DrugEx</i> that integrates an exploration strategy into RNN-based reinforcement learning to improve the diversity of the generated molecules. Here, we extended our <i>DrugEx</i> algorithm with multi-objective optimization to generate drug molecules towards more than one specific target (two adenosine receptors, A<sub>1</sub>AR and A<sub>2A</sub>AR, and the potassium ion channel hERG in this study). In our model, we applied an RNN as the <i>agent</i> and machine learning predictors as the <i>environment</i>, both of which were pre-trained in advance and then interplayed under the reinforcement learning framework. The concept of evolutionary algorithms was merged into our method such that <i>crossover</i> and <i>mutation</i> operations were implemented by the same deep learning model as the <i>agent</i>. During the training loop, the agent generates a batch of SMILES-based molecules. Subsequently scores for all objectives provided by the <i>environment</i> are used for constructing Pareto ranks of the generated molecules with non-dominated sorting and Tanimoto-based crowding distance algorithms. Here, we adopted GPU acceleration to speed up the process of Pareto optimization. The final reward of each molecule is calculated based on the Pareto ranking with the ranking selection algorithm. The agent is trained under the guidance of the reward to make sure it can generate more desired molecules after convergence of the training process. All in all we demonstrate generation of compounds with a diverse predicted selectivity profile toward multiple targets, offering the potential of high efficacy and lower toxicity.</p>



Sign in / Sign up

Export Citation Format

Share Document