A multi-objective deep reinforcement learning framework

In polypharmacology, ideal drugs are required to bind to multiple specific targets to enhance efficacy or to reduce resistance formation. Although deep learning has achieved breakthrough in drug discovery, most of its applications only focus on a single drug target to generate drug-like active molecules in spite of the reality that drug molecules often interact with more than one target which can have desired (polypharmacology) or undesired (toxicity) effects. In a previous study we proposed a new method named DrugEx that integrates an exploration strategy into RNN-based reinforcement learning to improve the diversity of the generated molecules. Here, we extended our DrugEx algorithm with multi-objective optimization to generate drug molecules towards more than one specific target (two adenosine receptors, A1AR and A2AAR, and the potassium ion channel hERG in this study). In our model, we applied an RNN as the agent and machine learning predictors as the environment, both of which were pre-trained in advance and then interplayed under the reinforcement learning framework. The concept of evolutionary algorithms was merged into our method such that crossover and mutation operations were implemented by the same deep learning model as the agent. During the training loop, the agent generates a batch of SMILES-based molecules. Subsequently scores for all objectives provided by the environment are used for constructing Pareto ranks of the generated molecules with non-dominated sorting and Tanimoto-based crowding distance algorithms. Here, we adopted GPU acceleration to speed up the process of Pareto optimization. The final reward of each molecule is calculated based on the Pareto ranking with the ranking selection algorithm. The agent is trained under the guidance of the reward to make sure it can generate more desired molecules after convergence of the training process. All in all we demonstrate generation of compounds with a diverse predicted selectivity profile toward multiple targets, offering the potential of high efficacy and lower toxicity.

Download Full-text

DrugEx v2: De Novo Design of Drug Molecule by Pareto-based Multi-Objective Reinforcement Learning in Polypharmacology

10.26434/chemrxiv.14474127.v1 ◽

2021 ◽

Author(s):

Xuhan Liu ◽

Kai Ye ◽

Herman Van Vlijmen ◽

Michael T. M. Emmerich ◽

Adriaan P. IJzerman ◽

...

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Drug Target ◽

De Novo ◽

Multi Objective ◽

Learning Framework ◽

Drug Molecules ◽

Pareto Ranking ◽

Speed Up ◽

Deep Learning Model

In polypharmacology, ideal drugs are required to bind to multiple specific targets to enhance efficacy or to reduce resistance formation. Although deep learning has achieved breakthrough in drug discovery, most of its applications only focus on a single drug target to generate drug-like active molecules in spite of the reality that drug molecules often interact with more than one target which can have desired (polypharmacology) or undesired (toxicity) effects. In a previous study we proposed a new method named DrugEx that integrates an exploration strategy into RNN-based reinforcement learning to improve the diversity of the generated molecules. Here, we extended our DrugEx algorithm with multi-objective optimization to generate drug molecules towards more than one specific target (two adenosine receptors, A1AR and A2AAR, and the potassium ion channel hERG in this study). In our model, we applied an RNN as the agent and machine learning predictors as the environment, both of which were pre-trained in advance and then interplayed under the reinforcement learning framework. The concept of evolutionary algorithms was merged into our method such that crossover and mutation operations were implemented by the same deep learning model as the agent. During the training loop, the agent generates a batch of SMILES-based molecules. Subsequently scores for all objectives provided by the environment are used for constructing Pareto ranks of the generated molecules with non-dominated sorting and Tanimoto-based crowding distance algorithms. Here, we adopted GPU acceleration to speed up the process of Pareto optimization. The final reward of each molecule is calculated based on the Pareto ranking with the ranking selection algorithm. The agent is trained under the guidance of the reward to make sure it can generate more desired molecules after convergence of the training process. All in all we demonstrate generation of compounds with a diverse predicted selectivity profile toward multiple targets, offering the potential of high efficacy and lower toxicity.

Download Full-text

DrugEx v2: De Novo Design of Drug Molecule by Pareto-based Multi-Objective Reinforcement Learning in Polypharmacology

10.26434/chemrxiv.14474127.v2 ◽

2021 ◽

Author(s):

Xuhan Liu ◽

Kai Ye ◽

Herman Van Vlijmen ◽

Michael T. M. Emmerich ◽

Adriaan P. IJzerman ◽

...

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Drug Target ◽

De Novo ◽

Multi Objective ◽

Learning Framework ◽

Drug Molecules ◽

Pareto Ranking ◽

Speed Up ◽

Deep Learning Model

In polypharmacology, ideal drugs are required to bind to multiple specific targets to enhance efficacy or to reduce resistance formation. Although deep learning has achieved breakthrough in drug discovery, most of its applications only focus on a single drug target to generate drug-like active molecules in spite of the reality that drug molecules often interact with more than one target which can have desired (polypharmacology) or undesired (toxicity) effects. In a previous study we proposed a new method named DrugEx that integrates an exploration strategy into RNN-based reinforcement learning to improve the diversity of the generated molecules. Here, we extended our DrugEx algorithm with multi-objective optimization to generate drug molecules towards more than one specific target (two adenosine receptors, A1AR and A2AAR, and the potassium ion channel hERG in this study). In our model, we applied an RNN as the agent and machine learning predictors as the environment, both of which were pre-trained in advance and then interplayed under the reinforcement learning framework. The concept of evolutionary algorithms was merged into our method such that crossover and mutation operations were implemented by the same deep learning model as the agent. During the training loop, the agent generates a batch of SMILES-based molecules. Subsequently scores for all objectives provided by the environment are used for constructing Pareto ranks of the generated molecules with non-dominated sorting and Tanimoto-based crowding distance algorithms. Here, we adopted GPU acceleration to speed up the process of Pareto optimization. The final reward of each molecule is calculated based on the Pareto ranking with the ranking selection algorithm. The agent is trained under the guidance of the reward to make sure it can generate more desired molecules after convergence of the training process. All in all we demonstrate generation of compounds with a diverse predicted selectivity profile toward multiple targets, offering the potential of high efficacy and lower toxicity.

Download Full-text

A multi-objective reinforcement learning framework for community deception

Scientia Sinica Informationis ◽

10.1360/ssi-2020-0229 ◽

2021 ◽

Vol 51 (7) ◽

pp. 1131

Author(s):

海成陶 ◽

湛卜 ◽

杰曹

Keyword(s):

Reinforcement Learning ◽

Multi Objective ◽

Learning Framework

Download Full-text

Continuous reinforcement learning to adapt multi-objective optimization online for robot motion

International Journal of Advanced Robotic Systems ◽

10.1177/1729881420911491 ◽

2020 ◽

Vol 17 (2) ◽

pp. 172988142091149

Author(s):

Kai Zhang ◽

Sterling McLeod ◽

Minwoo Lee ◽

Jing Xiao

Keyword(s):

Reinforcement Learning ◽

Environmental Changes ◽

Short Term Memory ◽

Dynamic Environments ◽

Multi Objective Optimization ◽

Continuous Reinforcement ◽

Multi Objective ◽

Learning Framework ◽

Time Motion ◽

Learning Agent

This article introduces a continuous reinforcement learning framework to enable online adaptation of multi-objective optimization functions for guiding a mobile robot to move in changing dynamic environments. The robot with this framework can continuously learn from multiple or changing environments where it encounters different numbers of obstacles moving in unknown ways at different times. Using both planned trajectories from a real-time motion planner and already executed trajectories as feedback observations, our reinforcement learning agent enables the robot to adapt motion behaviors to environmental changes. The agent contains a Q network connected to a long short-term memory network. The proposed framework is tested in both simulations and real robot experiments over various, dynamically varied task environments. The results show the efficacy of online continuous reinforcement learning for quick adaption to different, unknown, and dynamic environments.

Download Full-text

Deep reinforcement learning for multi-objective placement of virtual machines in cloud datacenters

Soft Computing ◽

10.1007/s00500-020-05462-x ◽

2020 ◽

Author(s):

Luca Caviglione ◽

Mauro Gaggero ◽

Massimo Paolucci ◽

Roberto Ronco

Keyword(s):

Reinforcement Learning ◽

Bin Packing ◽

Virtual Machines ◽

Service Level ◽

Optimal Placement ◽

Multi Objective ◽

Security Issues ◽

Learning Framework ◽

Management Policies ◽

The Impact

AbstractThe ubiquitous diffusion of cloud computing requires suitable management policies to face the workload while guaranteeing quality constraints and mitigating costs. The typical trade-off is between the used power and the adherence to a service-level metric subscribed by customers. To this aim, a possible idea is to use an optimization-based placement mechanism to select the servers where to deploy virtual machines. Unfortunately, high packing factors could lead to performance and security issues, e.g., virtual machines can compete for hardware resources or collude to leak data. Therefore, we introduce a multi-objective approach to compute optimal placement strategies considering different goals, such as the impact of hardware outages, the power required by the datacenter, and the performance perceived by users. Placement strategies are found by using a deep reinforcement learning framework to select the best placement heuristic for each virtual machine composing the workload. Results indicate that our method outperforms bin packing heuristics widely used in the literature when considering either synthetic or real workloads.

Download Full-text

DrugEx v2: de novo design of drug molecules by Pareto-based multi-objective reinforcement learning in polypharmacology

Journal of Cheminformatics ◽

10.1186/s13321-021-00561-9 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Xuhan Liu ◽

Kai Ye ◽

Herman W. T. van Vlijmen ◽

Michael T. M. Emmerich ◽

Adriaan P. IJzerman ◽

...

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

De Novo ◽

De Novo Design ◽

Sorting Algorithm ◽

Multiple Targets ◽

Multi Objective ◽

Learning Framework ◽

Drug Molecules ◽

Pareto Ranking

AbstractIn polypharmacology drugs are required to bind to multiple specific targets, for example to enhance efficacy or to reduce resistance formation. Although deep learning has achieved a breakthrough in de novo design in drug discovery, most of its applications only focus on a single drug target to generate drug-like active molecules. However, in reality drug molecules often interact with more than one target which can have desired (polypharmacology) or undesired (toxicity) effects. In a previous study we proposed a new method named DrugEx that integrates an exploration strategy into RNN-based reinforcement learning to improve the diversity of the generated molecules. Here, we extended our DrugEx algorithm with multi-objective optimization to generate drug-like molecules towards multiple targets or one specific target while avoiding off-targets (the two adenosine receptors, A1AR and A2AAR, and the potassium ion channel hERG in this study). In our model, we applied an RNN as the agent and machine learning predictors as the environment. Both the agent and the environment were pre-trained in advance and then interplayed under a reinforcement learning framework. The concept of evolutionary algorithms was merged into our method such that crossover and mutation operations were implemented by the same deep learning model as the agent. During the training loop, the agent generates a batch of SMILES-based molecules. Subsequently scores for all objectives provided by the environment are used to construct Pareto ranks of the generated molecules. For this ranking a non-dominated sorting algorithm and a Tanimoto-based crowding distance algorithm using chemical fingerprints are applied. Here, we adopted GPU acceleration to speed up the process of Pareto optimization. The final reward of each molecule is calculated based on the Pareto ranking with the ranking selection algorithm. The agent is trained under the guidance of the reward to make sure it can generate desired molecules after convergence of the training process. All in all we demonstrate generation of compounds with a diverse predicted selectivity profile towards multiple targets, offering the potential of high efficacy and low toxicity.

Download Full-text