matched molecular pairs
Recently Published Documents


TOTAL DOCUMENTS

46
(FIVE YEARS 17)

H-INDEX

14
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Jiazhen He ◽  
Eva Nittinger ◽  
Christian Tyrchan ◽  
Werngard Czechtizky ◽  
Atanas Patronov ◽  
...  

Abstract Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist's intuition in terms of matched molecular pairs (MMPs). Although MMPs is a typical and widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of solutions. There are more options to modify a starting molecule to achieve desirable properties, e.g. one can simultaneously modify the molecule at different places including changing the scaffold. This study trains the same Transformer architecture on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general transformations are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while keeping the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule.


2021 ◽  
Author(s):  
Jiazhen He ◽  
Eva Nittinger ◽  
Christian Tyrchan ◽  
Werngard Czechtizky ◽  
Atanas Patronov ◽  
...  

Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist's intuition in terms of matched molecular pairs (MMPs). Although MMPs is a typical and widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of solutions. There are more options to modify a starting molecule to achieve desirable properties, e.g. one can simultaneously modify the molecule at different places including changing the scaffold. This study trains the same Transformer architecture on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general transformations are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while keeping the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule.


2021 ◽  
Author(s):  
Jiazhen He ◽  
Felix Mattsson ◽  
Marcus Forsberg ◽  
Esben Jannik Bjerrum ◽  
Ola Engkvist ◽  
...  

Finding molecules with a desirable balance of multiple properties is a main challenge in drug discovery. Here, we focus on the task of molecular optimization, where a starting molecule with promising properties needs to be further optimized towards the desirable properties. Typically, chemists would apply chemical transformations to the starting molecule based on their intuition. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. In particular, a chemist would be interested in keeping one part of the starting molecule (core) constant, while substituting the other part (R-group), to optimize the starting molecule towards desirable properties. Motivated by this, we train a Transformer model, Transformer-R, to generate R-groups given the starting molecule (with its core and R-group specified) and the specified desirable properties. The generated R-groups will be attached to the core to form the final molecules, which are guaranteed to keep the core of interest and are expected to satisfy the desirable properties in the input. Our model could accelerate the process of optimizing antiviral drug candidates in terms of various properties of interest, e.g. pharmacokinetics.


2021 ◽  
Author(s):  
Jiazhen He ◽  
Felix Mattsson ◽  
Marcus Forsberg ◽  
Esben Jannik Bjerrum ◽  
Ola Engkvist ◽  
...  

Finding molecules with a desirable balance of multiple properties is a main challenge in drug discovery. Here, we focus on the task of molecular optimization, where a starting molecule with promising properties needs to be further optimized towards the desirable properties. Typically, chemists would apply chemical transformations to the starting molecule based on their intuition. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. In particular, a chemist would be interested in keeping one part of the starting molecule (core) constant, while substituting the other part (R-group), to optimize the starting molecule towards desirable properties. Motivated by this, we train a Transformer model, Transformer-R, to generate R-groups given the starting molecule (with its core and R-group specified) and the specified desirable properties. The generated R-groups will be attached to the core to form the final molecules, which are guaranteed to keep the core of interest and are expected to satisfy the desirable properties in the input. Our model could accelerate the process of optimizing antiviral drug candidates in terms of various properties of interest, e.g. pharmacokinetics.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Jiazhen He ◽  
Huifang You ◽  
Emil Sandström ◽  
Eva Nittinger ◽  
Esben Jannik Bjerrum ◽  
...  

AbstractA main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist’s intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: logD, solubility, and clearance, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.


2020 ◽  
Author(s):  
Jiazhen He ◽  
Huifang You ◽  
Emil Sandström ◽  
Eva Nittinger ◽  
Esben Bjerrum ◽  
...  

Abstract A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist's intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: logD, solubility, and clearance, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.


2020 ◽  
Author(s):  
Jiazhen He ◽  
huifang you ◽  
Emil Sandström ◽  
eva nittinger ◽  
Esben Jannik Bjerrum ◽  
...  

A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist's intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: <i>logD</i>, <i>solubility</i>, and <i>clearance</i>, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.


2020 ◽  
Author(s):  
Jiazhen He ◽  
huifang you ◽  
Emil Sandström ◽  
eva nittinger ◽  
Esben Jannik Bjerrum ◽  
...  

A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist's intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: <i>logD</i>, <i>solubility</i>, and <i>clearance</i>, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.


2020 ◽  
Vol 60 (10) ◽  
pp. 4757-4771
Author(s):  
James A. Lumley ◽  
Prashant Desai ◽  
Jibo Wang ◽  
Suntara Cahya ◽  
Hongzhou Zhang

Sign in / Sign up

Export Citation Format

Share Document