Integrating Synthetic Accessibility with AI-based Generative Drug Design

Generative models are frequently used for de novo design in drug discovery projects to propose new molecules. However, the question of whether or not the generated molecules can be synthesized is not systematically taken into account during generation, even though being able to synthesize the generated molecules is a fundamental requirement for such methods to be useful in practice. Methods have been developed to estimate molecule synthesizability, but, so far, there is no consensus on whether or not a molecule is synthesizable. In this paper we introduce the Retro-Score (RScore), which computes a synthetic feasibility score of molecules by performing a full retrosynthetic analysis through our data-driven synthetic planning software Spaya, and its dedicated API: Spaya-API (https://spaya.ai). After a comparison of RScore with other synthetic scores from the literature, we describe a pipeline to generate molecules that validate a list of targets while still being easy to synthesize. We further this idea by performing experiments comparing molecular generator outputs across a range of constraints and conditions. We show that the RScore can be learned by a Neural Network, which leads to a new score: RSPred. We demonstrate that using the RScore or RSPred as a constraint during molecular generation enables to obtain more synthesizable solutions, with higher diversity. The open-source Python code containing all the scores and the experiments can be found on https://github.com/iktos/generation-under-synthetic- constraint.

Download Full-text

Fine-tuning of a generative neural network for designing multi-target compounds

Journal of Computer-Aided Molecular Design ◽

10.1007/s10822-021-00392-8 ◽

2021 ◽

Author(s):

Thomas Blaschke ◽

Jürgen Bajorath

Keyword(s):

Neural Network ◽

Small Molecules ◽

De Novo ◽

Pharmaceutical Research ◽

Generative Models ◽

Fine Tuning ◽

Data Sets ◽

Single Target ◽

Fine Tune ◽

Target Activity

AbstractExploring the origin of multi-target activity of small molecules and designing new multi-target compounds are highly topical issues in pharmaceutical research. We have investigated the ability of a generative neural network to create multi-target compounds. Data sets of experimentally confirmed multi-target, single-target, and consistently inactive compounds were extracted from public screening data considering positive and negative assay results. These data sets were used to fine-tune the REINVENT generative model via transfer learning to systematically recognize multi-target compounds, distinguish them from single-target or inactive compounds, and construct new multi-target compounds. During fine-tuning, the model showed a clear tendency to increasingly generate multi-target compounds and structural analogs. Our findings indicate that generative models can be adopted for de novo multi-target compound design.

Download Full-text

Retrosynthetic Accessibility Score (RAscore) - Rapid Machine Learned Synthesizability Classification from AI Driven Retrosynthetic Planning

10.26434/chemrxiv.13019993.v1 ◽

2020 ◽

Author(s):

Amol Thakkar ◽

Veronika Chadimova ◽

Esben Jannik Bjerrum ◽

Ola Engkvist ◽

Jean-Louis Reymond

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Virtual Screening ◽

Generative Models ◽

Synthetic Route ◽

Synthetic Accessibility ◽

Wide Range ◽

Computer Aided ◽

Synthesis Planning ◽

Retrosynthetic Analysis

Computer aided synthesis planning (CASP) is part of a suite of artificial intelligence (AI) based tools that are able to propose synthesis to a wide range of compounds. However, at present they are too slow to be used to screen the synthetic feasibility of millions of generated or enumerated compounds before identification of potential bioactivity by virtual screening (VS) workflows. Herein we report a machine learning (ML) based method capable of classifying whether a synthetic route can be identified for a particular compound or not by the CASP tool AiZynthFinder. The resulting ML models return a retrosynthetic accessibility score (RAscore) of any molecule of interest, and computes 4,500 times faster than retrosynthetic analysis performed by the underlying CASP tool. The RAscore should be useful for the pre-screening millions of virtual molecules from enumerated databases or generative models for synthetic accessibility and produce higher quality databases for virtual screening of biological activity.

Download Full-text

Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph

10.26434/chemrxiv.14608296.v1 ◽

2021 ◽

Author(s):

Baiqing Li ◽

Hongming Chen

Keyword(s):

Deep Learning ◽

Prediction Models ◽

De Novo ◽

Quantitative Estimation ◽

Reaction Network ◽

Generative Models ◽

Machine Learning Algorithms ◽

Scoring Functions ◽

Synthetic Accessibility ◽

Learning Machine

<a>With the increasing application of deep learning based generative models for de novo molecule design, quantitative estimation of molecular synthetic accessibility becomes a crucial factor for prioritizing the structures generated from generative models. On the other hand, it is also useful for helping prioritization of hit/lead compounds and guiding retro-synthesis analysis. In current study, based on the USPTO and Pistachio reaction datasets, we created a chemical reaction network, in which a depth-first search was performed for identification of the reaction paths of product compounds. This reaction dataset was then used to build predictive model for distinguishing the organic compounds either as easy synthesize (ES) or hard-to synthesize (HS) classes. Three synthesis accessibility (SA) models were built using deep learning/machine learning algorithms. The comparison between our three SA scoring functions with other existing synthesis accessibility scoring schemes, such as SYBA, SCScore, SAScore were also carried out. and the graph based deep learning model outperforms those existing SA scores. Our results show that prediction models based on historical reaction knowledge could be a useful tool for measuring molecule complexity and estimating molecule SA.</a>

Download Full-text

De Novo Protein Design for Novel Folds using Guided Conditional Wasserstein Generative Adversarial Networks (gcWGAN)

10.1101/769919 ◽

2019 ◽

Cited By ~ 4

Author(s):

Mostafa Karimi ◽

Shaowen Zhu ◽

Yue Cao ◽

Yang Shen

Keyword(s):

Protein Design ◽

Sequence Space ◽

De Novo ◽

Sequence Data ◽

Generative Models ◽

Current Data ◽

Data Driven ◽

Supplementary Information ◽

Generative Adversarial Networks ◽

Sequence Structure

AbstractMotivationFacing data quickly accumulating on protein sequence and structure, this study is addressing the following question: to what extent could current data alone reveal deep insights into the sequence-structure relationship, such that new sequences can be designed accordingly for novel structure folds?ResultsWe have developed novel deep generative models, constructed low-dimensional and generalizable representation of fold space, exploited sequence data with and without paired structures, and developed ultra-fast fold predictor as an oracle providing feedback. The resulting semi-supervised gcWGAN is assessed with the oracle over 100 novel folds not in the training set and found to generate more yields and cover 3.6 times more target folds compared to a competing data-driven method (cVAE). Assessed with structure predictor over representative novel folds (including one not even part of basis folds), gcWGAN designs are found to have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE. gcWGAN explores uncharted sequence space to design proteins by learning from current sequence-structure data. The ultra fast data-driven model can be a powerful addition to principle-driven design methods through generating seed designs or tailoring sequence space.AvailabilityData and source codes will be available upon [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

‘Ring Breaker’: Neural Network Driven Synthesis Prediction of the Ring System Chemical Space

10.26434/chemrxiv.9938969.v3 ◽

2020 ◽

Author(s):

Amol Thakkar ◽

Nidhal Selmi ◽

Jean-Louis Reymond ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Network ◽

Chemical Space ◽

Ring System ◽

Data Driven ◽

Ring Systems ◽

The Neural Network ◽

Synthetic Accessibility ◽

Synthesis Planning ◽

Data Driven Approach ◽

Synthetic Routes

Ring systems in pharmaceuticals, agrochemicals and dyes are ubiquitous chemical motifs. Whilst the synthesis of common ring systems is well described, and novel ring systems can be readily computationally enumerated, the synthetic accessibility of unprecedented ring systems remains a challenge. ‘Ring Breaker’ uses a data-driven approach to enable the prediction of ring-forming reactions, for which we have demonstrated its utility on frequently found and unprecedented ring systems, in agreement with literature syntheses. We demonstrate the performance of the neural network on a range of ring fragments from the ZINC and DrugBank databases and highlight its potential for incorporation into computer aided synthesis planning tools. These approaches to ring formation and retrosynthetic disconnection offer opportunities for chemists to explore and select more efficient syntheses/synthetic routes.

Download Full-text