scholarly journals Trends in Deep Learning for Property-driven Drug Design

2021 ◽  
Vol 28 ◽  
Author(s):  
Jannis Born ◽  
Matteo Manica

: It is more pressing than ever to reduce the time and costs for developing lead compounds in the pharmaceutical industry. The co-occurrence of advances in high-throughput screening and the rise of deep learning (DL) have enabled the development of large-scale multimodal predictive models for virtual drug screening. Recently, deep generative models have emerged as a powerful tool for exploring the chemical space and raising hopes to expedite the drug discovery process. Following this progress in chemocentric approaches for generative chemistry, the next challenge is to build multimodal conditional generative models that leverage disparate knowledge sources when biochemical mapping properties to target structures. Here, we call the community to bridge drug discovery more closely with systems biology when designing deep generative models. Complementing the plethora of reviews on the role of DL in chemoinformatics, we herein specifically focus on the interface of predictive and generative modeling for drug discovery. Through a systematic publication keyword search on PubMed and a selection of preprint servers (arXiv, biorXiv, chemRxiv, and medRxiv), we quantify trends in the field and find that molecular graphs and VAEs have become the most widely adopted molecular representations and architectures in generative models, respectively. We discuss progress on DL for toxicity, drug-target affinity, and drug sensitivity prediction and specifically focus on conditional molecular generative models that encompass multimodal prediction models. Moreover, we outline prospects in the field and identify challenges such as the integration of deep learning systems into experimental workflows in a closed-loop manner or the adoption of federated machine learning techniques to overcome data sharing barriers. Other challenges include, but are not limited to interpretability in generative models, more sophisticated metrics for the evaluation of molecular generative models, and, following up on that, community-accepted benchmarks for both multimodal drug property prediction and property-driven molecular design.

2021 ◽  
Author(s):  
Quentin Perron ◽  
Olivier Mirguet ◽  
Hamza Tajmouati ◽  
Adam Skiredj ◽  
Anne Rojas ◽  
...  

<div> <div> <div> <p>Multi-Parameter Optimization (MPO) is a major challenge in New Chemical Entity (NCE) drug discovery projects, and the inability to identify molecules meeting all the criteria of lead optimization (LO) is an important cause of NCE project failure. Several ligand- and structure-based de novo design methods have been published over the past decades, some of which have proved useful multiobjective optimization. However, there is still need for improvement to better address the chemical feasibility of generated compounds as well as increasing the explored chemical space while tackling the MPO challenge. Recently, promising results have been reported for deep learning generative models applied to de novo molecular design, but until now, to our knowledge, no report has been made of the value of this new technology for addressing MPO in an actual drug discovery project. Our objective in this study was to evaluate the potential of a ligand-based de novo design technology using deep learning generative models to accelerate the discovery of an optimized lead compound meeting all in vitro late stage LO criteria. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Quentin Perron ◽  
Olivier Mirguet ◽  
Hamza Tajmouati ◽  
Adam Skiredj ◽  
Anne Rojas ◽  
...  

<div> <div> <div> <p>Multi-Parameter Optimization (MPO) is a major challenge in New Chemical Entity (NCE) drug discovery projects, and the inability to identify molecules meeting all the criteria of lead optimization (LO) is an important cause of NCE project failure. Several ligand- and structure-based de novo design methods have been published over the past decades, some of which have proved useful multiobjective optimization. However, there is still need for improvement to better address the chemical feasibility of generated compounds as well as increasing the explored chemical space while tackling the MPO challenge. Recently, promising results have been reported for deep learning generative models applied to de novo molecular design, but until now, to our knowledge, no report has been made of the value of this new technology for addressing MPO in an actual drug discovery project. Our objective in this study was to evaluate the potential of a ligand-based de novo design technology using deep learning generative models to accelerate the discovery of an optimized lead compound meeting all in vitro late stage LO criteria. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Quentin Perron ◽  
Olivier Mirguet ◽  
Hamza Tajmouati ◽  
Adam Skiredj ◽  
Anne Rojas ◽  
...  

<div> <div> <div> <p>Multi-Parameter Optimization (MPO) is a major challenge in New Chemical Entity (NCE) drug discovery projects, and the inability to identify molecules meeting all the criteria of lead optimization (LO) is an important cause of NCE project failure. Several ligand- and structure-based de novo design methods have been published over the past decades, some of which have proved useful multiobjective optimization. However, there is still need for improvement to better address the chemical feasibility of generated compounds as well as increasing the explored chemical space while tackling the MPO challenge. Recently, promising results have been reported for deep learning generative models applied to de novo molecular design, but until now, to our knowledge, no report has been made of the value of this new technology for addressing MPO in an actual drug discovery project. Our objective in this study was to evaluate the potential of a ligand-based de novo design technology using deep learning generative models to accelerate the discovery of an optimized lead compound meeting all in vitro late stage LO criteria. </p> </div> </div> </div>


Author(s):  
Benedict Irwin ◽  
Thomas Whitehead ◽  
Scott Rowland ◽  
Samar Mahmoud ◽  
Gareth Conduit ◽  
...  

More accurate predictions of the biological properties of chemical compounds would guide the selection and design of new compounds in drug discovery and help to address the enormous cost and low success-rate of pharmaceutical R&D. However this domain presents a significant challenge for AI methods due to the sparsity of compound data and the noise inherent in results from biological experiments. In this paper, we demonstrate how data imputation using deep learning provides substantial improvements over quantitative structure-activity relationship (QSAR) machine learning models that are widely applied in drug discovery. We present the largest-to-date successful application of deep-learning imputation to datasets which are comparable in size to the corporate data repository of a pharmaceutical company (678,994 compounds by 1166 endpoints). We demonstrate this improvement for three areas of practical application linked to distinct use cases; i) target activity data compiled from a range of drug discovery projects, ii) a high value and heterogeneous dataset covering complex absorption, distribution, metabolism and elimination properties and, iii) high throughput screening data, testing the algorithm’s limits on early-stage noisy and very sparse data. Achieving median coefficients of determination, R, of 0.69, 0.36 and 0.43 respectively across these applications, the deep learning imputation method offers an unambiguous improvement over random forest QSAR methods, which achieve median R values of 0.28, 0.19 and 0.23 respectively. We also demonstrate that robust estimates of the uncertainties in the predicted values correlate strongly with the accuracies in prediction, enabling greater confidence in decision-making based on the imputed values.


Molecules ◽  
2020 ◽  
Vol 25 (22) ◽  
pp. 5277
Author(s):  
Lauv Patel ◽  
Tripti Shukla ◽  
Xiuzhen Huang ◽  
David W. Ussery ◽  
Shanzhi Wang

The advancements of information technology and related processing techniques have created a fertile base for progress in many scientific fields and industries. In the fields of drug discovery and development, machine learning techniques have been used for the development of novel drug candidates. The methods for designing drug targets and novel drug discovery now routinely combine machine learning and deep learning algorithms to enhance the efficiency, efficacy, and quality of developed outputs. The generation and incorporation of big data, through technologies such as high-throughput screening and high through-put computational analysis of databases used for both lead and target discovery, has increased the reliability of the machine learning and deep learning incorporated techniques. The use of these virtual screening and encompassing online information has also been highlighted in developing lead synthesis pathways. In this review, machine learning and deep learning algorithms utilized in drug discovery and associated techniques will be discussed. The applications that produce promising results and methods will be reviewed.


2021 ◽  
Author(s):  
Jie Zhang ◽  
Rocío Mercado ◽  
Ola Engkvist ◽  
Hongming Chen

<p>In recent years, deep molecular generative models have emerged as novel methods for <i>de novo</i> molecular design. Thanks to the rapid advance of deep learning techniques, deep learning architectures such as recurrent neural networks, generative autoencoders, and adversarial networks, to give a few examples, have been employed for constructing generative models. However, so far the metrics used to evaluate these deep generative models are not discriminative enough to separate the performance of various state-of-the-art generative models. This work presents a novel metric for evaluating deep molecular generative models; this new metric is based on the chemical space coverage of a reference database, and compares not only the molecular structures, but also the ring systems and functional groups, reproduced from a reference dataset of 1M structures. In this study, the performance of 7 different molecular generative models was compared by calculating their structure and substructure coverage of the GDB-13 database while using a 1M subset of GDB-13 for training. Our study shows that the performance of various generative models varies significantly using the benchmarking metrics introduced herein, such that generalization capability of the generative model can be clearly differentiated. Additionally, the coverage of ring systems and functional groups existing in GDB-13 was also compared between the models. Our study provides a useful new metric that can be used for evaluating and comparing generative models.</p>


2020 ◽  
Author(s):  
Jie Zhang ◽  
Rocío Mercado ◽  
Ola Engkvist ◽  
Hongming Chen

<p>In recent years, deep molecular generative models have emerged as novel methods for <i>de novo</i> molecular design. Thanks to the rapid advance of deep learning techniques, deep learning architectures such as recurrent neural networks, generative autoencoders, and adversarial networks, to give a few examples, have been employed for constructing generative models. However, so far the metrics used to evaluate these deep generative models are not discriminative enough to separate the performance of various state-of-the-art generative models. This work presents a novel metric for evaluating deep molecular generative models; this new metric is based on the chemical space coverage of a reference database, and compares not only the molecular structures, but also the ring systems and functional groups, reproduced from a reference dataset of 1M structures. In this study, the performance of 7 different molecular generative models was compared by calculating their structure and substructure coverage of the GDB-13 database while using a 1M subset of GDB-13 for training. Our study shows that the performance of various generative models varies significantly using the benchmarking metrics introduced herein, such that generalization capability of the generative model can be clearly differentiated. Additionally, the coverage of ring systems and functional groups existing in GDB-13 was also compared between the models. Our study provides a useful new metric that can be used for evaluating and comparing generative models.</p>


2020 ◽  
Author(s):  
Jie Zhang ◽  
Rocío Mercado ◽  
Ola Engkvist ◽  
Hongming Chen

<p>In recent years, deep molecular generative models have emerged as novel methods for <i>de novo</i> molecular design. Thanks to the rapid advance of deep learning techniques, deep learning architectures such as recurrent neural networks, generative autoencoders, and adversarial networks, to give a few examples, have been employed for constructing generative models. However, so far the metrics used to evaluate these deep generative models are not discriminative enough to separate the performance of various state-of-the-art generative models. This work presents a novel metric for evaluating deep molecular generative models; this new metric is based on the chemical space coverage of a reference database, and compares not only the molecular structures, but also the ring systems and functional groups, reproduced from a reference dataset of 1M structures. In this study, the performance of 7 different molecular generative models was compared by calculating their structure and substructure coverage of the GDB-13 database while using a 1M subset of GDB-13 for training. Our study shows that the performance of various generative models varies significantly using the benchmarking metrics introduced herein, such that generalization capability of the generative model can be clearly differentiated. Additionally, the coverage of ring systems and functional groups existing in GDB-13 was also compared between the models. Our study provides a useful new metric that can be used for evaluating and comparing generative models.</p>


Author(s):  
Benedict Irwin ◽  
Thomas Whitehead ◽  
Scott Rowland ◽  
Samar Mahmoud ◽  
Gareth Conduit ◽  
...  

More accurate predictions of the biological properties of chemical compounds would guide the selection and design of new compounds in drug discovery and help to address the enormous cost and low success-rate of pharmaceutical R&D. However this domain presents a significant challenge for AI methods due to the sparsity of compound data and the noise inherent in results from biological experiments. In this paper, we demonstrate how data imputation using deep learning provides substantial improvements over quantitative structure-activity relationship (QSAR) machine learning models that are widely applied in drug discovery. We present the largest-to-date successful application of deep-learning imputation to datasets which are comparable in size to the corporate data repository of a pharmaceutical company (678,994 compounds by 1166 endpoints). We demonstrate this improvement for three areas of practical application linked to distinct use cases; i) target activity data compiled from a range of drug discovery projects, ii) a high value and heterogeneous dataset covering complex absorption, distribution, metabolism and elimination properties and, iii) high throughput screening data, testing the algorithm’s limits on early-stage noisy and very sparse data. Achieving median coefficients of determination, R, of 0.69, 0.36 and 0.43 respectively across these applications, the deep learning imputation method offers an unambiguous improvement over random forest QSAR methods, which achieve median R values of 0.28, 0.19 and 0.23 respectively. We also demonstrate that robust estimates of the uncertainties in the predicted values correlate strongly with the accuracies in prediction, enabling greater confidence in decision-making based on the imputed values.


Author(s):  
Jie Zhang ◽  
Rocío Mercado ◽  
Ola Engkvist ◽  
Hongming Chen

<p>In recent years, deep molecular generative models have emerged as novel methods for <i>de novo</i> molecular design. Thanks to the rapid advance of deep learning techniques, deep learning architectures such as recurrent neural networks, generative autoencoders, and adversarial networks, to give a few examples, have been employed for constructing generative models. However, so far the metrics used to evaluate these deep generative models are not discriminative enough to separate the performance of various state-of-the-art generative models. This work presents a novel metric for evaluating deep molecular generative models; this new metric is based on the chemical space coverage of a reference database, and compares not only the molecular structures, but also the ring systems and functional groups, reproduced from a reference dataset of 1M structures. In this study, the performance of 7 different molecular generative models was compared by calculating their structure and substructure coverage of the GDB-13 database while using a 1M subset of GDB-13 for training. Our study shows that the performance of various generative models varies significantly using the benchmarking metrics introduced herein, such that generalization capability of the generative model can be clearly differentiated. Additionally, the coverage of ring systems and functional groups existing in GDB-13 was also compared between the models. Our study provides a useful new metric that can be used for evaluating and comparing generative models.</p>


Sign in / Sign up

Export Citation Format

Share Document