scholarly journals Deep Generative Models for 3D Compound Design

2019 ◽  
Author(s):  
Fergus Imrie ◽  
Anthony R. Bradley ◽  
Mihaela van der Schaar ◽  
Charlotte M. Deane

AbstractRational compound design remains a challenging problem for both computational methods and medicinal chemists. Computational generative methods have begun to show promising results for the design problem. However, they have not yet used the power of 3D structural information. We have developed a novel graph-based deep generative model that combines state-of-the-art machine learning techniques with structural knowledge. Our method (“DeLinker”) takes two fragments or partial structures and designs a molecule incorporating both. The generation process is protein context dependent, utilising the relative distance and orientation between the partial structures. This 3D information is vital to successful compound design, and we demonstrate its impact on the generation process and the limitations of omitting such information. In a large scale evaluation, DeLinker designed 60% more molecules with high 3D similarity to the original molecule than a database baseline. When considering the more relevant problem of longer linkers with at least five atoms, the outperformance increased to 200%. We demonstrate the effectiveness and applicability of this approach on a diverse range of design problems: fragment linking, scaffold hopping, and proteolysis targeting chimera (PROTAC) design. As far as we are aware, this is the first molecular generative model to incorporate 3D structural information directly in the design process. Code is available at https://github.com/oxpig/DeLinker.


2020 ◽  
Author(s):  
Mingyuan Xu ◽  
Ting Ran ◽  
Hongming Chen

<p><i>De novo</i> molecule design through molecular generative model is gaining increasing attention in recent years. Here a novel generative model was proposed by integrating the 3D structural information of the protein binding pocket into the conditional RNN (cRNN) model to control the generation of drug-like molecules. In this model, the composition of protein binding pocket is effectively characterized through a coarse-grain strategy and the three-dimensional information of the pocket can be represented by the sorted eigenvalues of the coulomb matrix (EGCM) of the coarse-grained atoms composing the binding pocket. In current work, we used our EGCM method and a previously reported binding pocket descriptor DeeplyTough to train cRNN models and compared their performance. It has been shown that the molecules generated with the control of protein environment information have a clear tendency on generating compounds with higher similarity to the original X-ray bound ligand than normal RNN model and also achieving better performance in terms of docking scores. Our results demonstrate the potential application of EGCM controlled generative model for the targeted molecule generation and guided exploration on the drug-like chemical space. </p><p> </p>



2020 ◽  
Author(s):  
Mingyuan Xu ◽  
Ting Ran ◽  
Hongming Chen

<p><i>De novo</i> molecule design through molecular generative model is gaining increasing attention in recent years. Here a novel generative model was proposed by integrating the 3D structural information of the protein binding pocket into the conditional RNN (cRNN) model to control the generation of drug-like molecules. In this model, the composition of protein binding pocket is effectively characterized through a coarse-grain strategy and the three-dimensional information of the pocket can be represented by the sorted eigenvalues of the coulomb matrix (EGCM) of the coarse-grained atoms composing the binding pocket. In current work, we used our EGCM method and a previously reported binding pocket descriptor DeeplyTough to train cRNN models and compared their performance. It has been shown that the molecules generated with the control of protein environment information have a clear tendency on generating compounds with higher similarity to the original X-ray bound ligand than normal RNN model and also achieving better performance in terms of docking scores. Our results demonstrate the potential application of EGCM controlled generative model for the targeted molecule generation and guided exploration on the drug-like chemical space. </p><p> </p>



2021 ◽  
Author(s):  
Fergus Imrie ◽  
Thomas E. Hadfield ◽  
Anthony R. Bradley ◽  
Charlotte M. Deane

AbstractGenerative models have increasingly been proposed as a solution to the molecular design problem. However, it has proved challenging to control the design process or incorporate prior knowledge, limiting their practical use in drug discovery. In particular, generative methods have made limited use of three-dimensional (3D) structural information even though this is critical to binding. This work describes a method to incorporate such information and demonstrates the benefit of doing so. We combine an existing graph-based deep generative model, DeLinker, with a convolutional neural network to utilise physically-meaningful 3D representations of molecules and target pharmacophores. We apply our model, DEVELOP, to both linker and R-group design, demonstrating its suitability for both hit-to-lead and lead optimisation. The 3D pharmacophoric information results in improved generation and allows greater control of the design process. In multiple large-scale evaluations, we show that including 3D pharmacophoric constraints results in substantial improvements in the quality of generated molecules. On a challenging test set derived from PDBbind, our model improves the proportion of generated molecules with high 3D similarity to the original molecule by over 300%. In addition, DEVELOP recovers 10 × more of the original molecules compared to the base-line DeLinker method. Our approach is general-purpose, readily modifiable to alternate 3D representations, and can be incorporated into other generative frameworks. Code is available at https://github.com/oxpig/DEVELOP.



2021 ◽  
Vol 28 ◽  
Author(s):  
Jannis Born ◽  
Matteo Manica

: It is more pressing than ever to reduce the time and costs for developing lead compounds in the pharmaceutical industry. The co-occurrence of advances in high-throughput screening and the rise of deep learning (DL) have enabled the development of large-scale multimodal predictive models for virtual drug screening. Recently, deep generative models have emerged as a powerful tool for exploring the chemical space and raising hopes to expedite the drug discovery process. Following this progress in chemocentric approaches for generative chemistry, the next challenge is to build multimodal conditional generative models that leverage disparate knowledge sources when biochemical mapping properties to target structures. Here, we call the community to bridge drug discovery more closely with systems biology when designing deep generative models. Complementing the plethora of reviews on the role of DL in chemoinformatics, we herein specifically focus on the interface of predictive and generative modeling for drug discovery. Through a systematic publication keyword search on PubMed and a selection of preprint servers (arXiv, biorXiv, chemRxiv, and medRxiv), we quantify trends in the field and find that molecular graphs and VAEs have become the most widely adopted molecular representations and architectures in generative models, respectively. We discuss progress on DL for toxicity, drug-target affinity, and drug sensitivity prediction and specifically focus on conditional molecular generative models that encompass multimodal prediction models. Moreover, we outline prospects in the field and identify challenges such as the integration of deep learning systems into experimental workflows in a closed-loop manner or the adoption of federated machine learning techniques to overcome data sharing barriers. Other challenges include, but are not limited to interpretability in generative models, more sophisticated metrics for the evaluation of molecular generative models, and, following up on that, community-accepted benchmarks for both multimodal drug property prediction and property-driven molecular design.



Data Science ◽  
2021 ◽  
pp. 1-21
Author(s):  
Kushal Veer Singh ◽  
Ajay Kumar Verma ◽  
Lovekesh Vig

Capturing data in the form of networks is becoming an increasingly popular approach for modeling, analyzing and visualising complex phenomena, to understand the important properties of the underlying complex processes. Access to many large-scale network datasets is restricted due to the privacy and security concerns. Also for several applications (such as functional connectivity networks), generating large scale real data is expensive. For these reasons, there is a growing need for advanced mathematical and statistical models (also called generative models) that can account for the structure of these large-scale networks, without having to materialize them in the real world. The objective is to provide a comprehensible description of the network properties and to be able to infer previously unobserved properties. Various models have been developed by researchers, which generate synthetic networks that adhere to the structural properties of real networks. However, the selection of the appropriate generative model for a given real-world network remains an important challenge. In this paper, we investigate this problem and provide a novel technique (named as TripletFit) for model selection (or network classification) and estimation of structural similarities of the complex networks. The goal of network model selection is to select a generative model that is able to generate a structurally similar synthetic network for a given real-world (target) network. We consider six outstanding generative models as the candidate models. The existing model selection methods mostly suffer from sensitivity to network perturbations, dependency on the size of the networks, and low accuracy. To overcome these limitations, we considered a broad array of network features, with the aim of representing different structural aspects of the network and employed deep learning techniques such as deep triplet network architecture and simple feed-forward network for model selection and estimation of structural similarities of the complex networks. Our proposed method, outperforms existing methods with respect to accuracy, noise-tolerance, and size independence on a number of gold standard data set used in previous studies.



Author(s):  
Andrew Reid ◽  
Julie Ballantyne

In an ideal world, assessment should be synonymous with effective learning and reflect the intricacies of the subject area. It should also be aligned with the ideals of education: to provide equitable opportunities for all students to achieve and to allow both appropriate differentiation for varied contexts and students and comparability across various contexts and students. This challenge is made more difficult in circumstances in which the contexts are highly heterogeneous, for example in the state of Queensland, Australia. Assessment in music challenges schooling systems in unique ways because teaching and learning in music are often naturally differentiated and diverse, yet assessment often calls for standardization. While each student and teacher has individual, evolving musical pathways in life, the syllabus and the system require consistency and uniformity. The challenge, then, is to provide diverse, equitable, and quality opportunities for all children to learn and achieve to the best of their abilities. This chapter discusses the designing and implementation of large-scale curriculum as experienced in secondary schools in Queensland, Australia. The experiences detailed explore the possibilities offered through externally moderated school-based assessment. Also discussed is the centrality of system-level clarity of purpose, principles and processes, and the provision of supportive networks and mechanisms to foster autonomy for a diverse range of music educators and contexts. Implications for education systems that desire diversity, equity, and quality are discussed, and the conclusion provokes further conceptualization and action on behalf of students, teachers, and the subject area of music.



Cancers ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 2111
Author(s):  
Bo-Wei Zhao ◽  
Zhu-Hong You ◽  
Lun Hu ◽  
Zhen-Hao Guo ◽  
Lei Wang ◽  
...  

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.



Author(s):  
Masoumeh Zareapoor ◽  
Jie Yang

Image-to-Image translation aims to learn an image from a source domain to a target domain. However, there are three main challenges, such as lack of paired datasets, multimodality, and diversity, that are associated with these problems and need to be dealt with. Convolutional neural networks (CNNs), despite of having great performance in many computer vision tasks, they fail to detect the hierarchy of spatial relationships between different parts of an object and thus do not form the ideal representative model we look for. This article presents a new variation of generative models that aims to remedy this problem. We use a trainable transformer, which explicitly allows the spatial manipulation of data within training. This differentiable module can be augmented into the convolutional layers in the generative model, and it allows to freely alter the generated distributions for image-to-image translation. To reap the benefits of proposed module into generative model, our architecture incorporates a new loss function to facilitate an effective end-to-end generative learning for image-to-image translation. The proposed model is evaluated through comprehensive experiments on image synthesizing and image-to-image translation, along with comparisons with several state-of-the-art algorithms.



Science ◽  
2021 ◽  
Vol 372 (6541) ◽  
pp. 512-516
Author(s):  
Yan Zhou ◽  
Xuexia Xu ◽  
Yifeng Wei ◽  
Yu Cheng ◽  
Yu Guo ◽  
...  

DNA modifications vary in form and function but generally do not alter Watson-Crick base pairing. Diaminopurine (Z) is an exception because it completely replaces adenine and forms three hydrogen bonds with thymine in cyanophage S-2L genomic DNA. However, the biosynthesis, prevalence, and importance of Z genomes remain unexplored. Here, we report a multienzyme system that supports Z-genome synthesis. We identified dozens of globally widespread phages harboring such enzymes, and we further verified the Z genome in one of these phages, Acinetobacter phage SH-Ab 15497, by using liquid chromatography with ultraviolet and mass spectrometry. The Z genome endows phages with evolutionary advantages for evading the attack of host restriction enzymes, and the characterization of its biosynthetic pathway enables Z-DNA production on a large scale for a diverse range of applications.



2013 ◽  
Vol 46 (01) ◽  
pp. 23-27 ◽  
Author(s):  
Clare Heyward

Geoengineering, the “deliberate, large-scale manipulation of the planetary environment in order to counteract anthropogenic climate change” (Shepherd et al. 2009, 1), is attracting increasing interest. As well as the Royal Society, various scientific and government organizations have produced reports on the potential and challenge of geoengineering as a potential strategy, alongside mitigation and adaptation, to avoid the vast human and environmental costs that climate change is thought to bring (Blackstock et al. 2009; GAO 2010; Long et al. 2011; Rickels et al. 2011). “Geoengineering” covers a diverse range of proposals conventionally divided into carbon dioxide removal (CDR) proposals and solar radiation management (SRM) proposals. This article argues that “geoengineering” should not be regarded as a third category of response to climate change, but should be disaggregated. Technically, CDR and SRM are quite different and discussing them together under the rubric of geoengineering can give the impression that all the technologies in the two categories of response always raise similar challenges and political issues when this is not necessarily the case. However, CDR and SRM should not be completely subsumed into the preexisting categories of mitigation and adaptation. Instead, they can be regarded as two parts of a five-part continuum of responses to climate change. To make this case, the first section of this article discusses whether geoengineering is distinctive, and the second situates CDR and SRM in relation to other responses to climate change.



Sign in / Sign up

Export Citation Format

Share Document