Inverse molecular design using machine learning: Generative models for matter engineering

Science ◽  
2018 ◽  
Vol 361 (6400) ◽  
pp. 360-365 ◽  
Author(s):  
Benjamin Sanchez-Lengeling ◽  
Alán Aspuru-Guzik

The discovery of new materials can bring enormous societal and technological progress. In this context, exploring completely the large space of potential materials is computationally intractable. Here, we review methods for achieving inverse design, which aims to discover tailored materials from the starting point of a particular desired functionality. Recent advances from the rapidly growing field of artificial intelligence, mostly from the subfield of machine learning, have resulted in a fertile exchange of ideas, where approaches to inverse molecular design are being proposed and employed at a rapid pace. Among these, deep generative models have been applied to numerous classes of materials: rational design of prospective drugs, synthetic routes to organic compounds, and optimization of photovoltaics and redox flow batteries, as well as a variety of other solid-state materials.

2021 ◽  
Author(s):  
Kostas Papadopoulos ◽  
Kathryn A. Giblin ◽  
Jon Paul Janet ◽  
Atanas Patronov ◽  
Ola Engkvist

<p>We have demonstrated the utility of a 3D shape and pharmacophore similarity scoring component in molecular design with a deep generative model trained with reinforcement learning. Using Dopamine receptor type 2 (DRD2) as an example and its antagonist haloperidol <b>1</b> as a starting point in a ligand based design context, we have shown in a retrospective study that a 3D similarity enabled generative model can discover new leads in the absence of any other information. It can be efficiently used for scaffold hopping and generation of novel series. 3D similarity based models were compared against 2D QSAR based, indicating a significant degree of orthogonality of the generated outputs and with the former having a more diverse output. In addition, when the two scoring components are combined together for training of the generative model, it results in more efficient exploration of desirable chemical space compared to the individual components. </p>


2021 ◽  
Author(s):  
Kostas Papadopoulos ◽  
Kathryn A. Giblin ◽  
Jon Paul Janet ◽  
Atanas Patronov ◽  
Ola Engkvist

<p>We have demonstrated the utility of a 3D shape and pharmacophore similarity scoring component in molecular design with a deep generative model trained with reinforcement learning. Using Dopamine receptor type 2 (DRD2) as an example and its antagonist haloperidol <b>1</b> as a starting point in a ligand based design context, we have shown in a retrospective study that a 3D similarity enabled generative model can discover new leads in the absence of any other information. It can be efficiently used for scaffold hopping and generation of novel series. 3D similarity based models were compared against 2D QSAR based, indicating a significant degree of orthogonality of the generated outputs and with the former having a more diverse output. In addition, when the two scoring components are combined together for training of the generative model, it results in more efficient exploration of desirable chemical space compared to the individual components. </p>


Author(s):  
Yunan Luo ◽  
Lam Vo ◽  
Hantian Ding ◽  
Yufeng Su ◽  
Yang Liu ◽  
...  

AbstractProtein engineering seeks to design proteins with improved or novel functions. Compared to rational design and directed evolution approaches, machine learning-guided approaches traverse the fitness landscape more effectively and hold the promise for accelerating engineering and reducing the experimental cost and effort. A critical challenge here is whether we are capable of predicting the function or fitness of unseen protein variants. By learning from the sequence and large-scale screening data of characterized variants, machine learning models predict functional fitness of sequences and prioritize new variants that are very likely to demonstrate enhanced functional properties, thereby guiding and accelerating rational design and directed evolution. While existing generative models and language models have been developed to predict the effects of mutation and assist protein engineering, the accuracy of these models is limited due to their unsupervised nature of the general sequence contexts they captured that is not specific to the protein being engineered. In this work, we propose ECNet, a deep-learning algorithm to exploit evolutionary contexts to predict functional fitness for protein engineering. Our method integrated local evolutionary context from homologous sequences that explicitly model residue-residue epistasis for the protein of interest, as well as the global evolutionary context that encodes rich semantic and structural features from the enormous protein sequence universe. This biologically motivated sequence modeling approach enables accurate mapping from sequence to function and provides generalization from low-order mutants to higher-orders. Through extensive benchmark experiments, we showed that our method outperforms existing methods on ∼50 deep mutagenesis scanning and random mutagenesis datasets, demonstrating its potential of guiding and expediting protein engineering.


2020 ◽  
Vol 2 (1) ◽  
pp. 3-6
Author(s):  
Eric Holloway

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.


2020 ◽  
Vol 27 ◽  
Author(s):  
Alessia Catalano ◽  
Carlo Franchini ◽  
Alessia Carocci

: Mexiletine is an antiarrhythmic drug belonging to IB class, acting as sodium channel blocker. Besides its well-known activity on arrhythmias, its usefulness in the treatment of myotonia, myotonic distrophy and amyotrophic lateral sclerosis is now widely recognized. Nevertheless, it has been retired from the market in several countries because of its undesired effects. Thus, several papers were reported in the last years about analogues and homologues of mexiletine being endowed with a wider therapeutic ratio and a more selectivity of action. Some of them showed sodium channel blocking activity higher than the parent compound. It is noteworthy that mexiletine is used in therapy as a racemate even though a difference in the activities of the two enantiomers were widely demonstrated, with (–)-(R)-enantiomer being more active: this finding led several research groups to study mexiletine and its analogues and homologues in their optically active forms. This review summarizes the different synthetic routes used to obtain these compounds. They could represent an interesting starting point to new mexiletine-like compounds without common side effects related to the use of mexiletine.


2019 ◽  
Vol 20 (3) ◽  
pp. 203-208 ◽  
Author(s):  
Lin Ning ◽  
Bifang He ◽  
Peng Zhou ◽  
Ratmir Derda ◽  
Jian Huang

Background:Peptide-Fc fusion drugs, also known as peptibodies, are a category of biological therapeutics in which the Fc region of an antibody is genetically fused to a peptide of interest. However, to develop such kind of drugs is laborious and expensive. Rational design is urgently needed.Methods:We summarized the key steps in peptide-Fc fusion technology and stressed the main computational resources, tools, and methods that had been used in the rational design of peptide-Fc fusion drugs. We also raised open questions about the computer-aided molecular design of peptide-Fc.Results:The design of peptibody consists of four steps. First, identify peptide leads from native ligands, biopanning, and computational design or prediction. Second, select the proper Fc region from different classes or subclasses of immunoglobulin. Third, fuse the peptide leads and Fc together properly. At last, evaluate the immunogenicity of the constructs. At each step, there are quite a few useful resources and computational tools.Conclusion:Reviewing the molecular design of peptibody will certainly help make the transition from peptide leads to drugs on the market quicker and cheaper.


2020 ◽  
Vol 17 (3) ◽  
pp. 365-375
Author(s):  
Vasyl Kovalishyn ◽  
Diana Hodyna ◽  
Vitaliy O. Sinenko ◽  
Volodymyr Blagodatny ◽  
Ivan Semenyuta ◽  
...  

Background: Tuberculosis (TB) is an infection disease caused by Mycobacterium tuberculosis (Mtb) bacteria. One of the main causes of mortality from TB is the problem of Mtb resistance to known drugs. Objective: The goal of this work is to identify potent small molecule anti-TB agents by machine learning, synthesis and biological evaluation. Methods: The On-line Chemical Database and Modeling Environment (OCHEM) was used to build predictive machine learning models. Seven compounds were synthesized and tested in vitro for their antitubercular activity against H37Rv and resistant Mtb strains. Results: A set of predictive models was built with OCHEM based on a set of previously synthesized isoniazid (INH) derivatives containing a thiazole core and tested against Mtb. The predictive ability of the models was tested by a 5-fold cross-validation, and resulted in balanced accuracies (BA) of 61–78% for the binary classifiers. Test set validation showed that the models could be instrumental in predicting anti- TB activity with a reasonable accuracy (with BA = 67–79 %) within the applicability domain. Seven designed compounds were synthesized and demonstrated activity against both the H37Rv and multidrugresistant (MDR) Mtb strains resistant to rifampicin and isoniazid. According to the acute toxicity evaluation in Daphnia magna neonates, six compounds were classified as moderately toxic (LD50 in the range of 10−100 mg/L) and one as practically harmless (LD50 in the range of 100−1000 mg/L). Conclusion: The newly identified compounds may represent a starting point for further development of therapies against Mtb. The developed models are available online at OCHEM http://ochem.eu/article/11 1066 and can be used to virtually screen for potential compounds with anti-TB activity.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Qiuling Tao ◽  
Pengcheng Xu ◽  
Minjie Li ◽  
Wencong Lu

AbstractThe development of materials is one of the driving forces to accelerate modern scientific progress and technological innovation. Machine learning (ML) technology is rapidly developed in many fields and opening blueprints for the discovery and rational design of materials. In this review, we retrospected the latest applications of ML in assisting perovskites discovery. First, the development tendency of ML in perovskite materials publications in recent years was organized and analyzed. Second, the workflow of ML in perovskites discovery was introduced. Then the applications of ML in various properties of inorganic perovskites, hybrid organic–inorganic perovskites and double perovskites were briefly reviewed. In the end, we put forward suggestions on the future development prospects of ML in the field of perovskite materials.


Author(s):  
Sam Ade Jacobs ◽  
Tim Moon ◽  
Kevin McLoughlin ◽  
Derek Jones ◽  
David Hysom ◽  
...  

We improved the quality and reduced the time to produce machine learned models for use in small molecule antiviral design. Our globally asynchronous multi-level parallel training approach strong scales to all of Sierra with up to 97.7% efficiency. We trained a novel, character-based Wasserstein autoencoder that produces a higher quality model trained on 1.613 billion compounds in 23 minutes while the previous state of the art takes a day on 1 million compounds. Reducing training time from a day to minutes shifts the model creation bottleneck from computer job turnaround time to human innovation time. Our implementation achieves 318 PFLOPs for 17.1% of half-precision peak. We will incorporate this model into our molecular design loop enabling the generation of more diverse compounds; searching for novel, candidate antiviral drugs improves and reduces the time to synthesize compounds to be tested in the lab.


Author(s):  
Suryakanti Debata ◽  
Smruti R. Sahoo ◽  
Rudranarayan Khatua ◽  
Sridhar Sahu

In this study, we present an effective molecular design strategy to develop the n-type charge transport characteristics in organic semiconductors, using ring-fused double perylene diimides (DPDIs) as the model compounds.


Sign in / Sign up

Export Citation Format

Share Document