Machine learning for multiscale modeling in computational molecular design

2022 ◽  
Vol 36 ◽  
pp. 100752
Author(s):  
Abdulelah S Alshehri ◽  
Fengqi You
2021 ◽  
Author(s):  
Alain Beaudelaire Tchagang ◽  
Ahmed H. Tewfik ◽  
Julio J. Valdés

Abstract Accumulation of molecular data obtained from quantum mechanics (QM) theories such as density functional theory (DFTQM) make it possible for machine learning (ML) to accelerate the discovery of new molecules, drugs, and materials. Models that combine QM with ML (QM↔ML) have been very effective in delivering the precision of QM at the high speed of ML. In this study, we show that by integrating well-known signal processing (SP) techniques (i.e. short time Fourier transform, continuous wavelet analysis and Wigner-Ville distribution) in the QM↔ML pipeline, we obtain a powerful machinery (QM↔SP↔ML) that can be used for representation, visualization and forward design of molecules. More precisely, in this study, we show that the time-frequency-like representation of molecules encodes their structural, geometric, energetic, electronic and thermodynamic properties. This is demonstrated by using the new representation in the forward design loop as input to a deep convolutional neural networks trained on DFTQM calculations, which outputs the properties of the molecules. Tested on the QM9 dataset (composed of 133,855 molecules and 16 properties), the new QM↔SP↔ML model is able to predict the properties of molecules with a mean absolute error (MAE) below acceptable chemical accuracy (i.e. MAE < 1 Kcal/mol for total energies and MAE < 0.1 ev for orbital energies). Furthermore, the new approach performs similarly or better compared to other ML state-of-the-art techniques described in the literature. In all, in this study, we show that the new QM↔SP↔ML model represents a powerful technique for molecular forward design. All the codes and data generated and used in this study are available as supporting materials. The QM↔SP↔ML is also housed at the following website: https://github.com/TABeau/QM-SP-ML.


2021 ◽  
Author(s):  
Jieun Choi ◽  
Juyong Lee

In this work, we propose a novel drug-like molecular design workflow by combining an efficient global molecular property optimization, protein-ligand molecular docking, and machine learning. Computational drug design algorithms aim to find novel molecules satisfying various drug-like properties and have a strong binding affinity between a protein and a ligand. To accomplish this goal, various computational molecular generation methods have been developed with recent advances in deep learning and the increase of biological data. However, most existing methods heavily depend on experimental activity data, which are not available for many targets. Thus, when the number of available activity data is limited, protein-ligand docking calculations should be used. However, performing a docking calculation during molecular generation on the fly requires considerable computational resources. To address this problem, we used machine-learning models predicting docking energy to accelerate the molecular generation process. We combined this ML-assisted docking score prediction model with the efficient global molecular property optimization approach, MolFinder. We call this design approach V-dock. Using the V-dock approach, we quickly generated many molecules with high docking scores for a target protein and desirable drug-like and bespoke properties, such as similarity to a reference molecule.


2021 ◽  
Author(s):  
Yuxiang Chen ◽  
Chuanlei Liu ◽  
Yang An ◽  
Yue Lou ◽  
Yang Zhao ◽  
...  

Machine learning and computer-aided approaches significantly accelerate molecular design and discovery in scientific and industrial fields increasingly relying on data science for efficiency. The typical method used is supervised learning which needs huge datasets. Semi-supervised machine learning approaches are effective to train unlabeled data with improved modeling performance, whereas they are limited by the accumulation of prediction errors. Here, to screen solvents for removal of methyl mercaptan, a type of organosulfur impurities in natural gas, we constructed a computational framework by integrating molecular similarity search and active learning methods, namely, molecular active selection machine learning (MASML). This new model framework identifies the optimal molecules set by molecular similarity search and iterative addition to the training dataset. Among all 126,068 compounds in the initial dataset, 3 molecules were identified to be promising for methyl mercaptan (MeSH) capture, including benzylamine (BZA), p-methoxybenzylamine (PZM), and N,N-diethyltrimethylenediamine (DEAPA). Further experiments confirmed the effectiveness of our modeling framework in efficient molecular design and identification for capturing methyl mercaptan, in which DEAPA presents a Henry's law constant 89.4% lower than that of methyl diethanolamine (MDEA).


2018 ◽  
Vol 44 (11) ◽  
pp. 930-945 ◽  
Author(s):  
Bryce A. Thurston ◽  
Andrew L. Ferguson

Science ◽  
2018 ◽  
Vol 361 (6400) ◽  
pp. 360-365 ◽  
Author(s):  
Benjamin Sanchez-Lengeling ◽  
Alán Aspuru-Guzik

The discovery of new materials can bring enormous societal and technological progress. In this context, exploring completely the large space of potential materials is computationally intractable. Here, we review methods for achieving inverse design, which aims to discover tailored materials from the starting point of a particular desired functionality. Recent advances from the rapidly growing field of artificial intelligence, mostly from the subfield of machine learning, have resulted in a fertile exchange of ideas, where approaches to inverse molecular design are being proposed and employed at a rapid pace. Among these, deep generative models have been applied to numerous classes of materials: rational design of prospective drugs, synthetic routes to organic compounds, and optimization of photovoltaics and redox flow batteries, as well as a variety of other solid-state materials.


2021 ◽  
Author(s):  
Maxime Langevin ◽  
Rodolphe Vuilleumier ◽  
Marc Bianciotto

Despite growing interest and success in automated in-silico molecular design, doubts remain regarding the ability of goal-directed generation algorithms to perform unbiased exploration of novel chemical spaces. A specific phenomenon has recently been highlighted: goal-directed generation guided with machine learning models produce molecules with high scores according to the optimization model, but low scores according to control models, even when trained on the same data distribution and the same target. In this work, we show that this worrisome behavior is actually due to issues with the predictive models and not the goal-directed generation algorithms. We show that with appropriate predictive models, this issue can be resolved, and molecules generated have high scores according to both the optimization and the control models.


2021 ◽  
Author(s):  
Ke Chen ◽  
Christian Kunkel ◽  
Karsten Reuter ◽  
Johannes T. Margraf

The molecular reorganization energy $\lambda$ strongly influences the charge carrier mobility of organic semiconductors and is therefore an important target for molecular design. Machine learning (ML) models generally have the potential to strongly accelerate this design process (e.g. in virtual screening studies) by providing fast and accurate estimates of molecular properties. While such models are well established for simple properties (e.g. the atomization energy), $\lambda$ poses a significant challenge in this context. In this paper, we address the questions of how ML models for $\lambda$ can be improved and what their benefit is in high-throughput virtual screening (HTVS) studies. We find that, while improved predictive accuracy can be obtained relative to a semiempirical baseline model, the improvement in molecular discovery is somewhat marginal. In particular, the ML enhanced screenings are more effective in identifying promising candidates but lead to a less diverse sample. We further use substructure analysis to derive a general design rule for organic molecules with low $\lambda$ from the HTVS results.


2021 ◽  
Author(s):  
Kenneth Atz ◽  
Clemens Isert ◽  
Markus N. A. Böcker ◽  
José Jiménez-Luna ◽  
Gisbert Schneider

Many molecular design tasks benefit from fast and accurate calculations of quantum-mechanical (QM) properties. However, the computational cost of QM methods applied to drug-like molecules currently renders large-scale applications of quantum chemistry challenging. Aiming to mitigate this problem, we developed DelFTa, an open-source toolbox for the prediction of electronic properties of drug-like molecules at the density functional (DFT) level of theory, using Δ-machine-learning. Δ-Learning corrects the prediction error (Δ) of a fast but inaccurate property calculation. DelFTa employs state-of-the-art three-dimensional message-passing neural networks trained on a large dataset of QM properties. It provides access to a wide array of quantum observables on the molecular, atomic and bond levels by predicting approximations to DFT values from a low-cost semiempirical baseline. Δ-Learning outperformed its direct-learning counterpart for most of the considered QM endpoints. The results suggest that predictions for non-covalent intra- and intermolecular interactions can be extrapolated to larger biomolecular systems. The software is fully open-sourced and features documented command-line and Python APIs.


2021 ◽  
pp. 2102807
Author(s):  
Bohayra Mortazavi ◽  
Mohammad Silani ◽  
Evgeny V. Podryabinkin ◽  
Timon Rabczuk ◽  
Xiaoying Zhuang ◽  
...  

2020 ◽  
Vol 49 (17) ◽  
pp. 6154-6168 ◽  
Author(s):  
Felix Strieth-Kalthoff ◽  
Frederik Sandfort ◽  
Marwin H. S. Segler ◽  
Frank Glorius

Chemists go ML! This tutorial review provides easy access to the fundamentals of machine learning from a synthetic chemist's perspective. Its diverse applications for molecular design, synthesis planning, or reactivity prediction are summarized.


Sign in / Sign up

Export Citation Format

Share Document