Δ-Quantum machine learning for medicinal chemistry

Many molecular design tasks benefit from fast and accurate calculations of quantum-mechanical (QM) properties. However, the computational cost of QM methods applied to drug-like molecules currently renders large-scale applications of quantum chemistry challenging. Aiming to mitigate this problem, we developed DelFTa, an open-source toolbox for the prediction of electronic properties of drug-like molecules at the density functional (DFT) level of theory, using Δ-machine-learning. Δ-Learning corrects the prediction error (Δ) of a fast but inaccurate property calculation. DelFTa employs state-of-the-art three-dimensional message-passing neural networks trained on a large dataset of QM properties. It provides access to a wide array of quantum observables on the molecular, atomic and bond levels by predicting approximations to DFT values from a low-cost semiempirical baseline. Δ-Learning outperformed its direct-learning counterpart for most of the considered QM endpoints. The results suggest that predictions for non-covalent intra- and intermolecular interactions can be extrapolated to larger biomolecular systems. The software is fully open-sourced and features documented command-line and Python APIs.

Download Full-text

Open-source Δ-quantum machine learning for medicinal chemistry

10.33774/chemrxiv-2021-fz6v7 ◽

2021 ◽

Author(s):

Kenneth Atz ◽

Clemens Isert ◽

Markus N. A. Böcker ◽

José Jiménez-Luna ◽

Gisbert Schneider

Keyword(s):

Machine Learning ◽

Open Source ◽

Density Functional ◽

Large Scale ◽

Molecular Design ◽

State Of The Art ◽

Computational Cost ◽

Quantum Mechanical ◽

Quantum Observables ◽

Graph Neural Networks

Certain molecular design tasks benefit from fast and accurate calculations of quantum-mechanical (QM) properties. However, the computational cost of QM methods applied to drug-like compounds currently makes large-scale applications of quantum chemistry challenging. In order to mitigate this problem, we developed DelFTa, an open-source toolbox for predicting small-molecule electronic properties at the density functional (DFT) level of theory, using the Δ-machine learning principle. DelFTa employs state-of-the-art E(3)-equivariant graph neural networks that were trained on the QMugs dataset of QM properties. It provides access to a wide array of quantum observables by predicting approximations to ωB97X-D/def2-SVP values from a GFN2-xTB semiempirical baseline. Δ-learning with DelFTa was shown to outperform direct DFT learning for most of the considered QM endpoints. The software is provided as open-source code with fully-documented command-line and Python APIs.

Download Full-text

Multi-fidelity prediction of molecular optical peaks with deep learning

10.33774/chemrxiv-2021-6d2bp ◽

2021 ◽

Author(s):

Kevin Greenman ◽

William Green ◽

Rafael Gómez-Bombarelli

Keyword(s):

Optical Properties ◽

Statistical Methods ◽

Message Passing ◽

Density Functional ◽

Molecular Design ◽

Chemical Space ◽

Epistemic Uncertainty ◽

Computational Cost ◽

Regression Tree ◽

Td Dft

Optical properties are central to molecular design for many applications, including solar cells and biomedical imaging. A variety of ab initio and statistical methods have been developed for their prediction, each with a trade-off between accuracy, generality, and cost. Existing theoretical methods such as time-dependent density functional theory (TD-DFT) are generalizable across chemical space because of their robust physics-based foundations but still exhibit random and systematic errors with respect to experiment despite their high computational cost. Statistical methods can achieve high accuracy at a lower cost, but data sparsity and unoptimized molecule and solvent representations often limit their ability to generalize. Here, we utilize directed message passing neural networks (D-MPNNs) to represent both dye molecules and solvents for predictions of molecular absorption peaks in solution. Additionally, we demonstrate a multi-fidelity approach based on an auxiliary model trained on over 28,000 TD-DFT calculations that further improves accuracy and generalizability, as shown through rigorous splitting strategies. Combining several openly-available experimental datasets, we benchmark these methods against a state-of-the-art regression tree algorithm and compare the D-MPNN solvent representation to several alternatives. Finally, we explore the interpretability of the learned representations using dimensionality reduction and evaluate the use of ensemble variance as an estimator of the epistemic uncertainty in our predictions of molecular peak absorption in solution. The prediction methods proposed herein can be integrated with active learning, generative modeling, and experimental workflows to enable the more rapid design of molecules with targeted optical properties.

Download Full-text

Demonstration of cluster computing for three-dimensional CFD simulations

The Aeronautical Journal ◽

10.1017/s0001924000028037 ◽

1999 ◽

Vol 103 (1027) ◽

pp. 443-447 ◽

Cited By ~ 5

Author(s):

W. McMillan ◽

M. Woodgate ◽

B. E. Richards ◽

B. J. Gribben ◽

K. J. Badcock ◽

...

Keyword(s):

Message Passing ◽

Large Scale ◽

Cluster Computing ◽

Low Cost ◽

Three Dimensional ◽

Cost Effective ◽

Parallel Applications ◽

Cfd Simulations ◽

Single Node ◽

Computing Unit

Abstract Motivated by a lack of sufficient local and national computing facilities for computational fluid dynamics simulations, the Affordable Systems Computing Unit (ASCU) was established to investigate low cost alternatives. The options considered have all involved cluster computing, a term which refers to the grouping of a number of components into a managed system capable of running both serial and parallel applications. The present work aims to demonstrate the utility of commodity processors for dedicated batch processing. The performance of the cluster has proved to be extremely cost effective, enabling large three dimensional flow simulations on a computer costing less than £25k sterling at current market prices. The experience gained on this system in terms of single node performance, message passing and parallel performance will be discussed. In particular, comparisons with the performance of other systems will be made. Several medium-large scale CFD simulations performed using the new cluster will be presented to demonstrate the potential of commodity processor based parallel computers for aerodynamic simulation.

Download Full-text

BAND NN: A Deep Learning Framework For Energy Prediction and Geometry Optimization of Organic Small Molecules

10.26434/chemrxiv.9763094 ◽

2019 ◽

Author(s):

Siddhartha Laghuvarapu ◽

Yashaswi Pathak ◽

U. Deva Priyakumar

Keyword(s):

Machine Learning ◽

Density Functional ◽

Computational Cost ◽

Geometry Optimization ◽

Dft Methods ◽

Energy Prediction ◽

Machine Learning Model ◽

Equilibrium Structures ◽

High Level ◽

Non Equilibrium

Recent advances in artificial intelligence along with development of large datasets of energies calculated using quantum mechanical (QM)/density functional theory (DFT) methods have enabled prediction of accurate molecular energies at reasonably low computational cost. However, machine learning models that have been reported so far requires the atomic positions obtained from geometry optimizations using high level QM/DFT methods as input in order to predict the energies, and do not allow for geometry optimization. In this paper, a transferable and molecule-size independent machine learning model (BAND NN) based on a chemically intuitive representation inspired by molecular mechanics force fields is presented. The model predicts the atomization energies of equilibrium and non-equilibrium structures as sum of energy contributions from bonds (B), angles (A), nonbonds (N) and dihedrals (D) at remarkable accuracy. The robustness of the proposed model is further validated by calculations that span over the conformational, configurational and reaction space. The transferability of this model on systems larger than the ones in the dataset is demonstrated by performing calculations on select large molecules. Importantly, employing the BAND NN model, it is possible to perform geometry optimizations starting from non-equilibrium structures along with predicting their energies.

Download Full-text

Bayesian active learning of interatomic force field for molecular dynamics simulation of Pt/Ag(111)

10.26434/chemrxiv-2021-sk6lf-v2 ◽

2021 ◽

Author(s):

Kai Xu ◽

Lei Yan ◽

Bingran You

Keyword(s):

Molecular Dynamics ◽

Active Learning ◽

Force Field ◽

Density Functional ◽

Process Model ◽

Large Scale ◽

Computational Cost ◽

Dynamics Simulation ◽

Potential Energy Landscape ◽

Three Body

Force field is a central requirement in molecular dynamics (MD) simulation for accurate description of the potential energy landscape and the time evolution of individual atomic motions. Most energy models are limited by a fundamental tradeoff between accuracy and speed. Although ab initio MD based on density functional theory (DFT) has high accuracy, its high computational cost prevents its use for large-scale and long-timescale simulations. Here, we use Bayesian active learning to construct a Gaussian process model of interatomic forces to describe Pt deposited on Ag(111). An accurate model is obtained within one day of wall time after selecting only 126 atomic environments based on two- and three-body interactions, providing mean absolute errors of 52 and 142 meV/Å for Ag and Pt, respectively. Our work highlights automated and minimalistic training of machine-learning force fields with high fidelity to DFT, which would enable large-scale and long-timescale simulations of alloy surfaces at first-principles accuracy.

Download Full-text

Assessing Conformer Energies using Electronic Structure and Machine Learning Methods

10.26434/chemrxiv.11920914 ◽

2020 ◽

Author(s):

Dakota Folmsbee ◽

Geoffrey Hutchison

Keyword(s):

Machine Learning ◽

Electronic Structure ◽

Density Functional ◽

Large Scale ◽

Single Point ◽

Semiempirical Method ◽

Coupled Cluster ◽

Scale Evaluation ◽

Machine Learning Methods ◽

Electronic Structure Methods

We have performed a large-scale evaluation of current computational methods, including conventional small-molecule force fields, semiempirical, density functional, ab initio electronic structure methods, and current machine learning (ML) techniques to evaluate relative single-point energies. Using up to 10 local minima geometries across ~700 molecules, each optimized by B3LYP-D3BJ with single-point DLPNO-CCSD(T) triple-zeta energies, we consider over 6,500 single points to compare the correlation between different methods for both relative energies and ordered rankings of minima. We find promise from current ML methods and recommend methods at each tier of the accuracy-time tradeoff, particularly the recent GFN2 semiempirical method, the B97-3c density functional approximation, and RI-MP2 for accurate conformer energies. The ANI family of ML methods shows promise, particularly the ANI-1ccx variant trained in part on coupled-cluster energies. Multiple methods suggest continued improvements should be expected in both performance and accuracy.

Download Full-text

Assessing Conformer Energies using Electronic Structure and Machine Learning Methods

10.26434/chemrxiv.11920914.v2 ◽

2020 ◽

Author(s):

Dakota Folmsbee ◽

Geoffrey Hutchison

Keyword(s):

Machine Learning ◽

Electronic Structure ◽

Density Functional ◽

Large Scale ◽

Single Point ◽

Semiempirical Method ◽

Coupled Cluster ◽

Scale Evaluation ◽

Machine Learning Methods ◽

Electronic Structure Methods

Download Full-text

SVFX: a machine-learning framework to quantify the pathogenicity of structural variants

10.1101/739474 ◽

2019 ◽

Cited By ~ 2

Author(s):

Sushant Kumar ◽

Arif Harmanci ◽

Jagath Vytheeswaran ◽

Mark B. Gerstein

Keyword(s):

Machine Learning ◽

Large Scale ◽

Three Dimensional ◽

Point Mutations ◽

Dimensional Structure ◽

Rapid Decline ◽

Ras Signaling ◽

Cancer Genes ◽

Structural Variations ◽

Pathogenic Variants

AbstractA rapid decline in sequencing cost has made large-scale genome sequencing studies feasible. One of the fundamental goals of these studies is to catalog all pathogenic variants. Numerous methods and tools have been developed to interpret point mutations and small insertions and deletions. However, there is a lack of approaches for identifying pathogenic genomic structural variations (SVs). That said, SVs are known to play a crucial role in many diseases by altering the sequence and three-dimensional structure of the genome. Previous studies have suggested a complex interplay of genomic and epigenomic features in the emergence and distribution of SVs. However, the exact mechanism of pathogenesis for SVs in different diseases is not straightforward to decipher. Thus, we built an agnostic machine-learning-based workflow, called SVFX, to assign a “pathogenicity score” to somatic and germline SVs in various diseases. In particular, we generated somatic and germline training models, which included genomic, epigenomic, and conservation-based features for SV call sets in diseased and healthy individuals. We then applied SVFX to SVs in six different cancer cohorts and a cardiovascular disease (CVD) cohort. Overall, SVFX achieved high accuracy in identifying pathogenic SVs. Moreover, we found that predicted pathogenic SVs in cancer cohorts were enriched among known cancer genes and many cancer-related pathways (including Wnt signaling, Ras signaling, DNA repair, and ubiquitin-mediated proteolysis). Finally, we note that SVFX is flexible and can be easily extended to identify pathogenic SVs in additional disease cohorts.

Download Full-text

Molecular Design Using Signal Processing and Machine Learning: Time-Frequency-like Representation and Forward Design

10.21203/rs.3.rs-229094/v1 ◽

2021 ◽

Author(s):

Alain Beaudelaire Tchagang ◽

Ahmed H. Tewfik ◽

Julio J. Valdés

Keyword(s):

Machine Learning ◽

Signal Processing ◽

High Speed ◽

Density Functional ◽

Molecular Design ◽

Absolute Error ◽

Molecular Data ◽

Deep Convolutional Neural Networks ◽

Time Frequency ◽

Short Time

Abstract Accumulation of molecular data obtained from quantum mechanics (QM) theories such as density functional theory (DFTQM) make it possible for machine learning (ML) to accelerate the discovery of new molecules, drugs, and materials. Models that combine QM with ML (QM↔ML) have been very effective in delivering the precision of QM at the high speed of ML. In this study, we show that by integrating well-known signal processing (SP) techniques (i.e. short time Fourier transform, continuous wavelet analysis and Wigner-Ville distribution) in the QM↔ML pipeline, we obtain a powerful machinery (QM↔SP↔ML) that can be used for representation, visualization and forward design of molecules. More precisely, in this study, we show that the time-frequency-like representation of molecules encodes their structural, geometric, energetic, electronic and thermodynamic properties. This is demonstrated by using the new representation in the forward design loop as input to a deep convolutional neural networks trained on DFTQM calculations, which outputs the properties of the molecules. Tested on the QM9 dataset (composed of 133,855 molecules and 16 properties), the new QM↔SP↔ML model is able to predict the properties of molecules with a mean absolute error (MAE) below acceptable chemical accuracy (i.e. MAE < 1 Kcal/mol for total energies and MAE < 0.1 ev for orbital energies). Furthermore, the new approach performs similarly or better compared to other ML state-of-the-art techniques described in the literature. In all, in this study, we show that the new QM↔SP↔ML model represents a powerful technique for molecular forward design. All the codes and data generated and used in this study are available as supporting materials. The QM↔SP↔ML is also housed at the following website: https://github.com/TABeau/QM-SP-ML.

Download Full-text

Prediction of Residual Stresses in a Multipass Pipe Weld by a Novel 3D Finite Element Approach

Volume 6B: Materials and Fabrication ◽

10.1115/pvp2018-85044 ◽

2018 ◽

Cited By ~ 1

Author(s):

Hui Huang ◽

Jian Chen ◽

Blair Carlson ◽

Hui-Ping Wang ◽

Paul Crooker ◽

...

Keyword(s):

Finite Element ◽

Residual Stresses ◽

High Performance ◽

Large Scale ◽

Graphics Processing Unit ◽

Computational Cost ◽

Three Dimensional ◽

Processing Unit ◽

Girth Welds ◽

Welding Processes

Due to enormous computation cost, current residual stress simulation of multipass girth welds are mostly performed using two-dimensional (2D) axisymmetric models. The 2D model can only provide limited estimation on the residual stresses by assuming its axisymmetric distribution. In this study, a highly efficient thermal-mechanical finite element code for three dimensional (3D) model has been developed based on high performance Graphics Processing Unit (GPU) computers. Our code is further accelerated by considering the unique physics associated with welding processes that are characterized by steep temperature gradient and a moving arc heat source. It is capable of modeling large-scale welding problems that cannot be easily handled by the existing commercial simulation tools. To demonstrate the accuracy and efficiency, our code was compared with a commercial software by simulating a 3D multi-pass girth weld model with over 1 million elements. Our code achieved comparable solution accuracy with respect to the commercial one but with over 100 times saving on computational cost. Moreover, the three-dimensional analysis demonstrated more realistic stress distribution that is not axisymmetric in hoop direction.

Download Full-text