BAND NN: A Deep Learning Framework For Energy Prediction and Geometry Optimization of Organic Small Molecules

2019 ◽  
Author(s):  
Siddhartha Laghuvarapu ◽  
Yashaswi Pathak ◽  
U. Deva Priyakumar

Recent advances in artificial intelligence along with development of large datasets of energies calculated using quantum mechanical (QM)/density functional theory (DFT) methods have enabled prediction of accurate molecular energies at reasonably low computational cost. However, machine learning models that have been reported so far requires the atomic positions obtained from geometry optimizations using high level QM/DFT methods as input in order to predict the energies, and do not allow for geometry optimization. In this paper, a transferable and molecule-size independent machine learning model (BAND NN) based on a chemically intuitive representation inspired by molecular mechanics force fields is presented. The model predicts the atomization energies of equilibrium and non-equilibrium structures as sum of energy contributions from bonds (B), angles (A), nonbonds (N) and dihedrals (D) at remarkable accuracy. The robustness of the proposed model is further validated by calculations that span over the conformational, configurational and reaction space. The transferability of this model on systems larger than the ones in the dataset is demonstrated by performing calculations on select large molecules. Importantly, employing the BAND NN model, it is possible to perform geometry optimizations starting from non-equilibrium structures along with predicting their energies.

2019 ◽  
Author(s):  
Siddhartha Laghuvarapu ◽  
Yashaswi Pathak ◽  
U. Deva Priyakumar

Recent advances in artificial intelligence along with development of large datasets of energies calculated using quantum mechanical (QM)/density functional theory (DFT) methods have enabled prediction of accurate molecular energies at reasonably low computational cost. However, machine learning models that have been reported so far requires the atomic positions obtained from geometry optimizations using high level QM/DFT methods as input in order to predict the energies, and do not allow for geometry optimization. In this paper, a transferable and molecule-size independent machine learning model (BAND NN) based on a chemically intuitive representation inspired by molecular mechanics force fields is presented. The model predicts the atomization energies of equilibrium and non-equilibrium structures as sum of energy contributions from bonds (B), angles (A), nonbonds (N) and dihedrals (D) at remarkable accuracy. The robustness of the proposed model is further validated by calculations that span over the conformational, configurational and reaction space. The transferability of this model on systems larger than the ones in the dataset is demonstrated by performing calculations on select large molecules. Importantly, employing the BAND NN model, it is possible to perform geometry optimizations starting from non-equilibrium structures along with predicting their energies.


2019 ◽  
Author(s):  
Peng Gao ◽  
Jun Zhang ◽  
Qian Peng ◽  
Vassiliki-Alexandra Glezakou

Accurate prediction of NMR chemical shifts with affordable computational cost is of great importance for rigorous structural assignments of experimental studies. However, the most popular computational schemes for NMR calculation—based on density functional theory (DFT) and gauge-including atomic orbital (GIAO) methods—still suffer from ambiguities in structural assignments. Using state-of-the-art machine learning (ML) techniques, we have developed a DFT+ML model that is capable of predicting 13C/1H NMR chemical shifts of organic molecules with high accuracy. The input for this generalizable DFT+ML model contains two critical parts: one is a vector providing insights into chemical environments, which can be evaluated without knowing the exact geometry of the molecule; the other one is the DFT-calculated isotropic shielding constant. The DFT+ML model was trained with a dataset containing 476 13C and 270 1H experimental chemical shifts. For the DFT methods used here, the root-mean-square-derivations (RMSDs) for the errors between predicted and experimental 13C/1H chemical shifts are as small as 2.10/0.18 ppm, which is much lower than the typical DFT (5.54/0.25 ppm), or DFT+linear regression (4.77/0.23 ppm) approaches. It also has smaller RMSDs and maximum absolute errors than two previously reported NMR-predicting ML models. We test the robustness of the model on two classes of organic molecules (TIC10 and hyacinthacines), where we unambiguously assigned the correct isomers to the experimental ones. This DFT+ML model is a promising way of predicting NMR chemical shifts and can be easily adapted to calculated shifts for any chemical compound.<br>


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Victor Fung ◽  
Guoxiang Hu ◽  
P. Ganesh ◽  
Bobby G. Sumpter

AbstractMaterials databases generated by high-throughput computational screening, typically using density functional theory (DFT), have become valuable resources for discovering new heterogeneous catalysts, though the computational cost associated with generating them presents a crucial roadblock. Hence there is a significant demand for developing descriptors or features, in lieu of DFT, to accurately predict catalytic properties, such as adsorption energies. Here, we demonstrate an approach to predict energies using a convolutional neural network-based machine learning model to automatically obtain key features from the electronic density of states (DOS). The model, DOSnet, is evaluated for a diverse set of adsorbates and surfaces, yielding a mean absolute error on the order of 0.1 eV. In addition, DOSnet can provide physically meaningful predictions and insights by predicting responses to external perturbations to the electronic structure without additional DFT calculations, paving the way for the accelerated discovery of materials and catalysts by exploration of the electronic space.


2019 ◽  
Author(s):  
Peng Gao ◽  
Jun Zhang ◽  
Qian Peng ◽  
Vassiliki-Alexandra Glezakou

Accurate prediction of NMR chemical shifts with affordable computational cost is of great importance for rigorous structural assignments of experimental studies. However, the most popular computational schemes for NMR calculation—based on density functional theory (DFT) and gauge-including atomic orbital (GIAO) methods—still suffer from ambiguities in structural assignments. Using state-of-the-art machine learning (ML) techniques, we have developed a DFT+ML model that is capable of predicting 13C/1H NMR chemical shifts of organic molecules with high accuracy. The input for this generalizable DFT+ML model contains two critical parts: one is a vector providing insights into chemical environments, which can be evaluated without knowing the exact geometry of the molecule; the other one is the DFT-calculated isotropic shielding constant. The DFT+ML model was trained with a dataset containing 476 13C and 270 1H experimental chemical shifts. For the DFT methods used here, the root-mean-square-derivations (RMSDs) for the errors between predicted and experimental 13C/1H chemical shifts are as small as 2.10/0.18 ppm, which is much lower than the typical DFT (5.54/0.25 ppm), or DFT+linear regression (4.77/0.23 ppm) approaches. It also has smaller RMSDs and maximum absolute errors than two previously reported NMR-predicting ML models. We test the robustness of the model on two classes of organic molecules (TIC10 and hyacinthacines), where we unambiguously assigned the correct isomers to the experimental ones. This DFT+ML model is a promising way of predicting NMR chemical shifts and can be easily adapted to calculated shifts for any chemical compound.<br>


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Carl E. Belle ◽  
Vural Aksakalli ◽  
Salvy P. Russo

AbstractFor photovoltaic materials, properties such as band gap $$E_{g}$$ E g are critical indicators of the material’s suitability to perform a desired function. Calculating $$E_{g}$$ E g is often performed using Density Functional Theory (DFT) methods, although more accurate calculation are performed using methods such as the GW approximation. DFT software often used to compute electronic properties includes applications such as VASP, CRYSTAL, CASTEP or Quantum Espresso. Depending on the unit cell size and symmetry of the material, these calculations can be computationally expensive. In this study, we present a new machine learning platform for the accurate prediction of properties such as $$E_{g}$$ E g of a wide range of materials.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Yu Zhang ◽  
Yahui Long ◽  
Chee Keong Kwoh

Abstract Background Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: (1) they identify a large number of triplex-forming lncRNAs, but the limited number of experimentally verified triplex-forming lncRNA indicates that maybe not all of them can form triplex in practice, and (2) their predictions only consider the theoretical relationship while lacking the features from the experimentally verified data. Results In this work, we develop an integrated program named TriplexFPP (Triplex Forming Potential Prediction), which is the first machine learning model in DNA:RNA triplex prediction. TriplexFPP predicts the most likely triplex-forming lncRNAs and DNA sites based on the experimentally verified data, where the high-level features are learned by the convolutional neural networks. In the fivefold cross validation, the average values of Area Under the ROC curves and PRC curves for removed redundancy triplex-forming lncRNA dataset with threshold 0.8 are 0.9649 and 0.9996, and these two values for triplex DNA sites prediction are 0.8705 and 0.9671, respectively. Besides, we also briefly summarize the cis and trans targeting of triplexes lncRNAs. Conclusions The TriplexFPP is able to predict the most likely triplex-forming lncRNAs from all the lncRNAs with computationally defined triplex forming capacities and the potential of a DNA site to become a triplex. It may provide insights to the exploration of lncRNA functions.


Author(s):  
Quintin Hill ◽  
Chris-Kriton Skylaris

While density functional theory (DFT) allows accurate quantum mechanical simulations from first principles in molecules and solids, commonly used exchange-correlation density functionals provide a very incomplete description of dispersion interactions. One way to include such interactions is to augment the DFT energy expression by damped London energy expressions. Several variants of this have been developed for this task, which we discuss and compare in this paper. We have implemented these schemes in the ONETEP program, which is capable of DFT calculations with computational cost that increases linearly with the number of atoms. We have optimized all the parameters involved in our implementation of the dispersion correction, with the aim of simulating biomolecular systems. Our tests show that in cases where dispersion interactions are important this approach produces binding energies and molecular structures of a quality comparable with high-level wavefunction-based approaches.


2020 ◽  
Vol 10 (6) ◽  
pp. 20200007 ◽  
Author(s):  
Shunzhou Wan ◽  
Agastya P. Bhati ◽  
Stefan J. Zasada ◽  
Peter V. Coveney

A central quantity of interest in molecular biology and medicine is the free energy of binding of a molecule to a target biomacromolecule. Until recently, the accurate prediction of binding affinity had been widely regarded as out of reach of theoretical methods owing to the lack of reproducibility of the available methods, not to mention their complexity, computational cost and time-consuming procedures. The lack of reproducibility stems primarily from the chaotic nature of classical molecular dynamics (MD) and the associated extreme sensitivity of trajectories to their initial conditions. Here, we review computational approaches for both relative and absolute binding free energy calculations, and illustrate their application to a diverse set of ligands bound to a range of proteins with immediate relevance in a number of medical domains. We focus on ensemble-based methods which are essential in order to compute statistically robust results, including two we have recently developed, namely thermodynamic integration with enhanced sampling and enhanced sampling of MD with an approximation of continuum solvent. Together, these form a set of rapid, accurate, precise and reproducible free energy methods. They can be used in real-world problems such as hit-to-lead and lead optimization stages in drug discovery, and in personalized medicine. These applications show that individual binding affinities equipped with uncertainty quantification may be computed in a few hours on a massive scale given access to suitable high-end computing resources and workflow automation. A high level of accuracy can be achieved using these approaches.


2021 ◽  
Author(s):  
Kazuumi Fujioka ◽  
Yuheng Luo ◽  
Rui Sun

Ab initio molecular dymamics (AIMD) simulation studies are a direct way to visualize chemical reactions and help elucidate non-statistical dynamics that does not follow the intrinsic reaction coordinate. However, due to the enormous amount of the ab initio energy gradient calculations needed for AIMD, it has been largely restrained to limited sampling and low level of theory (i.e., density functional theory with small basis sets). To overcome this issue, a number of machine learning (ML) methods have been employed to predict the energy gradient of the system of interest. In this manuscript, we outline the theoretical foundations of a novel ML method which trains from a varying set of atomic positions and their energy gradients, called interpolating moving ridge regression (IMRR), and directly predicts the energy gradient of a new set of atomic positions. Several key theoretical findings are presented regarding the inputs used to train IMRR and the predicted energy gradient. A hyperparameter used to guide IMRR is rigorously examined as well. The method is then applied to three bimolecular reactions studied with AIMD, including HBr+ + CO2, H2S + CH, and C4H2 + CH, to demonstrate IMRR’s performance on different chemical systems of different sizes. This manuscript also compares the computational cost of the energy gradient calculation with IMRR vs. ab initio, and the results highlight IMRR as a viable option to greatly increase the efficiency of AIMD.


Sign in / Sign up

Export Citation Format

Share Document