Fast, accurate, and transferable many-body interatomic potentials by symbolic regression

<p></p><p>Physics-inspired Artificial Intelligence (AI) is at the forefront of methods development in molecular modeling and computational chemistry. In particular, interatomic potentials derived with Machine Learning algorithms such as Deep Neural Networks (DNNs), achieve the accuracy of high-fidelity quantum mechanical (QM) methods in areas traditionally dominated by empirical force fields and allow performing massive simulations. The applicability domain of DNN potentials is usually limited by the type of training data. As such, transferable models are aimed to be extensible in the description of chemical and conformational diversity of organic molecules. However, most DNN potentials, such as the AIMNet model we proposed previously, were parametrized for neutral molecules or closed-shell ions due to architectural limitations. In this work, we extend machine learning framework toward open-shell anions and cations. We introduce AIMNet-NSE (Neural Spin Equilibration) architecture, which being properly trained, could predict atomic and molecular properties for an arbitrary combination of molecular charge and spin multiplicity. This model explores a new dimension of transferability by adding the charge-spin space. The AIMNet-NSE model is capable of reproducing reference QM energies for cations, neutrals, and anions with errors of about 2-3 kcal/mol, compared to the reference QM simulations. The spin-charges have errors ~0.01 electrons for small organic molecules containing nine chemical elements {H, C, N, O, F, Si, P, S and Cl}. <a>The AIMNet-NSE model allows to fully bypass QM calculations and derive the ionization potential, electron affinity, and conceptual Density Functional Theory quantities like electronegativity, hardness, and condensed Fukui functions with a speed up to 10<sup>4</sup> molecules per second on a single modern GPU.</a> We show that these descriptors, along with learned atomic representations, could be used to model chemical reactivity through an example of regioselectivity in electrophilic aromatic substitution reactions.</p><p></p>

Download Full-text

Machine Learning Accurate Exchange and Correlation Functionals of the Electronic Density

10.26434/chemrxiv.9947312.v2 ◽

2019 ◽

Author(s):

Sebastian Dick ◽

Marivi Fernandez-Serra

Keyword(s):

Machine Learning ◽

Density Functional ◽

Computational Cost ◽

Atomic Scale ◽

Training Data ◽

Supervised Machine Learning ◽

Electronic Density ◽

Standard Formalism ◽

Dynamics Simulations ◽

Exchange And Correlation

Density Functional Theory (DFT) is the standard formalism to study the electronic structure of matter<br>at the atomic scale. The balance between accuracy and computational cost that<br>DFT-based simulations provide allows researchers to understand the structural and dynamical properties of increasingly large and complex systems at the quantum mechanical level.<br>In Kohn-Sham DFT, this balance depends on the choice of exchange and correlation functional, which only exists<br>in approximate form. Increasing the non-locality of this functional and climbing the figurative Jacob's ladder of DFT, one can systematically reduce the amount of approximation involved and thus approach the exact functional. Doing this, however, comes at the price of increased computational cost, and so, for extensive systems, the predominant methods of choice can still be found within the lower-rung approximations. <br>Here we propose a framework to create highly accurate density functionals by using supervised machine learning, termed NeuralXC. These machine-learned functionals are designed to lift the accuracy of local and semilocal functionals to that provided by more accurate methods while maintaining their efficiency. We show that the functionals learn a meaningful representation of the physical information contained in the training data, making them transferable across systems. We further demonstrate how a functional optimized on water can reproduce experimental results when used in molecular dynamics simulations. Finally, we discuss the effects that our method has on self-consistent electron densities by comparing these densities to benchmark coupled-cluster results.

Download Full-text

Machine Learning a Highly Accurate Exchange and Correlation Functional of the Electronic Density

10.26434/chemrxiv.9947312.v1 ◽

2019 ◽

Author(s):

Sebastian Dick ◽

Marivi Fernandez-Serra

Keyword(s):

Machine Learning ◽

Density Functional ◽

Computational Cost ◽

Atomic Scale ◽

Training Data ◽

Supervised Machine Learning ◽

Electronic Density ◽

Standard Formalism ◽

Dynamics Simulations ◽

Exchange And Correlation

Density Functional Theory (DFT) is the standard formalism to study the electronic structure of matter<br>at the atomic scale. The balance between accuracy and computational cost that<br>DFT-based simulations provide allows researchers to understand the structural and dynamical properties of increasingly large and complex systems at the quantum mechanical level.<br>In Kohn-Sham DFT, this balance depends on the choice of exchange and correlation functional, which only exists<br>in approximate form. Increasing the non-locality of this functional and climbing the figurative Jacob's ladder of DFT, one can systematically reduce the amount of approximation involved and thus approach the exact functional. Doing this, however, comes at the price of increased computational cost, and so, for extensive systems, the predominant methods of choice can still be found within the lower-rung approximations. <br>Here we propose a framework to create highly accurate density functionals by using supervised machine learning, termed NeuralXC. These machine-learned functionals are designed to lift the accuracy of local and semilocal functionals to that provided by more accurate methods while maintaining their efficiency. We show that the functionals learn a meaningful representation of the physical information contained in the training data, making them transferable across systems. We further demonstrate how a functional optimized on water can reproduce experimental results when used in molecular dynamics simulations. Finally, we discuss the effects that our method has on self-consistent electron densities by comparing these densities to benchmark coupled-cluster results.

Download Full-text

Machine Learning Accurate Exchange and Correlation Functionals of the Electronic Density

10.26434/chemrxiv.9947312 ◽

2020 ◽

Author(s):

Sebastian Dick ◽

Marivi Fernandez-Serra

Keyword(s):

Machine Learning ◽

Density Functional ◽

Computational Cost ◽

Atomic Scale ◽

Training Data ◽

Supervised Machine Learning ◽

Electronic Density ◽

Approximate Form ◽

Standard Formalism ◽

Exchange And Correlation

<div>Density Functional Theory (DFT) is the standard formalism to study the electronic structure</div><div>of matter at the atomic scale. In Kohn-Sham DFT simulations, the balance between accuracy</div><div>and computational cost depends on the choice of exchange and correlation functional, which only</div><div>exists in approximate form. Here we propose a framework to create density functionals using</div><div>supervised machine learning, termed NeuralXC. These machine-learned functionals are designed to</div><div>lift the accuracy of baseline functionals towards that are provided by more accurate methods while</div><div>maintaining their efficiency. We show that the functionals learn a meaningful representation of the</div><div>physical information contained in the training data, making them transferable across systems. A</div><div>NeuralXC functional optimized for water outperforms other methods characterizing bond breaking</div><div>and excels when comparing against experimental results. This work demonstrates that NeuralXC</div><div>is a first step towards the design of a universal, highly accurate functional valid for both molecules</div><div>and solids.</div>

Download Full-text

Machine Learning Accurate Exchange and Correlation Functionals of the Electronic Density

10.26434/chemrxiv.9947312.v3 ◽

2020 ◽

Author(s):

Sebastian Dick ◽

Marivi Fernandez-Serra

Keyword(s):

Machine Learning ◽

Density Functional ◽

Computational Cost ◽

Atomic Scale ◽

Training Data ◽

Supervised Machine Learning ◽

Electronic Density ◽

Approximate Form ◽

Standard Formalism ◽

Exchange And Correlation

<div>Density Functional Theory (DFT) is the standard formalism to study the electronic structure</div><div>of matter at the atomic scale. In Kohn-Sham DFT simulations, the balance between accuracy</div><div>and computational cost depends on the choice of exchange and correlation functional, which only</div><div>exists in approximate form. Here we propose a framework to create density functionals using</div><div>supervised machine learning, termed NeuralXC. These machine-learned functionals are designed to</div><div>lift the accuracy of baseline functionals towards that are provided by more accurate methods while</div><div>maintaining their efficiency. We show that the functionals learn a meaningful representation of the</div><div>physical information contained in the training data, making them transferable across systems. A</div><div>NeuralXC functional optimized for water outperforms other methods characterizing bond breaking</div><div>and excels when comparing against experimental results. This work demonstrates that NeuralXC</div><div>is a first step towards the design of a universal, highly accurate functional valid for both molecules</div><div>and solids.</div>

Download Full-text

BAND NN: A Deep Learning Framework For Energy Prediction and Geometry Optimization of Organic Small Molecules

10.26434/chemrxiv.9763094 ◽

2019 ◽

Author(s):

Siddhartha Laghuvarapu ◽

Yashaswi Pathak ◽

U. Deva Priyakumar

Keyword(s):

Machine Learning ◽

Density Functional ◽

Computational Cost ◽

Geometry Optimization ◽

Dft Methods ◽

Energy Prediction ◽

Machine Learning Model ◽

Equilibrium Structures ◽

High Level ◽

Non Equilibrium

Recent advances in artificial intelligence along with development of large datasets of energies calculated using quantum mechanical (QM)/density functional theory (DFT) methods have enabled prediction of accurate molecular energies at reasonably low computational cost. However, machine learning models that have been reported so far requires the atomic positions obtained from geometry optimizations using high level QM/DFT methods as input in order to predict the energies, and do not allow for geometry optimization. In this paper, a transferable and molecule-size independent machine learning model (BAND NN) based on a chemically intuitive representation inspired by molecular mechanics force fields is presented. The model predicts the atomization energies of equilibrium and non-equilibrium structures as sum of energy contributions from bonds (B), angles (A), nonbonds (N) and dihedrals (D) at remarkable accuracy. The robustness of the proposed model is further validated by calculations that span over the conformational, configurational and reaction space. The transferability of this model on systems larger than the ones in the dataset is demonstrated by performing calculations on select large molecules. Importantly, employing the BAND NN model, it is possible to perform geometry optimizations starting from non-equilibrium structures along with predicting their energies.

Download Full-text

Optimization of Diabetes Training DATA using Machine Learning Algorithms

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i2.283286 ◽

2018 ◽

Vol 6 (2) ◽

pp. 283-286

Author(s):

M. Samba Siva Rao ◽

◽

M.Yaswanth . ◽

K. Raghavendra Swamy ◽

◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data

Download Full-text

57 Precision neoantigen discovery using novel algorithms and expanded HLA-ligandome datasets

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0057 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A62-A62

Author(s):

Dattatreya Mellacheruvu ◽

Rachel Pyke ◽

Charles Abbott ◽

Nick Phillips ◽

Sejal Desai ◽

...

Keyword(s):

Machine Learning ◽

Cell Lines ◽

Antigen Processing ◽

Large Scale ◽

Prediction Models ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Training Data ◽

High Quality ◽

Tissue Samples

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.

Download Full-text

Drill-Core Mineral Abundance Estimation Using Hyperspectral and High-Resolution Mineralogical Data

Remote Sensing ◽

10.3390/rs12071218 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1218

Author(s):

Laura Tuşa ◽

Mahdi Khodadadzadeh ◽

Cecilia Contreras ◽

Kasra Rafiezadeh Shahi ◽

Margret Fuchs ◽

...

Keyword(s):

Machine Learning ◽

High Resolution ◽

Ore Deposits ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Drill Core ◽

Data Types ◽

Mineralogical Characterization ◽

Core Samples

Due to the extensive drilling performed every year in exploration campaigns for the discovery and evaluation of ore deposits, drill-core mapping is becoming an essential step. While valuable mineralogical information is extracted during core logging by on-site geologists, the process is time consuming and dependent on the observer and individual background. Hyperspectral short-wave infrared (SWIR) data is used in the mining industry as a tool to complement traditional logging techniques and to provide a rapid and non-invasive analytical method for mineralogical characterization. Additionally, Scanning Electron Microscopy-based image analyses using a Mineral Liberation Analyser (SEM-MLA) provide exhaustive high-resolution mineralogical maps, but can only be performed on small areas of the drill-cores. We propose to use machine learning algorithms to combine the two data types and upscale the quantitative SEM-MLA mineralogical data to drill-core scale. This way, quasi-quantitative maps over entire drill-core samples are obtained. Our upscaling approach increases result transparency and reproducibility by employing physical-based data acquisition (hyperspectral imaging) combined with mathematical models (machine learning). The procedure is tested on 5 drill-core samples with varying training data using random forests, support vector machines and neural network regression models. The obtained mineral abundance maps are further used for the extraction of mineralogical parameters such as mineral association.

Download Full-text

Pros and cons of time-dependent hybrid density functional approach to optical spectra of solids: a case study of CeO2

Physical Chemistry Chemical Physics ◽

10.1039/d1cp02049h ◽

2021 ◽

Author(s):

Huai-Yang Sun ◽

Shuo-Xue Li ◽

Hong Jiang

Keyword(s):

First Principles ◽

Density Functional ◽

Computational Cost ◽

Optical Spectra ◽

First Principles Calculation ◽

Many Body ◽

Many Body Perturbation Theory ◽

Pros And Cons ◽

Hybrid Density Functional

Prediction of optical spectra of complex solids remains a great challenge for first-principles calculation due to the huge computational cost of the state-of-the-art many-body perturbation theory based GW-Bethe Salpeter equation...

Download Full-text