scholarly journals Fast, accurate, and transferable many-body interatomic potentials by symbolic regression

2019 ◽  
Vol 5 (1) ◽  
Author(s):  
Alberto Hernandez ◽  
Adarsh Balasubramanian ◽  
Fenglin Yuan ◽  
Simon A. M. Mason ◽  
Tim Mueller

AbstractThe length and time scales of atomistic simulations are limited by the computational cost of the methods used to predict material properties. In recent years there has been great progress in the use of machine-learning algorithms to develop fast and accurate interatomic potential models, but it remains a challenge to develop models that generalize well and are fast enough to be used at extreme time and length scales. To address this challenge, we have developed a machine-learning algorithm based on symbolic regression in the form of genetic programming that is capable of discovering accurate, computationally efficient many-body potential models. The key to our approach is to explore a hypothesis space of models based on fundamental physical principles and select models within this hypothesis space based on their accuracy, speed, and simplicity. The focus on simplicity reduces the risk of overfitting the training data and increases the chances of discovering a model that generalizes well. Our algorithm was validated by rediscovering an exact Lennard-Jones potential and a Sutton-Chen embedded-atom method potential from training data generated using these models. By using training data generated from density functional theory calculations, we found potential models for elemental copper that are simple, as fast as embedded-atom models, and capable of accurately predicting properties outside of their training set. Our approach requires relatively small sets of training data, making it possible to generate training data using highly accurate methods at a reasonable computational cost. We present our approach, the forms of the discovered models, and assessments of their transferability, accuracy and speed.

2021 ◽  
Author(s):  
Roman Zubatyuk ◽  
Justin Smith ◽  
Benjamin T. Nebgen ◽  
Sergei Tretiak ◽  
Olexandr Isayev

<p></p><p>Physics-inspired Artificial Intelligence (AI) is at the forefront of methods development in molecular modeling and computational chemistry. In particular, interatomic potentials derived with Machine Learning algorithms such as Deep Neural Networks (DNNs), achieve the accuracy of high-fidelity quantum mechanical (QM) methods in areas traditionally dominated by empirical force fields and allow performing massive simulations. The applicability domain of DNN potentials is usually limited by the type of training data. As such, transferable models are aimed to be extensible in the description of chemical and conformational diversity of organic molecules. However, most DNN potentials, such as the AIMNet model we proposed previously, were parametrized for neutral molecules or closed-shell ions due to architectural limitations. In this work, we extend machine learning framework toward open-shell anions and cations. We introduce AIMNet-NSE (Neural Spin Equilibration) architecture, which being properly trained, could predict atomic and molecular properties for an arbitrary combination of molecular charge and spin multiplicity. This model explores a new dimension of transferability by adding the charge-spin space. The AIMNet-NSE model is capable of reproducing reference QM energies for cations, neutrals, and anions with errors of about 2-3 kcal/mol, compared to the reference QM simulations. The spin-charges have errors ~0.01 electrons for small organic molecules containing nine chemical elements {H, C, N, O, F, Si, P, S and Cl}. <a>The AIMNet-NSE model allows to fully bypass QM calculations and derive the ionization potential, electron affinity, and conceptual Density Functional Theory quantities like electronegativity, hardness, and condensed Fukui functions with a speed up to 10<sup>4</sup> molecules per second on a single modern GPU.</a> We show that these descriptors, along with learned atomic representations, could be used to model chemical reactivity through an example of regioselectivity in electrophilic aromatic substitution reactions.</p><p></p>


2019 ◽  
Author(s):  
Sebastian Dick ◽  
Marivi Fernandez-Serra

Density Functional Theory (DFT) is the standard formalism to study the electronic structure of matter<br>at the atomic scale. The balance between accuracy and computational cost that<br>DFT-based simulations provide allows researchers to understand the structural and dynamical properties of increasingly large and complex systems at the quantum mechanical level.<br>In Kohn-Sham DFT, this balance depends on the choice of exchange and correlation functional, which only exists<br>in approximate form. Increasing the non-locality of this functional and climbing the figurative Jacob's ladder of DFT, one can systematically reduce the amount of approximation involved and thus approach the exact functional. Doing this, however, comes at the price of increased computational cost, and so, for extensive systems, the predominant methods of choice can still be found within the lower-rung approximations. <br>Here we propose a framework to create highly accurate density functionals by using supervised machine learning, termed NeuralXC. These machine-learned functionals are designed to lift the accuracy of local and semilocal functionals to that provided by more accurate methods while maintaining their efficiency. We show that the functionals learn a meaningful representation of the physical information contained in the training data, making them transferable across systems. We further demonstrate how a functional optimized on water can reproduce experimental results when used in molecular dynamics simulations. Finally, we discuss the effects that our method has on self-consistent electron densities by comparing these densities to benchmark coupled-cluster results.


2019 ◽  
Author(s):  
Sebastian Dick ◽  
Marivi Fernandez-Serra

Density Functional Theory (DFT) is the standard formalism to study the electronic structure of matter<br>at the atomic scale. The balance between accuracy and computational cost that<br>DFT-based simulations provide allows researchers to understand the structural and dynamical properties of increasingly large and complex systems at the quantum mechanical level.<br>In Kohn-Sham DFT, this balance depends on the choice of exchange and correlation functional, which only exists<br>in approximate form. Increasing the non-locality of this functional and climbing the figurative Jacob's ladder of DFT, one can systematically reduce the amount of approximation involved and thus approach the exact functional. Doing this, however, comes at the price of increased computational cost, and so, for extensive systems, the predominant methods of choice can still be found within the lower-rung approximations. <br>Here we propose a framework to create highly accurate density functionals by using supervised machine learning, termed NeuralXC. These machine-learned functionals are designed to lift the accuracy of local and semilocal functionals to that provided by more accurate methods while maintaining their efficiency. We show that the functionals learn a meaningful representation of the physical information contained in the training data, making them transferable across systems. We further demonstrate how a functional optimized on water can reproduce experimental results when used in molecular dynamics simulations. Finally, we discuss the effects that our method has on self-consistent electron densities by comparing these densities to benchmark coupled-cluster results.


2020 ◽  
Author(s):  
Sebastian Dick ◽  
Marivi Fernandez-Serra

<div>Density Functional Theory (DFT) is the standard formalism to study the electronic structure</div><div>of matter at the atomic scale. In Kohn-Sham DFT simulations, the balance between accuracy</div><div>and computational cost depends on the choice of exchange and correlation functional, which only</div><div>exists in approximate form. Here we propose a framework to create density functionals using</div><div>supervised machine learning, termed NeuralXC. These machine-learned functionals are designed to</div><div>lift the accuracy of baseline functionals towards that are provided by more accurate methods while</div><div>maintaining their efficiency. We show that the functionals learn a meaningful representation of the</div><div>physical information contained in the training data, making them transferable across systems. A</div><div>NeuralXC functional optimized for water outperforms other methods characterizing bond breaking</div><div>and excels when comparing against experimental results. This work demonstrates that NeuralXC</div><div>is a first step towards the design of a universal, highly accurate functional valid for both molecules</div><div>and solids.</div>


2020 ◽  
Author(s):  
Sebastian Dick ◽  
Marivi Fernandez-Serra

<div>Density Functional Theory (DFT) is the standard formalism to study the electronic structure</div><div>of matter at the atomic scale. In Kohn-Sham DFT simulations, the balance between accuracy</div><div>and computational cost depends on the choice of exchange and correlation functional, which only</div><div>exists in approximate form. Here we propose a framework to create density functionals using</div><div>supervised machine learning, termed NeuralXC. These machine-learned functionals are designed to</div><div>lift the accuracy of baseline functionals towards that are provided by more accurate methods while</div><div>maintaining their efficiency. We show that the functionals learn a meaningful representation of the</div><div>physical information contained in the training data, making them transferable across systems. A</div><div>NeuralXC functional optimized for water outperforms other methods characterizing bond breaking</div><div>and excels when comparing against experimental results. This work demonstrates that NeuralXC</div><div>is a first step towards the design of a universal, highly accurate functional valid for both molecules</div><div>and solids.</div>


2019 ◽  
Author(s):  
Siddhartha Laghuvarapu ◽  
Yashaswi Pathak ◽  
U. Deva Priyakumar

Recent advances in artificial intelligence along with development of large datasets of energies calculated using quantum mechanical (QM)/density functional theory (DFT) methods have enabled prediction of accurate molecular energies at reasonably low computational cost. However, machine learning models that have been reported so far requires the atomic positions obtained from geometry optimizations using high level QM/DFT methods as input in order to predict the energies, and do not allow for geometry optimization. In this paper, a transferable and molecule-size independent machine learning model (BAND NN) based on a chemically intuitive representation inspired by molecular mechanics force fields is presented. The model predicts the atomization energies of equilibrium and non-equilibrium structures as sum of energy contributions from bonds (B), angles (A), nonbonds (N) and dihedrals (D) at remarkable accuracy. The robustness of the proposed model is further validated by calculations that span over the conformational, configurational and reaction space. The transferability of this model on systems larger than the ones in the dataset is demonstrated by performing calculations on select large molecules. Importantly, employing the BAND NN model, it is possible to perform geometry optimizations starting from non-equilibrium structures along with predicting their energies.


2018 ◽  
Vol 6 (2) ◽  
pp. 283-286
Author(s):  
M. Samba Siva Rao ◽  
◽  
M.Yaswanth . ◽  
K. Raghavendra Swamy ◽  
◽  
...  

2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A62-A62
Author(s):  
Dattatreya Mellacheruvu ◽  
Rachel Pyke ◽  
Charles Abbott ◽  
Nick Phillips ◽  
Sejal Desai ◽  
...  

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.


2020 ◽  
Vol 12 (7) ◽  
pp. 1218
Author(s):  
Laura Tuşa ◽  
Mahdi Khodadadzadeh ◽  
Cecilia Contreras ◽  
Kasra Rafiezadeh Shahi ◽  
Margret Fuchs ◽  
...  

Due to the extensive drilling performed every year in exploration campaigns for the discovery and evaluation of ore deposits, drill-core mapping is becoming an essential step. While valuable mineralogical information is extracted during core logging by on-site geologists, the process is time consuming and dependent on the observer and individual background. Hyperspectral short-wave infrared (SWIR) data is used in the mining industry as a tool to complement traditional logging techniques and to provide a rapid and non-invasive analytical method for mineralogical characterization. Additionally, Scanning Electron Microscopy-based image analyses using a Mineral Liberation Analyser (SEM-MLA) provide exhaustive high-resolution mineralogical maps, but can only be performed on small areas of the drill-cores. We propose to use machine learning algorithms to combine the two data types and upscale the quantitative SEM-MLA mineralogical data to drill-core scale. This way, quasi-quantitative maps over entire drill-core samples are obtained. Our upscaling approach increases result transparency and reproducibility by employing physical-based data acquisition (hyperspectral imaging) combined with mathematical models (machine learning). The procedure is tested on 5 drill-core samples with varying training data using random forests, support vector machines and neural network regression models. The obtained mineral abundance maps are further used for the extraction of mineralogical parameters such as mineral association.


Author(s):  
Huai-Yang Sun ◽  
Shuo-Xue Li ◽  
Hong Jiang

Prediction of optical spectra of complex solids remains a great challenge for first-principles calculation due to the huge computational cost of the state-of-the-art many-body perturbation theory based GW-Bethe Salpeter equation...


Sign in / Sign up

Export Citation Format

Share Document