scholarly journals Quantum chemical accuracy from density functional approximations via machine learning

2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Mihail Bogojeski ◽  
Leslie Vogt-Maranto ◽  
Mark E. Tuckerman ◽  
Klaus-Robert Müller ◽  
Kieron Burke

Abstract Kohn-Sham density functional theory (DFT) is a standard tool in most branches of chemistry, but accuracies for many molecules are limited to 2-3 kcal ⋅ mol−1 with presently-available functionals. Ab initio methods, such as coupled-cluster, routinely produce much higher accuracy, but computational costs limit their application to small molecules. In this paper, we leverage machine learning to calculate coupled-cluster energies from DFT densities, reaching quantum chemical accuracy (errors below 1 kcal ⋅ mol−1) on test data. Moreover, density-based Δ-learning (learning only the correction to a standard DFT calculation, termed Δ-DFT ) significantly reduces the amount of training data required, particularly when molecular symmetries are included. The robustness of Δ-DFT  is highlighted by correcting “on the fly” DFT-based molecular dynamics (MD) simulations of resorcinol (C6H4(OH)2) to obtain MD trajectories with coupled-cluster accuracy. We conclude, therefore, that Δ-DFT  facilitates running gas-phase MD simulations with quantum chemical accuracy, even for strained geometries and conformer changes where standard DFT fails.

2019 ◽  
Author(s):  
Mihail Bogojeski ◽  
Leslie Vogt-Maranto ◽  
Mark E. Tuckerman ◽  
Klaus-Robert Mueller ◽  
Kieron Burke

<div> <div> <p>Kohn-Sham density functional theory (DFT) is a standard tool in most branches of chemistry, but accuracies for many molecules are limited to 2-3 kcal/mol with presently-available functionals. <i>Ab initio</i> methods, such as coupled-cluster, routinely produce much higher accuracy, but computational costs limit their application to small molecules. In this paper, we leverage machine learning to calculate coupled-cluster energies from DFT densities, reaching quantum chemical accuracy (errors below 1 kcal/mol) on test data. Moreover, density-based ∆-learning (learning only the correction to a standard DFT calculation, termed ∆-DFT) significantly reduces the amount of training data required, particularly when molecular symmetries are included. The robustness of ∆-DFT is highlighted by correcting "on the fly" DFT-based molecular dynamics (MD) simulations of resorcinol (C<sub>6</sub>H<sub>4</sub>(OH)<sub>2</sub>) to obtain MD trajectories with coupled-cluster accuracy. We conclude, therefore, that ∆-DFT facilitates running gas-phase MD simulations with quantum chemical accuracy, even for strained geometries and conformer changes where standard DFT fails.</p> </div> </div>


Author(s):  
Mihail Bogojeski ◽  
Leslie Vogt-Maranto ◽  
Mark E. Tuckerman ◽  
Klaus-Robert Mueller ◽  
Kieron Burke

<div> <div> <div> <p>Kohn-Sham density functional theory (DFT) is a standard tool in most branches of chemistry, but accuracies for many molecules are limited to 2-3 kcal/mol with presently-available functionals. <i>Ab initio </i>methods, such as coupled-cluster, routinely produce much higher accuracy, but computational costs limit their application to small molecules. We create density functionals from coupled-cluster energies, based only on DFT densities, via machine learning. These functionals attain quantum chemical accuracy (errors below 1 kcal/mol). Moreover, density-based ∆-learning (learning only the correction to a standard DFT calculation, ∆-DFT) significantly reduces the amount of training data required. We demonstrate these concepts for a single water molecule, and then illustrate how to include molecular symmetries with ethanol. Finally, we highlight the robustness of ∆-DFT by correcting DFT simulations of resorcinol on the fly to obtain molecular dynamics (MD) trajectories with coupled-cluster accuracy. Thus ∆-DFT opens the door to running gas-phase MD simulations with quantum chemical accuracy, even for strained geometries and conformer changes where standard DFT is quantitatively incorrect. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Zheng Cheng ◽  
Jiahui Du ◽  
Lei Zhang ◽  
Jing Ma ◽  
Wei Li ◽  
...  

<p>Molecular dynamic (MD) simulation plays an essential role in understanding protein functions at atomic level. At present, MD simulations on proteins are mainly based on classical force fields. However, the accuracy of classical force fields for proteins is still insufficient for accurate descriptions of their structures and dynamical properties. Here we present a novel protocol to construct machine learning force field (MLFF) for a given protein with full quantum mechanics (QM) accuracy. In this protocol, the energy of the target system is obtained by fitting energies of its various subsystems constructed with the generalized energy-based fragmentation (GEBF) approach. To facilitate the construction of MLFF for various proteins, a protein’s data library is created to store all data of subsystems generated from trained proteins. With this protein’s data library, for a new protein only its subsystems with new topological types are required for the construction of the corresponding MLFF. This protocol is illustrated with two polypeptides, 4ZNN and 1XQ8 segment, as examples. The energies and forces predicted from this MLFF are in good agreement with those from density functional theory calculations, and dihedral angle distributions from GEBF-MLFF MD simulations can also well reproduce those from <i>ab initio</i> MD simulations. Therefore, this GEBF-ML protocol is expected to be an efficient and systematic way to build force fields for proteins and other biological systems with QM accuracy.<b></b></p>


2020 ◽  
Author(s):  
Antti Pihlajamaki ◽  
Joonas Hamalainen ◽  
Joakim Linja ◽  
Paavo Nieminen ◽  
Sami Malola ◽  
...  

<div> <div> <div> <p>We present an implementation of distance-based machine learning (ML) methods to create a realistic atomistic interaction potential to be used in Monte Carlo simulations of thermal dynamics of thiolate (SR) protected gold nanoclusters. The ML potential is trained for Au38(SR)24 by using previously published, density functional theory (DFT) -based, molecular dynamics (MD) simulation data on two experimentally characterized structural isomers of the cluster, and validated against independent DFT MD simulations. This method opens a door to efficient probing of the configuration space for further investigations of thermal-dependent electronic and optical properties of Au38(SR)24. Our ML implementation strategy allows for generalization and accuracy control of distance-based ML models for complex nanostructures having several chemical elements and interactions of varying strength. </p> </div> </div> </div>


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Marco Arrigoni ◽  
Georg K. H. Madsen

AbstractDensity functional theory (DFT) has become a standard tool for the study of point defects in materials. However, finding the most stable defective structures remains a very challenging task as it involves the solution of a multimodal optimization problem with a high-dimensional objective function. Hitherto, the approaches most commonly used to tackle this problem have been mostly empirical, heuristic, and/or based on domain knowledge. In this contribution, we describe an approach for exploring the potential energy surface (PES) based on the covariance matrix adaptation evolution strategy (CMA-ES) and supervised and unsupervised machine learning models. The resulting algorithm depends only on a limited set of physically interpretable hyperparameters and the approach offers a systematic way for finding low-energy configurations of isolated point defects in solids. We demonstrate its applicability on different systems and show its ability to find known low-energy structures and discover additional ones as well.


2020 ◽  
Author(s):  
Antti Pihlajamaki ◽  
Joonas Hamalainen ◽  
Joakim Linja ◽  
Paavo Nieminen ◽  
Sami Malola ◽  
...  

<div> <div> <div> <p>We present an implementation of distance-based machine learning (ML) methods to create a realistic atomistic interaction potential to be used in Monte Carlo simulations of thermal dynamics of thiolate (SR) protected gold nanoclusters. The ML potential is trained for Au38(SR)24 by using previously published, density functional theory (DFT) -based, molecular dynamics (MD) simulation data on two experimentally characterized structural isomers of the cluster, and validated against independent DFT MD simulations. This method opens a door to efficient probing of the configuration space for further investigations of thermal-dependent electronic and optical properties of Au38(SR)24. Our ML implementation strategy allows for generalization and accuracy control of distance-based ML models for complex nanostructures having several chemical elements and interactions of varying strength. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Zheng Cheng ◽  
Jiahui Du ◽  
Lei Zhang ◽  
Jing Ma ◽  
Wei Li ◽  
...  

<p>Molecular dynamic (MD) simulation plays an essential role in understanding protein functions at atomic level. At present, MD simulations on proteins are mainly based on classical force fields. However, the accuracy of classical force fields for proteins is still insufficient for accurate descriptions of their structures and dynamical properties. Here we present a novel protocol to construct machine learning force field (MLFF) for a given protein with full quantum mechanics (QM) accuracy. In this protocol, the energy of the target system is obtained by fitting energies of its various subsystems constructed with the generalized energy-based fragmentation (GEBF) approach. To facilitate the construction of MLFF for various proteins, a protein’s data library is created to store all data of subsystems generated from trained proteins. With this protein’s data library, for a new protein only its subsystems with new topological types are required for the construction of the corresponding MLFF. This protocol is illustrated with two polypeptides, 4ZNN and 1XQ8 segment, as examples. The energies and forces predicted from this MLFF are in good agreement with those from density functional theory calculations, and dihedral angle distributions from GEBF-MLFF MD simulations can also well reproduce those from <i>ab initio</i> MD simulations. Therefore, this GEBF-ML protocol is expected to be an efficient and systematic way to build force fields for proteins and other biological systems with QM accuracy.<b></b></p>


2009 ◽  
Vol 08 (04) ◽  
pp. 677-690 ◽  
Author(s):  
JIN WEN ◽  
JING MA

Packing structures and orientation of sexithiophene (6T) molecules on Ag (111) surface are investigated by molecular dynamics (MD) simulations and quantum chemical calculations. Both the cluster and the slab models are employed. The density functional theory and molecular mechanism calculations demonstrate a weak physisorption and little site-preference in thiophene/ Ag (111) system. The MD simulations show that in the first layer close to the surface, the nearly coplanar 6T strips lie parallel with long axes deviating from [Formula: see text] direction about 20° – 30° and 75° – 90°. The average adsorption height of the monolayer is about 3.2 Å with most of the sulfur atoms in thienyl rings sitting on the bridge site of Ag (111) surface. The 6T molecules tend to take tilted orientations when they are far away from the surface. The packing structures of 6T layers deposited on the surface resulted from the competition between the molecule–substrate and intermolecular interactions.


Author(s):  
Steven Kauwe ◽  
Jake Graser ◽  
Ryan Murdock ◽  
Taylor Sparks

<p>One of the most common criticisms of machine learning is an assumed inability for models to extrapolate, i.e. to identify extraordinary materials with properties beyond those present in the training data set. To investigate whether this is indeed the case, this work takes advantage of density functional theory calculated properties (bulk modulus, shear modulus, thermal conductivity, thermal expansion, band gap and Debye temperature) to investigate whether machine learning is truly capable of predicting materials with properties that extend beyond previously seen values. We refer to these materials as extraordinary, meaning they represent the top 1% of values in the available data set. Interestingly, we show that even when machine learning is trained on a fraction of the bottom 99% we can consistently identify 3/4 of the highest performing compositions for all considered properties with a precision that is typically above 0.5. Moreover, we investigate a few different modeling choices and demonstrate how a classification approach can identify an equivalent amount of extraordinary compounds but with significantly fewer false positives than a regression approach. Finally, we discuss cautions and potential limitations in implementing such an approach to discover new record-breaking materials.</p>


Sign in / Sign up

Export Citation Format

Share Document