scholarly journals Performance comparison of r2SCAN and SCAN metaGGA density functionals for solid materials via an automated, high-throughput computational workflow

Author(s):  
Ryan Kingsbury ◽  
Ayush Gupta ◽  
Christopher Bartel ◽  
Jason Munro ◽  
Shyam Dwaraknath ◽  
...  

Computational materials discovery efforts utilize hundreds or thousands of density functional theory (DFT) calculations to predict material properties. Historically, such efforts have performed calculations at the generalized gradient approximation (GGA) level of theory due to its efficient compromise between accuracy and computational reliability. However, high-throughput calculations at the higher metaGGA level of theory are becoming feasible. The Strongly Constrainted and Appropriately Normed (SCAN) metaGGA functional offers superior accuracy to GGA across much of chemical space, making it appealing as a general-purpose metaGGA functional, but it suffers from numerical instabilities that impede it's use in high-throughput workflows. The recently-developed r2SCAN metaGGA functional promises accuracy similar to SCAN in addition to more robust numerical performance. However, its performance compared to SCAN has yet to be evaluated over a large group of solid materials. In this work, we compared r2SCAN and SCAN predictions for key properties of approximately 6,000 solid materials using a newly-developed high-throughput computational workflow. We find that r2SCAN predicts formation energies more accurately than SCAN and PBEsol for both strongly- and weakly-bound materials and that r2SCAN predicts systematically larger lattice constants than SCAN. We also find that r2SCAN requires modestly fewer computational resources than SCAN and offers much more reliable convergence. Thus, our large-scale benchmark confirms that r2SCAN has delivered on its promises of numerical efficiency and accuracy, making it an ideal choice for high-throughput metaGGA calculations.

2021 ◽  
Author(s):  
Ryan Kingsbury ◽  
Ayush Gupta ◽  
Christopher Bartel ◽  
Jason Munro ◽  
Shyam Dwaraknath ◽  
...  

Computational materials discovery efforts utilize hundreds or thousands of density functional theory (DFT) calculations to predict material properties. Historically, such efforts have performed calculations at the generalized gradient approximation (GGA) level of theory due to its efficient compromise between accuracy and computational reliability. However, high-throughput calculations at the higher metaGGA level of theory are becoming feasible. The Strongly Constrainted and Appropriately Normed (SCAN) metaGGA functional offers superior accuracy to GGA across much of chemical space, making it appealing as a general-purpose metaGGA functional, but it suffers from numerical instabilities that impede it's use in high-throughput workflows. The recently-developed r2SCAN metaGGA functional promises accuracy similar to SCAN in addition to more robust numerical performance. However, its performance compared to SCAN has yet to be evaluated over a large group of solid materials. In this work, we compared r2SCAN and SCAN predictions for key properties of approximately 6,000 solid materials using a newly-developed high-throughput computational workflow. We find that r2SCAN predicts formation energies more accurately than SCAN and PBEsol for both strongly- and weakly-bound materials and that r2SCAN predicts systematically larger lattice constants than SCAN. We also find that r2SCAN requires modestly fewer computational resources than SCAN and offers significantly more reliable convergence. Thus, our large-scale benchmark confirms that r2SCAN has delivered on its promises of numerical efficiency and accuracy, making it a preferred choice for high-throughput metaGGA calculations.


2018 ◽  
Author(s):  
isabelle Heath-Apostolopoulos ◽  
Liam Wilbraham ◽  
Martijn Zwijnenburg

We discuss a low-cost computational workflow for the high-throughput screening of polymeric photocatalysts and demonstrate its utility by applying it to a number of challenging problems that would be difficult to tackle otherwise. Specifically we show how having access to a low-cost method allows one to screen a vast chemical space, as well as to probe the effects of conformational degrees of freedom and sequence isomerism. Finally, we discuss both the opportunities of computational screening in the search for polymer photocatalysts, as well as the biggest challenges.


2022 ◽  
Author(s):  
Shomik Verma ◽  
Miguel Rivera ◽  
David O. Scanlon ◽  
Aron Walsh

Understanding the excited state properties of molecules provides insights into how they interact with light. These interactions can be exploited to design compounds for photochemical applications, including enhanced spectral conversion of light to increase the efficiency of photovoltaic cells. While chemical discovery is time- and resource-intensive experimentally, computational chemistry can be used to screen large-scale databases for molecules of interest in a procedure known as high-throughput virtual screening. The first step usually involves a high-speed but low-accuracy method to screen large numbers of molecules (potentially millions) so only the best candidates are evaluated with expensive methods. However, use of a coarse first-pass screening method can potentially result in high false positive or false negative rates. Therefore, this study uses machine learning to calibrate a high-throughput technique (xTB-sTDA) against a higher accuracy one (TD-DFT). Testing the calibration model shows a ~5-fold decrease in error in-domain and a ~3-fold decrease out-of-domain. The resulting mean absolute error of ~0.14 eV is in line with previous work in machine learning calibrations and out-performs previous work in linear calibration of xTB-sTDA. We then apply the calibration model to screen a 250k molecule database and map inaccuracies of xTB-sTDA in chemical space. We also show generalizability of the workflow by calibrating against a higher-level technique (CC2), yielding a similarly low error. Overall, this work demonstrates machine learning can be used to develop a both cheap and accurate method for large-scale excited state screening, enabling accelerated molecular discovery across a variety of disciplines.


2020 ◽  
Author(s):  
sambit kumar das ◽  
Sabyasachi Chakraborty ◽  
Raghunathan Ramakrishnan

First-principles calculation of the standard formation enthalpy, $\Delta H_f^0$~(298K), in such large scale as required by chemical space explorations, is amenable only with density functional approximations (DFAs) and some composite wave function theories (cWFTs). Alas, the accuracies of popular range-separated hybrid, `rung-4' DFAs, and cWFTs that offer the best accuracy-vs.-cost trade-off have as yet been established only for datasets predominantly comprising small molecules, hence, their transferability to larger datasets remains vague. In this study, we present an extended benchmark dataset of over two-thousand values of $\Delta H_f^0$ for structurally and electronically diverse molecules. We apply quartile-ranking based on boundary-corrected kernel density estimation to filter outliers and arrive at Probabilistically Pruned Enthalpies of 1908 compounds (PPE1908). For this dataset, we rank the prediction accuracies of G4(MP2), ccCA and 23 popular DFAs using conventional and probabilistic error metrics. We discuss systematic prediction errors and highlight the role an empirical higher-level correction (HLC) plays in the G4(MP2) model. Furthermore, we comment on uncertainties associated with the reference empirical data for atoms and systematic errors introduced by these that grow with the molecular size. We believe these findings to aid in identifying meaningful application domains for quantum thermochemical methods.


2020 ◽  
Author(s):  
Fang Liu ◽  
Chenru Duan ◽  
Heather Kulik

<p>Despite its widespread use in chemical discovery, approximate density functional theory (DFT) is poorly suited to many targets, such as those containing open-shell, 3<i>d</i> transition metals that can be expected to have strong multi-reference (MR) character. For discovery workflows to be predictive, we need automated, low-cost methods that can distinguish the regions of chemical space where DFT should be applied from those where it should not. We curate over 4,800 open-shell transition-metal complexes up to hundreds of atoms in size from prior high-throughput DFT studies and evaluate affordable, finite-temperature DFT evaluation of fractional occupation number (FON)-based MR diagnostics. We show that intuitive measures of strong correlation (i.e., the HOMO–LUMO gap) are not predictive of MR character as judged by FON-based diagnostics. Analysis of independently trained machine learning (ML) models to predict HOMO–LUMO gaps and FON-based diagnostics reveals differences in metal- and ligand-sensitivity of the two quantities. We use our trained ML models to rapidly evaluate MR character over a space of ca. 187,000 theoretical complexes, identifying large-scale trends in spin-state-dependent MR character and finding small HOMO–LUMO gap complexes while ensuring low MR character. </p>


2018 ◽  
Author(s):  
isabelle Heath-Apostolopoulos ◽  
Liam Wilbraham ◽  
Martijn Zwijnenburg

We discuss a low-cost computational workflow for the high-throughput screening of polymeric photocatalysts and demonstrate its utility by applying it to a number of challenging problems that would be difficult to tackle otherwise. Specifically we show how having access to a low-cost method allows one to screen a vast chemical space, as well as to probe the effects of conformational degrees of freedom and sequence isomerism. Finally, we discuss both the opportunities of computational screening in the search for polymer photocatalysts, as well as the biggest challenges.


2020 ◽  
Author(s):  
Phani Ghanakota ◽  
Pieter Bos ◽  
Kyle Konze ◽  
Joshua Staker ◽  
Gabriel Marques ◽  
...  

The hit identification process usually involves the profiling of millions to more recently billions of compounds either via traditional experimental high throughput screens (HTS) or computational virtual high throughput screens (vHTS). We have previously demonstrated that by coupling reaction-based enumeration, active learning and free energy calculations, a similarly large-scale exploration of chemical space can be extended to the hit-to-lead process. In this work, we augment that approach by coupling large scale enumeration and cloud-based FEP profiling with goal-directed generative machine learning, which results in a higher enrichment of potent ideas compared to large scale enumeration alone, while simultaneously staying within the bounds of a predefined drug-like property space. We are able to achieve this by building the molecular distribution for generative machine learning from the PathFinder rules-based enumeration and optimizing for a weighted sum QSAR based multi-parameter optimization function. We examine the utility of this combined approach by designing potent inhibitors of cyclin-dependent kinase 2 (CDK2) and demonstrate a coupled workflow that can: (1) provide a 6.4 fold enrichment improvement in identifying < 10nM compounds over random selection, and a 1.5 fold enrichment in identifying < 10nM compounds over our previous method (2) rapidly explore relevant chemical space outside the bounds of commercial reagents, (3) use generative ML approaches to “learn” the SAR from large scale in silico enumerations and generate novel idea molecules for a flexible receptor site that are both potent and within relevant physicochemical space and (4) produce over 3,000,000 idea molecules and run 2153 FEP simulations, identifying 69 ideas with a predicted IC<sub>50</sub> < 10nM and 358 ideas with a predicted IC<sub>50</sub> <100 nM. The reported data suggest combining both reaction-based and generative machine learning for ideation results in a higher enrichment of potent compounds over previously described approaches, and can rapidly accelerate the discovery of novel chemical matter within a predefined potency and property space.<br>


2020 ◽  
Author(s):  
Fang Liu ◽  
Chenru Duan ◽  
Heather Kulik

<p>Despite its widespread use in chemical discovery, approximate density functional theory (DFT) is poorly suited to many targets, such as those containing open-shell, 3<i>d</i> transition metals that can be expected to have strong multi-reference (MR) character. For discovery workflows to be predictive, we need automated, low-cost methods that can distinguish the regions of chemical space where DFT should be applied from those where it should not. We curate over 4,800 open-shell transition-metal complexes up to hundreds of atoms in size from prior high-throughput DFT studies and evaluate affordable, finite-temperature DFT evaluation of fractional occupation number (FON)-based MR diagnostics. We show that intuitive measures of strong correlation (i.e., the HOMO–LUMO gap) are not predictive of MR character as judged by FON-based diagnostics. Analysis of independently trained machine learning (ML) models to predict HOMO–LUMO gaps and FON-based diagnostics reveals differences in metal- and ligand-sensitivity of the two quantities. We use our trained ML models to rapidly evaluate MR character over a space of ca. 187,000 theoretical complexes, identifying large-scale trends in spin-state-dependent MR character and finding small HOMO–LUMO gap complexes while ensuring low MR character. </p>


2020 ◽  
Author(s):  
Phani Ghanakota ◽  
Pieter Bos ◽  
Kyle Konze ◽  
Joshua Staker ◽  
Gabriel Marques ◽  
...  

The hit identification process usually involves the profiling of millions to more recently billions of compounds either via traditional experimental high throughput screens (HTS) or computational virtual high throughput screens (vHTS). We have previously demonstrated that by coupling reaction-based enumeration, active learning and free energy calculations, a similarly large-scale exploration of chemical space can be extended to the hit-to-lead process. In this work, we augment that approach by coupling large scale enumeration and cloud-based FEP profiling with goal-directed generative machine learning, which results in a higher enrichment of potent ideas compared to large scale enumeration alone, while simultaneously staying within the bounds of a predefined drug-like property space. We are able to achieve this by building the molecular distribution for generative machine learning from the PathFinder rules-based enumeration and optimizing for a weighted sum QSAR based multi-parameter optimization function. We examine the utility of this combined approach by designing potent inhibitors of cyclin-dependent kinase 2 (CDK2) and demonstrate a coupled workflow that can: (1) provide a 6.4 fold enrichment improvement in identifying < 10nM compounds over random selection, and a 1.5 fold enrichment in identifying < 10nM compounds over our previous method (2) rapidly explore relevant chemical space outside the bounds of commercial reagents, (3) use generative ML approaches to “learn” the SAR from large scale in silico enumerations and generate novel idea molecules for a flexible receptor site that are both potent and within relevant physicochemical space and (4) produce over 3,000,000 idea molecules and run 2153 FEP simulations, identifying 69 ideas with a predicted IC<sub>50</sub> < 10nM and 358 ideas with a predicted IC<sub>50</sub> <100 nM. The reported data suggest combining both reaction-based and generative machine learning for ideation results in a higher enrichment of potent compounds over previously described approaches, and can rapidly accelerate the discovery of novel chemical matter within a predefined potency and property space.<br>


Sign in / Sign up

Export Citation Format

Share Document