scholarly journals Data-Driven Approaches Can Overcome Limitations in Multireference Diagnostics

Author(s):  
Chenru Duan ◽  
Fang Liu ◽  
Aditya Nandy ◽  
Heather Kulik

High-throughput computational screening typically employs methods (i.e., density functional theory or DFT) that can fail to describe challenging molecules, such as those with strongly correlated electronic structure. In such cases, multireference (MR) correlated wavefunction theory (WFT) would be the appropriate choice but remains more challenging to carry out and automate than single-reference (SR) WFT or DFT. Numerous diagnostics have been proposed for identifying when MR character is likely to have an effect on the predictive power of SR calculations, but conflicting conclusions about diagnostic performance have been reached on small data sets. We compute 15 MR diagnostics, ranging from affordable DFT-based to more costly MR-WFT-based diagnostics, on a set of 3,165 equilibrium and distorted small organic molecules containing up to six heavy atoms. Conflicting MR character assignments and low pairwise linear correlations among diagnostics are also observed over this set. We evaluate the ability of existing diagnostics to predict the percent recovery of the correlation energy, %<i>E</i><sub>corr</sub>. None of the DFT-based diagnostics are nearly as predictive of %<i>E</i><sub>corr</sub> as the best WFT-based diagnostics. To overcome the limitation of this cost–accuracy trade-off, we develop machine learning (ML, i.e., kernel ridge regression) models to predict WFT-based diagnostics from a combination of DFT-based diagnostics and a new, size-independent 3D geometric representation. The ML-predicted diagnostics correlate as well with MR effects as their computed (i.e., with WFT) values, significantly improving over the DFT-based diagnostics on which the models were trained. These ML models thus provide a promising approach to improve upon DFT-based diagnostic accuracy while remaining suitably low cost for high-throughput screening.

2020 ◽  
Author(s):  
Chenru Duan ◽  
Fang Liu ◽  
Aditya Nandy ◽  
Heather Kulik

High-throughput computational screening typically employs methods (i.e., density functional theory or DFT) that can fail to describe challenging molecules, such as those with strongly correlated electronic structure. In such cases, multireference (MR) correlated wavefunction theory (WFT) would be the appropriate choice but remains more challenging to carry out and automate than single-reference (SR) WFT or DFT. Numerous diagnostics have been proposed for identifying when MR character is likely to have an effect on the predictive power of SR calculations, but conflicting conclusions about diagnostic performance have been reached on small data sets. We compute 15 MR diagnostics, ranging from affordable DFT-based to more costly MR-WFT-based diagnostics, on a set of 3,165 equilibrium and distorted small organic molecules containing up to six heavy atoms. Conflicting MR character assignments and low pairwise linear correlations among diagnostics are also observed over this set. We evaluate the ability of existing diagnostics to predict the percent recovery of the correlation energy, %<i>E</i><sub>corr</sub>. None of the DFT-based diagnostics are nearly as predictive of %<i>E</i><sub>corr</sub> as the best WFT-based diagnostics. To overcome the limitation of this cost–accuracy trade-off, we develop machine learning (ML, i.e., kernel ridge regression) models to predict WFT-based diagnostics from a combination of DFT-based diagnostics and a new, size-independent 3D geometric representation. The ML-predicted diagnostics correlate as well with MR effects as their computed (i.e., with WFT) values, significantly improving over the DFT-based diagnostics on which the models were trained. These ML models thus provide a promising approach to improve upon DFT-based diagnostic accuracy while remaining suitably low cost for high-throughput screening.


2018 ◽  
Author(s):  
isabelle Heath-Apostolopoulos ◽  
Liam Wilbraham ◽  
Martijn Zwijnenburg

We discuss a low-cost computational workflow for the high-throughput screening of polymeric photocatalysts and demonstrate its utility by applying it to a number of challenging problems that would be difficult to tackle otherwise. Specifically we show how having access to a low-cost method allows one to screen a vast chemical space, as well as to probe the effects of conformational degrees of freedom and sequence isomerism. Finally, we discuss both the opportunities of computational screening in the search for polymer photocatalysts, as well as the biggest challenges.


Author(s):  
Haomin Chen ◽  
Lee Loong Wong ◽  
Stefan Adams

The identification of materials for advanced energy-storage systems is still mostly based on experimental trial and error. Increasingly, computational tools are sought to accelerate materials discovery by computational predictions. Here are introduced a set of computationally inexpensive software tools that exploit the bond-valence-based empirical force field previously developed by the authors to enable high-throughput computational screening of experimental or simulated crystal-structure models of battery materials predicting a variety of properties of technological relevance, including a structure plausibility check, surface energies, an inventory of equilibrium and interstitial sites, the topology of ion-migration paths in between those sites, the respective migration barriers and the site-specific attempt frequencies. All of these can be predicted from CIF files of structure models at a minute fraction of the computational cost of density functional theory (DFT) simulations, and with the added advantage that all the relevant pathway segments are analysed instead of arbitrarily predetermined paths. The capabilities and limitations of the approach are evaluated for a wide range of ion-conducting solids. An integrated simple kinetic Monte Carlo simulation provides rough (but less reliable) predictions of the absolute conductivity at a given temperature. The automated adaptation of the force field to the composition and charge distribution in the simulated material allows for a high transferability of the force field within a wide range of Lewis acid–Lewis base-type ionic inorganic compounds as necessary for high-throughput screening. While the transferability and precision will not reach the same levels as in DFT simulations, the fact that the computational cost is several orders of magnitude lower allows the application of the approach not only to pre-screen databases of simple structure prototypes but also to structure models of complex disordered or amorphous phases, and provides a path to expand the analysis to charge transfer across interfaces that would be difficult to cover by ab initio methods.


2018 ◽  
Author(s):  
Liam Wilbraham ◽  
Enrico Berardo ◽  
Lukas Turcani ◽  
Kim Jelfs ◽  
Martijn Zwijnenburg

<p>We propose a general high-throughput computational screening approach for the optical and electronic properties of conjugated polymers. This approach makes use of the recently developed xTB family of low-computational-cost density functional tight-binding methods from Grimme and co-workers, calibrated here to (TD-)DFT data computed for a representative diverse set of (co-)polymers. Parameters drawn from the resulting calibration using a linear model can then be applied to the xTB derived results for new polymers, thus generating near DFT-quality data with orders of magnitude reduction in computational cost. As a result, after an initial computational investment for calibration, this approach can be used to quickly and accurately screen on the order of thousands of polymers for target applications. We also demonstrate that the (opto)electronic properties of the conjugated polymers show only a very minor variation when considering different conformers and that the results of high-throughput screening are therefore expected to be relatively insensitive with respect to the conformer search methodology applied.</p>


2020 ◽  
Author(s):  
Chenru Duan ◽  
Fang Liu ◽  
Aditya Nandy ◽  
Heather Kulik

Multireference (MR) diagnostics are common tools for identifying strongly correlated electronic structure that makes single reference (SR) methods (e.g., density functional theory or DFT) insufficient for accurate property prediction. However, MR diagnostics typically require computationally demanding correlated wavefunction theory (WFT) calculations, and diagnostics often disagree or fail to predict MR effects on properties. To overcome these challenges, we introduce a semi-supervised machine learning (ML) approach with virtual adversarial training (VAT) of an MR classifier using 15 WFT and DFT MR diagnostics as inputs. In semi-supervised learning, only the most extreme SR or MR points are labeled, and the remaining point labels are learned. The resulting VAT model outperforms the alternatives, as quantified by the distinct property distributions of SR- and MR-classified molecules. To reduce the cost of generating inputs to the VAT model, we leverage the VAT model’s robustness to noisy inputs by replacing WFT MR diagnostics with regression predictions in a MR decision engine workflow that preserves excellent performance. We demonstrate the transferability of our approach to larger molecules and those with distinct chemical composition from the training set. This MR decision engine demonstrates promise as a low-cost, high-accuracy approach to the automatic detection of strong correlation for predictive high-throughput screening.


2018 ◽  
Author(s):  
isabelle Heath-Apostolopoulos ◽  
Liam Wilbraham ◽  
Martijn Zwijnenburg

We discuss a low-cost computational workflow for the high-throughput screening of polymeric photocatalysts and demonstrate its utility by applying it to a number of challenging problems that would be difficult to tackle otherwise. Specifically we show how having access to a low-cost method allows one to screen a vast chemical space, as well as to probe the effects of conformational degrees of freedom and sequence isomerism. Finally, we discuss both the opportunities of computational screening in the search for polymer photocatalysts, as well as the biggest challenges.


2018 ◽  
Author(s):  
isabelle Heath-Apostolopoulos ◽  
Liam Wilbraham ◽  
Martijn Zwijnenburg

We discuss a low-cost computational workflow for the high-throughput screening of polymeric photocatalysts and demonstrate its utility by applying it to a number of challenging problems that would be difficult to tackle otherwise. Specifically we show how having access to a low-cost method allows one to screen a vast chemical space, as well as to probe the effects of conformational degrees of freedom and sequence isomerism. Finally, we discuss both the opportunities of computational screening in the search for polymer photocatalysts, as well as the biggest challenges.


2018 ◽  
Author(s):  
isabelle Heath-Apostolopoulos ◽  
Liam Wilbraham ◽  
Martijn Zwijnenburg

We discuss a low-cost computational workflow for the high-throughput screening of polymeric photocatalysts and demonstrate its utility by applying it to a number of challenging problems that would be difficult to tackle otherwise. Specifically we show how having access to a low-cost method allows one to screen a vast chemical space, as well as to probe the effects of conformational degrees of freedom and sequence isomerism. Finally, we discuss both the opportunities of computational screening in the search for polymer photocatalysts, as well as the biggest challenges.


2018 ◽  
Author(s):  
isabelle Heath-Apostolopoulos ◽  
Liam Wilbraham ◽  
Martijn Zwijnenburg

We discuss a low-cost computational workflow for the high-throughput screening of polymeric photocatalysts and demonstrate its utility by applying it to a number of challenging problems that would be difficult to tackle otherwise. Specifically we show how having access to a low-cost method allows one to screen a vast chemical space, as well as to probe the effects of conformational degrees of freedom and sequence isomerism. Finally, we discuss both the opportunities of computational screening in the search for polymer photocatalysts, as well as the biggest challenges.


2020 ◽  
Author(s):  
Chenru Duan ◽  
Fang Liu ◽  
Aditya Nandy ◽  
Heather Kulik

Multireference (MR) diagnostics are common tools for identifying strongly correlated electronic structure that makes single reference (SR) methods (e.g., density functional theory or DFT) insufficient for accurate property prediction. However, MR diagnostics typically require computationally demanding correlated wavefunction theory (WFT) calculations, and diagnostics often disagree or fail to predict MR effects on properties. To overcome these challenges, we introduce a semi-supervised machine learning (ML) approach with virtual adversarial training (VAT) of an MR classifier using 15 WFT and DFT MR diagnostics as inputs. In semi-supervised learning, only the most extreme SR or MR points are labeled, and the remaining point labels are learned. The resulting VAT model outperforms the alternatives, as quantified by the distinct property distributions of SR- and MR-classified molecules. To reduce the cost of generating inputs to the VAT model, we leverage the VAT model’s robustness to noisy inputs by replacing WFT MR diagnostics with regression predictions in a MR decision engine workflow that preserves excellent performance. We demonstrate the transferability of our approach to larger molecules and those with distinct chemical composition from the training set. This MR decision engine demonstrates promise as a low-cost, high-accuracy approach to the automatic detection of strong correlation for predictive high-throughput screening.


Sign in / Sign up

Export Citation Format

Share Document