Data-Driven Approaches Can Overcome Limitations in Multireference Diagnostics

Density Functional ◽

Correlation Energy ◽

Low Cost ◽

Small Data ◽

Data Sets ◽

Strongly Correlated ◽

Heavy Atoms ◽

Computational Screening

High-throughput computational screening typically employs methods (i.e., density functional theory or DFT) that can fail to describe challenging molecules, such as those with strongly correlated electronic structure. In such cases, multireference (MR) correlated wavefunction theory (WFT) would be the appropriate choice but remains more challenging to carry out and automate than single-reference (SR) WFT or DFT. Numerous diagnostics have been proposed for identifying when MR character is likely to have an effect on the predictive power of SR calculations, but conflicting conclusions about diagnostic performance have been reached on small data sets. We compute 15 MR diagnostics, ranging from affordable DFT-based to more costly MR-WFT-based diagnostics, on a set of 3,165 equilibrium and distorted small organic molecules containing up to six heavy atoms. Conflicting MR character assignments and low pairwise linear correlations among diagnostics are also observed over this set. We evaluate the ability of existing diagnostics to predict the percent recovery of the correlation energy, %Ecorr. None of the DFT-based diagnostics are nearly as predictive of %Ecorr as the best WFT-based diagnostics. To overcome the limitation of this cost–accuracy trade-off, we develop machine learning (ML, i.e., kernel ridge regression) models to predict WFT-based diagnostics from a combination of DFT-based diagnostics and a new, size-independent 3D geometric representation. The ML-predicted diagnostics correlate as well with MR effects as their computed (i.e., with WFT) values, significantly improving over the DFT-based diagnostics on which the models were trained. These ML models thus provide a promising approach to improve upon DFT-based diagnostic accuracy while remaining suitably low cost for high-throughput screening.

Computational High-Throughput Screening of Polymeric Photocatalysts: Exploring the Effect of Composition, Sequence Isomerism and Conformational Degrees of Freedom

10.26434/chemrxiv.7314929.v3 ◽

2018 ◽

Author(s):

isabelle Heath-Apostolopoulos ◽

Liam Wilbraham ◽

Martijn Zwijnenburg

Keyword(s):

High Throughput ◽

Degrees Of Freedom ◽

Chemical Space ◽

Low Cost ◽

Computational Screening ◽

Computational Workflow ◽

We discuss a low-cost computational workflow for the high-throughput screening of polymeric photocatalysts and demonstrate its utility by applying it to a number of challenging problems that would be difficult to tackle otherwise. Specifically we show how having access to a low-cost method allows one to screen a vast chemical space, as well as to probe the effects of conformational degrees of freedom and sequence isomerism. Finally, we discuss both the opportunities of computational screening in the search for polymer photocatalysts, as well as the biggest challenges.

Acta Crystallographica Section B Structural Science Crystal Engineering and Materials ◽

SoftBV – a software tool for screening the materials genome of inorganic fast ion conductors

10.1107/s2052520618015718 ◽

2019 ◽

Vol 75 (1) ◽

pp. 18-33 ◽

Cited By ~ 35

Author(s):

Haomin Chen ◽

Lee Loong Wong ◽

Stefan Adams

Keyword(s):

Force Field ◽

High Throughput ◽

Density Functional ◽

Inorganic Compounds ◽

Computational Cost ◽

Lewis Base ◽

Computational Screening ◽

Wide Range ◽

Dft Simulations

The identification of materials for advanced energy-storage systems is still mostly based on experimental trial and error. Increasingly, computational tools are sought to accelerate materials discovery by computational predictions. Here are introduced a set of computationally inexpensive software tools that exploit the bond-valence-based empirical force field previously developed by the authors to enable high-throughput computational screening of experimental or simulated crystal-structure models of battery materials predicting a variety of properties of technological relevance, including a structure plausibility check, surface energies, an inventory of equilibrium and interstitial sites, the topology of ion-migration paths in between those sites, the respective migration barriers and the site-specific attempt frequencies. All of these can be predicted from CIF files of structure models at a minute fraction of the computational cost of density functional theory (DFT) simulations, and with the added advantage that all the relevant pathway segments are analysed instead of arbitrarily predetermined paths. The capabilities and limitations of the approach are evaluated for a wide range of ion-conducting solids. An integrated simple kinetic Monte Carlo simulation provides rough (but less reliable) predictions of the absolute conductivity at a given temperature. The automated adaptation of the force field to the composition and charge distribution in the simulated material allows for a high transferability of the force field within a wide range of Lewis acid–Lewis base-type ionic inorganic compounds as necessary for high-throughput screening. While the transferability and precision will not reach the same levels as in DFT simulations, the fact that the computational cost is several orders of magnitude lower allows the application of the approach not only to pre-screen databases of simple structure prototypes but also to structure models of complex disordered or amorphous phases, and provides a path to expand the analysis to charge transfer across interfaces that would be difficult to cover by ab initio methods.

A High-Throughput Screening Approach for the Optoelectronic Properties of Conjugated Polymers

10.26434/chemrxiv.6181841.v1 ◽

2018 ◽

Author(s):

Liam Wilbraham ◽

Enrico Berardo ◽

Lukas Turcani ◽

Kim Jelfs ◽

Martijn Zwijnenburg

Keyword(s):

Conjugated Polymers ◽

Electronic Properties ◽

High Throughput ◽

Density Functional ◽

Computational Cost ◽

Quality Data ◽

Minor Variation ◽

Computational Screening ◽

Screening Approach

We propose a general high-throughput computational screening approach for the optical and electronic properties of conjugated polymers. This approach makes use of the recently developed xTB family of low-computational-cost density functional tight-binding methods from Grimme and co-workers, calibrated here to (TD-)DFT data computed for a representative diverse set of (co-)polymers. Parameters drawn from the resulting calibration using a linear model can then be applied to the xTB derived results for new polymers, thus generating near DFT-quality data with orders of magnitude reduction in computational cost. As a result, after an initial computational investment for calibration, this approach can be used to quickly and accurately screen on the order of thousands of polymers for target applications. We also demonstrate that the (opto)electronic properties of the conjugated polymers show only a very minor variation when considering different conformers and that the results of high-throughput screening are therefore expected to be relatively insensitive with respect to the conformer search methodology applied.

Semi-Supervised Machine Learning Enables the Robust Detection of Multireference Character at Low Cost

10.26434/chemrxiv.12592346 ◽

2020 ◽

Author(s):

Chenru Duan ◽

Fang Liu ◽

Aditya Nandy ◽

Heather Kulik

Keyword(s):

Machine Learning ◽

Density Functional ◽

Low Cost ◽

Supervised Machine Learning ◽

Strongly Correlated ◽

Robust Detection ◽

Distinct Property ◽

The Cost ◽

Decision Engine

Multireference (MR) diagnostics are common tools for identifying strongly correlated electronic structure that makes single reference (SR) methods (e.g., density functional theory or DFT) insufficient for accurate property prediction. However, MR diagnostics typically require computationally demanding correlated wavefunction theory (WFT) calculations, and diagnostics often disagree or fail to predict MR effects on properties. To overcome these challenges, we introduce a semi-supervised machine learning (ML) approach with virtual adversarial training (VAT) of an MR classifier using 15 WFT and DFT MR diagnostics as inputs. In semi-supervised learning, only the most extreme SR or MR points are labeled, and the remaining point labels are learned. The resulting VAT model outperforms the alternatives, as quantified by the distinct property distributions of SR- and MR-classified molecules. To reduce the cost of generating inputs to the VAT model, we leverage the VAT model’s robustness to noisy inputs by replacing WFT MR diagnostics with regression predictions in a MR decision engine workflow that preserves excellent performance. We demonstrate the transferability of our approach to larger molecules and those with distinct chemical composition from the training set. This MR decision engine demonstrates promise as a low-cost, high-accuracy approach to the automatic detection of strong correlation for predictive high-throughput screening.

Computational High-Throughput Screening of Polymeric Photocatalysts: Exploring the Effect of Composition, Sequence Isomerism and Conformational Degrees of Freedom

10.26434/chemrxiv.7314929.v4 ◽

2018 ◽

Author(s):

isabelle Heath-Apostolopoulos ◽

Liam Wilbraham ◽

Martijn Zwijnenburg

Keyword(s):

High Throughput ◽

Degrees Of Freedom ◽

Chemical Space ◽

Low Cost ◽

Computational Screening ◽

Computational Workflow ◽

We discuss a low-cost computational workflow for the high-throughput screening of polymeric photocatalysts and demonstrate its utility by applying it to a number of challenging problems that would be difficult to tackle otherwise. Specifically we show how having access to a low-cost method allows one to screen a vast chemical space, as well as to probe the effects of conformational degrees of freedom and sequence isomerism. Finally, we discuss both the opportunities of computational screening in the search for polymer photocatalysts, as well as the biggest challenges.

Computational High-Throughput Screening of Polymeric Photocatalysts: Exploring the Effect of Composition, Sequence Isomerism and Conformational Degrees of Freedom

10.26434/chemrxiv.7314929.v1 ◽

2018 ◽

Author(s):

isabelle Heath-Apostolopoulos ◽

Liam Wilbraham ◽

Martijn Zwijnenburg

Keyword(s):

High Throughput ◽

Degrees Of Freedom ◽

Chemical Space ◽

Low Cost ◽

Computational Screening ◽

Computational Workflow ◽

We discuss a low-cost computational workflow for the high-throughput screening of polymeric photocatalysts and demonstrate its utility by applying it to a number of challenging problems that would be difficult to tackle otherwise. Specifically we show how having access to a low-cost method allows one to screen a vast chemical space, as well as to probe the effects of conformational degrees of freedom and sequence isomerism. Finally, we discuss both the opportunities of computational screening in the search for polymer photocatalysts, as well as the biggest challenges.

Computational High-Throughput Screening of Polymeric Photocatalysts: Exploring the Effect of Composition, Sequence Isomerism and Conformational Degrees of Freedom

10.26434/chemrxiv.7314929 ◽

2018 ◽

Author(s):

isabelle Heath-Apostolopoulos ◽

Liam Wilbraham ◽

Martijn Zwijnenburg

Keyword(s):

High Throughput ◽

Degrees Of Freedom ◽

Chemical Space ◽

Low Cost ◽

Computational Screening ◽

Computational Workflow ◽

We discuss a low-cost computational workflow for the high-throughput screening of polymeric photocatalysts and demonstrate its utility by applying it to a number of challenging problems that would be difficult to tackle otherwise. Specifically we show how having access to a low-cost method allows one to screen a vast chemical space, as well as to probe the effects of conformational degrees of freedom and sequence isomerism. Finally, we discuss both the opportunities of computational screening in the search for polymer photocatalysts, as well as the biggest challenges.

Computational High-Throughput Screening of Polymeric Photocatalysts: Exploring the Effect of Composition, Sequence Isomerism and Conformational Degrees of Freedom

10.26434/chemrxiv.7314929.v2 ◽

2018 ◽

Author(s):

isabelle Heath-Apostolopoulos ◽

Liam Wilbraham ◽

Martijn Zwijnenburg

Keyword(s):

High Throughput ◽

Degrees Of Freedom ◽

Chemical Space ◽

Low Cost ◽

Computational Screening ◽

Computational Workflow ◽

We discuss a low-cost computational workflow for the high-throughput screening of polymeric photocatalysts and demonstrate its utility by applying it to a number of challenging problems that would be difficult to tackle otherwise. Specifically we show how having access to a low-cost method allows one to screen a vast chemical space, as well as to probe the effects of conformational degrees of freedom and sequence isomerism. Finally, we discuss both the opportunities of computational screening in the search for polymer photocatalysts, as well as the biggest challenges.

Semi-Supervised Machine Learning Enables the Robust Detection of Multireference Character at Low Cost

10.26434/chemrxiv.12592346.v1 ◽

2020 ◽

Author(s):

Chenru Duan ◽

Fang Liu ◽

Aditya Nandy ◽

Heather Kulik

Keyword(s):

Machine Learning ◽

Density Functional ◽

Low Cost ◽

Supervised Machine Learning ◽

Strongly Correlated ◽

Robust Detection ◽

Distinct Property ◽

The Cost ◽

Decision Engine

Multireference (MR) diagnostics are common tools for identifying strongly correlated electronic structure that makes single reference (SR) methods (e.g., density functional theory or DFT) insufficient for accurate property prediction. However, MR diagnostics typically require computationally demanding correlated wavefunction theory (WFT) calculations, and diagnostics often disagree or fail to predict MR effects on properties. To overcome these challenges, we introduce a semi-supervised machine learning (ML) approach with virtual adversarial training (VAT) of an MR classifier using 15 WFT and DFT MR diagnostics as inputs. In semi-supervised learning, only the most extreme SR or MR points are labeled, and the remaining point labels are learned. The resulting VAT model outperforms the alternatives, as quantified by the distinct property distributions of SR- and MR-classified molecules. To reduce the cost of generating inputs to the VAT model, we leverage the VAT model’s robustness to noisy inputs by replacing WFT MR diagnostics with regression predictions in a MR decision engine workflow that preserves excellent performance. We demonstrate the transferability of our approach to larger molecules and those with distinct chemical composition from the training set. This MR decision engine demonstrates promise as a low-cost, high-accuracy approach to the automatic detection of strong correlation for predictive high-throughput screening.