Sequential sampling method using Gaussian process regression for estimating extreme structural response

An efficient way to capture the dynamic characteristics of structural systems with uncertainties has been an important and challenging subject. While such characterization is valuable for structural response predictions, it could be impractical in many application situations where a sufficiently large sample is expensive or unavailable. In this paper, Gaussian process regression models are employed to capture structural dynamical responses, especially responses with uncertainties. When Gaussian processes are used to make predictions for responses with uncertainties, the sampling costs can be significantly reduced because only a relatively small set of data points is needed. With no loss of generality, applications of Gaussian process regression models are introduced in conjunction with Monte Carlo sampling. This approach can be easily generalized to situations where data points are obtained by other sampling techniques.

Download Full-text

Using Gaussian Process Regression to Integrate the Transition Structure Factor Curve for the Many-Body Correlation Energy

10.26226/morressier.5fa409874d4e91fe5c54b97a ◽

2020 ◽

Author(s):

Laura Weiler

Keyword(s):

Gaussian Process ◽

Structure Factor ◽

Correlation Energy ◽

Gaussian Process Regression ◽

Many Body ◽

Transition Structure ◽

The Many ◽

Structure Factor Curve ◽

Body Correlation

Download Full-text

Exchange Spin Coupling from Gaussian Process Regression

10.26434/chemrxiv.12589541.v3 ◽

2020 ◽

Author(s):

Marc Philipp Bahlke ◽

Natnael Mogos ◽

Jonny Proppe ◽

Carmen Herrmann

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

Gaussian Process Regression ◽

Molecular Magnets ◽

Molecular Structures ◽

Spin Coupling ◽

Structure Property ◽

Data Set ◽

Uncertainty Estimates

Heisenberg exchange spin coupling between metal centers is essential for describing and understanding the electronic structure of many molecular catalysts, metalloenzymes, and molecular magnets for potential application in information technology. We explore the machine-learnability of exchange spin coupling, which has not been studied yet. We employ Gaussian process regression since it can potentially deal with small training sets (as likely associated with the rather complex molecular structures required for exploring spin coupling) and since it provides uncertainty estimates (“error bars”) along with predicted values. We compare a range of descriptors and kernels for 257 small dicopper complexes and find that a simple descriptor based on chemical intuition, consisting only of copper-bridge angles and copper-copper distances, clearly outperforms several more sophisticated descriptors when it comes to extrapolating towards larger experimentally relevant complexes. Exchange spin coupling is similarly easy to learn as the polarizability, while learning dipole moments is much harder. The strength of the sophisticated descriptors lies in their ability to linearize structure-property relationships, to the point that a simple linear ridge regression performs just as well as the kernel-based machine-learning model for our small dicopper data set. The superior extrapolation performance of the simple descriptor is unique to exchange spin coupling, reinforcing the crucial role of choosing a suitable descriptor, and highlighting the interesting question of the role of chemical intuition vs. systematic or automated selection of features for machine learning in chemistry and material science.

Download Full-text

SAMPL6 Challenge Results from pKa Predictions Based on a General Gaussian Process Model

10.26434/chemrxiv.6406505.v2 ◽

2018 ◽

Author(s):

Caitlin C. Bannan ◽

David Mobley ◽

A. Geoff Skillman

Keyword(s):

Gaussian Process ◽

Process Model ◽

Molecular Graph ◽

Gaussian Process Regression ◽

Ionization State ◽

Training Set ◽

Physiochemical Properties ◽

Quantile Plots ◽

Physical And Chemical ◽

Good Agreement

<div>A variety of fields would benefit from accurate pK<sub>a</sub> predictions, especially drug design due to the affect a change in ionization state can have on a molecules physiochemical properties.</div><div>Participants in the recent SAMPL6 blind challenge were asked to submit predictions for microscopic and macroscopic pK<sub>a</sub>s of 24 drug like small molecules.</div><div>We recently built a general model for predicting pK<sub>a</sub>s using a Gaussian process regression trained using physical and chemical features of each ionizable group.</div><div>Our pipeline takes a molecular graph and uses the OpenEye Toolkits to calculate features describing the removal of a proton.</div><div>These features are fed into a Scikit-learn Gaussian process to predict microscopic pK<sub>a</sub>s which are then used to analytically determine macroscopic pK<sub>a</sub>s.</div><div>Our Gaussian process is trained on a set of 2,700 macroscopic pK<sub>a</sub>s from monoprotic and select diprotic molecules.</div><div>Here, we share our results for microscopic and macroscopic predictions in the SAMPL6 challenge.</div><div>Overall, we ranked in the middle of the pack compared to other participants, but our fairly good agreement with experiment is still promising considering the challenge molecules are chemically diverse and often polyprotic while our training set is predominately monoprotic.</div><div>Of particular importance to us when building this model was to include an uncertainty estimate based on the chemistry of the molecule that would reflect the likely accuracy of our prediction. </div><div>Our model reports large uncertainties for the molecules that appear to have chemistry outside our domain of applicability, along with good agreement in quantile-quantile plots, indicating it can predict its own accuracy.</div><div>The challenge highlighted a variety of means to improve our model, including adding more polyprotic molecules to our training set and more carefully considering what functional groups we do or do not identify as ionizable. </div>

Download Full-text