Computationally efficient algorithm for Gaussian Process regression in case of structured samples

Abstract. Satellite remote sensing provides a global view to processes on Earth that has unique benefits compared to measurements made on the ground. The global coverage and the enormous amounts of data produced come, however, with the price of spatial and temporal gaps and less than perfect data quality. Meaningful statistical inference from such data requires overcoming these problems and that calls for developing efficient computational tools. We design and implement a computationally efficient multi-scale Gaussian process (GP) software package, satGP, geared towards remote sensing applications. The software is designed to be able to handle problems of enormous sizes and is able to compute marginals and sample from a random process with at least over hundred million observations. The mean function of the Gaussian process is described by approximating marginals of a Markov random field (MRF). For covariance functions, Matern, exponential, and periodic kernels are utilized in a multi-scale kernel setting to describe the spatial heterogeneity present in data. We further demonstrate how winds can be used to inform the covariance kernel formulation. The covariance kernel parameters are learned by calculating an approximate marginal maximum likelihood estimate and this is utilized to verify the validity of the multi-scale approach in synthetic experiments. For demonstrating the techniques above, data from the Orbiting Carbon Observatory 2 (OCO-2) satellite is used. The satGP program is released as open source software.

Download Full-text

Splitting Gaussian processes for computationally-efficient regression

PLoS ONE ◽

10.1371/journal.pone.0256470 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0256470

Author(s):

Nick Terry ◽

Youngjun Choe

Keyword(s):

Gaussian Process ◽

Gaussian Processes ◽

Time Complexity ◽

Kernel Method ◽

Gaussian Process Regression ◽

Process Models ◽

Limiting Factor ◽

Computationally Efficient ◽

Continuity Properties ◽

Memory Complexity

Gaussian processes offer a flexible kernel method for regression. While Gaussian processes have many useful theoretical properties and have proven practically useful, they suffer from poor scaling in the number of observations. In particular, the cubic time complexity of updating standard Gaussian process models can be a limiting factor in applications. We propose an algorithm for sequentially partitioning the input space and fitting a localized Gaussian process to each disjoint region. The algorithm is shown to have superior time and space complexity to existing methods, and its sequential nature allows the model to be updated efficiently. The algorithm constructs a model for which the time complexity of updating is tightly bounded above by a pre-specified parameter. To the best of our knowledge, the model is the first local Gaussian process regression model to achieve linear memory complexity. Theoretical continuity properties of the model are proven. We demonstrate the efficacy of the resulting model on several multi-dimensional regression tasks.

Download Full-text

Using Gaussian Process Regression to Integrate the Transition Structure Factor Curve for the Many-Body Correlation Energy

10.26226/morressier.5fa409874d4e91fe5c54b97a ◽

2020 ◽

Author(s):

Laura Weiler

Keyword(s):

Gaussian Process ◽

Structure Factor ◽

Correlation Energy ◽

Gaussian Process Regression ◽

Many Body ◽

Transition Structure ◽

The Many ◽

Structure Factor Curve ◽

Body Correlation

Download Full-text

Exchange Spin Coupling from Gaussian Process Regression

10.26434/chemrxiv.12589541.v3 ◽

2020 ◽

Author(s):

Marc Philipp Bahlke ◽

Natnael Mogos ◽

Jonny Proppe ◽

Carmen Herrmann

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

Gaussian Process Regression ◽

Molecular Magnets ◽

Molecular Structures ◽

Spin Coupling ◽

Structure Property ◽

Data Set ◽

Uncertainty Estimates

Heisenberg exchange spin coupling between metal centers is essential for describing and understanding the electronic structure of many molecular catalysts, metalloenzymes, and molecular magnets for potential application in information technology. We explore the machine-learnability of exchange spin coupling, which has not been studied yet. We employ Gaussian process regression since it can potentially deal with small training sets (as likely associated with the rather complex molecular structures required for exploring spin coupling) and since it provides uncertainty estimates (“error bars”) along with predicted values. We compare a range of descriptors and kernels for 257 small dicopper complexes and find that a simple descriptor based on chemical intuition, consisting only of copper-bridge angles and copper-copper distances, clearly outperforms several more sophisticated descriptors when it comes to extrapolating towards larger experimentally relevant complexes. Exchange spin coupling is similarly easy to learn as the polarizability, while learning dipole moments is much harder. The strength of the sophisticated descriptors lies in their ability to linearize structure-property relationships, to the point that a simple linear ridge regression performs just as well as the kernel-based machine-learning model for our small dicopper data set. The superior extrapolation performance of the simple descriptor is unique to exchange spin coupling, reinforcing the crucial role of choosing a suitable descriptor, and highlighting the interesting question of the role of chemical intuition vs. systematic or automated selection of features for machine learning in chemistry and material science.

Download Full-text

SAMPL6 Challenge Results from pKa Predictions Based on a General Gaussian Process Model

10.26434/chemrxiv.6406505.v2 ◽

2018 ◽

Author(s):

Caitlin C. Bannan ◽

David Mobley ◽

A. Geoff Skillman

Keyword(s):

Gaussian Process ◽

Process Model ◽

Molecular Graph ◽

Gaussian Process Regression ◽

Ionization State ◽

Training Set ◽

Physiochemical Properties ◽

Quantile Plots ◽

Physical And Chemical ◽

Good Agreement

<div>A variety of fields would benefit from accurate pK<sub>a</sub> predictions, especially drug design due to the affect a change in ionization state can have on a molecules physiochemical properties.</div><div>Participants in the recent SAMPL6 blind challenge were asked to submit predictions for microscopic and macroscopic pK<sub>a</sub>s of 24 drug like small molecules.</div><div>We recently built a general model for predicting pK<sub>a</sub>s using a Gaussian process regression trained using physical and chemical features of each ionizable group.</div><div>Our pipeline takes a molecular graph and uses the OpenEye Toolkits to calculate features describing the removal of a proton.</div><div>These features are fed into a Scikit-learn Gaussian process to predict microscopic pK<sub>a</sub>s which are then used to analytically determine macroscopic pK<sub>a</sub>s.</div><div>Our Gaussian process is trained on a set of 2,700 macroscopic pK<sub>a</sub>s from monoprotic and select diprotic molecules.</div><div>Here, we share our results for microscopic and macroscopic predictions in the SAMPL6 challenge.</div><div>Overall, we ranked in the middle of the pack compared to other participants, but our fairly good agreement with experiment is still promising considering the challenge molecules are chemically diverse and often polyprotic while our training set is predominately monoprotic.</div><div>Of particular importance to us when building this model was to include an uncertainty estimate based on the chemistry of the molecule that would reflect the likely accuracy of our prediction. </div><div>Our model reports large uncertainties for the molecules that appear to have chemistry outside our domain of applicability, along with good agreement in quantile-quantile plots, indicating it can predict its own accuracy.</div><div>The challenge highlighted a variety of means to improve our model, including adding more polyprotic molecules to our training set and more carefully considering what functional groups we do or do not identify as ionizable. </div>

Download Full-text

Gaussian Process Regression for Estimating Wind Speed From X-band Marine Radar Images

OCEANS 2018 MTS/IEEE Charleston ◽

10.1109/oceans.2018.8604842 ◽

2018 ◽

Author(s):

Xinwei Chen ◽

Weimin Huang

Keyword(s):

Wind Speed ◽

Gaussian Process ◽

Gaussian Process Regression ◽

Radar Images ◽

X Band ◽

Marine Radar

Download Full-text

Direct quantum dynamics using variational Gaussian wavepackets and Gaussian process regression

The Journal of Chemical Physics ◽

10.1063/1.5086358 ◽

2019 ◽

Vol 150 (4) ◽

pp. 041101 ◽

Cited By ~ 13

Author(s):

Iakov Polyak ◽

Gareth W. Richings ◽

Scott Habershon ◽

Peter J. Knowles

Keyword(s):

Gaussian Process ◽

Quantum Dynamics ◽

Gaussian Process Regression

Download Full-text

A spatio-temporal, Gaussian process regression, real-estate price predictor

Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems - GIS '16 ◽

10.1145/2996913.2996960 ◽

2016 ◽

Cited By ~ 1

Author(s):

Henry Crosby ◽

Paul Davis ◽

Theo Damoulas ◽

Stephen A. Jarvis

Keyword(s):

Real Estate ◽

Gaussian Process ◽

Gaussian Process Regression ◽

Real Estate Price ◽

Spatio Temporal

Download Full-text

Predicting Bridge Elements Deterioration, using Collaborative Gaussian Process Regression

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2020.11.056 ◽

2020 ◽

Vol 53 (3) ◽

pp. 348-353

Author(s):

Maharshi Dhada ◽

Georgios M. Hadjidemetriou ◽

Ajith K. Parlikad

Keyword(s):

Gaussian Process ◽

Gaussian Process Regression

Download Full-text

Modeling of Cutting Force in the Turning of AISI 4340 Using Gaussian Process Regression Algorithm

Applied Sciences ◽

10.3390/app11094055 ◽

2021 ◽

Vol 11 (9) ◽

pp. 4055

Author(s):

Mahdi S. Alajmi ◽

Abdullah M. Almeshal

Keyword(s):

Gaussian Process ◽

Cutting Force ◽

Predictive Accuracy ◽

Gaussian Process Regression ◽

Machining Process ◽

Support Vector ◽

Process Data ◽

Cutting Force Prediction ◽

Artificial Neural Network Ann ◽

Aisi 4340

Machining process data can be utilized to predict cutting force and optimize process parameters. Cutting force is an essential parameter that has a significant impact on the metal turning process. In this study, a cutting force prediction model for turning AISI 4340 alloy steel was developed using Gaussian process regression (GPR), support vector machines (SVM), and artificial neural network (ANN) methods. The GPR simulations demonstrated a reliable prediction of surface roughness for the dry turning method with R2 = 0.9843, MAPE = 5.12%, and RMSE = 1.86%. Performance comparisons between GPR, SVM, and ANN show that GPR is an effective method that can ensure high predictive accuracy of the cutting force in the turning of AISI 4340.

Download Full-text