Two-dimensional Bayesian inversion of magnetotelluric data using trans-dimensional Gaussian processes

Author(s):  
Daniel Blatter ◽  
Anandaroop Ray ◽  
Kerry Key

Summary Bayesian inversion of electromagnetic data produces crucial uncertainty information on inferred subsurface resistivity. Due to their high computational cost, however, Bayesian inverse methods have largely been restricted to computationally expedient 1D resistivity models. In this study, we successfully demonstrate, for the first time, a fully 2D, trans-dimensional Bayesian inversion of magnetotelluric data. We render this problem tractable from a computational standpoint by using a stochastic interpolation algorithm known as a Gaussian process to achieve a parsimonious parametrization of the model vis-a-vis the dense parameter grids used in numerical forward modeling codes. The Gaussian process links a trans-dimensional, parallel tempered Markov chain Monte Carlo sampler, which explores the parsimonious model space, to MARE2DEM, an adaptive finite element forward solver. MARE2DEM computes the model response using a dense parameter mesh with resistivity assigned via the Gaussian process model. We demonstrate the new trans-dimensional Gaussian process sampler by inverting both synthetic and field magnetotelluric data for 2D models of electrical resistivity, with the field data example converging within 10 days on 148 cores, a non-negligible but tractable computational cost. For a field data inversion, our algorithm achieves a parameter reduction of over 32x compared to the fixed parameter grid used for the MARE2DEM regularized inversion. Resistivity probability distributions computed from the ensemble of models produced by the inversion yield credible intervals and interquartile plots that quantitatively show the non-linear 2D uncertainty in model structure. This uncertainty could then be propagated to other physical properties that impact resistivity including bulk composition, porosity and pore-fluid content.

2015 ◽  
Vol 25 (3) ◽  
pp. 483-498 ◽  
Author(s):  
Maciej Smołka ◽  
Robert Schaefer ◽  
Maciej Paszyński ◽  
David Pardo ◽  
Julen Álvarez-Aramberri

Abstract The paper discusses the complex, agent-oriented hierarchic memetic strategy (HMS) dedicated to solving inverse parametric problems. The strategy goes beyond the idea of two-phase global optimization algorithms. The global search performed by a tree of dependent demes is dynamically alternated with local, steepest descent searches. The strategy offers exceptionally low computational costs, mainly because the direct solver accuracy (performed by the hp-adaptive finite element method) is dynamically adjusted for each inverse search step. The computational cost is further decreased by the strategy employed for solution inter-processing and fitness deterioration. The HMS efficiency is compared with the results of a standard evolutionary technique, as well as with the multi-start strategy on benchmarks that exhibit typical inverse problems’ difficulties. Finally, an HMS application to a real-life engineering problem leading to the identification of oil deposits by inverting magnetotelluric measurements is presented. The HMS applicability to the inversion of magnetotelluric data is also mathematically verified.


2019 ◽  
Vol 35 (19) ◽  
pp. 3684-3692 ◽  
Author(s):  
Jarno Vanhatalo ◽  
Zitong Li ◽  
Mikko J Sillanpää

Abstract Motivation Recent advances in high dimensional phenotyping bring time as an extra dimension into the phenotypes. This promotes the quantitative trait locus (QTL) studies of function-valued traits such as those related to growth and development. Existing approaches for analyzing functional traits utilize either parametric methods or semi-parametric approaches based on splines and wavelets. However, very limited choices of software tools are currently available for practical implementation of functional QTL mapping and variable selection. Results We propose a Bayesian Gaussian process (GP) approach for functional QTL mapping. We use GPs to model the continuously varying coefficients which describe how the effects of molecular markers on the quantitative trait are changing over time. We use an efficient gradient based algorithm to estimate the tuning parameters of GPs. Notably, the GP approach is directly applicable to the incomplete datasets having even larger than 50% missing data rate (among phenotypes). We further develop a stepwise algorithm to search through the model space in terms of genetic variants, and use a minimal increase of Bayesian posterior probability as a stopping rule to focus on only a small set of putative QTL. We also discuss the connection between GP and penalized B-splines and wavelets. On two simulated and three real datasets, our GP approach demonstrates great flexibility for modeling different types of phenotypic trajectories with low computational cost. The proposed model selection approach finds the most likely QTL reliably in tested datasets. Availability and implementation Software and simulated data are available as a MATLAB package ‘GPQTLmapping’, and they can be downloaded from GitHub (https://github.com/jpvanhat/GPQTLmapping). Real datasets used in case studies are publicly available at QTL Archive. Supplementary information Supplementary data are available at Bioinformatics online.


Sensors ◽  
2018 ◽  
Vol 18 (12) ◽  
pp. 4267 ◽  
Author(s):  
Santosh Subedi ◽  
Jae-Young Pyun

Fingerprinting localization approach is widely used in indoor positioning applications owing to its high reliability. However, the learning procedure of radio signals in fingerprinting is time-consuming and labor-intensive. In this paper, an affinity propagation clustering (APC)-based fingerprinting localization system with Gaussian process regression (GPR) is presented for a practical positioning system with the reduced offline workload and low online computation cost. The proposed system collects sparse received signal strength (RSS) data from the deployed Bluetooth low energy beacons and trains them with the Gaussian process model. As the signal estimation component, GPR predicts not only the mean RSS but also the variance, which indicates the uncertainty of the estimation. The predicted RSS and variance can be employed for probabilistic-based fingerprinting localization. As the clustering component, the APC minimizes the searching space of reference points on the testbed. Consequently, it also helps to reduce the localization estimation error and the computational cost of the positioning system. The proposed method is evaluated through real field deployments. Experimental results show that the proposed method can reduce the offline workload and increase localization accuracy with less computational cost. This method outperforms the existing methods owing to RSS prediction using GPR and RSS clustering using APC.


2018 ◽  
Author(s):  
Caitlin C. Bannan ◽  
David Mobley ◽  
A. Geoff Skillman

<div>A variety of fields would benefit from accurate pK<sub>a</sub> predictions, especially drug design due to the affect a change in ionization state can have on a molecules physiochemical properties.</div><div>Participants in the recent SAMPL6 blind challenge were asked to submit predictions for microscopic and macroscopic pK<sub>a</sub>s of 24 drug like small molecules.</div><div>We recently built a general model for predicting pK<sub>a</sub>s using a Gaussian process regression trained using physical and chemical features of each ionizable group.</div><div>Our pipeline takes a molecular graph and uses the OpenEye Toolkits to calculate features describing the removal of a proton.</div><div>These features are fed into a Scikit-learn Gaussian process to predict microscopic pK<sub>a</sub>s which are then used to analytically determine macroscopic pK<sub>a</sub>s.</div><div>Our Gaussian process is trained on a set of 2,700 macroscopic pK<sub>a</sub>s from monoprotic and select diprotic molecules.</div><div>Here, we share our results for microscopic and macroscopic predictions in the SAMPL6 challenge.</div><div>Overall, we ranked in the middle of the pack compared to other participants, but our fairly good agreement with experiment is still promising considering the challenge molecules are chemically diverse and often polyprotic while our training set is predominately monoprotic.</div><div>Of particular importance to us when building this model was to include an uncertainty estimate based on the chemistry of the molecule that would reflect the likely accuracy of our prediction. </div><div>Our model reports large uncertainties for the molecules that appear to have chemistry outside our domain of applicability, along with good agreement in quantile-quantile plots, indicating it can predict its own accuracy.</div><div>The challenge highlighted a variety of means to improve our model, including adding more polyprotic molecules to our training set and more carefully considering what functional groups we do or do not identify as ionizable. </div>


Energies ◽  
2021 ◽  
Vol 14 (15) ◽  
pp. 4392
Author(s):  
Jia Zhou ◽  
Hany Abdel-Khalik ◽  
Paul Talbot ◽  
Cristian Rabiti

This manuscript develops a workflow, driven by data analytics algorithms, to support the optimization of the economic performance of an Integrated Energy System. The goal is to determine the optimum mix of capacities from a set of different energy producers (e.g., nuclear, gas, wind and solar). A stochastic-based optimizer is employed, based on Gaussian Process Modeling, which requires numerous samples for its training. Each sample represents a time series describing the demand, load, or other operational and economic profiles for various types of energy producers. These samples are synthetically generated using a reduced order modeling algorithm that reads a limited set of historical data, such as demand and load data from past years. Numerous data analysis methods are employed to construct the reduced order models, including, for example, the Auto Regressive Moving Average, Fourier series decomposition, and the peak detection algorithm. All these algorithms are designed to detrend the data and extract features that can be employed to generate synthetic time histories that preserve the statistical properties of the original limited historical data. The optimization cost function is based on an economic model that assesses the effective cost of energy based on two figures of merit: the specific cash flow stream for each energy producer and the total Net Present Value. An initial guess for the optimal capacities is obtained using the screening curve method. The results of the Gaussian Process model-based optimization are assessed using an exhaustive Monte Carlo search, with the results indicating reasonable optimization results. The workflow has been implemented inside the Idaho National Laboratory’s Risk Analysis and Virtual Environment (RAVEN) framework. The main contribution of this study addresses several challenges in the current optimization methods of the energy portfolios in IES: First, the feasibility of generating the synthetic time series of the periodic peak data; Second, the computational burden of the conventional stochastic optimization of the energy portfolio, associated with the need for repeated executions of system models; Third, the inadequacies of previous studies in terms of the comparisons of the impact of the economic parameters. The proposed workflow can provide a scientifically defendable strategy to support decision-making in the electricity market and to help energy distributors develop a better understanding of the performance of integrated energy systems.


2020 ◽  
Vol 176 (2) ◽  
pp. 183-203
Author(s):  
Santosh Chapaneri ◽  
Deepak Jayaswal

Modeling the music mood has wide applications in music categorization, retrieval, and recommendation systems; however, it is challenging to computationally model the affective content of music due to its subjective nature. In this work, a structured regression framework is proposed to model the valence and arousal mood dimensions of music using a single regression model at a linear computational cost. To tackle the subjectivity phenomena, a confidence-interval based estimated consensus is computed by modeling the behavior of various annotators (e.g. biased, adversarial) and is shown to perform better than using the average annotation values. For a compact feature representation of music clips, variational Bayesian inference is used to learn the Gaussian mixture model representation of acoustic features and chord-related features are used to improve the valence estimation by probing the chord progressions between chroma frames. The dimensionality of features is further reduced using an adaptive version of kernel PCA. Using an efficient implementation of twin Gaussian process for structured regression, the proposed work achieves a significant improvement in R2 for arousal and valence dimensions relative to state-of-the-art techniques on two benchmark datasets for music mood estimation.


Geophysics ◽  
2014 ◽  
Vol 79 (1) ◽  
pp. IM1-IM9 ◽  
Author(s):  
Nathan Leon Foks ◽  
Richard Krahenbuhl ◽  
Yaoguo Li

Compressive inversion uses computational algorithms that decrease the time and storage needs of a traditional inverse problem. Most compression approaches focus on the model domain, and very few, other than traditional downsampling focus on the data domain for potential-field applications. To further the compression in the data domain, a direct and practical approach to the adaptive downsampling of potential-field data for large inversion problems has been developed. The approach is formulated to significantly reduce the quantity of data in relatively smooth or quiet regions of the data set, while preserving the signal anomalies that contain the relevant target information. Two major benefits arise from this form of compressive inversion. First, because the approach compresses the problem in the data domain, it can be applied immediately without the addition of, or modification to, existing inversion software. Second, as most industry software use some form of model or sensitivity compression, the addition of this adaptive data sampling creates a complete compressive inversion methodology whereby the reduction of computational cost is achieved simultaneously in the model and data domains. We applied the method to a synthetic magnetic data set and two large field magnetic data sets; however, the method is also applicable to other data types. Our results showed that the relevant model information is maintained after inversion despite using 1%–5% of the data.


Geophysics ◽  
2016 ◽  
Vol 81 (6) ◽  
pp. A17-A21 ◽  
Author(s):  
Juan I. Sabbione ◽  
Mauricio D. Sacchi

The coefficients that synthesize seismic data via the hyperbolic Radon transform (HRT) are estimated by solving a linear-inverse problem. In the classical HRT, the computational cost of the inverse problem is proportional to the size of the data and the number of Radon coefficients. We have developed a strategy that significantly speeds up the implementation of time-domain HRTs. For this purpose, we have defined a restricted model space of coefficients applying hard thresholding to an initial low-resolution Radon gather. Then, an iterative solver that operated on the restricted model space was used to estimate the group of coefficients that synthesized the data. The method is illustrated with synthetic data and tested with a marine data example.


Sign in / Sign up

Export Citation Format

Share Document