scholarly journals A Gaussian process model and Bayesian variable selection for mapping function-valued quantitative traits with incomplete phenotypic data

2019 ◽  
Vol 35 (19) ◽  
pp. 3684-3692 ◽  
Author(s):  
Jarno Vanhatalo ◽  
Zitong Li ◽  
Mikko J Sillanpää

Abstract Motivation Recent advances in high dimensional phenotyping bring time as an extra dimension into the phenotypes. This promotes the quantitative trait locus (QTL) studies of function-valued traits such as those related to growth and development. Existing approaches for analyzing functional traits utilize either parametric methods or semi-parametric approaches based on splines and wavelets. However, very limited choices of software tools are currently available for practical implementation of functional QTL mapping and variable selection. Results We propose a Bayesian Gaussian process (GP) approach for functional QTL mapping. We use GPs to model the continuously varying coefficients which describe how the effects of molecular markers on the quantitative trait are changing over time. We use an efficient gradient based algorithm to estimate the tuning parameters of GPs. Notably, the GP approach is directly applicable to the incomplete datasets having even larger than 50% missing data rate (among phenotypes). We further develop a stepwise algorithm to search through the model space in terms of genetic variants, and use a minimal increase of Bayesian posterior probability as a stopping rule to focus on only a small set of putative QTL. We also discuss the connection between GP and penalized B-splines and wavelets. On two simulated and three real datasets, our GP approach demonstrates great flexibility for modeling different types of phenotypic trajectories with low computational cost. The proposed model selection approach finds the most likely QTL reliably in tested datasets. Availability and implementation Software and simulated data are available as a MATLAB package ‘GPQTLmapping’, and they can be downloaded from GitHub (https://github.com/jpvanhat/GPQTLmapping). Real datasets used in case studies are publicly available at QTL Archive. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Daniel Blatter ◽  
Anandaroop Ray ◽  
Kerry Key

Summary Bayesian inversion of electromagnetic data produces crucial uncertainty information on inferred subsurface resistivity. Due to their high computational cost, however, Bayesian inverse methods have largely been restricted to computationally expedient 1D resistivity models. In this study, we successfully demonstrate, for the first time, a fully 2D, trans-dimensional Bayesian inversion of magnetotelluric data. We render this problem tractable from a computational standpoint by using a stochastic interpolation algorithm known as a Gaussian process to achieve a parsimonious parametrization of the model vis-a-vis the dense parameter grids used in numerical forward modeling codes. The Gaussian process links a trans-dimensional, parallel tempered Markov chain Monte Carlo sampler, which explores the parsimonious model space, to MARE2DEM, an adaptive finite element forward solver. MARE2DEM computes the model response using a dense parameter mesh with resistivity assigned via the Gaussian process model. We demonstrate the new trans-dimensional Gaussian process sampler by inverting both synthetic and field magnetotelluric data for 2D models of electrical resistivity, with the field data example converging within 10 days on 148 cores, a non-negligible but tractable computational cost. For a field data inversion, our algorithm achieves a parameter reduction of over 32x compared to the fixed parameter grid used for the MARE2DEM regularized inversion. Resistivity probability distributions computed from the ensemble of models produced by the inversion yield credible intervals and interquartile plots that quantitatively show the non-linear 2D uncertainty in model structure. This uncertainty could then be propagated to other physical properties that impact resistivity including bulk composition, porosity and pore-fluid content.


Sensors ◽  
2018 ◽  
Vol 18 (12) ◽  
pp. 4267 ◽  
Author(s):  
Santosh Subedi ◽  
Jae-Young Pyun

Fingerprinting localization approach is widely used in indoor positioning applications owing to its high reliability. However, the learning procedure of radio signals in fingerprinting is time-consuming and labor-intensive. In this paper, an affinity propagation clustering (APC)-based fingerprinting localization system with Gaussian process regression (GPR) is presented for a practical positioning system with the reduced offline workload and low online computation cost. The proposed system collects sparse received signal strength (RSS) data from the deployed Bluetooth low energy beacons and trains them with the Gaussian process model. As the signal estimation component, GPR predicts not only the mean RSS but also the variance, which indicates the uncertainty of the estimation. The predicted RSS and variance can be employed for probabilistic-based fingerprinting localization. As the clustering component, the APC minimizes the searching space of reference points on the testbed. Consequently, it also helps to reduce the localization estimation error and the computational cost of the positioning system. The proposed method is evaluated through real field deployments. Experimental results show that the proposed method can reduce the offline workload and increase localization accuracy with less computational cost. This method outperforms the existing methods owing to RSS prediction using GPR and RSS clustering using APC.


2018 ◽  
Author(s):  
Caitlin C. Bannan ◽  
David Mobley ◽  
A. Geoff Skillman

<div>A variety of fields would benefit from accurate pK<sub>a</sub> predictions, especially drug design due to the affect a change in ionization state can have on a molecules physiochemical properties.</div><div>Participants in the recent SAMPL6 blind challenge were asked to submit predictions for microscopic and macroscopic pK<sub>a</sub>s of 24 drug like small molecules.</div><div>We recently built a general model for predicting pK<sub>a</sub>s using a Gaussian process regression trained using physical and chemical features of each ionizable group.</div><div>Our pipeline takes a molecular graph and uses the OpenEye Toolkits to calculate features describing the removal of a proton.</div><div>These features are fed into a Scikit-learn Gaussian process to predict microscopic pK<sub>a</sub>s which are then used to analytically determine macroscopic pK<sub>a</sub>s.</div><div>Our Gaussian process is trained on a set of 2,700 macroscopic pK<sub>a</sub>s from monoprotic and select diprotic molecules.</div><div>Here, we share our results for microscopic and macroscopic predictions in the SAMPL6 challenge.</div><div>Overall, we ranked in the middle of the pack compared to other participants, but our fairly good agreement with experiment is still promising considering the challenge molecules are chemically diverse and often polyprotic while our training set is predominately monoprotic.</div><div>Of particular importance to us when building this model was to include an uncertainty estimate based on the chemistry of the molecule that would reflect the likely accuracy of our prediction. </div><div>Our model reports large uncertainties for the molecules that appear to have chemistry outside our domain of applicability, along with good agreement in quantile-quantile plots, indicating it can predict its own accuracy.</div><div>The challenge highlighted a variety of means to improve our model, including adding more polyprotic molecules to our training set and more carefully considering what functional groups we do or do not identify as ionizable. </div>


2019 ◽  
Author(s):  
Sierra Bainter ◽  
Thomas Granville McCauley ◽  
Tor D Wager ◽  
Elizabeth Reynolds Losin

In this paper we address the problem of selecting important predictors from some larger set of candidate predictors. Standard techniques are limited by lack of power and high false positive rates. A Bayesian variable selection approach used widely in biostatistics, stochastic search variable selection, can be used instead to combat these issues by accounting for uncertainty in the other predictors of the model. In this paper we present Bayesian variable selection to aid researchers facing this common scenario, along with an online application (https://ssvsforpsych.shinyapps.io/ssvsforpsych/) to perform the analysis and visualize the results. Using an application to predict pain ratings, we demonstrate how this approach quickly identifies reliable predictors, even when the set of possible predictors is larger than the sample size. This technique is widely applicable to research questions that may be relatively data-rich, but with limited information or theory to guide variable selection.


Sign in / Sign up

Export Citation Format

Share Document