scholarly journals Another look at the treatment of data uncertainty in Markov chain Monte Carlo inversion and other probabilistic methods

2020 ◽  
Vol 222 (1) ◽  
pp. 388-405
Author(s):  
F J Tilmann ◽  
H Sadeghisorkhani ◽  
A Mauerberger

SUMMARY In probabilistic Bayesian inversions, data uncertainty is a crucial parameter for quantifying the uncertainties and correlations of the resulting model parameters or, in transdimensional approaches, even the complexity of the model. However, in many geophysical inference problems it is poorly known. Therefore, it is common practice to allow the data uncertainty itself to be a parameter to be determined. Although in principle any arbitrary uncertainty distribution can be assumed, Gaussian distributions whose standard deviation is then the unknown parameter to be estimated are the usual choice. In this special case, the paper demonstrates that a simple analytical integration is sufficient to marginalise out this uncertainty parameter, reducing the complexity of the model space without compromising the accuracy of the posterior model probability distribution. However, it is well known that the distribution of geophysical measurement errors, although superficially similar to a Gaussian distribution, typically contains more frequent samples along the tail of the distribution, so-called outliers. In linearized inversions these are often removed in subsequent iterations based on some threshold criterion, but in Markov chain Monte Carlo (McMC) inversions this approach is not possible as they rely on the likelihood ratios, which cannot be formed if the number of data points varies between the steps of the Markov chain. The flexibility to define the data error probability distribution in McMC can be exploited in order to account for this pattern of uncertainties in a natural way, without having to make arbitrary choices regarding residual thresholds. In particular, we can regard the data uncertainty distribution as a mixture between a Gaussian distribution, which represent valid measurements with some measurement error, and a uniform distribution, which represents invalid measurements. The relative balance between them is an unknown parameter to be estimated alongside the standard deviation of the Gauss distribution. For each data point, the algorithm can then assign a probability to be an outlier, and the influence of each data point will be effectively downgraded according to its probability to be an outlier. Furthermore, this assignment can change as the McMC search is exploring different parts of the model space. The approach is demonstrated with both synthetic and real tomography examples. In a synthetic test, the proposed mixed measurement error distribution allows recovery of the underlying model even in the presence of 6 per cent outliers, which completely destroy the ability of a regular McMC or linear search to provide a meaningful image. Applied to an actual ambient noise tomography study based on automatically picked dispersion curves, the resulting model is shown to be much more consistent for different data sets, which differ in the applied quality criteria, while retaining the ability to recover strong anomalies in selected parts of the model.

2013 ◽  
Vol 9 (S298) ◽  
pp. 441-441
Author(s):  
Yihan Song ◽  
Ali Luo ◽  
Yongheng Zhao

AbstractStellar radial velocity is estimated by using template fitting and Markov Chain Monte Carlo(MCMC) methods. This method works on the LAMOST stellar spectra. The MCMC simulation generates a probability distribution of the RV. The RV error can also computed from distribution.


2020 ◽  
Vol 35 (24) ◽  
pp. 1950142
Author(s):  
Allen Caldwell ◽  
Philipp Eller ◽  
Vasyl Hafych ◽  
Rafael Schick ◽  
Oliver Schulz ◽  
...  

Numerically estimating the integral of functions in high dimensional spaces is a nontrivial task. A oft-encountered example is the calculation of the marginal likelihood in Bayesian inference, in a context where a sampling algorithm such as a Markov Chain Monte Carlo provides samples of the function. We present an Adaptive Harmonic Mean Integration (AHMI) algorithm. Given samples drawn according to a probability distribution proportional to the function, the algorithm will estimate the integral of the function and the uncertainty of the estimate by applying a harmonic mean estimator to adaptively chosen regions of the parameter space. We describe the algorithm and its mathematical properties, and report the results using it on multiple test cases.


Geophysics ◽  
2019 ◽  
Vol 84 (4) ◽  
pp. A37-A42 ◽  
Author(s):  
Erasmus Kofi Oware ◽  
James Irving ◽  
Thomas Hermans

Bayesian Markov-chain Monte Carlo (McMC) techniques are increasingly being used in geophysical estimation of hydrogeologic processes due to their ability to produce multiple estimates that enable comprehensive assessment of uncertainty. Standard McMC sampling methods can, however, become computationally intractable for spatially distributed, high-dimensional problems. We have developed a novel basis-constrained Bayesian McMC difference inversion framework for time-lapse geophysical imaging. The strategy parameterizes the Bayesian inversion model space in terms of sparse, hydrologic-process-tuned bases, leading to dimensionality reduction while accounting for the physics of the target hydrologic process. We evaluate the algorithm on cross-borehole electrical resistivity tomography (ERT) field data acquired during a heat-tracer experiment. We validate the ERT-estimated temperatures with direct temperature measurements at two locations on the ERT plane. We also perform the inversions using the conventional smoothness-constrained inversion (SCI). Our approach estimates the heat plumes without excessive smoothing in contrast with the SCI thermograms. We capture most of the validation temperatures within the 90% confidence interval of the mean. Accounting for the physics of the target process allows the detection of small temperature changes that are undetectable by the SCI. Performing the inversion in the reduced-dimensional model space results in significant gains in computational cost.


2011 ◽  
Vol 11 (7) ◽  
pp. 20051-20105 ◽  
Author(s):  
D. G. Partridge ◽  
J. A. Vrugt ◽  
P. Tunved ◽  
A. M. L. Ekman ◽  
H. Struthers ◽  
...  

Abstract. This paper presents a novel approach to investigate cloud-aerosol interactions by coupling a Markov Chain Monte Carlo (MCMC) algorithm to a pseudo-adiabatic cloud parcel model. Despite the number of numerical cloud-aerosol sensitivity studies previously conducted few have used statistical analysis tools to investigate the sensitivity of a cloud model to input aerosol physiochemical parameters. Using synthetic data as observed values of cloud droplet number concentration (CDNC) distribution, this inverse modelling framework is shown to successfully converge to the correct calibration parameters. The employed analysis method provides a new, integrative framework to evaluate the sensitivity of the derived CDNC distribution to the input parameters describing the lognormal properties of the accumulation mode and the particle chemistry. To a large extent, results from prior studies are confirmed, but the present study also provides some additional insightful findings. There is a clear transition from very clean marine Arctic conditions where the aerosol parameters representing the mean radius and geometric standard deviation of the accumulation mode are found to be most important for determining the CDNC distribution to very polluted continental environments (aerosol concentration in the accumulation mode >1000 cm−3) where particle chemistry is more important than both number concentration and size of the accumulation mode. The competition and compensation between the cloud model input parameters illustrate that if the soluble mass fraction is reduced, both the number of particles and geometric standard deviation must increase and the mean radius of the accumulation mode must increase in order to achieve the same CDNC distribution. For more polluted aerosol conditions, with a reduction in soluble mass fraction the parameter correlation becomes weaker and more non-linear over the range of possible solutions (indicative of the sensitivity). This indicates that for the cloud parcel model used herein, the relative importance of the soluble mass fraction appears to decrease if the number or geometric standard deviation of the accumulation mode is increased. This study demonstrates that inverse modelling provides a flexible, transparent and integrative method for efficiently exploring cloud-aerosol interactions efficiently with respect to parameter sensitivity and correlation.


Author(s):  
Andreas Raue ◽  
Clemens Kreutz ◽  
Fabian Joachim Theis ◽  
Jens Timmer

Increasingly complex applications involve large datasets in combination with nonlinear and high-dimensional mathematical models. In this context, statistical inference is a challenging issue that calls for pragmatic approaches that take advantage of both Bayesian and frequentist methods. The elegance of Bayesian methodology is founded in the propagation of information content provided by experimental data and prior assumptions to the posterior probability distribution of model predictions. However, for complex applications, experimental data and prior assumptions potentially constrain the posterior probability distribution insufficiently. In these situations, Bayesian Markov chain Monte Carlo sampling can be infeasible. From a frequentist point of view, insufficient experimental data and prior assumptions can be interpreted as non-identifiability. The profile-likelihood approach offers to detect and to resolve non-identifiability by experimental design iteratively. Therefore, it allows one to better constrain the posterior probability distribution until Markov chain Monte Carlo sampling can be used securely. Using an application from cell biology, we compare both methods and show that a successive application of the two methods facilitates a realistic assessment of uncertainty in model predictions.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Georges Bresson ◽  
Anoop Chaturvedi ◽  
Mohammad Arshad Rahman ◽  
Shalabh

Abstract Linear regression with measurement error in the covariates is a heavily studied topic, however, the statistics/econometrics literature is almost silent to estimating a multi-equation model with measurement error. This paper considers a seemingly unrelated regression model with measurement error in the covariates and introduces two novel estimation methods: a pure Bayesian algorithm (based on Markov chain Monte Carlo techniques) and its mean field variational Bayes (MFVB) approximation. The MFVB method has the added advantage of being computationally fast and can handle big data. An issue pertinent to measurement error models is parameter identification, and this is resolved by employing a prior distribution on the measurement error variance. The methods are shown to perform well in multiple simulation studies, where we analyze the impact on posterior estimates for different values of reliability ratio or variance of the true unobserved quantity used in the data generating process. The paper further implements the proposed algorithms in an application drawn from the health literature and shows that modeling measurement error in the data can improve model fitting.


2012 ◽  
Author(s):  
Zairul Nor Deana Md. Desa ◽  
Ismail Mohamad ◽  
Zarina Mohd. Khalid ◽  
Hanafiah Md. Zin

Kajian dijalankan untuk membanding keputusan yang didapati daripada tiga kaedah penggredan terhadap pencapaian pelajar. Kaedah konvensional yang popular adalah kaedah Skala Tegak. Pendekatan statistik yang menggunakan kaedah Sisihan Piawai dan kaedah Bayesian bersyarat dipertimbangkan untuk memberi gred. Dalam model Bayesian, dianggapkan bahawa data adalah mengikut taburan Normal Tergabung di mana setiap gred adalah dipisahkan secara berasingan oleh parameter; min dan kadar bandingan dari taburan Normal Tergabung. Masalah yang timbul adalah sukar untuk menganggarkan ketumpatan posterior bagi parameter tersebut secara analitik. Satu penyelesaiannya adalah dengan menggunakan pendekatan Markov Chain Monte Carlo iaitu melalui algoritma pensampelan Gibbs. Kaedah Skala Tegak, kaedah Sisihan Piawai dan kaedah Bayesian bersyarat diaplikasikan untuk markah mentah peperiksaan bagi dua kumpulan pelajar. Pencapaian ketiga–tiga kaedah dibandingkan melalui nilai Kehilangan Kelas Neutral, Kehilangan Kelas Tidak Tegas dan Pekali Penentuan. Didapati keputusan dari kaedah Bayesian bersyarat menunjukkan penggredan yang lebih baik berbanding kaedah Skala Tegak dan kaedah Sisihan Piawai. Kata kunci: Kaedah penggredan, pengukuran pendidikan, Skala Tegak, kaedah Sisihan Piawai, Normal Tergabung, Markov Chain Monte Carlo, pensampelan Gibbs The purpose of this study is to compare results obtained from three methods of assigning letter grades to students’ achievement. The conventional and the most popular method to assign grades is the Straight Scale method (SS). Statistical approaches which used the Standard Deviation (GC) and conditional Bayesian methods are considered to assign the grades. In the conditional Bayesian model, we assume the data to follow the Normal Mixture distribution where the grades are distinctively separated by the parameters: means and proportions of the Normal Mixture distribution. The problem lies in estimating the posterior density of the parameters which is analytically intractable. A solution to this problem is using the Markov Chain Monte Carlo approach namely Gibbs sampler algorithm. The Straight Scale, Standard Deviation and Conditional Bayesian methods are applied to the examination raw scores of two sets of students. The performances of these methods are measured using the Neutral Class Loss, Lenient Class Loss and Coefficient of Determination. The results showed that Conditional Bayesian outperformed the Conventional Methods of assigning grades. Key words: Grading methods, educational measurement, Straight Scale, Standard Deviation method, Normal Mixture, Markov Chain Monte Carlo, Gibbs sampling


Sign in / Sign up

Export Citation Format

Share Document