A Bayesian machine scientist to aid in the solution of challenging scientific problems

Roger Guimerà; Ignasi Reichardt; Antoni Aguilar-Mogas; Francesco A. Massucci; Manuel Miranda; Jordi Pallarès; Marta Sales-Pardo

doi:10.1126/sciadv.aav6971

A Bayesian machine scientist to aid in the solution of challenging scientific problems

Science Advances ◽

10.1126/sciadv.aav6971 ◽

2020 ◽

Vol 6 (5) ◽

pp. eaav6971 ◽

Cited By ~ 5

Author(s):

Roger Guimerà ◽

Ignasi Reichardt ◽

Antoni Aguilar-Mogas ◽

Francesco A. Massucci ◽

Manuel Miranda ◽

...

Keyword(s):

Social Sciences ◽

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Mathematical Models ◽

Real Data ◽

Mathematical Expressions ◽

The Social ◽

The World ◽

Out Of Sample

Closed-form, interpretable mathematical models have been instrumental for advancing our understanding of the world; with the data revolution, we may now be in a position to uncover new such models for many systems from physics to the social sciences. However, to deal with increasing amounts of data, we need “machine scientists” that are able to extract these models automatically from data. Here, we introduce a Bayesian machine scientist, which establishes the plausibility of models using explicit approximations to the exact marginal posterior over models and establishes its prior expectations about models by learning from a large empirical corpus of mathematical expressions. It explores the space of models using Markov chain Monte Carlo. We show that this approach uncovers accurate models for synthetic and real data and provides out-of-sample predictions that are more accurate than those of existing approaches and of other nonparametric methods.

Download Full-text

Experiences With Markov Chain Monte Carlo Convergence Assessment in Two Psychometric Examples

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986029004461 ◽

2004 ◽

Vol 29 (4) ◽

pp. 461-488 ◽

Cited By ~ 48

Author(s):

Sandip Sinharay

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Diagnostic Tool ◽

Statistical Models ◽

Real Data ◽

Mcmc Algorithm ◽

Mcmc Algorithms ◽

Convergence Diagnostics ◽

Number Of Iterations

There is an increasing use of Markov chain Monte Carlo (MCMC) algorithms for fitting statistical models in psychometrics, especially in situations where the traditional estimation techniques are very difficult to apply. One of the disadvantages of using an MCMC algorithm is that it is not straightforward to determine the convergence of the algorithm. Using the output of an MCMC algorithm that has not converged may lead to incorrect inferences on the problem at hand. The convergence is not one to a point, but that of the distribution of a sequence of generated values to another distribution, and hence is not easy to assess; there is no guaranteed diagnostic tool to determine convergence of an MCMC algorithm in general. This article examines the convergence of MCMC algorithms using a number of convergence diagnostics for two real data examples from psychometrics. Findings from this research have the potential to be useful to researchers using the algorithms. For both the examples, the number of iterations required (suggested by the diagnostics) to be reasonably confident that the MCMC algorithm has converged may be larger than what many practitioners consider to be safe.

Download Full-text

A Study of Perks-II Distribution via Bayesian Paradigm

Pravaha ◽

10.3126/pravaha.v24i1.20221 ◽

2018 ◽

Vol 24 (1) ◽

pp. 1-17

Author(s):

A. K. Chaudhary

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Bayesian Analysis ◽

Real Data ◽

Simulation Method ◽

Mcmc Methods ◽

Data Set ◽

Bayesian Paradigm ◽

Mcmc Simulation

In this paper, the Markov chain Monte Carlo (MCMC) method is used to estimate the parameters of Perks-II distribution based on a complete sample. The procedures are developed to perform full Bayesian analysis of the Perks-II distributions using Markov Chain Monte Carlo (MCMC) simulation method in OpenBUGS, established software for Bayesian analysis using Markov Chain Monte Carlo (MCMC) methods. We have obtained the Bayes estimates of the parameters, hazard and reliability functions, and their probability intervals are also presented. We have also discussed the issue of model compatibility for the given data set. A real data set is considered for illustration under gamma sets of priors.PravahaVol. 24, No. 1, 2018,page: 1-17

Download Full-text

A comparison of approximate versus exact techniques for Bayesian parameter inference in nonlinear ordinary differential equation models

Royal Society Open Science ◽

10.1098/rsos.191315 ◽

2020 ◽

Vol 7 (3) ◽

pp. 191315

Author(s):

Amani A. Alahmadi ◽

Jennifer A. Flegg ◽

Davis G. Cochrane ◽

Christopher C. Drovandi ◽

Jonathan M. Keith

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Bayesian Inference ◽

Sequential Monte Carlo ◽

Simulated Data ◽

Real Data ◽

Nonlinear Ordinary Differential Equation ◽

Unknown Parameters ◽

Acceptance Probability

The behaviour of many processes in science and engineering can be accurately described by dynamical system models consisting of a set of ordinary differential equations (ODEs). Often these models have several unknown parameters that are difficult to estimate from experimental data, in which case Bayesian inference can be a useful tool. In principle, exact Bayesian inference using Markov chain Monte Carlo (MCMC) techniques is possible; however, in practice, such methods may suffer from slow convergence and poor mixing. To address this problem, several approaches based on approximate Bayesian computation (ABC) have been introduced, including Markov chain Monte Carlo ABC (MCMC ABC) and sequential Monte Carlo ABC (SMC ABC). While the system of ODEs describes the underlying process that generates the data, the observed measurements invariably include errors. In this paper, we argue that several popular ABC approaches fail to adequately model these errors because the acceptance probability depends on the choice of the discrepancy function and the tolerance without any consideration of the error term. We observe that the so-called posterior distributions derived from such methods do not accurately reflect the epistemic uncertainties in parameter values. Moreover, we demonstrate that these methods provide minimal computational advantages over exact Bayesian methods when applied to two ODE epidemiological models with simulated data and one with real data concerning malaria transmission in Afghanistan.

Download Full-text

A Bayesian Analysis of Perks Distribution via Markov Chain Monte Carlo Simulation

Nepal Journal of Science and Technology ◽

10.3126/njst.v14i1.8936 ◽

2013 ◽

Vol 14 (1) ◽

pp. 153-166 ◽

Cited By ~ 1

Author(s):

Arun Kumar Chaudhary ◽

Vijay Kumar

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Bayesian Analysis ◽

Real Data ◽

Simulation Method ◽

Complete Sample ◽

Data Set ◽

Mcmc Simulation ◽

The Given

In this paper the Markov chain Monte Carlo (MCMC) method is used to estimate the parameters of Perks distribution based on a complete sample. The procedures are developed to perform full Bayesian analysis of the Perks distributions using MCMC simulation method in OpenBUGS. We obtained the Bayes estimates of the parameters, hazard and reliability functions, and their probability intervals are also presented. We also discussed the issue of model compatibility for the given data set. A real data set is considered for illustration under gamma sets of priors. Nepal Journal of Science and Technology Vol. 14, No. 1 (2013) 153-166 DOI: http://dx.doi.org/10.3126/njst.v14i1.8936

Download Full-text

Markov chain Monte Carlo for active module identification problem

BMC Bioinformatics ◽

10.1186/s12859-020-03572-9 ◽

2020 ◽

Vol 21 (S6) ◽

Author(s):

Nikita Alexeev ◽

Javlon Isomurodov ◽

Vladimir Sukhov ◽

Gennady Korotkevich ◽

Alexey Sergushichev

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Interaction Network ◽

Real Data ◽

Identification Problem ◽

Classification Problem ◽

Biological Data ◽

Connected Subgraph ◽

Computational Performance

Abstract Background Integrative network methods are commonly used for interpretation of high-throughput experimental biological data: transcriptomics, proteomics, metabolomics and others. One of the common approaches is finding a connected subnetwork of a global interaction network that best encompasses significant individual changes in the data and represents a so-called active module. Usually methods implementing this approach find a single subnetwork and thus solve a hard classification problem for vertices. This subnetwork inherently contains erroneous vertices, while no instrument is provided to estimate the confidence level of any particular vertex inclusion. To address this issue, in the current study we consider the active module problem as a soft classification problem. Results We propose a method to estimate probabilities of each vertex to belong to the active module based on Markov chain Monte Carlo (MCMC) subnetwork sampling. As an example of the performance of our method on real data, we run it on two gene expression datasets. For the first many-replicate expression dataset we show that the proposed approach is consistent with an existing resampling-based method. On the second dataset the jackknife resampling method is inapplicable due to the small number of biological replicates, but the MCMC method can be run and shows high classification performance. Conclusions The proposed method allows to estimate the probability that an individual vertex belongs to the active module as well as the false discovery rate (FDR) for a given set of vertices. Given the estimated probabilities, it becomes possible to provide a connected subgraph in a consistent manner for any given FDR level: no vertex can disappear when the FDR level is relaxed. We show, on both simulated and real datasets, that the proposed method has good computational performance and high classification accuracy.

Download Full-text

An Empirical Dynamic Model of Trade with Consumer Accumulation

American Economic Journal Microeconomics ◽

10.1257/mic.20190051 ◽

2021 ◽

Vol 13 (4) ◽

pp. 23-63

Author(s):

Paul Piveteau

Keyword(s):

Monte Carlo ◽

International Trade ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Structural Model ◽

Survival Rates ◽

Foreign Markets ◽

Entry Costs ◽

Dynamic Structural Model ◽

Out Of Sample

This paper develops a dynamic structural model of trade in which firms slowly accumulate consumers in foreign markets. Estimating the model using export data from individual firms and a particle Markov chain Monte Carlo estimator, the model predicts lower survival rates for new exporters and estimates low entry costs of exporting—less than half of those estimated in the absence of consumer accumulation. Using simulations and out-of-sample predictions, I show that the introduction of such frictions and the reduction in estimated entry costs allow the model to match important facts regarding the aggregate response of international trade to shocks. (JEL D22, F12, F14, L66)

Download Full-text

Simple, scalable and accurate posterior interval estimation

Biometrika ◽

10.1093/biomet/asx033 ◽

2017 ◽

Vol 104 (3) ◽

pp. 665-680 ◽

Cited By ~ 7

Author(s):

Cheng Li ◽

Sanvesh Srivastava ◽

David B. Dunson

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Interval Estimation ◽

Real Data ◽

Estimation Algorithm ◽

Massive Datasets ◽

Posterior Sampling ◽

Credible Intervals ◽

Sampling Algorithms

Summary Standard posterior sampling algorithms, such as Markov chain Monte Carlo procedures, face major challenges in scaling up to massive datasets. We propose a simple and general posterior interval estimation algorithm to rapidly and accurately estimate quantiles of the posterior distributions for one-dimensional functionals. Our algorithm runs Markov chain Monte Carlo in parallel for subsets of the data, and then averages quantiles estimated from each subset. We provide strong theoretical guarantees and show that the credible intervals from our algorithm asymptotically approximate those from the full posterior in the leading parametric order. Our algorithm has a better balance of accuracy and efficiency than its competitors across a variety of simulations and a real-data example.

Download Full-text

Bayesian Analysis of Two Parameter Complementary Exponential Power Distribution

NCC Journal ◽

10.3126/nccj.v3i1.20244 ◽

2018 ◽

Vol 3 (1) ◽

pp. 1-23

Author(s):

A. K. Chaudhary

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Bayesian Analysis ◽

Power Distribution ◽

Real Data ◽

Mcmc Methods ◽

Output Analysis ◽

Data Set ◽

Exponential Power Distribution

In this paper, the Markov chain Monte Carlo (MCMC) method is used to estimate the parameters of CEP distribution based on a complete sample. A procedure is developed to obtain Bayes estimates of the parameters of the CEP distribution using Markov Chain Monte Carlo (MCMC) simulation method in OpenBUGS, established software for Bayesian analysis using Markov Chain Monte Carlo (MCMC) methods. The MCMC methods have been shown to be easier to implement computationally, the estimates always exist and are statistically consistent, and their probability intervals are convenient to construct. The R functions are developed to study the statistical properties, model validation and comparison tools of the distribution and the output analysis of MCMC samples generated from OpenBUGS. A real data set is considered for illustration under uniform and gamma sets of priors. NCC Journal Vol. 3, No. 1, 2018, Page: 1-23

Download Full-text

Procedural Reconstruction of 3D Indoor Models from Lidar Data Using Reversible Jump Markov Chain Monte Carlo

Remote Sensing ◽

10.3390/rs12050838 ◽

2020 ◽

Vol 12 (5) ◽

pp. 838 ◽

Cited By ~ 6

Author(s):

Ha Tran ◽

Kourosh Khoshelham

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Real Data ◽

Point Clouds ◽

Shape Grammar ◽

Data Driven ◽

Reversible Jump ◽

Indoor Environments ◽

Architectural Styles

Automated reconstruction of Building Information Models (BIMs) from point clouds has been an intensive and challenging research topic for decades. Traditionally, 3D models of indoor environments are reconstructed purely by data-driven methods, which are susceptible to erroneous and incomplete data. Procedural-based methods such as the shape grammar are more robust to uncertainty and incompleteness of the data as they exploit the regularity and repetition of structural elements and architectural design principles in the reconstruction. Nevertheless, these methods are often limited to simple architectural styles: the so-called Manhattan design. In this paper, we propose a new method based on a combination of a shape grammar and a data-driven process for procedural modelling of indoor environments from a point cloud. The core idea behind the integration is to apply a stochastic process based on reversible jump Markov Chain Monte Carlo (rjMCMC) to guide the automated application of grammar rules in the derivation of a 3D indoor model. Experiments on synthetic and real data sets show the applicability of the method to efficiently generate 3D indoor models of both Manhattan and non-Manhattan environments with high accuracy, completeness, and correctness.

Download Full-text

A Bayesian Estimation and Predictionof Gompertz Extension Distribution Using the MCMC Method

Nepal Journal of Science and Technology ◽

10.3126/njst.v19i1.29795 ◽

2020 ◽

Vol 19 (1) ◽

pp. 142-160

Author(s):

Arun Kumar Chaudhary ◽

Vijay Kumar

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Real Data ◽

Simulation Method ◽

Mcmc Methods ◽

Mcmc Method ◽

Data Set ◽

Bayes Estimates ◽

Check Method

In this paper, the Markov chain Monte Carlo (MCMC) method is used to estimate the parameters of the Gompertz extension distribution based on a complete sample. We have developed a procedure to obtain Bayes estimates of the parameters of the Gompertz extension distribution using Markov Chain Monte Carlo (MCMC) simulation method in OpenBUGS, established software for Bayesian analysis using Markov Chain Monte Carlo (MCMC) methods. We have obtained the Bayes estimates of the parameters, hazard and reliability functions, and their probability intervals are also presented. We have applied the predictive check method to discuss the issue of model compatibility. A real data set is considered for illustration under uniform and gamma priors.

Download Full-text