Particle Gibbs sampling for Bayesian phylogenetic inference

Bioinformatics ◽

10.1093/bioinformatics/btaa867 ◽

2020 ◽

Author(s):

Shijia Wang ◽

Liangliang Wang

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Phylogenetic Trees ◽

Sequential Monte Carlo ◽

Evolutionary Model ◽

Supplementary Information ◽

Proposal Distribution ◽

Complementary Method ◽

Tree Inference ◽

Particle Mcmc

Abstract Motivation The combinatorial sequential Monte Carlo (CSMC) has been demonstrated to be an efficient complementary method to the standard Markov chain Monte Carlo (MCMC) for Bayesian phylogenetic tree inference using biological sequences. It is appealing to combine the CSMC and MCMC in the framework of the particle Gibbs (PG) sampler to jointly estimate the phylogenetic trees and evolutionary parameters. However, the Markov chain of the PG may mix poorly for high dimensional problems (e.g. phylogenetic trees). Some remedies, including the PG with ancestor sampling and the interacting particle MCMC, have been proposed to improve the PG. But they either cannot be applied to or remain inefficient for the combinatorial tree space. Results We introduce a novel CSMC method by proposing a more efficient proposal distribution. It also can be combined into the PG sampler framework to infer parameters in the evolutionary model. The new algorithm can be easily parallelized by allocating samples over different computing cores. We validate that the developed CSMC can sample trees more efficiently in various PG samplers via numerical experiments. Availability and implementation The implementation of our method and the data underlying this article are available at https://github.com/liangliangwangsfu/phyloPMCMC. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Geometry of Ranked Nearest Neighbour Interchange Space of Phylogenetic Trees

10.1101/2019.12.19.883603 ◽

2019 ◽

Author(s):

Lena Collienne ◽

Kieran Elmes ◽

Mareike Fischer ◽

David Bryant ◽

Alex Gavryushkin

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Maximum Likelihood ◽

Phylogenetic Trees ◽

Search Space ◽

Nearest Neighbour ◽

Adjacency Relation ◽

Inference Algorithms ◽

Tree Inference

AbstractIn this paper we study the graph of ranked phylogenetic trees where the adjacency relation is given by a local rearrangement of the tree structure. Our work is motivated by tree inference algorithms, such as maximum likelihood and Markov Chain Monte Carlo methods, where the geometry of the search space plays a central role for efficiency and practicality of optimisation and sampling. We hence focus on understanding the geometry of the space (graph) of ranked trees, the so-called ranked nearest neighbour interchange (RNNI) graph. We find the radius and diameter of the space exactly, improving the best previously known estimates. Since the RNNI graph is a generalisation of the classical nearest neighbour interchange (NNI) graph to ranked phylogenetic trees, we compare geometric and algorithmic properties of the two graphs. Surprisingly, we discover that both geometric and algorithmic properties of RNNI and NNI are quite different. For example, we establish convexity of certain natural subspaces in RNNI which are not convex is NNI. Our results suggest that the complexity of computing distances in the two graphs is different.

Download Full-text

A sequential Bayesian approach for the estimation of the age–depth relationship of Dome Fuji ice core

Nonlinear Processes in Geophysics Discussions ◽

10.5194/npgd-2-939-2015 ◽

2015 ◽

Vol 2 (3) ◽

pp. 939-968

Author(s):

S. Nakano ◽

K. Suzuki ◽

K. Kawamura ◽

F. Parrenin ◽

T. Higuchi

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Sequential Monte Carlo ◽

Ice Core ◽

Posterior Distributions ◽

Metropolis Method ◽

Depth Relationship ◽

Relationship Of ◽

Thinning Process

Abstract. A technique for estimating the age–depth relationship in an ice core and evaluating its uncertainty is presented. The age–depth relationship is mainly determined by the accumulation of snow at the site of the ice core and the thinning process due to the horizontal stretching and vertical compression of ice layers. However, since neither the accumulation process nor the thinning process are fully understood, it is essential to incorporate observational information into a model that describes the accumulation and thinning processes. In the proposed technique, the age as a function of depth is estimated from age markers and δ18O data. The estimation is achieved using the particle Markov chain Monte Carlo (PMCMC) method, in which the sequential Monte Carlo (SMC) method is combined with the Markov chain Monte Carlo method. In this hybrid method, the posterior distributions for the parameters in the models for the accumulation and thinning processes are computed using the Metropolis method, in which the likelihood is obtained with the SMC method. Meanwhile, the posterior distribution for the age as a function of depth is obtained by collecting the samples generated by the SMC method with Metropolis iterations. The use of this PMCMC method enables us to estimate the age–depth relationship without assuming either linearity or Gaussianity. The performance of the proposed technique is demonstrated by applying it to ice core data from Dome Fuji in Antarctica.

Download Full-text

An Efficient Independence Sampler for Updating Branches in Bayesian Markov chain Monte Carlo Sampling of Phylogenetic Trees

Systematic Biology ◽

10.1093/sysbio/syv051 ◽

2015 ◽

Vol 65 (1) ◽

pp. 161-176 ◽

Cited By ~ 6

Author(s):

Andre J. Aberer ◽

Alexandros Stamatakis ◽

Fredrik Ronquist

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Phylogenetic Trees ◽

Monte Carlo Sampling ◽

Independence Sampler

Download Full-text

A Gaussian Mixture Model as a Proposal Distribution for Efficient Markov-Chain Monte Carlo Characterization of Uncertainty in Reservoir Description and Forecasting

SPE Journal ◽

10.2118/182684-pa ◽

2019 ◽

Vol 25 (01) ◽

pp. 001-036 ◽

Cited By ~ 1

Author(s):

Xin Li ◽

Albert C. Reynolds

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Random Walk ◽

Markov Chain Monte Carlo ◽

Uncertainty Quantification ◽

Gaussian Mixture Model ◽

Gaussian Mixture ◽

Mcmc Algorithm ◽

Proposal Distribution

Summary Generating an estimate of uncertainty in production forecasts has become nearly standard in the oil industry, but is often performed with procedures that yield at best a highly approximate uncertainty quantification. Formally, the uncertainty quantification of a production forecast can be achieved by generating a correct characterization of the posterior probability-density function (PDF) of reservoir-model parameters conditional to dynamic data and then sampling this PDF correctly. Although Markov-chain Monte Carlo (MCMC) provides a theoretically rigorous method for sampling any target PDF that is known up to a normalizing constant, in reservoir-engineering applications, researchers have found that it might require extraordinarily long chains containing millions to hundreds of millions of states to obtain a correct characterization of the target PDF. When the target PDF has a single mode or has multiple modes concentrated in a small region, it might be possible to implement a proposal distribution dependent on a random walk so that the resulting MCMC algorithm derived from the Metropolis-Hastings acceptance probability can yield a good characterization of the posterior PDF with a computationally feasible chain length. However, for a high-dimensional multimodal PDF with modes separated by large regions of low or zero probability, characterizing the PDF with MCMC using a random walk is not computationally feasible. Although methods such as population MCMC exist for characterizing a multimodal PDF, their computational cost generally makes the application of these algorithms far too costly for field application. In this paper, we design a new proposal distribution using a Gaussian mixture PDF for use in MCMC where the posterior PDF can be multimodal with the modes spread far apart. Simply put, the method generates modes using a gradient-based optimization method and constructs a Gaussian mixture model (GMM) to use as the basic proposal distribution. Tests on three simple problems are presented to establish the validity of the method. The performance of the new MCMC algorithm is compared with that of random-walk MCMC and is also compared with that of population MCMC for a target PDF that is multimodal.

Download Full-text

An effective proposal distribution for sequential Monte Carlo methods-based wildfire data assimilation

2013 Winter Simulations Conference (WSC) ◽

10.1109/wsc.2013.6721573 ◽

2013 ◽

Cited By ~ 1

Author(s):

Haidong Xue ◽

Xiaolin Hu

Keyword(s):

Monte Carlo ◽

Data Assimilation ◽

Monte Carlo Methods ◽

Sequential Monte Carlo ◽

Proposal Distribution ◽

Sequential Monte Carlo Methods

Download Full-text

Sequential Monte Carlo Samplers with Independent Markov Chain Monte Carlo Proposals

Bayesian Analysis ◽

10.1214/18-ba1129 ◽

2019 ◽

Vol 14 (3) ◽

pp. 753-776 ◽

Cited By ~ 5

Author(s):

L. F. South ◽

A. N. Pettitt ◽

C. C. Drovandi

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Sequential Monte Carlo ◽

Independent Markov Chain

Download Full-text

Limit theorems for sequential MCMC methods

Advances in Applied Probability ◽

10.1017/apr.2020.9 ◽

2020 ◽

Vol 52 (2) ◽

pp. 377-403 ◽

Cited By ~ 2

Author(s):

Axel Finke ◽

Arnaud Doucet ◽

Adam M. Johansen

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Sequential Monte Carlo ◽

Probability Distributions ◽

Asymptotic Variance ◽

Mcmc Methods ◽

Time Step ◽

Strong Law ◽

Large Numbers

AbstractBoth sequential Monte Carlo (SMC) methods (a.k.a. ‘particle filters’) and sequential Markov chain Monte Carlo (sequential MCMC) methods constitute classes of algorithms which can be used to approximate expectations with respect to (a sequence of) probability distributions and their normalising constants. While SMC methods sample particles conditionally independently at each time step, sequential MCMC methods sample particles according to a Markov chain Monte Carlo (MCMC) kernel. Introduced over twenty years ago in [6], sequential MCMC methods have attracted renewed interest recently as they empirically outperform SMC methods in some applications. We establish an $\mathbb{L}_r$ -inequality (which implies a strong law of large numbers) and a central limit theorem for sequential MCMC methods and provide conditions under which errors can be controlled uniformly in time. In the context of state-space models, we also provide conditions under which sequential MCMC methods can indeed outperform standard SMC methods in terms of asymptotic variance of the corresponding Monte Carlo estimators.

Download Full-text

A comparison of approximate versus exact techniques for Bayesian parameter inference in nonlinear ordinary differential equation models

Royal Society Open Science ◽

10.1098/rsos.191315 ◽

2020 ◽

Vol 7 (3) ◽

pp. 191315

Author(s):

Amani A. Alahmadi ◽

Jennifer A. Flegg ◽

Davis G. Cochrane ◽

Christopher C. Drovandi ◽

Jonathan M. Keith

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Bayesian Inference ◽

Sequential Monte Carlo ◽

Simulated Data ◽

Real Data ◽

Nonlinear Ordinary Differential Equation ◽

Unknown Parameters ◽

Acceptance Probability

The behaviour of many processes in science and engineering can be accurately described by dynamical system models consisting of a set of ordinary differential equations (ODEs). Often these models have several unknown parameters that are difficult to estimate from experimental data, in which case Bayesian inference can be a useful tool. In principle, exact Bayesian inference using Markov chain Monte Carlo (MCMC) techniques is possible; however, in practice, such methods may suffer from slow convergence and poor mixing. To address this problem, several approaches based on approximate Bayesian computation (ABC) have been introduced, including Markov chain Monte Carlo ABC (MCMC ABC) and sequential Monte Carlo ABC (SMC ABC). While the system of ODEs describes the underlying process that generates the data, the observed measurements invariably include errors. In this paper, we argue that several popular ABC approaches fail to adequately model these errors because the acceptance probability depends on the choice of the discrepancy function and the tolerance without any consideration of the error term. We observe that the so-called posterior distributions derived from such methods do not accurately reflect the epistemic uncertainties in parameter values. Moreover, we demonstrate that these methods provide minimal computational advantages over exact Bayesian methods when applied to two ODE epidemiological models with simulated data and one with real data concerning malaria transmission in Afghanistan.

Download Full-text

An efficient sequential learning algorithm in regime-switching environments

Studies in Nonlinear Dynamics & Econometrics ◽

10.1515/snde-2018-0016 ◽

2018 ◽

Vol 23 (3) ◽

Author(s):

Jaeho Kim ◽

Sunhyung Lee

Keyword(s):

Monte Carlo ◽

Regime Switching ◽

Sequential Monte Carlo ◽

Learning Algorithm ◽

State Space Model ◽

Gaussian State ◽

Proposal Distribution ◽

Novel Approach ◽

Particle Learning ◽

Non Gaussian

Abstract We provide a novel approach of estimating a regime-switching nonlinear and non-Gaussian state-space model based on a particle learning scheme. In particular, we extend the particle learning method in Liu, J., and M. West. 2001. “Combined Parameter and State Estimation in Simulation-Based Filtering.” In Sequential Monte Carlo Methods in Practice, 197–223. Springer. by constructing a new proposal distribution for the latent regime index variable that incorporates all available information contained in the current and past observations. The Monte Carlo simulation result implies that our approach categorically outperforms a popular existing algorithm. For empirical illustration, the proposed algorithm is used to analyze the underlying dynamics of US excess stock return.

Download Full-text

Infectious Disease Dynamics Inferred from Genetic Data via Sequential Monte Carlo

10.1101/096396 ◽

2016 ◽

Author(s):

R. A. Smith ◽

E. L. Ionides ◽

A. A. King

Keyword(s):

Infectious Disease ◽

Monte Carlo ◽

Phylogenetic Trees ◽

Sequential Monte Carlo ◽

Genetic Data ◽

Monte Carlo Algorithm ◽

Transmission Model ◽

Disease Dynamics ◽

Genetic Sequences ◽

Infectious Disease Dynamics

AbstractGenetic sequences from pathogens can provide information about infectious disease dynamics that may supplement or replace information from other epidemiological observations. Currently available methods first estimate phylogenetic trees from sequence data, then estimate a transmission model conditional on these phylogenies. Outside limited classes of models, existing methods are unable to enforce logical consistency between the model of transmission and that underlying the phylogenetic reconstruction. Such conflicts in assumptions can lead to bias in the resulting inferences. Here, we develop a general, statistically efficient, plug-and-play method to jointly estimate both disease transmission and phylogeny using genetic data and, if desired, other epidemiological observations. This method explicitly connects the model of transmission and the model of phylogeny so as to avoid the aforementioned inconsistency. We demonstrate the feasibility of our approach through simulation and apply it to estimate stage-specific infectiousness in a subepidemic of HIV in Detroit, Michigan. In a supplement, we prove that our approach is a valid sequential Monte Carlo algorithm. While we focus on how these methods may be applied to population-level models of infectious disease, their scope is more general. These methods may be applied in other biological systems where one seeks to infer population dynamics from genetic sequences, and they may also find application for evolutionary models with phenotypic rather than genotypic data.

Download Full-text