scholarly journals Particle Gibbs sampling for Bayesian phylogenetic inference

Author(s):  
Shijia Wang ◽  
Liangliang Wang

Abstract Motivation The combinatorial sequential Monte Carlo (CSMC) has been demonstrated to be an efficient complementary method to the standard Markov chain Monte Carlo (MCMC) for Bayesian phylogenetic tree inference using biological sequences. It is appealing to combine the CSMC and MCMC in the framework of the particle Gibbs (PG) sampler to jointly estimate the phylogenetic trees and evolutionary parameters. However, the Markov chain of the PG may mix poorly for high dimensional problems (e.g. phylogenetic trees). Some remedies, including the PG with ancestor sampling and the interacting particle MCMC, have been proposed to improve the PG. But they either cannot be applied to or remain inefficient for the combinatorial tree space. Results We introduce a novel CSMC method by proposing a more efficient proposal distribution. It also can be combined into the PG sampler framework to infer parameters in the evolutionary model. The new algorithm can be easily parallelized by allocating samples over different computing cores. We validate that the developed CSMC can sample trees more efficiently in various PG samplers via numerical experiments. Availability and implementation The implementation of our method and the data underlying this article are available at https://github.com/liangliangwangsfu/phyloPMCMC. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Lena Collienne ◽  
Kieran Elmes ◽  
Mareike Fischer ◽  
David Bryant ◽  
Alex Gavryushkin

AbstractIn this paper we study the graph of ranked phylogenetic trees where the adjacency relation is given by a local rearrangement of the tree structure. Our work is motivated by tree inference algorithms, such as maximum likelihood and Markov Chain Monte Carlo methods, where the geometry of the search space plays a central role for efficiency and practicality of optimisation and sampling. We hence focus on understanding the geometry of the space (graph) of ranked trees, the so-called ranked nearest neighbour interchange (RNNI) graph. We find the radius and diameter of the space exactly, improving the best previously known estimates. Since the RNNI graph is a generalisation of the classical nearest neighbour interchange (NNI) graph to ranked phylogenetic trees, we compare geometric and algorithmic properties of the two graphs. Surprisingly, we discover that both geometric and algorithmic properties of RNNI and NNI are quite different. For example, we establish convexity of certain natural subspaces in RNNI which are not convex is NNI. Our results suggest that the complexity of computing distances in the two graphs is different.


2015 ◽  
Vol 2 (3) ◽  
pp. 939-968
Author(s):  
S. Nakano ◽  
K. Suzuki ◽  
K. Kawamura ◽  
F. Parrenin ◽  
T. Higuchi

Abstract. A technique for estimating the age–depth relationship in an ice core and evaluating its uncertainty is presented. The age–depth relationship is mainly determined by the accumulation of snow at the site of the ice core and the thinning process due to the horizontal stretching and vertical compression of ice layers. However, since neither the accumulation process nor the thinning process are fully understood, it is essential to incorporate observational information into a model that describes the accumulation and thinning processes. In the proposed technique, the age as a function of depth is estimated from age markers and δ18O data. The estimation is achieved using the particle Markov chain Monte Carlo (PMCMC) method, in which the sequential Monte Carlo (SMC) method is combined with the Markov chain Monte Carlo method. In this hybrid method, the posterior distributions for the parameters in the models for the accumulation and thinning processes are computed using the Metropolis method, in which the likelihood is obtained with the SMC method. Meanwhile, the posterior distribution for the age as a function of depth is obtained by collecting the samples generated by the SMC method with Metropolis iterations. The use of this PMCMC method enables us to estimate the age–depth relationship without assuming either linearity or Gaussianity. The performance of the proposed technique is demonstrated by applying it to ice core data from Dome Fuji in Antarctica.


SPE Journal ◽  
2019 ◽  
Vol 25 (01) ◽  
pp. 001-036 ◽  
Author(s):  
Xin Li ◽  
Albert C. Reynolds

Summary Generating an estimate of uncertainty in production forecasts has become nearly standard in the oil industry, but is often performed with procedures that yield at best a highly approximate uncertainty quantification. Formally, the uncertainty quantification of a production forecast can be achieved by generating a correct characterization of the posterior probability-density function (PDF) of reservoir-model parameters conditional to dynamic data and then sampling this PDF correctly. Although Markov-chain Monte Carlo (MCMC) provides a theoretically rigorous method for sampling any target PDF that is known up to a normalizing constant, in reservoir-engineering applications, researchers have found that it might require extraordinarily long chains containing millions to hundreds of millions of states to obtain a correct characterization of the target PDF. When the target PDF has a single mode or has multiple modes concentrated in a small region, it might be possible to implement a proposal distribution dependent on a random walk so that the resulting MCMC algorithm derived from the Metropolis-Hastings acceptance probability can yield a good characterization of the posterior PDF with a computationally feasible chain length. However, for a high-dimensional multimodal PDF with modes separated by large regions of low or zero probability, characterizing the PDF with MCMC using a random walk is not computationally feasible. Although methods such as population MCMC exist for characterizing a multimodal PDF, their computational cost generally makes the application of these algorithms far too costly for field application. In this paper, we design a new proposal distribution using a Gaussian mixture PDF for use in MCMC where the posterior PDF can be multimodal with the modes spread far apart. Simply put, the method generates modes using a gradient-based optimization method and constructs a Gaussian mixture model (GMM) to use as the basic proposal distribution. Tests on three simple problems are presented to establish the validity of the method. The performance of the new MCMC algorithm is compared with that of random-walk MCMC and is also compared with that of population MCMC for a target PDF that is multimodal.


2019 ◽  
Vol 14 (3) ◽  
pp. 753-776 ◽  
Author(s):  
L. F. South ◽  
A. N. Pettitt ◽  
C. C. Drovandi

2020 ◽  
Vol 52 (2) ◽  
pp. 377-403 ◽  
Author(s):  
Axel Finke ◽  
Arnaud Doucet ◽  
Adam M. Johansen

AbstractBoth sequential Monte Carlo (SMC) methods (a.k.a. ‘particle filters’) and sequential Markov chain Monte Carlo (sequential MCMC) methods constitute classes of algorithms which can be used to approximate expectations with respect to (a sequence of) probability distributions and their normalising constants. While SMC methods sample particles conditionally independently at each time step, sequential MCMC methods sample particles according to a Markov chain Monte Carlo (MCMC) kernel. Introduced over twenty years ago in [6], sequential MCMC methods have attracted renewed interest recently as they empirically outperform SMC methods in some applications. We establish an $\mathbb{L}_r$ -inequality (which implies a strong law of large numbers) and a central limit theorem for sequential MCMC methods and provide conditions under which errors can be controlled uniformly in time. In the context of state-space models, we also provide conditions under which sequential MCMC methods can indeed outperform standard SMC methods in terms of asymptotic variance of the corresponding Monte Carlo estimators.


2020 ◽  
Vol 7 (3) ◽  
pp. 191315
Author(s):  
Amani A. Alahmadi ◽  
Jennifer A. Flegg ◽  
Davis G. Cochrane ◽  
Christopher C. Drovandi ◽  
Jonathan M. Keith

The behaviour of many processes in science and engineering can be accurately described by dynamical system models consisting of a set of ordinary differential equations (ODEs). Often these models have several unknown parameters that are difficult to estimate from experimental data, in which case Bayesian inference can be a useful tool. In principle, exact Bayesian inference using Markov chain Monte Carlo (MCMC) techniques is possible; however, in practice, such methods may suffer from slow convergence and poor mixing. To address this problem, several approaches based on approximate Bayesian computation (ABC) have been introduced, including Markov chain Monte Carlo ABC (MCMC ABC) and sequential Monte Carlo ABC (SMC ABC). While the system of ODEs describes the underlying process that generates the data, the observed measurements invariably include errors. In this paper, we argue that several popular ABC approaches fail to adequately model these errors because the acceptance probability depends on the choice of the discrepancy function and the tolerance without any consideration of the error term. We observe that the so-called posterior distributions derived from such methods do not accurately reflect the epistemic uncertainties in parameter values. Moreover, we demonstrate that these methods provide minimal computational advantages over exact Bayesian methods when applied to two ODE epidemiological models with simulated data and one with real data concerning malaria transmission in Afghanistan.


2018 ◽  
Vol 23 (3) ◽  
Author(s):  
Jaeho Kim ◽  
Sunhyung Lee

Abstract We provide a novel approach of estimating a regime-switching nonlinear and non-Gaussian state-space model based on a particle learning scheme. In particular, we extend the particle learning method in Liu, J., and M. West. 2001. “Combined Parameter and State Estimation in Simulation-Based Filtering.” In Sequential Monte Carlo Methods in Practice, 197–223. Springer. by constructing a new proposal distribution for the latent regime index variable that incorporates all available information contained in the current and past observations. The Monte Carlo simulation result implies that our approach categorically outperforms a popular existing algorithm. For empirical illustration, the proposed algorithm is used to analyze the underlying dynamics of US excess stock return.


2016 ◽  
Author(s):  
R. A. Smith ◽  
E. L. Ionides ◽  
A. A. King

AbstractGenetic sequences from pathogens can provide information about infectious disease dynamics that may supplement or replace information from other epidemiological observations. Currently available methods first estimate phylogenetic trees from sequence data, then estimate a transmission model conditional on these phylogenies. Outside limited classes of models, existing methods are unable to enforce logical consistency between the model of transmission and that underlying the phylogenetic reconstruction. Such conflicts in assumptions can lead to bias in the resulting inferences. Here, we develop a general, statistically efficient, plug-and-play method to jointly estimate both disease transmission and phylogeny using genetic data and, if desired, other epidemiological observations. This method explicitly connects the model of transmission and the model of phylogeny so as to avoid the aforementioned inconsistency. We demonstrate the feasibility of our approach through simulation and apply it to estimate stage-specific infectiousness in a subepidemic of HIV in Detroit, Michigan. In a supplement, we prove that our approach is a valid sequential Monte Carlo algorithm. While we focus on how these methods may be applied to population-level models of infectious disease, their scope is more general. These methods may be applied in other biological systems where one seeks to infer population dynamics from genetic sequences, and they may also find application for evolutionary models with phenotypic rather than genotypic data.


Sign in / Sign up

Export Citation Format

Share Document