scholarly journals Selecting and averaging relaxed clock models in Bayesian tip dating of Mesozoic birds

Paleobiology ◽  
2021 ◽  
pp. 1-13
Author(s):  
Chi Zhang

Abstract Relaxed clock models are fundamental in Bayesian clock dating, but a single distribution characterizing the clock variation is typically selected. Hence, I developed a new reversible-jump Markov chain Monte Carlo (rjMCMC) algorithm for drawing posterior samples between the independent lognormal (ILN) and independent gamma rates (IGR) clock models. The ability of the rjMCMC algorithm to infer the true model was verified through simulations. I then applied the algorithm to the Mesozoic bird data previously analyzed under the white noise (WN) clock model. In comparison, averaging over the ILN and IGR models provided more reliable estimates of the divergence times and evolutionary rates. The ILN model showed slightly better fit than the IGR model and much better fit than the autocorrelated lognormal (ALN) clock model. When the data were partitioned, different partitions showed heterogeneous model fit for ILN and IGR clocks. The implementation provides a general framework for selecting and averaging relaxed clock models in Bayesian dating analyses.

2020 ◽  
Vol 38 (1) ◽  
pp. 307-317
Author(s):  
Xavier Didelot ◽  
Igor Siveroni ◽  
Erik M Volz

Abstract Phylogenetic dating is one of the most powerful and commonly used methods of drawing epidemiological interpretations from pathogen genomic data. Building such trees requires considering a molecular clock model which represents the rate at which substitutions accumulate on genomes. When the molecular clock rate is constant throughout the tree then the clock is said to be strict, but this is often not an acceptable assumption. Alternatively, relaxed clock models consider variations in the clock rate, often based on a distribution of rates for each branch. However, we show here that the distributions of rates across branches in commonly used relaxed clock models are incompatible with the biological expectation that the sum of the numbers of substitutions on two neighboring branches should be distributed as the substitution number on a single branch of equivalent length. We call this expectation the additivity property. We further show how assumptions of commonly used relaxed clock models can lead to estimates of evolutionary rates and dates with low precision and biased confidence intervals. We therefore propose a new additive relaxed clock model where the additivity property is satisfied. We illustrate the use of our new additive relaxed clock model on a range of simulated and real data sets, and we show that using this new model leads to more accurate estimates of mean evolutionary rates and ancestral dates.


2021 ◽  
Vol 17 (2) ◽  
pp. e1008322
Author(s):  
Jordan Douglas ◽  
Rong Zhang ◽  
Remco Bouckaert

Relaxed clock models enable estimation of molecular substitution rates across lineages and are widely used in phylogenetics for dating evolutionary divergence times. Under the (uncorrelated) relaxed clock model, tree branches are associated with molecular substitution rates which are independently and identically distributed. In this article we delved into the internal complexities of the relaxed clock model in order to develop efficient MCMC operators for Bayesian phylogenetic inference. We compared three substitution rate parameterisations, introduced an adaptive operator which learns the weights of other operators during MCMC, and we explored how relaxed clock model estimation can benefit from two cutting-edge proposal kernels: the AVMVN and Bactrian kernels. This work has produced an operator scheme that is up to 65 times more efficient at exploring continuous relaxed clock parameters compared with previous setups, depending on the dataset. Finally, we explored variants of the standard narrow exchange operator which are specifically designed for the relaxed clock model. In the most extreme case, this new operator traversed tree space 40% more efficiently than narrow exchange. The methodologies introduced are adaptive and highly effective on short as well as long alignments. The results are available via the open source optimised relaxed clock (ORC) package for BEAST 2 under a GNU licence (https://github.com/jordandouglas/ORC).


2020 ◽  
Author(s):  
Jordan Douglas ◽  
Rong Zhang ◽  
Remco Bouckaert

AbstractUncorrelated relaxed clock models enable estimation of molecular substitution rates across lineages and are widely used in phylogenetics for dating evolutionary divergence times. In this article we delved into the internal complexities of the relaxed clock model in order to develop efficient MCMC operators for Bayesian phylogenetic inference. We compared three substitution rate parameterisations, introduced an adaptive operator which learns the weights of other operators during MCMC, and we explored how relaxed clock model estimation can benefit from two cutting-edge proposal kernels: the AVMVN and Bactrian kernels. This work has produced an operator scheme that is up to 65 times more efficient at exploring continuous relaxed clock parameters compared with previous setups, depending on the dataset. Finally, we explored variants of the standard narrow exchange operator which are specifically designed for the relaxed clock model. In the most extreme case, this new operator traversed tree space 40% more efficiently than narrow exchange. The methodologies introduced are adaptive and highly effective on short as well as long alignments. The results are available via the open source optimised relaxed clock (ORC) package for BEAST 2 under a GNU licence (https://github.com/jordandouglas/ORC).Author summaryBiological sequences, such as DNA, accumulate mutations over generations. By comparing such sequences in a phylogenetic framework, the evolutionary tree of lifeforms can be inferred. With the overwhelming availability of biological sequence data, and the increasing affordability of collecting new data, the development of fast and efficient phylogenetic algorithms is more important than ever. In this article we focus on the relaxed clock model, which is very popular in phylogenetics. We explored how a range of optimisations can improve the statistical inference of the relaxed clock. This work has produced a phylogenetic setup which can infer parameters related to the relaxed clock up to 65 times faster than previous setups, depending on the dataset. The methods introduced adapt to the dataset during computation and are highly efficient when processing long biological sequences.


2016 ◽  
Vol 371 (1699) ◽  
pp. 20150132 ◽  
Author(s):  
Nicolas Lartillot ◽  
Matthew J. Phillips ◽  
Fredrik Ronquist

Over recent years, several alternative relaxed clock models have been proposed in the context of Bayesian dating. These models fall in two distinct categories: uncorrelated and autocorrelated across branches. The choice between these two classes of relaxed clocks is still an open question. More fundamentally, the true process of rate variation may have both long-term trends and short-term fluctuations, suggesting that more sophisticated clock models unfolding over multiple time scales should ultimately be developed. Here, a mixed relaxed clock model is introduced, which can be mechanistically interpreted as a rate variation process undergoing short-term fluctuations on the top of Brownian long-term trends. Statistically, this mixed clock represents an alternative solution to the problem of choosing between autocorrelated and uncorrelated relaxed clocks, by proposing instead to combine their respective merits. Fitting this model on a dataset of 105 placental mammals, using both node-dating and tip-dating approaches, suggests that the two pure clocks, Brownian and white noise, are rejected in favour of a mixed model with approximately equal contributions for its uncorrelated and autocorrelated components. The tip-dating analysis is particularly sensitive to the choice of the relaxed clock model. In this context, the classical pure Brownian relaxed clock appears to be overly rigid, leading to biases in divergence time estimation. By contrast, the use of a mixed clock leads to more recent and more reasonable estimates for the crown ages of placental orders and superorders. Altogether, the mixed clock introduced here represents a first step towards empirically more adequate models of the patterns of rate variation across phylogenetic trees. This article is part of the themed issue ‘Dating species divergences using rocks and clocks’.


Geophysics ◽  
2016 ◽  
Vol 81 (5) ◽  
pp. R293-R305 ◽  
Author(s):  
Sireesh Dadi ◽  
Richard Gibson ◽  
Kainan Wang

Upscaling log measurements acquired at high frequencies and correlating them with corresponding low-frequency values from surface seismic and vertical seismic profile data is a challenging task. We have applied a sampling technique called the reversible jump Markov chain Monte Carlo (RJMCMC) method to this problem. A key property of our approach is that it treats the number of unknowns itself as a parameter to be determined. Specifically, we have considered upscaling as an inverse problem in which we considered the number of coarse layers, layer boundary depths, and material properties as the unknowns. The method applies Bayesian inversion, with RJMCMC sampling and uses simulated annealing to guide the optimization. At each iteration, the algorithm will randomly move a boundary in the current model, add a new boundary, or delete an existing boundary. In each case, a random perturbation is applied to Backus-average values. We have developed examples showing that the mismatch between seismograms computed from the upscaled model and log velocities improves by 89% compared to the case in which the algorithm is allowed to move boundaries only. The layer boundary distributions after running the RJMCMC algorithm can represent sharp and gradual changes in lithology. The maximum deviation of upscaled velocities from Backus-average values is less than 10% with most of the values close to zero.


2016 ◽  
Vol 9 (9) ◽  
pp. 3213-3229 ◽  
Author(s):  
Mark F. Lunt ◽  
Matt Rigby ◽  
Anita L. Ganesan ◽  
Alistair J. Manning

Abstract. Atmospheric trace gas inversions often attempt to attribute fluxes to a high-dimensional grid using observations. To make this problem computationally feasible, and to reduce the degree of under-determination, some form of dimension reduction is usually performed. Here, we present an objective method for reducing the spatial dimension of the parameter space in atmospheric trace gas inversions. In addition to solving for a set of unknowns that govern emissions of a trace gas, we set out a framework that considers the number of unknowns to itself be an unknown. We rely on the well-established reversible-jump Markov chain Monte Carlo algorithm to use the data to determine the dimension of the parameter space. This framework provides a single-step process that solves for both the resolution of the inversion grid, as well as the magnitude of fluxes from this grid. Therefore, the uncertainty that surrounds the choice of aggregation is accounted for in the posterior parameter distribution. The posterior distribution of this transdimensional Markov chain provides a naturally smoothed solution, formed from an ensemble of coarser partitions of the spatial domain. We describe the form of the reversible-jump algorithm and how it may be applied to trace gas inversions. We build the system into a hierarchical Bayesian framework in which other unknown factors, such as the magnitude of the model uncertainty, can also be explored. A pseudo-data example is used to show the usefulness of this approach when compared to a subjectively chosen partitioning of a spatial domain. An inversion using real data is also shown to illustrate the scales at which the data allow for methane emissions over north-west Europe to be resolved.


2019 ◽  
Vol 7 (6) ◽  
pp. 896-912 ◽  
Author(s):  
Caitlin Gray ◽  
Lewis Mitchell ◽  
Matthew Roughan

Abstract Sampling random graphs is essential in many applications, and often algorithms use Markov chain Monte Carlo methods to sample uniformly from the space of graphs. However, often there is a need to sample graphs with some property that we are unable, or it is too inefficient, to sample using standard approaches. In this article, we are interested in sampling graphs from a conditional ensemble of the underlying graph model. We present an algorithm to generate samples from an ensemble of connected random graphs using a Metropolis–Hastings framework. The algorithm extends to a general framework for sampling from a known distribution of graphs, conditioned on a desired property. We demonstrate the method to generate connected spatially embedded random graphs, specifically the well-known Waxman network, and illustrate the convergence and practicalities of the algorithm.


Sign in / Sign up

Export Citation Format

Share Document