scholarly journals Additive Uncorrelated Relaxed Clock Models for the Dating of Genomic Epidemiology Phylogenies

2020 ◽  
Vol 38 (1) ◽  
pp. 307-317
Author(s):  
Xavier Didelot ◽  
Igor Siveroni ◽  
Erik M Volz

Abstract Phylogenetic dating is one of the most powerful and commonly used methods of drawing epidemiological interpretations from pathogen genomic data. Building such trees requires considering a molecular clock model which represents the rate at which substitutions accumulate on genomes. When the molecular clock rate is constant throughout the tree then the clock is said to be strict, but this is often not an acceptable assumption. Alternatively, relaxed clock models consider variations in the clock rate, often based on a distribution of rates for each branch. However, we show here that the distributions of rates across branches in commonly used relaxed clock models are incompatible with the biological expectation that the sum of the numbers of substitutions on two neighboring branches should be distributed as the substitution number on a single branch of equivalent length. We call this expectation the additivity property. We further show how assumptions of commonly used relaxed clock models can lead to estimates of evolutionary rates and dates with low precision and biased confidence intervals. We therefore propose a new additive relaxed clock model where the additivity property is satisfied. We illustrate the use of our new additive relaxed clock model on a range of simulated and real data sets, and we show that using this new model leads to more accurate estimates of mean evolutionary rates and ancestral dates.

Paleobiology ◽  
2021 ◽  
pp. 1-13
Author(s):  
Chi Zhang

Abstract Relaxed clock models are fundamental in Bayesian clock dating, but a single distribution characterizing the clock variation is typically selected. Hence, I developed a new reversible-jump Markov chain Monte Carlo (rjMCMC) algorithm for drawing posterior samples between the independent lognormal (ILN) and independent gamma rates (IGR) clock models. The ability of the rjMCMC algorithm to infer the true model was verified through simulations. I then applied the algorithm to the Mesozoic bird data previously analyzed under the white noise (WN) clock model. In comparison, averaging over the ILN and IGR models provided more reliable estimates of the divergence times and evolutionary rates. The ILN model showed slightly better fit than the IGR model and much better fit than the autocorrelated lognormal (ALN) clock model. When the data were partitioned, different partitions showed heterogeneous model fit for ILN and IGR clocks. The implementation provides a general framework for selecting and averaging relaxed clock models in Bayesian dating analyses.


2014 ◽  
Vol 281 (1793) ◽  
pp. 20141278 ◽  
Author(s):  
Robin M. D. Beck ◽  
Michael S. Y. Lee

Analyses of a comprehensive morphological character matrix of mammals using ‘relaxed’ clock models (which simultaneously estimate topology, divergence dates and evolutionary rates), either alone or in combination with an 8.5 kb nuclear sequence dataset, retrieve implausibly ancient, Late Jurassic–Early Cretaceous estimates for the initial diversification of Placentalia (crown-group Eutheria). These dates are much older than all recent molecular and palaeontological estimates. They are recovered using two very different clock models, and regardless of whether the tree topology is freely estimated or constrained using scaffolds to match the current consensus placental phylogeny. This raises the possibility that divergence dates have been overestimated in previous analyses that have applied such clock models to morphological and total evidence datasets. Enforcing additional age constraints on selected internal divergences results in only a slight reduction of the age of Placentalia. Constraining Placentalia to less than 93.8 Ma, congruent with recent molecular estimates, does not require major changes in morphological or molecular evolutionary rates. Even constraining Placentalia to less than 66 Ma to match the ‘explosive’ palaeontological model results in only a 10- to 20-fold increase in maximum evolutionary rate for morphology, and fivefold for molecules. The large discrepancies between clock- and fossil-based estimates for divergence dates might therefore be attributable to relatively small changes in evolutionary rates through time, although other explanations (such as overly simplistic models of morphological evolution) need to be investigated. Conversely, dates inferred using relaxed clock models (especially with discrete morphological data and M r B ayes ) should be treated cautiously, as relatively minor deviations in rate patterns can generate large effects on estimated divergence dates.


2019 ◽  
Vol 5 (2) ◽  
Author(s):  
Magda Bletsa ◽  
Marc A Suchard ◽  
Xiang Ji ◽  
Sophie Gryseels ◽  
Bram Vrancken ◽  
...  

Abstract The need to estimate divergence times in evolutionary histories in the presence of various sources of substitution rate variation has stimulated a rich development of relaxed molecular clock models. Viral evolutionary studies frequently adopt an uncorrelated clock model as a generic relaxed molecular clock process, but this may impose considerable estimation bias if discrete rate variation exists among clades or lineages. For HIV-1 group M, rate variation among subtypes has been shown to result in inconsistencies in time to the most recent common ancestor estimation. Although this calls into question the adequacy of available molecular dating methods, no solution to this problem has been offered so far. Here, we investigate the use of mixed effects molecular clock models, which combine both fixed and random effects in the evolutionary rate, to estimate divergence times. Using simulation, we demonstrate that this model outperforms existing molecular clock models in a Bayesian framework for estimating time-measured phylogenies in the presence of mixed sources of rate variation, while also maintaining good performance in simpler scenarios. By analysing a comprehensive HIV-1 group M complete genome data set we confirm considerable rate variation among subtypes that is not adequately modelled by uncorrelated relaxed clock models. The mixed effects clock model can accommodate this rate variation and results in a time to the most recent common ancestor of HIV-1 group M of 1920 (1915–25), which is only slightly earlier than the uncorrelated relaxed clock estimate for the same data set. The use of complete genome data appears to have a more profound impact than the molecular clock model because it reduces the credible intervals by 50 per cent relative to similar estimates based on short envelope gene sequences.


2017 ◽  
Vol 114 (35) ◽  
pp. E7282-E7290 ◽  
Author(s):  
Liang Liu ◽  
Jin Zhang ◽  
Frank E. Rheindt ◽  
Fumin Lei ◽  
Yanhua Qu ◽  
...  

The timing of the diversification of placental mammals relative to the Cretaceous–Paleogene (KPg) boundary mass extinction remains highly controversial. In particular, there have been seemingly irreconcilable differences in the dating of the early placental radiation not only between fossil-based and molecular datasets but also among molecular datasets. To help resolve this discrepancy, we performed genome-scale analyses using 4,388 loci from 90 taxa, including representatives of all extant placental orders and transcriptome data from flying lemurs (Dermoptera) and pangolins (Pholidota). Depending on the gene partitioning scheme, molecular clock model, and genic deviation from molecular clock assumptions, extensive sensitivity analyses recovered widely varying diversification scenarios for placental mammals from a given gene set, ranging from a deep Cretaceous origin and diversification to a scenario spanning the KPg boundary, suggesting that the use of suboptimal molecular clock markers and methodologies is a major cause of controversies regarding placental diversification timing. We demonstrate that reconciliation between molecular and paleontological estimates of placental divergence times can be achieved using the appropriate clock model and gene partitioning scheme while accounting for the degree to which individual genes violate molecular clock assumptions. A birth-death-shift analysis suggests that placental mammals underwent a continuous radiation across the KPg boundary without apparent interruption by the mass extinction, paralleling a genus-level radiation of multituberculates and ecomorphological diversification of both multituberculates and therians. These findings suggest that the KPg catastrophe evidently played a limited role in placental diversification, which, instead, was likely a delayed response to the slightly earlier radiation of angiosperms.


2021 ◽  
Vol 17 (2) ◽  
pp. e1008322
Author(s):  
Jordan Douglas ◽  
Rong Zhang ◽  
Remco Bouckaert

Relaxed clock models enable estimation of molecular substitution rates across lineages and are widely used in phylogenetics for dating evolutionary divergence times. Under the (uncorrelated) relaxed clock model, tree branches are associated with molecular substitution rates which are independently and identically distributed. In this article we delved into the internal complexities of the relaxed clock model in order to develop efficient MCMC operators for Bayesian phylogenetic inference. We compared three substitution rate parameterisations, introduced an adaptive operator which learns the weights of other operators during MCMC, and we explored how relaxed clock model estimation can benefit from two cutting-edge proposal kernels: the AVMVN and Bactrian kernels. This work has produced an operator scheme that is up to 65 times more efficient at exploring continuous relaxed clock parameters compared with previous setups, depending on the dataset. Finally, we explored variants of the standard narrow exchange operator which are specifically designed for the relaxed clock model. In the most extreme case, this new operator traversed tree space 40% more efficiently than narrow exchange. The methodologies introduced are adaptive and highly effective on short as well as long alignments. The results are available via the open source optimised relaxed clock (ORC) package for BEAST 2 under a GNU licence (https://github.com/jordandouglas/ORC).


2020 ◽  
Author(s):  
Jordan Douglas ◽  
Rong Zhang ◽  
Remco Bouckaert

AbstractUncorrelated relaxed clock models enable estimation of molecular substitution rates across lineages and are widely used in phylogenetics for dating evolutionary divergence times. In this article we delved into the internal complexities of the relaxed clock model in order to develop efficient MCMC operators for Bayesian phylogenetic inference. We compared three substitution rate parameterisations, introduced an adaptive operator which learns the weights of other operators during MCMC, and we explored how relaxed clock model estimation can benefit from two cutting-edge proposal kernels: the AVMVN and Bactrian kernels. This work has produced an operator scheme that is up to 65 times more efficient at exploring continuous relaxed clock parameters compared with previous setups, depending on the dataset. Finally, we explored variants of the standard narrow exchange operator which are specifically designed for the relaxed clock model. In the most extreme case, this new operator traversed tree space 40% more efficiently than narrow exchange. The methodologies introduced are adaptive and highly effective on short as well as long alignments. The results are available via the open source optimised relaxed clock (ORC) package for BEAST 2 under a GNU licence (https://github.com/jordandouglas/ORC).Author summaryBiological sequences, such as DNA, accumulate mutations over generations. By comparing such sequences in a phylogenetic framework, the evolutionary tree of lifeforms can be inferred. With the overwhelming availability of biological sequence data, and the increasing affordability of collecting new data, the development of fast and efficient phylogenetic algorithms is more important than ever. In this article we focus on the relaxed clock model, which is very popular in phylogenetics. We explored how a range of optimisations can improve the statistical inference of the relaxed clock. This work has produced a phylogenetic setup which can infer parameters related to the relaxed clock up to 65 times faster than previous setups, depending on the dataset. The methods introduced adapt to the dataset during computation and are highly efficient when processing long biological sequences.


2012 ◽  
Vol 29 (9) ◽  
pp. 2157-2167 ◽  
Author(s):  
Guy Baele ◽  
Philippe Lemey ◽  
Trevor Bedford ◽  
Andrew Rambaut ◽  
Marc A. Suchard ◽  
...  

1998 ◽  
Vol 2 (2) ◽  
pp. 233-248 ◽  
Author(s):  
Gustav Peters ◽  
Barbara A. Tonkin-Leyhausen

Based on the molecular clock model of evolution, molecular phylogenies represent reconstructions of the evolutionary process with a time scale. From these, inferences can be drawn about the evolution of other characters, including behaviour patterns. Mapping particular vocalization types in the Felidae (cats) on a published molecular phylogeny of this mammal family reveals that the distribution of these behavioural characters is fully congruent with it. Thence a time frame for the evolution of these vocalizations can be inferred, indicating large differences in their evolutionary age. Phylogenetic stasis for several million years in particular vocalization types refutes the hypothesis that behavioural characters are generally more susceptible to evolutionary change than morphological ones.


2016 ◽  
Vol 371 (1699) ◽  
pp. 20150132 ◽  
Author(s):  
Nicolas Lartillot ◽  
Matthew J. Phillips ◽  
Fredrik Ronquist

Over recent years, several alternative relaxed clock models have been proposed in the context of Bayesian dating. These models fall in two distinct categories: uncorrelated and autocorrelated across branches. The choice between these two classes of relaxed clocks is still an open question. More fundamentally, the true process of rate variation may have both long-term trends and short-term fluctuations, suggesting that more sophisticated clock models unfolding over multiple time scales should ultimately be developed. Here, a mixed relaxed clock model is introduced, which can be mechanistically interpreted as a rate variation process undergoing short-term fluctuations on the top of Brownian long-term trends. Statistically, this mixed clock represents an alternative solution to the problem of choosing between autocorrelated and uncorrelated relaxed clocks, by proposing instead to combine their respective merits. Fitting this model on a dataset of 105 placental mammals, using both node-dating and tip-dating approaches, suggests that the two pure clocks, Brownian and white noise, are rejected in favour of a mixed model with approximately equal contributions for its uncorrelated and autocorrelated components. The tip-dating analysis is particularly sensitive to the choice of the relaxed clock model. In this context, the classical pure Brownian relaxed clock appears to be overly rigid, leading to biases in divergence time estimation. By contrast, the use of a mixed clock leads to more recent and more reasonable estimates for the crown ages of placental orders and superorders. Altogether, the mixed clock introduced here represents a first step towards empirically more adequate models of the patterns of rate variation across phylogenetic trees. This article is part of the themed issue ‘Dating species divergences using rocks and clocks’.


2021 ◽  
Author(s):  
Jakob Raymaekers ◽  
Peter J. Rousseeuw

AbstractMany real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box–Cox and Yeo–Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.


Sign in / Sign up

Export Citation Format

Share Document