scholarly journals Calibration uncertainty in molecular dating analyses: there is no substitute for the prior evaluation of time priors

2015 ◽  
Vol 282 (1798) ◽  
pp. 20141013 ◽  
Author(s):  
Rachel C. M. Warnock ◽  
James F. Parham ◽  
Walter G. Joyce ◽  
Tyler R. Lyson ◽  
Philip C. J. Donoghue

Calibration is the rate-determining step in every molecular clock analysis and, hence, considerable effort has been expended in the development of approaches to distinguish good from bad calibrations. These can be categorized into a priori evaluation of the intrinsic fossil evidence, and a posteriori evaluation of congruence through cross-validation. We contrasted these competing approaches and explored the impact of different interpretations of the fossil evidence upon Bayesian divergence time estimation. The results demonstrate that a posteriori approaches can lead to the selection of erroneous calibrations. Bayesian posterior estimates are also shown to be extremely sensitive to the probabilistic interpretation of temporal constraints. Furthermore, the effective time priors implemented within an analysis differ for individual calibrations when employed alone and in differing combination with others. This compromises the implicit assumption of all calibration consistency methods, that the impact of an individual calibration is the same when used alone or in unison with others. Thus, the most effective means of establishing the quality of fossil-based calibrations is through a priori evaluation of the intrinsic palaeontological, stratigraphic, geochronological and phylogenetic data. However, effort expended in establishing calibrations will not be rewarded unless they are implemented faithfully in divergence time analyses.

2020 ◽  
Vol 12 (7) ◽  
pp. 1087-1098
Author(s):  
Alan J S Beavan ◽  
Philip C J Donoghue ◽  
Mark A Beaumont ◽  
Davide Pisani

Abstract Relaxed molecular clock methods allow the use of genomic data to estimate divergence times across the tree of life. This is most commonly achieved in Bayesian analyses where the molecular clock is calibrated a priori through the integration of fossil information. Alternatively, fossil calibrations can be used a posteriori, to transform previously estimated relative divergence times that were inferred without considering fossil information, into absolute divergence times. However, as branch length is the product of the rate of evolution and the duration in time of the considered branch, the extent to which a posteriori calibrated, relative divergence time methods can disambiguate time and rate, is unclear. Here, we use forward evolutionary simulations and compare a priori and a posteriori calibration strategies using different molecular clock methods and models. Specifically, we compare three Bayesian methods, the strict clock, uncorrelated clock and autocorrelated clock, and the non-Bayesian algorithm implemented in RelTime. We simulate phylogenies with multiple, independent substitution rate changes and show that correct timescales cannot be inferred without the use of calibrations. Under our simulation conditions, a posteriori calibration strategies almost invariably inferred incorrect rate changes and divergence times. The a priori integration of fossil calibrations is fundamental in these cases to improve the accuracy of the estimated divergence times. Relative divergence times and absolute timescales derived by calibrating relative timescales to geological time a posteriori appear to be less reliable than a priori calibrated, timescales.


PLoS ONE ◽  
2011 ◽  
Vol 6 (11) ◽  
pp. e27138 ◽  
Author(s):  
Sebastián Duchêne ◽  
Frederick I. Archer ◽  
Julia Vilstrup ◽  
Susana Caballero ◽  
Phillip A. Morin

2019 ◽  
Vol 69 (4) ◽  
pp. 660-670 ◽  
Author(s):  
Tom Carruthers ◽  
Michael J Sanderson ◽  
Robert W Scotland

Abstract Rate variation adds considerable complexity to divergence time estimation in molecular phylogenies. Here, we evaluate the impact of lineage-specific rates—which we define as among-branch-rate-variation that acts consistently across the entire genome. We compare its impact to residual rates—defined as among-branch-rate-variation that shows a different pattern of rate variation at each sampled locus, and gene-specific rates—defined as variation in the average rate across all branches at each sampled locus. We show that lineage-specific rates lead to erroneous divergence time estimates, regardless of how many loci are sampled. Further, we show that stronger lineage-specific rates lead to increasing error. This contrasts to residual rates and gene-specific rates, where sampling more loci significantly reduces error. If divergence times are inferred in a Bayesian framework, we highlight that error caused by lineage-specific rates significantly reduces the probability that the 95% highest posterior density includes the correct value, and leads to sensitivity to the prior. Use of a more complex rate prior—which has recently been proposed to model rate variation more accurately—does not affect these conclusions. Finally, we show that the scale of lineage-specific rates used in our simulation experiments is comparable to that of an empirical data set for the angiosperm genus Ipomoea. Taken together, our findings demonstrate that lineage-specific rates cause error in divergence time estimates, and that this error is not overcome by analyzing genomic scale multilocus data sets. [Divergence time estimation; error; rate variation.]


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
De Chen ◽  
Peter A. Hosner ◽  
Donna L. Dittmann ◽  
John P. O’Neill ◽  
Sharon M. Birks ◽  
...  

Abstract Background Divergence time estimation is fundamental to understanding many aspects of the evolution of organisms, such as character evolution, diversification, and biogeography. With the development of sequence technology, improved analytical methods, and knowledge of fossils for calibration, it is possible to obtain robust molecular dating results. However, while phylogenomic datasets show great promise in phylogenetic estimation, the best ways to leverage the large amounts of data for divergence time estimation has not been well explored. A potential solution is to focus on a subset of data for divergence time estimation, which can significantly reduce the computational burdens and avoid problems with data heterogeneity that may bias results. Results In this study, we obtained thousands of ultraconserved elements (UCEs) from 130 extant galliform taxa, including representatives of all genera, to determine the divergence times throughout galliform history. We tested the effects of different “gene shopping” schemes on divergence time estimation using a carefully, and previously validated, set of fossils. Our results found commonly used clock-like schemes may not be suitable for UCE dating (or other data types) where some loci have little information. We suggest use of partitioning (e.g., PartitionFinder) and selection of tree-like partitions may be good strategies to select a subset of data for divergence time estimation from UCEs. Our galliform time tree is largely consistent with other molecular clock studies of mitochondrial and nuclear loci. With our increased taxon sampling, a well-resolved topology, carefully vetted fossil calibrations, and suitable molecular dating methods, we obtained a high quality galliform time tree. Conclusions We provide a robust galliform backbone time tree that can be combined with more fossil records to further facilitate our understanding of the evolution of Galliformes and can be used as a resource for comparative and biogeographic studies in this group.


2011 ◽  
Vol 8 (1) ◽  
pp. 156-159 ◽  
Author(s):  
Rachel C. M. Warnock ◽  
Ziheng Yang ◽  
Philip C. J. Donoghue

Calibration is a critical step in every molecular clock analysis but it has been the least considered. Bayesian approaches to divergence time estimation make it possible to incorporate the uncertainty in the degree to which fossil evidence approximates the true time of divergence. We explored the impact of different approaches in expressing this relationship, using arthropod phylogeny as an example for which we established novel calibrations. We demonstrate that the parameters distinguishing calibration densities have a major impact upon the prior and posterior of the divergence times, and it is critically important that users evaluate the joint prior distribution of divergence times used by their dating programmes. We illustrate a procedure for deriving calibration densities in Bayesian divergence dating through the use of soft maximum constraints.


2004 ◽  
Vol 359 (1450) ◽  
pp. 1477-1483 ◽  
Author(s):  
Thomas J. Near ◽  
Michael J. Sanderson

Estimates of species divergence times using DNA sequence data are playing an increasingly important role in studies of evolution, ecology and biogeography. Most work has centred on obtaining appropriate kinds of data and developing optimal estimation procedures, whereas somewhat less attention has focused on the calibration of divergences using fossils. Case studies with multiple fossil calibration points provide important opportunities to examine the divergence time estimation problem in new ways. We discuss two cross–validation procedures that address different aspects of inference in divergence time estimation. ‘Fossil cross–validation’ is a procedure used to identify the impact of different individual calibrations on overall estimation. This can identify fossils that have an exceptionally large error effect and may warrant further scrutiny. ‘Fossil–based model cross–validation’ is an entirely different procedure that uses fossils to identify the optimal model of molecular evolution in the context of rate smoothing or other inference methods. Both procedures were applied to two recent studies: an analysis of monocot angiosperms with eight fossil calibrations and an analysis of placental mammals with nine fossil calibrations. In each case, fossil calibrations could be ranked from most to least influential, and in one of the two studies, the fossils provided decisive evidence about the optimal molecular evolutionary model.


2017 ◽  
Author(s):  
Joseph W. Brown ◽  
Stephen A. Smith

AbstractDivergence time estimation — the calibration of a phylogeny to geological time — is an integral first step in modelling the tempo of biological evolution (traits and lineages). However, despite increasingly sophisticated methods to infer divergence times from molecular genetic sequences, the estimated age of many nodes across the tree of life contrast significantly and consistently with timeframes conveyed by the fossil record. This is perhaps best exemplified by crown angiosperms, where molecular clock (Triassic) estimates predate the oldest (Early Cretaceous) undisputed angiosperm fossils by tens of millions of years or more. While the incompleteness of the fossil record is a common concern, issues of data limitation and model inadequacy are viable (if underexplored) alternative explanations. In this vein, Beaulieu et al. (2015) convincingly demonstrated how methods of divergence time inference can be misled by both (i) extreme state-dependent molecular substitution rate heterogeneity and (ii) biased sampling of representative major lineages. These results demonstrate the impact of (potentially common) model violations. Here, we suggest another potential challenge: that the configuration of the statistical inference problem (i.e., the parameters, their relationships, and associated priors) alone may preclude the reconstruction of the paleontological timeframe for the crown age of angiosperms. We demonstrate, through sampling from the joint prior (formed by combining the tree (diversification) prior with the calibration densities specified for fossil-calibrated nodes) that with no data present at all, that, an Early Cretaceous crown angiosperms is rejected (i.e., has essentially zero probability). More worrisome, however, is that, for the 24 nodes calibrated by fossils, almost all have indistinguishable marginal prior and posterior age distributions when employing routine lognormal fossil calibration priors. These results indicate that there is inadequate information in the data to overrule the joint prior. Given that these calibrated nodes are strategically placed in disparate regions of the tree, they act to anchor the tree scaffold, and so the posterior inference for the tree as a whole is largely determined by the pseudo-data present in the (often arbitrary) calibration densities. We recommend, as for any Bayesian analysis, that marginal prior and posterior distributions be carefully compared to determine whether signal is coming from the data or prior belief, especially for parameters of direct interest. This recommendation is not novel. However, given how rarely such checks are carried out in evolutionary biology, it bears repeating. Our results demonstrate the fundamental importance of prior/posterior comparisons in any Bayesian analysis, and we hope that they further encourage both researchers and journals to consistently adopt, this crucial step as standard practice. Finally, we note that the results presented here do not refute the biological modelling concerns identified by Beaulieu et al. (2015). Both sets of issues remain apposite to the goals of accurate divergence time estimation, and only by considering them in tandem can we move forward more confidently. [marginal priors; information content; diptych; divergence time estimation; fossil record; BEAST; angiosperms.]


2021 ◽  
Author(s):  
Sebastian Hoehna ◽  
Sarah E Lower ◽  
Pablo Duchen ◽  
Ana Catalan

Fireflies (Coleoptera: Lampyridae) consist of over 2,000 described extant species. A well-resolved phylogeny of fireflies is important for the study of their bioluminescence, evolution, and conservation. We used a recently published anchored hybrid enrichment dataset (AHE; 436 loci for 88 Lampyridae species and 10 outgroup species) and state-of-the-art statistical methods (the fossilized birth-death-range process implemented in a Bayesian framework) to estimate a time-calibrated phylogeny of Lampyridae. Unfortunately, estimating calibrated phylogenies using AHE and the latest and most robust time-calibration strategies is not possible because of computational constraints. As a solution, we subset the full dataset and applied three different strategies: using the most complete loci, the most homogeneous loci, and the loci with the highest accuracy to infer the well established Photinus clade. The estimated topology using the three data subsets agreed on almost all major clades and only showed minor discordance with less supported nodes. The estimated divergence times overlapped for all nodes that are shared between the topologies. Thus, divergence time estimation is robust as long as the topology inference is robust and any well selected data subset suffices. Additionally, we observed an unexpected amount of gene tree discordance between the 436 AHE loci. Our assessment of model adequacy showed that standard phylogenetic substitution models are not adequate for any of the 436 AHE loci which is likely to bias phylogenetic inferences. We performed a simulation study to explore the impact of (a) incomplete lineage sorting, (b) uniformly distributed and systematic missing data, and (c) systematic bias in the position of highly variable and conserved sites. For our simulated data, we observed less gene tree variation and hence the empirically observed amount of gene tree discordance for the AHE dataset is unexpected.


AoB Plants ◽  
2021 ◽  
Author(s):  
Min-Jie Li ◽  
Huan-Xi Yu ◽  
Xian-Lin Guo ◽  
Xing-Jin He

Abstract The disjunctive distribution (Europe-Caucasus-Asia) and species diversification across Eurasia for the genus Allium sect. Daghestanica has fascinating attractions for researchers aiming to understanding the development and history of the modern Eurasia flora. However, no any studies have been carried out to address the evolutionary history of this section. Based on the nrITS and cpDNA fragments (trnL-trnF and rpl32-trnL), the evolutionary history of the third evolutionary line (EL3) of the genus Allium was reconstructed and we further elucidate the evolutionary line of sect. Daghestanica under this background. Our molecular phylogeny recovered two highly supported clades in sect. Daghestanica: the Clade I includes Caucasian-European species and Asian A. maowenense, A. xinlongense and A. carolinianum collected in Qinghai; the Clade II comprises Asian yellowish tepal species, A. chrysanthum, A. chrysocephalum, A. herderianum, A. rude and A. xichuanense. The divergence time estimation and biogeography inference indicated that Asian ancestor located in the QTP and the adjacent region could have migrated to Caucasus and Europe distributions around the Late Miocene and resulted in further divergence and speciation; Asian ancestor underwent the rapid radiation in the QTP and the adjacent region most likely due to the heterogeneous ecology of the QTP resulted from the orogeneses around 4–3 Mya. Our study provides a picture to understand the origin and species diversification across Eurasia for sect. Daghestanica.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i884-i894
Author(s):  
Jose Barba-Montoya ◽  
Qiqing Tao ◽  
Sudhir Kumar

Abstract Motivation As the number and diversity of species and genes grow in contemporary datasets, two common assumptions made in all molecular dating methods, namely the time-reversibility and stationarity of the substitution process, become untenable. No software tools for molecular dating allow researchers to relax these two assumptions in their data analyses. Frequently the same General Time Reversible (GTR) model across lineages along with a gamma (+Γ) distributed rates across sites is used in relaxed clock analyses, which assumes time-reversibility and stationarity of the substitution process. Many reports have quantified the impact of violations of these underlying assumptions on molecular phylogeny, but none have systematically analyzed their impact on divergence time estimates. Results We quantified the bias on time estimates that resulted from using the GTR + Γ model for the analysis of computer-simulated nucleotide sequence alignments that were evolved with non-stationary (NS) and non-reversible (NR) substitution models. We tested Bayesian and RelTime approaches that do not require a molecular clock for estimating divergence times. Divergence times obtained using a GTR + Γ model differed only slightly (∼3% on average) from the expected times for NR datasets, but the difference was larger for NS datasets (∼10% on average). The use of only a few calibrations reduced these biases considerably (∼5%). Confidence and credibility intervals from GTR + Γ analysis usually contained correct times. Therefore, the bias introduced by the use of the GTR + Γ model to analyze datasets, in which the time-reversibility and stationarity assumptions are violated, is likely not large and can be reduced by applying multiple calibrations. Availability and implementation All datasets are deposited in Figshare: https://doi.org/10.6084/m9.figshare.12594638.


Sign in / Sign up

Export Citation Format

Share Document