scholarly journals Assessing the quality of molecular divergence time estimates by fossil calibrations and fossil–based model selection

2004 ◽  
Vol 359 (1450) ◽  
pp. 1477-1483 ◽  
Author(s):  
Thomas J. Near ◽  
Michael J. Sanderson

Estimates of species divergence times using DNA sequence data are playing an increasingly important role in studies of evolution, ecology and biogeography. Most work has centred on obtaining appropriate kinds of data and developing optimal estimation procedures, whereas somewhat less attention has focused on the calibration of divergences using fossils. Case studies with multiple fossil calibration points provide important opportunities to examine the divergence time estimation problem in new ways. We discuss two cross–validation procedures that address different aspects of inference in divergence time estimation. ‘Fossil cross–validation’ is a procedure used to identify the impact of different individual calibrations on overall estimation. This can identify fossils that have an exceptionally large error effect and may warrant further scrutiny. ‘Fossil–based model cross–validation’ is an entirely different procedure that uses fossils to identify the optimal model of molecular evolution in the context of rate smoothing or other inference methods. Both procedures were applied to two recent studies: an analysis of monocot angiosperms with eight fossil calibrations and an analysis of placental mammals with nine fossil calibrations. In each case, fossil calibrations could be ranked from most to least influential, and in one of the two studies, the fossils provided decisive evidence about the optimal molecular evolutionary model.

2017 ◽  
Author(s):  
Mario dos Reis ◽  
Gregg F. Gunnell ◽  
José Barba-Montoya ◽  
Alex Wilkins ◽  
Ziheng Yang ◽  
...  

AbstractPrimates have long been a test case for the development of phylogenetic methods for divergence time estimation. Despite a large number of studies, however, the timing of origination of crown Primates relative to the K-Pg boundary and the timing of diversification of the main crown groups remain controversial. Here we analysed a dataset of 372 taxa (367 Primates and 5 outgroups, 61 thousand base pairs) that includes nine complete primate genomes (3.4 million base pairs). We systematically explore the effect of different interpretations of fossil calibrations and molecular clock models on primate divergence time estimates. We find that even small differences in the construction of fossil calibrations can have a noticeable impact on estimated divergence times, especially for the oldest nodes in the tree. Notably, choice of molecular rate model (auto-correlated or independently distributed rates) has an especially strong effect on estimated times, with the independent rates model producing considerably more ancient estimates for the deeper nodes in the phylogeny. We implement thermodynamic integration, combined with Gaussian quadrature, in the program MCMCTree, and use it to calculate Bayes factors for clock models. Bayesian model selection indicates that the auto-correlated rates model fits the primate data substantially better, and we conclude that time estimates under this model should be preferred. We show that for eight core nodes in the phylogeny, uncertainty in time estimates is close to the theoretical limit imposed by fossil uncertainties. Thus, these estimates are unlikely to be improved by collecting additional molecular sequence data. All analyses place the origin of Primates close to the K-Pg boundary, either in the Cretaceous or straddling the boundary into the Palaeogene.


2019 ◽  
Vol 69 (4) ◽  
pp. 660-670 ◽  
Author(s):  
Tom Carruthers ◽  
Michael J Sanderson ◽  
Robert W Scotland

Abstract Rate variation adds considerable complexity to divergence time estimation in molecular phylogenies. Here, we evaluate the impact of lineage-specific rates—which we define as among-branch-rate-variation that acts consistently across the entire genome. We compare its impact to residual rates—defined as among-branch-rate-variation that shows a different pattern of rate variation at each sampled locus, and gene-specific rates—defined as variation in the average rate across all branches at each sampled locus. We show that lineage-specific rates lead to erroneous divergence time estimates, regardless of how many loci are sampled. Further, we show that stronger lineage-specific rates lead to increasing error. This contrasts to residual rates and gene-specific rates, where sampling more loci significantly reduces error. If divergence times are inferred in a Bayesian framework, we highlight that error caused by lineage-specific rates significantly reduces the probability that the 95% highest posterior density includes the correct value, and leads to sensitivity to the prior. Use of a more complex rate prior—which has recently been proposed to model rate variation more accurately—does not affect these conclusions. Finally, we show that the scale of lineage-specific rates used in our simulation experiments is comparable to that of an empirical data set for the angiosperm genus Ipomoea. Taken together, our findings demonstrate that lineage-specific rates cause error in divergence time estimates, and that this error is not overcome by analyzing genomic scale multilocus data sets. [Divergence time estimation; error; rate variation.]


2017 ◽  
Author(s):  
Joseph W. Brown ◽  
Stephen A. Smith

AbstractDivergence time estimation — the calibration of a phylogeny to geological time — is an integral first step in modelling the tempo of biological evolution (traits and lineages). However, despite increasingly sophisticated methods to infer divergence times from molecular genetic sequences, the estimated age of many nodes across the tree of life contrast significantly and consistently with timeframes conveyed by the fossil record. This is perhaps best exemplified by crown angiosperms, where molecular clock (Triassic) estimates predate the oldest (Early Cretaceous) undisputed angiosperm fossils by tens of millions of years or more. While the incompleteness of the fossil record is a common concern, issues of data limitation and model inadequacy are viable (if underexplored) alternative explanations. In this vein, Beaulieu et al. (2015) convincingly demonstrated how methods of divergence time inference can be misled by both (i) extreme state-dependent molecular substitution rate heterogeneity and (ii) biased sampling of representative major lineages. These results demonstrate the impact of (potentially common) model violations. Here, we suggest another potential challenge: that the configuration of the statistical inference problem (i.e., the parameters, their relationships, and associated priors) alone may preclude the reconstruction of the paleontological timeframe for the crown age of angiosperms. We demonstrate, through sampling from the joint prior (formed by combining the tree (diversification) prior with the calibration densities specified for fossil-calibrated nodes) that with no data present at all, that, an Early Cretaceous crown angiosperms is rejected (i.e., has essentially zero probability). More worrisome, however, is that, for the 24 nodes calibrated by fossils, almost all have indistinguishable marginal prior and posterior age distributions when employing routine lognormal fossil calibration priors. These results indicate that there is inadequate information in the data to overrule the joint prior. Given that these calibrated nodes are strategically placed in disparate regions of the tree, they act to anchor the tree scaffold, and so the posterior inference for the tree as a whole is largely determined by the pseudo-data present in the (often arbitrary) calibration densities. We recommend, as for any Bayesian analysis, that marginal prior and posterior distributions be carefully compared to determine whether signal is coming from the data or prior belief, especially for parameters of direct interest. This recommendation is not novel. However, given how rarely such checks are carried out in evolutionary biology, it bears repeating. Our results demonstrate the fundamental importance of prior/posterior comparisons in any Bayesian analysis, and we hope that they further encourage both researchers and journals to consistently adopt, this crucial step as standard practice. Finally, we note that the results presented here do not refute the biological modelling concerns identified by Beaulieu et al. (2015). Both sets of issues remain apposite to the goals of accurate divergence time estimation, and only by considering them in tandem can we move forward more confidently. [marginal priors; information content; diptych; divergence time estimation; fossil record; BEAST; angiosperms.]


2020 ◽  
Author(s):  
Tom Carruthers ◽  
Robert W Scotland

Abstract Understanding and representing uncertainty is crucial in academic research, because it enables studies to build on the conclusions of previous studies, leading to robust advances in a particular field. Here, we evaluate the nature of uncertainty and the manner by which it is represented in divergence time estimation, a field that is fundamental to many aspects of macroevolutionary research, and where there is evidence that uncertainty has been seriously underestimated. We address this issue in the context of methods used in divergence time estimation, and with respect to the manner by which time-calibrated phylogenies are interpreted. With respect to methods, we discuss how the assumptions underlying different methods may not adequately reflect uncertainty about molecular evolution, the fossil record, or diversification rates. Therefore, divergence time estimates may not adequately reflect uncertainty, and may be directly contradicted by subsequent findings. For the interpretation of time-calibrated phylogenies, we discuss how the use of time-calibrated phylogenies for reconstructing general evolutionary timescales leads to inferences about macroevolution that are highly sensitive to methodological limitations in how uncertainty is accounted for. By contrast, we discuss how the use of time-calibrated phylogenies to test specific hypotheses leads to inferences about macroevolution that are less sensitive to methodological limitations. Given that many biologists wish to use time-calibrated phylogenies to reconstruct general evolutionary timescales, we conclude that the development of methods of divergence time estimation that adequately account for uncertainty is necessary.


2020 ◽  
Vol 55 (4) ◽  
pp. 520-546
Author(s):  
Chengcai Si ◽  
Keke Chen ◽  
Ruisong Tao ◽  
Chengyong Su ◽  
Junye Ma ◽  
...  

Abstract Parnassius (Lepidoptera: Papilionidae) is a genus of attractive butterflies mainly distributed in the mountainous areas of Central Asia, the Himalayas, and western China. In this study, we used the internal transcribed spacer (ITS1 and ITS2) sequence data as DNA barcodes to characterize the genetic differentiation and conduct the phylogenetic analysis and divergence time estimation of the 17 Parnassius species collected in China. Species identification and genetic differentiation analysis suggest that the ITS barcode is an effective marker for Parnassius species identification; additionally, a relatively high level of genetic diversity and low level of gene flow were detected in the five Parnassius species with diverse geographic populations. Phylogenetic analysis indicates that the 17 species studied were clustered in six clades (subgenera), with subgenus Parnassius at the basal position in the phylogenetic trees. Bayesian divergence time estimation shows that the genus originated about 18 million years ago during the early Miocene, correlated with orogenic events in the distribution region, probably southwestern China about 20–10 million years ago. Our estimated phylochronology also suggests that the Parnassius interspecific and intraspecific divergences were probably related with the rapid rising of the Qinghai-Tibet Plateau, the Tibet Movement, the Kunlun-Yellow River Tectonic Movement, and global cooling associated with intensified glaciation in the region during the Quaternary Period.


PLoS ONE ◽  
2011 ◽  
Vol 6 (11) ◽  
pp. e27138 ◽  
Author(s):  
Sebastián Duchêne ◽  
Frederick I. Archer ◽  
Julia Vilstrup ◽  
Susana Caballero ◽  
Phillip A. Morin

2020 ◽  
Author(s):  
Julian F. Quintero-Galvis ◽  
Pablo Saenz-Agudelo ◽  
Juan L. Celis-Diez ◽  
Guillermo C. Amico ◽  
Soledad Vazquez ◽  
...  

AbstractAimSeveral geological events affecting Southern South America during the middle Miocene climatic optimum acted as important drivers of diversification to the biota. This is the case of Microbiotheria, for which Dromiciops is considered the sole surviving lineage, the sister group of Eomarsupialia (Australian marsupials). Three main Dromiciops genetic lineages are known, whose divergence was initially attributed to recent Pleistocene glaciations. Using fossil-calibrated dating on nuclear and mitochondrial genes, here we reevaluate this hypothesis and report an older (Miocenic) biogeographic history for the genus.LocationSouthern South America.MethodsPhylogenetic reconstruction using sequences from two mitochondrial DNA and four nuclear DNA genes in 159 specimens, from 31 sites across Chile and Argentina. Divergence time estimation using fossil calibration.ResultsOur phylogenetic analysis resolved four well supported clades with discrete geographic distributions. The oldest and most differentiated clade corresponds to that of the northern distribution (35.2°S to 39.3°S), which would be a different species (D. bozinovici, sensu D’elia et al. 2016). According to our estimations, this species shared a common ancestor with D. gliroides (southern clades) about 13 million years ago (95% CI: 6.4-25.3). The southern clades (39.6°S to 42.0°S), showed a divergence time ranging from 9.57 to 6.5 Mya. Strong genetic structure was detected from north to south but not across the Andes, or between Chiloé island/ mainland. Demographic equilibrium is inferred to the northern clade, and recent demographic expansions was detected in the central and southern clades.Main conclusionsThe whole diversification of Dromiciops occurred within the Miocene, being the Middle Miocene transgression (MMT), the massive marine flooding that covered several lowlands of the western face of los Andes between 38-48° S, the most likely diversifying force. This was the result of an increase in global sea levels due to the Miocene climatic optimum, which shaped the biogeographic origin of several species, including Nothofagus forests, the habitat main of Dromiciops.


2016 ◽  
Author(s):  
Michael Matschiner ◽  
Zuzana Musilová ◽  
Julia M I Barth ◽  
Zuzana Starostová ◽  
Walter Salzburger ◽  
...  

Divergence-time estimation based on molecular phylogenies and the fossil record has provided insights into fundamental questions of evolutionary biology. In Bayesian node dating, phylogenies are commonly time calibrated through the specification of calibration densities on nodes representing clades with known fossil occurrences. Unfortunately, the optimal shape of these calibration densities is usually unknown and they are therefore often chosen arbitrarily, which directly impacts the reliability of the resulting age estimates. As possible solutions to this problem, two non-exclusive alternative approaches have recently been developed, the "fossilized birth-death" model and "total-evidence dating". While these approaches have been shown to perform well under certain conditions, they require including all (or a random subset) of the fossils of each clade in the analysis, rather than just relying on the oldest fossils of clades. In addition, both approaches assume that fossil records of different clades in the phylogeny are all the product of the same underlying fossil sampling rate, even though this rate has been shown to differ strongly between higher-level taxa. We here develop a flexible new approach to Bayesian node dating that combines advantages of traditional node dating and the fossilized birth-death model. In our new approach, calibration densities are defined on the basis of first fossil occurrences and sampling rate estimates that can be specified separately for all clades. We verify our approach with a large number of simulated datasets, and compare its performance to that of the fossilized birth death model. We find that our approach produces reliable age estimates that are robust to model violation, on par with the fossilized birth-death model. By applying our approach to a large dataset including sequence data from over 1000 species of teleost fishes as well as 147 carefully selected fossil constraints, we recover a timeline of teleost diversification that is incompatible with previously assumed vicariant divergences of freshwater fishes. Our results instead provide strong evidence for trans-oceanic dispersal of cichlids and other groups of teleost fishes.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Stephanie J. Spielman ◽  
Molly L. Miraglia

Abstract Background Multiple sequence alignments (MSAs) represent the fundamental unit of data inputted to most comparative sequence analyses. In phylogenetic analyses in particular, errors in MSA construction have the potential to induce further errors in downstream analyses such as phylogenetic reconstruction itself, ancestral state reconstruction, and divergence time estimation. In addition to providing phylogenetic methods with an MSA to analyze, researchers must also specify a suitable evolutionary model for the given analysis. Most commonly, researchers apply relative model selection to select a model from candidate set and then provide both the MSA and the selected model as input to subsequent analyses. While the influence of MSA errors has been explored for most stages of phylogenetics pipelines, the potential effects of MSA uncertainty on the relative model selection procedure itself have not been explored. Results We assessed the consistency of relative model selection when presented with multiple perturbed versions of a given MSA. We find that while relative model selection is mostly robust to MSA uncertainty, in a substantial proportion of circumstances, relative model selection identifies distinct best-fitting models from different MSAs created from the same set of sequences. We find that this issue is more pervasive for nucleotide data compared to amino-acid data. However, we also find that it is challenging to predict whether relative model selection will be robust or sensitive to uncertainty in a given MSA. Conclusions We find that that MSA uncertainty can affect virtually all steps of phylogenetic analysis pipelines to a greater extent than has previously been recognized, including relative model selection.


2018 ◽  
Author(s):  
Joëlle Barido-Sottani ◽  
Gabriel Aguirre-Fernández ◽  
Melanie Hopkins ◽  
Tanja Stadler ◽  
Rachel Warnock

AbstractFossil information is essential for estimating species divergence times, and can be integrated into Bayesian phylogenetic inference using the fossilized birth-death (FBD) process. An important aspect of palaeontological data is the uncertainty surrounding specimen ages, which can be handled in different ways during inference. The most common approach is to fix fossil ages to a point estimate within the known age interval. Alternatively, age uncertainty can be incorporated by using priors, and fossil ages are then directly sampled as part of the inference. This study presents a comparison of alternative approaches for handling fossil age uncertainty in analysis using the FBD process. Based on simulations, we find that fixing fossil ages to the midpoint or a random point drawn from within the stratigraphic age range leads to biases in divergence time estimates, while sampling fossil ages leads to estimates that are similar to inferences that employ the correct ages of fossils. Second, we show a comparison using an empirical dataset of extant and fossil cetaceans, which confirms that different methods of handling fossil age uncertainty lead to large differences in estimated node ages. Stratigraphic age uncertainty should thus not be ignored in divergence time estimation and instead should be incorporated explicitly.


Sign in / Sign up

Export Citation Format

Share Document