scholarly journals Relative Efficiencies of Simple and Complex Substitution Models in Estimating Divergence Times in Phylogenomics

2020 ◽  
Vol 37 (6) ◽  
pp. 1819-1831
Author(s):  
Qiqing Tao ◽  
Jose Barba-Montoya ◽  
Louise A Huuki ◽  
Mary Kathleen Durnan ◽  
Sudhir Kumar

Abstract The conventional wisdom in molecular evolution is to apply parameter-rich models of nucleotide and amino acid substitutions for estimating divergence times. However, the actual extent of the difference between time estimates produced by highly complex models compared with those from simple models is yet to be quantified for contemporary data sets that frequently contain sequences from many species and genes. In a reanalysis of many large multispecies alignments from diverse groups of taxa, we found that the use of the simplest models can produce divergence time estimates and credibility intervals similar to those obtained from the complex models applied in the original studies. This result is surprising because the use of simple models underestimates sequence divergence for all the data sets analyzed. We found three fundamental reasons for the observed robustness of time estimates to model complexity in many practical data sets. First, the estimates of branch lengths and node-to-tip distances under the simplest model show an approximately linear relationship with those produced by using the most complex models applied on data sets with many sequences. Second, relaxed clock methods automatically adjust rates on branches that experience considerable underestimation of sequence divergences, resulting in time estimates that are similar to those from complex models. And, third, the inclusion of even a few good calibrations in an analysis can reduce the difference in time estimates from simple and complex models. The robustness of time estimates to model complexity in these empirical data analyses is encouraging, because all phylogenomics studies use statistical models that are oversimplified descriptions of actual evolutionary substitution processes.

2020 ◽  
Author(s):  
Qiqing Tao ◽  
Jose Barba-Montoya ◽  
Louise A. Huuki ◽  
Mary Kathleen Durnan ◽  
Sudhir Kumar

AbstractThe conventional wisdom in molecular evolution is to apply parameter-rich models of nucleotide and amino acid substitutions for estimating divergence times. However, the actual extent of the difference between time estimates produced by highly complex models compared to those from simple models is yet to be quantified for contemporary datasets that frequently contain sequences from many species and genes. In a reanalysis of many large multispecies alignments from diverse groups of taxa using the same tree topologies and calibrations, we found that the use of the simplest models can produce divergence time estimates and credibility intervals similar to those obtained from the complex models applied in the original studies. This result is surprising because the use of simple models underestimates sequence divergence for all the datasets analyzed. We find three fundamental reasons for the observed robustness of time estimates to model complexity in many practical datasets. First, the estimates of branch lengths and node-to-tip distances under the simplest model show an approximately linear relationship with those produced by using the most complex models applied, especially for datasets with many sequences. Second, relaxed clock methods automatically adjust rates on branches that experience considerable underestimation of sequence divergences, resulting in time estimates that are similar to those from complex models. And, third, the inclusion of even a few good calibrations in an analysis can reduce the difference in time estimates from simple and complex models. The robustness of time estimates to models complexity in these empirical data analyses is encouraging, because all phylogenomics studies use statistical models that are oversimplified descriptions of actual evolutionary substitution processes.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i884-i894
Author(s):  
Jose Barba-Montoya ◽  
Qiqing Tao ◽  
Sudhir Kumar

Abstract Motivation As the number and diversity of species and genes grow in contemporary datasets, two common assumptions made in all molecular dating methods, namely the time-reversibility and stationarity of the substitution process, become untenable. No software tools for molecular dating allow researchers to relax these two assumptions in their data analyses. Frequently the same General Time Reversible (GTR) model across lineages along with a gamma (+Γ) distributed rates across sites is used in relaxed clock analyses, which assumes time-reversibility and stationarity of the substitution process. Many reports have quantified the impact of violations of these underlying assumptions on molecular phylogeny, but none have systematically analyzed their impact on divergence time estimates. Results We quantified the bias on time estimates that resulted from using the GTR + Γ model for the analysis of computer-simulated nucleotide sequence alignments that were evolved with non-stationary (NS) and non-reversible (NR) substitution models. We tested Bayesian and RelTime approaches that do not require a molecular clock for estimating divergence times. Divergence times obtained using a GTR + Γ model differed only slightly (∼3% on average) from the expected times for NR datasets, but the difference was larger for NS datasets (∼10% on average). The use of only a few calibrations reduced these biases considerably (∼5%). Confidence and credibility intervals from GTR + Γ analysis usually contained correct times. Therefore, the bias introduced by the use of the GTR + Γ model to analyze datasets, in which the time-reversibility and stationarity assumptions are violated, is likely not large and can be reduced by applying multiple calibrations. Availability and implementation All datasets are deposited in Figshare: https://doi.org/10.6084/m9.figshare.12594638.


2020 ◽  
Author(s):  
Jose Barba-Montoya ◽  
Qiqing Tao ◽  
Sudhir Kumar

AbstractMotivationAs the number and diversity of species and genes grow in contemporary datasets, two common assumptions made in all molecular dating methods, namely the time-reversibility and stationarity of the substitution process, become untenable. No software tools for molecular dating allow researchers to relax these two assumptions in their data analyses. Frequently the same General Time Reversible (GTR) model across lineages along with a gamma (+Γ) distributed rates across sites is used in relaxed clock analyses, which assumes time-reversibility and stationarity of the substitution process. Many reports have quantified the impact of violations of these underlying assumptions on molecular phylogeny, but none have systematically analyzed their impact on divergence time estimates.ResultsWe quantified the bias on time estimates that resulted from using the GTR+Γ model for the analysis of computer-simulated nucleotide sequence alignments that were evolved with non-stationary (NS) and non-reversible (NR) substitution models. We tested Bayesian and RelTime approaches that do not require a molecular clock for estimating divergence times. Divergence times obtained using a GTR+Γ model differed only slightly (∼3% on average) from the expected times for NR datasets, but the difference was larger for NS datasets (∼10% on average). The use of only a few calibrations reduced these biases considerably (∼5%). Confidence and credibility intervals from GTR+Γ analysis usually contained correct times. Therefore, the bias introduced by the use of the GTR+Γ model to analyze datasets, in which the time-reversibility and stationarity assumptions are violated, is likely not large and can be reduced by applying multiple calibrations.AvailabilityAll datasets are deposited in Figshare: https://doi.org/10.6084/[email protected]


2017 ◽  
Author(s):  
Simon Gunkel ◽  
Jes Rust ◽  
Torsten Wappler ◽  
Christoph Mayer ◽  
Oliver Niehuis ◽  
...  

AbstractThe application of molecular clock concepts in phylogenetics permits estimating the divergence times of clades with an incomplete fossil record. However, the reliability of this approach is disputed, because the resulting estimates are often inconsistent with different sets of fossils and other parameters (clock models and prior settings) in the analyses. Here, we present the λ statistic, a likelihood approach for a posteriori evaluating the reliability of estimated divergence times. The λ statistic is based on empirically derived fossilization rates and evaluates the fit of estimated divergence times to the fossil record. We tested the performance of this measure with simulated data sets. Furthermore, we applied it to the estimated divergence times of (i) Clavigeritae beetles of the family Staphylinidae and (ii) all extant insect orders. The reanalyzed beetle data supports the originally published results, but shows that several fossil calibrations used do not increase the reliability of the divergence time estimates. Analyses of estimated inter-ordinal insect divergences indicate that uniform priors with soft bounds marginally outperform log-normal priors on node ages. Furthermore, a posteriori evaluation of the original published analysis indicates that several inter-ordinal divergence estimates might be too young. The λ statistic allows the comparative evaluation of any clade divergence estimate derived from different calibration approaches. Consequently, the application of different algorithms, software tools, and calibration schemes can be empirically assessed.


2019 ◽  
Vol 99 (1) ◽  
pp. 105-367 ◽  
Author(s):  
Mao-Qiang He ◽  
Rui-Lin Zhao ◽  
Kevin D. Hyde ◽  
Dominik Begerow ◽  
Martin Kemler ◽  
...  

AbstractThe Basidiomycota constitutes a major phylum of the kingdom Fungi and is second in species numbers to the Ascomycota. The present work provides an overview of all validly published, currently used basidiomycete genera to date in a single document. An outline of all genera of Basidiomycota is provided, which includes 1928 currently used genera names, with 1263 synonyms, which are distributed in 241 families, 68 orders, 18 classes and four subphyla. We provide brief notes for each accepted genus including information on classification, number of accepted species, type species, life mode, habitat, distribution, and sequence information. Furthermore, three phylogenetic analyses with combined LSU, SSU, 5.8s, rpb1, rpb2, and ef1 datasets for the subphyla Agaricomycotina, Pucciniomycotina and Ustilaginomycotina are conducted, respectively. Divergence time estimates are provided to the family level with 632 species from 62 orders, 168 families and 605 genera. Our study indicates that the divergence times of the subphyla in Basidiomycota are 406–430 Mya, classes are 211–383 Mya, and orders are 99–323 Mya, which are largely consistent with previous studies. In this study, all phylogenetically supported families were dated, with the families of Agaricomycotina diverging from 27–178 Mya, Pucciniomycotina from 85–222 Mya, and Ustilaginomycotina from 79–177 Mya. Divergence times as additional criterion in ranking provide additional evidence to resolve taxonomic problems in the Basidiomycota taxonomic system, and also provide a better understanding of their phylogeny and evolution.


Life ◽  
2018 ◽  
Vol 8 (4) ◽  
pp. 49 ◽  
Author(s):  
Renata Capellão ◽  
Elisa Costa-Paiva ◽  
Carlos Schrago

Studies that measured mutation rates in human populations using pedigrees have reported values that differ significantly from rates estimated from the phylogenetic comparison of humans and chimpanzees. Consequently, exchanges between mutation rate values across different timescales lead to conflicting divergence time estimates. It has been argued that this variation of mutation rate estimates across hominoid evolution is in part caused by incorrect assignment of calibration information to the mean coalescent time among loci, instead of the true genetic isolation (speciation) time between humans and chimpanzees. In this study, we investigated the feasibility of estimating the human pedigree mutation rate using phylogenetic data from the genomes of great apes. We found that, when calibration information was correctly assigned to the human–chimpanzee speciation time (and not to the coalescent time), estimates of phylogenetic mutation rates were statistically equivalent to the estimates previously reported using studies of human pedigrees. We conclude that, within the range of biologically realistic ancestral generation times, part of the difference between whole-genome phylogenetic and pedigree mutation rates is due to inappropriate assignment of fossil calibration information to the mean coalescent time instead of the speciation time. Although our results focus on the human–chimpanzee divergence, our findings are general, and relevant to the inference of the timescale of the tree of life.


2018 ◽  
Author(s):  
Joëlle Barido-Sottani ◽  
Gabriel Aguirre-Fernández ◽  
Melanie Hopkins ◽  
Tanja Stadler ◽  
Rachel Warnock

AbstractFossil information is essential for estimating species divergence times, and can be integrated into Bayesian phylogenetic inference using the fossilized birth-death (FBD) process. An important aspect of palaeontological data is the uncertainty surrounding specimen ages, which can be handled in different ways during inference. The most common approach is to fix fossil ages to a point estimate within the known age interval. Alternatively, age uncertainty can be incorporated by using priors, and fossil ages are then directly sampled as part of the inference. This study presents a comparison of alternative approaches for handling fossil age uncertainty in analysis using the FBD process. Based on simulations, we find that fixing fossil ages to the midpoint or a random point drawn from within the stratigraphic age range leads to biases in divergence time estimates, while sampling fossil ages leads to estimates that are similar to inferences that employ the correct ages of fossils. Second, we show a comparison using an empirical dataset of extant and fossil cetaceans, which confirms that different methods of handling fossil age uncertainty lead to large differences in estimated node ages. Stratigraphic age uncertainty should thus not be ignored in divergence time estimation and instead should be incorporated explicitly.


2017 ◽  
Author(s):  
Caroline Parins-Fukuchi ◽  
Joseph W. Brown

AbstractRecently, approaches that estimate species divergence times using fossil taxa and models of morphological evolution have exploded in popularity. These methods incorporate diverse biological and geological information to inform posterior reconstructions, and have been applied to several high-profile clades to positive effect. However, there are important examples where morphological data are misleading, resulting in unrealistic age estimates. While several studies have demonstrated that these approaches can be robust and internally consistent, the causes and limitations of these patterns remain unclear. In this study, we dissect signal in Bayesian dating analyses of three mammalian clades. For two of the three examples, we find that morphological characters provide little information regarding divergence times as compared to geological range information, with posterior estimates largely recapitulating those recovered under the prior. However, in the cetacean dataset, we find that morphological data do appreciably inform posterior divergence time estimates. We supplement these empirical analyses with a set of simulations designed to explore the efficiency and limitations of binary and 3-state character data in reconstructing node ages. Our results demonstrate areas of both strength and weakness for morphological clock analyses, and help to outline conditions under which they perform best and, conversely, when they should be eschewed in favour of purely geological approaches.


2004 ◽  
Vol 359 (1450) ◽  
pp. 1485-1494 ◽  
Author(s):  
Susanne S. Renner

Melastomataceae sensu stricto (excluding Memecylaceae) comprise some 3000 species in the neotropics, 1000 in Asia, 240 in Africa, and 230 in Madagascar. Previous family–wide morphological and DNA analyses have shown that the Madagascan species belong to at least three unrelated lineages, which were hypothesized to have arrived by trans–oceanic dispersal. An alternative hypothesis posits that the ancestors of Madagascan, as well as Indian, Melastomataceae arrived from Africa in the Late Cretaceous. This study tests these hypotheses in a Bayesian framework, using three combined sequence datasets analysed under a relaxed clock and simultaneously calibrated with fossils, some not previously used. The new fossil calibration comes from a re–dated possibly Middle or Upper Eocene Brazilian fossil of Melastomeae. Tectonic events were also tentatively used as constraints because of concerns that some of the family's fossils are difficult to assign to nodes in the phylogeny. Regardless of how the data were calibrated, the estimated divergence times of Madagascan and Indian lineages were too young for Cretaceous explanations to hold. This was true even of the oldest ages within the 95% credibility interval around each estimate. Madagascar's Melastomeae appear to have arrived from Africa during the Miocene. Medinilla , with some 70 species in Madagascar and two in Africa, too, arrived during the Miocene, but from Asia. Gravesia , with 100 species in Madagascar and four in east and west Africa, also appears to date to the Miocene, but its monophyly has not been tested. The study afforded an opportunity to compare divergence time estimates obtained earlier with strict clocks and single calibrations, with estimates based on relaxed clocks and different multiple calibrations and taxon sampling.


2019 ◽  
Vol 286 (1902) ◽  
pp. 20190685 ◽  
Author(s):  
Joëlle Barido-Sottani ◽  
Gabriel Aguirre-Fernández ◽  
Melanie J. Hopkins ◽  
Tanja Stadler ◽  
Rachel Warnock

Fossil information is essential for estimating species divergence times, and can be integrated into Bayesian phylogenetic inference using the fossilized birth–death (FBD) process. An important aspect of palaeontological data is the uncertainty surrounding specimen ages, which can be handled in different ways during inference. The most common approach is to fix fossil ages to a point estimate within the known age interval. Alternatively, age uncertainty can be incorporated by using priors, and fossil ages are then directly sampled as part of the inference. This study presents a comparison of alternative approaches for handling fossil age uncertainty in analysis using the FBD process. Based on simulations, we find that fixing fossil ages to the midpoint or a random point drawn from within the stratigraphic age range leads to biases in divergence time estimates, while sampling fossil ages leads to estimates that are similar to inferences that employ the correct ages of fossils. Second, we show a comparison using an empirical dataset of extant and fossil cetaceans, which confirms that different methods of handling fossil age uncertainty lead to large differences in estimated node ages. Stratigraphic age uncertainty should thus not be ignored in divergence time estimation and instead should be incorporated explicitly.


Sign in / Sign up

Export Citation Format

Share Document