scholarly journals The choice of tree prior and molecular clock does not substantially affect phylogenetic inferences of diversification rates

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6334 ◽  
Author(s):  
Brice A.J. Sarver ◽  
Matthew W. Pennell ◽  
Joseph W. Brown ◽  
Sara Keeble ◽  
Kayla M. Hardwick ◽  
...  

Comparative methods allow researchers to make inferences about evolutionary processes and patterns from phylogenetic trees. In Bayesian phylogenetics, estimating a phylogeny requires specifying priors on parameters characterizing the branching process and rates of substitution among lineages, in addition to others. Accordingly, characterizing the effect of prior selection on phylogenies is an active area of research. The choice of priors may systematically bias phylogenetic reconstruction and, subsequently, affect conclusions drawn from the resulting phylogeny. Here, we focus on the impact of priors in Bayesian phylogenetic inference and evaluate how they affect the estimation of parameters in macroevolutionary models of lineage diversification. Specifically, we simulate trees under combinations of tree priors and molecular clocks, simulate sequence data, estimate trees, and estimate diversification parameters (e.g., speciation and extinction rates) from these trees. When substitution rate heterogeneity is large, diversification rate estimates deviate substantially from those estimated under the simulation conditions when not captured by an appropriate choice of relaxed molecular clock. However, in general, we find that the choice of tree prior and molecular clock has relatively little impact on the estimation of diversification rates insofar as the sequence data are sufficiently informative and substitution rate heterogeneity among lineages is low-to-moderate.

2018 ◽  
Author(s):  
Brice A. J. Sarver ◽  
Matthew W. Pennell ◽  
Joseph W. Brown ◽  
Sara Keeble ◽  
Kayla M. Hardwick ◽  
...  

AbstractComparative methods allow researchers to make inferences about evolutionary processes and patterns from phylogenetic trees. In Bayesian phylogenetics, estimating a phylogeny requires specifying priors on parameters characterizing the branching process and rates of substitution among lineages, in addition to others. However, the effect that the selection of these priors has on the inference of comparative parameters has not been thoroughly investigated. Such uncertainty may systematically bias phylogenetic reconstruction and, subsequently, parameter estimation. Here, we focus on the impact of priors in Bayesian phylogenetic inference and evaluate how they affect the estimation of parameters in macroevolutionary models of lineage diversification. Specifically, we use BEAST to simulate trees under combinations of tree priors and molecular clocks, simulate sequence data, estimate trees, and estimate diversification parameters (e.g., speciation rates and extinction rates) from these trees. When substitution rate heterogeneity is large, parameter estimates deviate substantially from those estimated under the simulation conditions when not captured by an appropriate choice of relaxed molecular clock. However, in general, we find that the choice of tree prior and molecular clock has relatively little impact on the estimation of diversification rates insofar as the sequence data are sufficiently informative and substitution rate heterogeneity among lineages is low-to-moderate.


2016 ◽  
Vol 371 (1699) ◽  
pp. 20160098 ◽  
Author(s):  
Kenneth De Baets ◽  
Alexandre Antonelli ◽  
Philip C. J. Donoghue

Evolutionary timescales have mainly used fossils for calibrating molecular clocks, though fossils only really provide minimum clade age constraints. In their place, phylogenetic trees can be calibrated by precisely dated geological events that have shaped biogeography. However, tectonic episodes are protracted, their role in vicariance is rarely justified, the biogeography of living clades and their antecedents may differ, and the impact of such events is contingent on ecology. Biogeographic calibrations are no panacea for the shortcomings of fossil calibrations, but their associated uncertainties can be accommodated. We provide examples of how biogeographic calibrations based on geological data can be established for the fragmentation of the Pangaean supercontinent: (i) for the uplift of the Isthmus of Panama, (ii) the separation of New Zealand from Gondwana, and (iii) for the opening of the Atlantic Ocean. Biogeographic and fossil calibrations are complementary, not competing, approaches to constraining molecular clock analyses, providing alternative constraints on the age of clades that are vital to avoiding circularity in investigating the role of biogeographic mechanisms in shaping modern biodiversity. This article is part of the themed issue ‘Dating species divergences using rocks and clocks’.


2021 ◽  
Vol 12 ◽  
Author(s):  
Na Su ◽  
Bin-bin Liu ◽  
Jun-ru Wang ◽  
Ru-chang Tong ◽  
Chen Ren ◽  
...  

The recognition, identification, and differentiation of closely related plant species present significant and notorious challenges to taxonomists. The Maddenia group of Prunus, which comprises four to seven species, is an example of a group in which species delimitation and phylogenetic reconstruction have been difficult, due to the lack of clear morphological distinctions, limited sampling, and low informativeness of molecular evidence. Thus, the precise number of species in the group and the relationships among them remain unclear. Here, we used genome skimming to generate the DNA sequence data for 22 samples, including 17 Maddenia individuals and five outgroups in Amygdaloideae of Rosaceae, from which we assembled the plastome and 446 single-copy nuclear (SCN) genes for each sample. The phylogenetic relationships of the Maddenia group were then reconstructed using both concatenated and coalescent-based methods. We also identified eight highly variable regions and detected simple sequence repeats (SSRs) and repeat sequences in the Maddenia species plastomes. The phylogenetic analysis based on the complete plastomes strongly supported three main subclades in the Maddenia group of Prunus, while five subclades were recognized based on the nuclear tree. The phylogenetic network analysis detected six hybridization events. Integrating the nuclear and morphological evidence, we proposed to recognize five species within the Maddenia group, i.e., Prunus fujianensis, P. himalayana, P. gongshanensis, P. hypoleuca, and P. hypoxantha. Within this group, the first three species are well-supported, while the gene flow occurring throughout the Maddenia group seems to be especially frequent between P. hypoleuca and P. hypoxantha, eroding the barrier between them. The phylogenetic trees based on eight concatenated hypervariable regions had a similar topology with the complete plastomes, showing their potential as molecular markers and effective barcodes for further phylogeographic studies on Maddenia.


Author(s):  
Juan Alfredo Holley ◽  
Néstor Guillermo Basso ◽  
Juliana Sterli

Background. The clade Chelidae (Testudines, Pleurodira) is a group of fresh water turtles with representatives in Australasia and South America. Its diversity of extant and fossil species is characterized by two recognized morphotypes: the long-necked and the short-necked chelids. So far, the phylogenies constructed over Chelidae differ depending on the information source. While morphology recovers one monophyletic group of long-necked chelids (with South American and Australasian species), the molecular data split the group into South American and Australasian chelids, both as monophyletic sister groups and containing long-necked species. The consequences of this conflict imply the emergence of long-necked chelids (i) one time before the final breakup of Southern Gondwana (≅ 35 Mya) or (ii) independently after this event. Methods. Using BEAST, a set of molecular clock analyses was performed. Seven of these analyses correspond to the molecular hypothesis and thirteen to the morphological hypothesis. So, ten fossils were used as calibration points in different combinations for each hypothesis. The results were statistically compared performing ANOVA and the global similarity was inspected by a hierarchical cluster analysis (HCA). Results. Molecular hypothesis: all the analyses produced an age of the origin of Chelidae, and rising of long neck, older than 35 Mys. Divergence times in the South American clade were generally older than the observed in the Australasian clade. The result of the HCA was: analyses 2, 4 and 5 form a group and the analyses 3, 6 and 7 form another group; the analysis 1 is close related to this last. Morphological hypothesis: the origin of the clade of long-necked chelids predated the 35 Mys in all the analyses except one; however the Chelodina group resulted younger than this age in all the analyses. The HCA yielded two main groups of molecular clock analyses (1, 3, 7, 8, 9, 13 and 2, 4, 6, 10, 11, 12) and one analysis (5) clearly separated of these two. The ANOVA resulted in significant differences for all estimated nodes in both phylogenetic hypotheses. Discussion. Our set of molecular clock analyses suggests an early diversification of the chelid turtles and the raising of the long-necked chelids before the final breakup of Southern Gondwana. However, the appearance of this trait one time or as evolutionary convergence still depends on which phylogenetic scenario is taken into account. Furthermore, our results indicate that the number of calibration points not necessarily improve the precision of estimated nodes. Instead the “quality” of the fossils used as calibrations and its position in the phylogeny, have appreciable impact not only over this parameter, but also over the global evolutionary rate along the tree.


Author(s):  
Carolina A Martinez-Gutierrez ◽  
Frank O Aylward

Abstract Reconstruction of the Tree of Life is a central goal in biology. Although numerous novel phyla of bacteria and archaea have recently been discovered, inconsistent phylogenetic relationships are routinely reported, and many inter-phylum and inter-domain evolutionary relationships remain unclear. Here, we benchmark different marker genes often used in constructing multidomain phylogenetic trees of bacteria and archaea and present a set of marker genes that perform best for multidomain trees constructed from concatenated alignments. We use recently-developed Tree Certainty metrics to assess the confidence of our results and to obviate the complications of traditional bootstrap-based metrics. Given the vastly disparate number of genomes available for different phyla of bacteria and archaea, we also assessed the impact of taxon sampling on multidomain tree construction. Our results demonstrate that biases between the representation of different taxonomic groups can dramatically impact the topology of resulting trees. Inspection of our highest-quality tree supports the division of most bacteria into Terrabacteria and Gracilicutes, with Thermatogota and Synergistota branching earlier from these superphyla. This tree also supports the inclusion of the Patescibacteria within the Terrabacteria as a sister group to the Chloroflexota instead of as a basal-branching lineage. For the Archaea, our tree supports three monophyletic lineages (DPANN, Euryarchaeota, and TACK/Asgard), although we note the basal placement of the DPANN may still represent an artifact caused by biased sequence composition. Our findings provide a robust and standardized framework for multidomain phylogenetic reconstruction that can be used to evaluate inter-phylum relationships and assess uncertainty in conflicting topologies of the Tree of Life.


2021 ◽  
Author(s):  
Jeremy M Beaulieu ◽  
Brian C O'Meara

There is a prevailing view that the inclusion of fossil data could remedy identifiability issues related to models of diversification, by drastically reducing the number of congruent models. The fossilized birth-death (FBD) model is an appealing way of directly incorporating fossil information when estimating diversification rates. Here we explore the benefits of including fossils by implementing and then testing two-types of FBD models in more complex likelihood-based models that assume multiple rate classes across the tree. We also assess the impact of severely undersampling, and even not including fossils that represent samples of lineages that also had sampled descendants (i.e., k-type fossils), as well as converting a fossil set to represent stratigraphic ranges. Under various simulation scenarios, including a scenario that exists far outside the set of models we evaluated, including fossils rarely outperforms analyses that exclude them altogether. At best, the inclusion of fossils improves precision but does not influence bias. We also found that severely undercounting the number of k-type fossils produces highly inflated rates of turnover and extinction fraction. Similarly, we found that converting the fossil set to stratigraphic ranges results in turnover rates and extinction fraction estimates that are generally underestimated. While fossils remain essential for understanding diversification through time, in the specific case of understanding diversification given an existing, largely modern tree, they are not especially beneficial.


2015 ◽  
Author(s):  
Juan Alfredo Holley ◽  
Néstor Guillermo Basso ◽  
Juliana Sterli

Background. The clade Chelidae (Testudines, Pleurodira) is a group of fresh water turtles with representatives in Australasia and South America. Its diversity of extant and fossil species is characterized by two recognized morphotypes: the long-necked and the short-necked chelids. So far, the phylogenies constructed over Chelidae differ depending on the information source. While morphology recovers one monophyletic group of long-necked chelids (with South American and Australasian species), the molecular data split the group into South American and Australasian chelids, both as monophyletic sister groups and containing long-necked species. The consequences of this conflict imply the emergence of long-necked chelids (i) one time before the final breakup of Southern Gondwana (≅ 35 Mya) or (ii) independently after this event. Methods. Using BEAST, a set of molecular clock analyses was performed. Seven of these analyses correspond to the molecular hypothesis and thirteen to the morphological hypothesis. So, ten fossils were used as calibration points in different combinations for each hypothesis. The results were statistically compared performing ANOVA and the global similarity was inspected by a hierarchical cluster analysis (HCA). Results. Molecular hypothesis: all the analyses produced an age of the origin of Chelidae, and rising of long neck, older than 35 Mys. Divergence times in the South American clade were generally older than the observed in the Australasian clade. The result of the HCA was: analyses 2, 4 and 5 form a group and the analyses 3, 6 and 7 form another group; the analysis 1 is close related to this last. Morphological hypothesis: the origin of the clade of long-necked chelids predated the 35 Mys in all the analyses except one; however the Chelodina group resulted younger than this age in all the analyses. The HCA yielded two main groups of molecular clock analyses (1, 3, 7, 8, 9, 13 and 2, 4, 6, 10, 11, 12) and one analysis (5) clearly separated of these two. The ANOVA resulted in significant differences for all estimated nodes in both phylogenetic hypotheses. Discussion. Our set of molecular clock analyses suggests an early diversification of the chelid turtles and the raising of the long-necked chelids before the final breakup of Southern Gondwana. However, the appearance of this trait one time or as evolutionary convergence still depends on which phylogenetic scenario is taken into account. Furthermore, our results indicate that the number of calibration points not necessarily improve the precision of estimated nodes. Instead the “quality” of the fossils used as calibrations and its position in the phylogeny, have appreciable impact not only over this parameter, but also over the global evolutionary rate along the tree.


2004 ◽  
Vol 10 (2) ◽  
pp. 157-166 ◽  
Author(s):  
George I. Hagstrom ◽  
Dehua H. Hang ◽  
Charles Ofria ◽  
Eric Torng

Phylogenetic trees group organisms by their ancestral relationships. There are a number of distinct algorithms used to reconstruct these trees from molecular sequence data, but different methods sometimes give conflicting results. Since there are few precisely known phylogenies, simulations are typically used to test the quality of reconstruction algorithms. These simulations randomly evolve strings of symbols to produce a tree, and then the algorithms are run with the tree leaves as inputs. Here we use Avida to test two widely used reconstruction methods, which gives us the chance to observe the effect of natural selection on tree reconstruction. We find that if the organisms undergo natural selection between branch points, the methods will be successful even on very large time scales. However, these algorithms often falter when selection is absent.


Author(s):  
Hesam Montazeri ◽  
Susan Little ◽  
Mozhgan Mozaffarilegha ◽  
Niko Beerenwinkel ◽  
Victor DeGruttola

AbstractGenetic sequence data of pathogens are increasingly used to investigate transmission dynamics in both endemic diseases and disease outbreaks. Such research can aid in the development of appropriate interventions and in the design of studies to evaluate them. Several computational methods have been proposed to infer transmission chains from sequence data; however, existing methods do not generally reliably reconstruct transmission trees because genetic sequence data or inferred phylogenetic trees from such data contain insufficient information for accurate estimation of transmission chains. Here, we show by simulation studies that incorporating infection times, even when they are uncertain, can greatly improve the accuracy of reconstruction of transmission trees. To achieve this improvement, we propose a Bayesian inference methods using Markov chain Monte Carlo that directly draws samples from the space of transmission trees under the assumption of complete sampling of the outbreak. The likelihood of each transmission tree is computed by a phylogenetic model by treating its internal nodes as transmission events. By a simulation study, we demonstrate that accuracy of the reconstructed transmission trees depends mainly on the amount of information available on times of infection; we show superiority of the proposed method to two alternative approaches when infection times are known up to specified degrees of certainty. In addition, we illustrate the use of a multiple imputation framework to study features of epidemic dynamics, such as the relationship between characteristics of nodes and average number of outbound edges or inbound edges, signifying possible transmission events from and to nodes. We apply the proposed method to a transmission cluster in San Diego and to a dataset from the 2014 Sierra Leone Ebola virus outbreak and investigate the impact of biological, behavioral, and demographic factors.


2013 ◽  
Vol 368 (1614) ◽  
pp. 20120198 ◽  
Author(s):  
Tanja Stadler ◽  
Sebastian Bonhoeffer

Host population structure has a major influence on epidemiological dynamics. However, in particular for sexually transmitted diseases, quantitative data on population contact structure are hard to obtain. Here, we introduce a new method that quantifies host population structure based on phylogenetic trees, which are obtained from pathogen genetic sequence data. Our method is based on a maximum-likelihood framework and uses a multi-type branching process, under which each host is assigned to a type (subpopulation). In a simulation study, we show that our method produces accurate parameter estimates for phylogenetic trees in which each tip is assigned to a type, as well for phylogenetic trees in which the type of the tip is unknown. We apply the method to a Latvian HIV-1 dataset, quantifying the impact of the intravenous drug user epidemic on the heterosexual epidemic (known tip states), and identifying superspreader dynamics within the men-having-sex-with-men epidemic (unknown tip states).


Sign in / Sign up

Export Citation Format

Share Document