scholarly journals Multispecies Coalescent Analysis of the Early Diversification of Neotropical Primates: Phylogenetic Inference under Strong Gene Trees/Species Tree Conflict

2014 ◽  
Vol 6 (11) ◽  
pp. 3105-3114 ◽  
Author(s):  
C. G. Schrago ◽  
A. N. Menezes ◽  
C. Furtado ◽  
C. R. Bonvicino ◽  
H. N. Seuanez
2022 ◽  
Author(s):  
XiaoXu Pang ◽  
Da-Yong Zhang

The species studied in any evolutionary investigation generally constitute a very small proportion of all the species currently existing or that have gone extinct. It is therefore likely that introgression, which is widespread across the tree of life, involves "ghosts," i.e., unsampled, unknown, or extinct lineages. However, the impact of ghost introgression on estimations of species trees has been rarely studied and is thus poorly understood. In this study, we use mathematical analysis and simulations to examine the robustness of species tree methods based on a multispecies coalescent model under gene flow sourcing from an extant or ghost lineage. We found that very low levels of extant or ghost introgression can result in anomalous gene trees (AGTs) on three-taxon rooted trees if accompanied by strong incomplete lineage sorting (ILS). In contrast, even massive introgression, with more than half of the recipient genome descending from the donor lineage, may not necessarily lead to AGTs. In cases involving an ingroup lineage (defined as one that diverged no earlier than the most basal species under investigation) acting as the donor of introgression, the time of root divergence among the investigated species was either underestimated or remained unaffected, but for the cases of outgroup ghost lineages acting as donors, the divergence time was generally overestimated. Under many conditions of ingroup introgression, the stronger the ILS was, the higher was the accuracy of estimating the time of root divergence, although the topology of the species tree is more prone to be biased by the effect of introgression.


2017 ◽  
Author(s):  
Timothy D. Swain

AbstractThe recent rapid proliferation of novel taxon identification in the Zoanthidea has been accompanied by a parallel propagation of gene trees as a tool of species discovery, but not a corresponding increase in our understanding of phylogeny. This disparity is caused by the trade-off between the capabilities of automated DNA sequence alignment and data content of genes applied to phylogenetic inference in this group. Conserved genes or segments are easily aligned across the order, but produce poorly resolved trees; hypervariable genes or segments contain the evolutionary signal necessary for resolution and robust support, but sequence alignment is daunting. Staggered alignments are a form of phylogeny-informed sequence alignment composed of a mosaic of local and universal regions that allow phylogenetic inference to be applied to all nucleotides from both hypervariable and conserved gene segments. Comparisons between species tree phylogenies inferred from all data (staggered alignment) and hypervariable-excluded data (standard alignment) demonstrate improved confidence and greater topological agreement with other sources of data for the complete-data tree. This novel phylogeny is the most comprehensive to date (in terms of taxa and data) and can serve as an expandable tool for evolutionary hypothesis testing in the Zoanthidea.ResumenSpanish language translation by Lisbeth O. Swain, DePaul University, Chicago, Illinois, 60604, USA.Aunque la proliferación reciente y acelerada en la identificación de taxones en Zoanthidea ha sido acompañada por una propagación paralela de los árboles de genes como una herramienta en el descubrimiento de especies, no hay una correspondencia en cuanto a la ampliación de nuestro conocimiento en filogenia. Esta disparidad, es causada por la competencia entre la capacidad de los alineamientos de secuencia del ácido desoxirribonucleico (ADN) automatizados y la información contenida en los datos de genes que se aplican a los métodos de inferencia filogenética en este grupo de Zoanthidea. Las regiones o segmentos de genes conservados son fácilmente alineados dentro del orden; sin embargo, producen árboles de genes con resultados paupérrimos; además, aunque estas regiones hipervariables de genes o segmentos contienen las señas evolutivas necesarias para apoyar la construcción robusta y completa de árboles filogenéticos, estos genes producen alineamientos de secuencia abrumadores. Los alineamientos escalonados de secuencias son una forma de alineamientos informados por la filogenia y compuestos de un mosaico de regiones locales y universales que permiten que inferencias filogenéticas sean aplicadas a todos los nucleótidos de regiones hipervariables y de genes o segmentos conservados. Las comparaciones entre especies de árboles filogenéticos quese infirieron de los datos de alineamientos escalonados y los datos hipervariables excluidos (alineamiento estandarizado), demuestran un mejoramiento en la confiabilidad y un mayor acuerdo tipológico con respecto a otras fuentes que contienen árboles filogenéticos hechos de datos más completos. Esta nueva forma escalonada de filogenia es una de los más compresibles hasta la fecha (en términos de taxones y datos) y que pueden servir como una herramienta de amplificación para probar la hipótesis evolutiva de Zoanthidea.


2019 ◽  
Author(s):  
Xiaodong Jian ◽  
Scott V. Edwards ◽  
Liang Liu

ABSTRACTA statistical framework of model comparison and model validation is essential to resolving the debates over concatenation and coalescent models in phylogenomic data analysis. A set of statistical tests are here applied and developed to evaluate and compare the adequacy of substitution, concatenation, and multispecies coalescent (MSC) models across 47 phylogenomic data sets collected across tree of life. Tests for substitution models and the concatenation assumption of topologically concordant gene trees suggest that a poor fit of substitution models (44% of loci rejecting the substitution model) and concatenation models (38% of loci rejecting the hypothesis of topologically congruent gene trees) is widespread. Logistic regression shows that the proportions of GC content and informative sites are both negatively correlated with the fit of substitution models across loci. Moreover, a substantial violation of the concatenation assumption of congruent gene trees is consistently observed across 6 major groups (birds, mammals, fish, insects, reptiles, and others, including other invertebrates). In contrast, among those loci adequately described by a given substitution model, the proportion of loci rejecting the MSC model is 11%, significantly lower than those rejecting the substitution and concatenation models, and Bayesian model comparison strongly favors the MSC over concatenation across all data sets. Species tree inference suggests that loci rejecting the MSC have little effect on species tree estimation. Due to computational constraints, the Bayesian model validation and comparison analyses were conducted on the reduced data sets. A complete analysis of phylogenomic data requires the development of efficient algorithms for phylogenetic inference. Nevertheless, the concatenation assumption of congruent gene trees rarely holds for phylogenomic data with more than 10 loci. Thus, for large phylogenomic data sets, model comparison analyses are expected to consistently and more strongly favor the coalescent model over the concatenation model. Our analysis reveals the value of model validation and comparison in phylogenomic data analysis, as well as the need for further improvements of multilocus models and computational tools for phylogenetic inference.


2018 ◽  
Author(s):  
Rafael F. Guerrero ◽  
Matthew W. Hahn

AbstractConvergent evolution is often inferred when a trait is incongruent with the species tree. However, trait incongruence can also arise from changes that occur on discordant gene trees, a process referred to as hemiplasy. Hemiplasy is rarely taken into account in studies of convergent evolution, despite the fact that phylogenomic studies have revealed rampant discordance. Here, we study the relative probabilities of homoplasy (including convergence and reversal) and hemiplasy for an incongruent trait. We derive expressions for the probabilities of the two events, showing that they depend on many of the same parameters. We find that hemiplasy is as likely— or more likely—than homoplasy for a wide range of conditions, even when levels of discordance are low. We also present a new method to calculate the ratio of these two probabilities (the “hemiplasy risk factor”) along the branches of a phylogeny of arbitrary length. Such calculations can be applied to any tree in order to identify when and where incongruent traits may be more likely to be due to hemiplasy than homoplasy.


2019 ◽  
Author(s):  
Yaxuan Wang ◽  
Huw A. Ogilvie ◽  
Luay Nakhleh

AbstractSpecies tree inference from multi-locus data has emerged as a powerful paradigm in the post-genomic era, both in terms of the accuracy of the species tree it produces as well as in terms of elucidating the processes that shaped the evolutionary history. Bayesian methods for species tree inference are desirable in this area as they have been shown to yield accurate estimates, but also to naturally provide measures of confidence in those estimates. However, the heavy computational requirements of Bayesian inference have limited the applicability of such methods to very small data sets.In this paper, we show that the computational efficiency of Bayesian inference under the multispecies coalescent can be improved in practice by restricting the space of the gene trees explored during the random walk, without sacrificing accuracy as measured by various metrics. The idea is to first infer constraints on the trees of the individual loci in the form of unresolved gene trees, and then to restrict the sampler to consider only resolutions of the constrained trees. We demonstrate the improvements gained by such an approach on both simulated and biological data.


2020 ◽  
Author(s):  
Patrick F. McKenzie ◽  
Deren A. R. Eaton

AbstractA key distinction between species tree inference under the multi-species coalescent model (MSC), and the inference of gene trees in sliding windows along a genome, is in the effect of genetic linkage. Whereas the MSC explicitly assumes genealogies to be unlinked, i.e., statistically independent, genealogies located close together on genomes are spatially auto-correlated. Here we use tree sequence simulations with recombination to explore the effects of species tree parameters on spatial patterns of linkage among genealogies. We decompose coalescent time units to demonstrate differential effects of generation time and effective population size on spatial coalescent patterns, and we define a new metric, “phylogenetic linkage,” for measuring the rate of decay of phylogenetic similarity by comparison to distances among unlinked genealogies. Finally, we provide a simple example where accounting for phylogenetic linkage in sliding window analyses improves local gene tree inference.


2020 ◽  
Vol 36 (18) ◽  
pp. 4819-4821
Author(s):  
Anastasiia Kim ◽  
James H Degnan

Abstract Summary PRANC computes the Probabilities of RANked gene tree topologies under the multispecies coalescent. A ranked gene tree is a gene tree accounting for the temporal ordering of internal nodes. PRANC can also estimate the maximum likelihood (ML) species tree from a sample of ranked or unranked gene tree topologies. It estimates the ML tree with estimated branch lengths in coalescent units. Availability and implementation PRANC is written in C++ and freely available at github.com/anastasiiakim/PRANC. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 37 (6) ◽  
pp. 1809-1818
Author(s):  
Yaxuan Wang ◽  
Huw A Ogilvie ◽  
Luay Nakhleh

Abstract Species tree inference from multilocus data has emerged as a powerful paradigm in the postgenomic era, both in terms of the accuracy of the species tree it produces as well as in terms of elucidating the processes that shaped the evolutionary history. Bayesian methods for species tree inference are desirable in this area as they have been shown not only to yield accurate estimates, but also to naturally provide measures of confidence in those estimates. However, the heavy computational requirements of Bayesian inference have limited the applicability of such methods to very small data sets. In this article, we show that the computational efficiency of Bayesian inference under the multispecies coalescent can be improved in practice by restricting the space of the gene trees explored during the random walk, without sacrificing accuracy as measured by various metrics. The idea is to first infer constraints on the trees of the individual loci in the form of unresolved gene trees, and then to restrict the sampler to consider only resolutions of the constrained trees. We demonstrate the improvements gained by such an approach on both simulated and biological data.


2019 ◽  
Vol 37 (5) ◽  
pp. 1480-1494 ◽  
Author(s):  
Anastasiia Kim ◽  
Noah A Rosenberg ◽  
James H Degnan

Abstract A labeled gene tree topology that is more probable than the labeled gene tree topology matching a species tree is called “anomalous.” Species trees that can generate such anomalous gene trees are said to be in the “anomaly zone.” Here, probabilities of “unranked” and “ranked” gene tree topologies under the multispecies coalescent are considered. A ranked tree depicts not only the topological relationship among gene lineages, as an unranked tree does, but also the sequence in which the lineages coalesce. In this article, we study how the parameters of a species tree simulated under a constant-rate birth–death process can affect the probability that the species tree lies in the anomaly zone. We find that with more than five taxa, it is possible for species trees to have both anomalous unranked and ranked gene trees. The probability of being in either type of anomaly zone increases with more taxa. The probability of anomalous gene trees also increases with higher speciation rates. We observe that the probabilities of unranked anomaly zones are higher and grow much faster than those of ranked anomaly zones as the speciation rate increases. Our simulation shows that the most probable ranked gene tree is likely to have the same unranked topology as the species tree. We design the software PRANC, which computes probabilities of ranked gene tree topologies given a species tree under the coalescent model.


Sign in / Sign up

Export Citation Format

Share Document