Evaluation of the multispecies coalescent method to explore intra-Trypanosoma cruzi I relationships and genetic diversity

Parasitology ◽  
2019 ◽  
Vol 146 (8) ◽  
pp. 1063-1074 ◽  
Author(s):  
César Gómez-Hernández ◽  
Sergio D. Pérez ◽  
Karine Rezende-Oliveira ◽  
Cecilia G. Barbosa ◽  
Eliane Lages-Silva ◽  
...  

AbstractChagas Disease is a zoonosis caused by the parasite Trypanosoma cruzi. Several high-resolution markers have subdivided T. cruzi taxon into at least seven lineages or Discrete Typing Units (DTUs) (TcI-TcVI and TcBat). Trypanosoma cruzi I is the most diverse and geographically widespread DTU. Recently a TcI genotype related to domestic cycles was proposed and named as TcIDOM. Herein, we combined traditional markers and housekeeping genes and applied a Multispecies Coalescent method to explore intra-TcI relationships, lineage boundaries and genetic diversity in a random set of isolates and DNA sequences retrieved from Genbank from different countries in the Americas. We found further evidence supporting TcIDOM as an independent and emerging genotype of TcI at least in Colombia and Venezuela. We also found evidence of high phylogenetic incongruence between parasite's gene trees (including introgression) and embedded species trees, and a lack of genetic structure among geography and hosts, illustrating the complex dynamics and epidemiology of TcI across the Americas. These findings provide novel insights into T. cruzi systematics and epidemiology and support the need to assess parasite diversity and lineage boundaries through hypothesis testing using different approaches to those traditionally employed, including the Bayesian Multispecies coalescent method.

2022 ◽  
Author(s):  
XiaoXu Pang ◽  
Da-Yong Zhang

The species studied in any evolutionary investigation generally constitute a very small proportion of all the species currently existing or that have gone extinct. It is therefore likely that introgression, which is widespread across the tree of life, involves "ghosts," i.e., unsampled, unknown, or extinct lineages. However, the impact of ghost introgression on estimations of species trees has been rarely studied and is thus poorly understood. In this study, we use mathematical analysis and simulations to examine the robustness of species tree methods based on a multispecies coalescent model under gene flow sourcing from an extant or ghost lineage. We found that very low levels of extant or ghost introgression can result in anomalous gene trees (AGTs) on three-taxon rooted trees if accompanied by strong incomplete lineage sorting (ILS). In contrast, even massive introgression, with more than half of the recipient genome descending from the donor lineage, may not necessarily lead to AGTs. In cases involving an ingroup lineage (defined as one that diverged no earlier than the most basal species under investigation) acting as the donor of introgression, the time of root divergence among the investigated species was either underestimated or remained unaffected, but for the cases of outgroup ghost lineages acting as donors, the divergence time was generally overestimated. Under many conditions of ingroup introgression, the stronger the ILS was, the higher was the accuracy of estimating the time of root divergence, although the topology of the species tree is more prone to be biased by the effect of introgression.


Author(s):  
Jordan Douglas

Abstract Summary Visualization is a vital task in phylogenetics and yet there is a deficit in programs which visualize the multispecies coalescent (MSC) model. UglyTrees (UT) is an easy-to-use program for visualizing multiple gene trees embedded within a single species trees. The mapping between gene and species nodes is automatically detected allowing for ready access to the program. UT can scrape the contents of a website for MSC analyses, enabling the sharing of interactive MSC figures through optional parameters in the URL. If a posterior distribution is uploaded, the transitions between MSC states are animated allowing the visual tracking of trees throughout the sequence. Availability and implementation UT runs in all major web browsers including mobile devices, and is hosted at www.uglytrees.nz. The MIT-licensed code is available at https://github.com/UglyTrees/uglytrees.github.io.


2019 ◽  
Vol 37 (5) ◽  
pp. 1480-1494 ◽  
Author(s):  
Anastasiia Kim ◽  
Noah A Rosenberg ◽  
James H Degnan

Abstract A labeled gene tree topology that is more probable than the labeled gene tree topology matching a species tree is called “anomalous.” Species trees that can generate such anomalous gene trees are said to be in the “anomaly zone.” Here, probabilities of “unranked” and “ranked” gene tree topologies under the multispecies coalescent are considered. A ranked tree depicts not only the topological relationship among gene lineages, as an unranked tree does, but also the sequence in which the lineages coalesce. In this article, we study how the parameters of a species tree simulated under a constant-rate birth–death process can affect the probability that the species tree lies in the anomaly zone. We find that with more than five taxa, it is possible for species trees to have both anomalous unranked and ranked gene trees. The probability of being in either type of anomaly zone increases with more taxa. The probability of anomalous gene trees also increases with higher speciation rates. We observe that the probabilities of unranked anomaly zones are higher and grow much faster than those of ranked anomaly zones as the speciation rate increases. Our simulation shows that the most probable ranked gene tree is likely to have the same unranked topology as the species tree. We design the software PRANC, which computes probabilities of ranked gene tree topologies given a species tree under the coalescent model.


2021 ◽  
Author(s):  
Mark Hershkovitz

Phylogenetic analysis of combined ribosomal DNA internal transcribed spacer (ITS) and chloroplast DNA rpl32-trnL intergenic spacer sequences greatly improves phylogenetic resolution of Chaetanthera Ruiz & Pav. and Oriastrum Poepp. & Endl. (Asteraceae; Mutisieae) over a previously published phylogeny based on ITS alone. The results support segregation of Chaetanthera subg. Liniphyllum Less. from C. subg. Chaetanthera. One sample with peculiar ITS and rpl32-trnL sequences may be of extraterrestrial origin. Fifteen of 16 nominal species sampled more than once for both loci were polymorphic for at least one of them, and only half of the polymorphic samples were demonstrably monophyletic in the combined data analysis. An additional five species sampled only for ITS all were polymorphic. These results underscore the ontological difference between gene trees and species trees and further discredit the notion of “species barcodes.” The gene trees for both loci manifest departures from all evolutionary models implemented for phylogenetic reconstruction. This result is explained as a consequence of evolutionary idiosyncraticity, in turn a function of the determinacy of biological organisms and processes consequent to autopoiesis. This determinacy implicates a chaotic evolutionary function that theoretically cannot be reconstructed or predicted by stochastic models. However, because phylogenetic history and clades are materially tangible entities, their reconstruction is within the realm of scientific inquiry. I discuss the phylogeny of Chaetanthera/Oriastrum in this epistemological framework.


2020 ◽  
Author(s):  
Qiuyi Li ◽  
Celine Scornavacca ◽  
Nicolas Galtier ◽  
Yao-Ban Chan

Abstract Incomplete lineage sorting (ILS), the interaction between coalescence and speciation, can generate incongruence between gene trees and species trees, as can gene duplication (D), transfer (T) and loss (L). These processes are usually modelled independently, but in reality, ILS can affect gene copy number polymorphism, i.e., interfere with DTL. This has been previously recognised, but not treated in a satisfactory way, mainly because DTL events are naturally modelled forward-in-time, while ILS is naturally modelled backwards-in-time with the coalescent. Here we consider the joint action of ILS and DTL on the gene tree/species tree problem in all its complexity. In particular, we show that the interaction between ILS and duplications/transfers (without losses) can result in patterns usually interpreted as resulting from gene loss, and that the realised rate of D, T and L becomes non-homogeneous in time when ILS is taken into account. We introduce algorithmic solutions to these problems. Our new model, the multilocus multispecies coalescent (MLMSC), which also accounts for any level of linkage between loci, generalises the multispecies coalescent model and offers a versatile, powerful framework for proper simulation and inference of gene family evolution.


2020 ◽  
Author(s):  
Laura Kubatko ◽  
Julia Chifman

AbstractThe advent of rapid and inexpensive sequencing technologies has necessitated the development of computationally efficient methods for analyzing sequence data for many genes simultaneously in a phylogenetic framework. The coalescent process is the most commonly used model for linking the underlying genealogies of individual genes with the global species-level phylogeny, but inference under the coalescent model is computationally daunting in the typical inference frameworks (e.g., the likelihood and Bayesian frameworks) due to the dimensionality of the space of both gene trees and species trees. Here we consider estimation of the branch lengths in a fixed species tree, and show that these branch lengths are identifiable. We also show that in the case of four taxa simple estimators for the branch lengths can be derived based on observed site pattern frequencies. Properties of these estimators, such as their asymptotic variances and large-sample distributions, are examined, and performance of the estimators is assessed using simulation. Finally, we use these estimators to develop a hypothesis test that can be limit species under the coalescent model.


2020 ◽  
Author(s):  
Erin K. Molloy ◽  
John Gatesy ◽  
Mark S. Springer

AbstractA major shortcoming of concatenation methods for species tree estimation is their failure to account for incomplete lineage sorting (ILS). Coalescence methods explicitly address this problem, but make various assumptions that, if violated, can result in worse performance than concatenation. Given the challenges of analyzing DNA sequences with both concatenation and coalescence methods, retroelement insertions have emerged as powerful phylogenomic markers for species tree estimation. We show that two recently proposed methods, SDPquartets and ASTRAL_BP, are statistically consistent estimators of the species tree under the multispecies coalescent model, with retroelement insertions following a neutral infinite sites model of mutation. The accuracy of these and other methods for inferring species trees with retroelements has not been assessed in simulation studies. We simulate retroelements for four different species trees, including three with short branch lengths in the anomaly zone, and assess the performance of eight different methods for recovering the correct species tree. We also examine whether ASTRAL_BP recovers accurate internal branch lengths for internodes of various lengths (in coalescent units). Our results indicate that two recently proposed ILS-aware methods, ASTRAL_BP and SDPquartets, as well as the newly proposed ASTRID_BP, always recover the correct species tree on data sets with large numbers of retroelements even when there are extremely short species-tree branches in the anomaly zone. Dollo parsimony performed almost as well as these ILS-aware methods. By contrast, unordered parsimony, polymorphism parsimony, and MDC recovered the correct species tree in the case of a pectinate tree with four ingroup taxa in the anomaly zone, but failed to recover the correct tree in more complex anomaly-zone situations with additional lineages impacted by extensive incomplete lineage sorting. Camin-Sokal parsimony always reconstructed an incorrect tree in the anomaly zone. ASTRAL_BP accurately estimated branch lengths when internal branches were very short as in anomaly zone situations, but branch lengths were upwardly biased by more than 35% when species tree branches were longer. We derive a mathematical correction for these distortions, assuming the expected number of new retroelement insertions per generation is constant across the species tree. We also show that short branches do not need to be corrected even when this assumption does not hold; therefore, the branch lengths estimates produced by ASTRAL_BP may provide insight into whether an estimated species tree is in the anomaly zone.


2019 ◽  
Vol 69 (1) ◽  
pp. 194-207
Author(s):  
Richard H Adams ◽  
Todd A Castoe

Abstract Despite the ubiquitous use of statistical models for phylogenomic and population genomic inferences, this model-based rigor is rarely applied to post hoc comparison of trees. In a recent study, Garba et al. derived new methods for measuring the distance between two gene trees computed as the difference in their site pattern probability distributions. Unlike traditional metrics that compare trees solely in terms of geometry, these measures consider gene trees and associated parameters as probabilistic models that can be compared using standard information theoretic approaches. Consequently, probabilistic measures of phylogenetic tree distance can be far more informative than simply comparisons of topology and/or branch lengths alone. However, in their current form, these distance measures are not suitable for the comparison of species tree models in the presence of gene tree heterogeneity. Here, we demonstrate an approach for how the theory of Garba et al. (2018), which is based on gene tree distances, can be extended naturally to the comparison of species tree models. Multispecies coalescent (MSC) models parameterize the discrete probability distribution of gene trees conditioned upon a species tree with a particular topology and set of divergence times (in coalescent units), and thus provide a framework for measuring distances between species tree models in terms of their corresponding gene tree topology probabilities. We describe the computation of probabilistic species tree distances in the context of standard MSC models, which assume complete genetic isolation postspeciation, as well as recent theoretical extensions to the MSC in the form of network-based MSC models that relax this assumption and permit hybridization among taxa. We demonstrate these metrics using simulations and empirical species tree estimates and discuss both the benefits and limitations of these approaches. We make our species tree distance approach available as an R package called pSTDistanceR, for open use by the community.


2021 ◽  
Vol 7 (8) ◽  
Author(s):  
Maëva Perez ◽  
Bernard Angers ◽  
C. Robert Young ◽  
S. Kim Juniper

Many foundation species in chemosynthesis-based ecosystems rely on environmentally acquired symbiotic bacteria for their survival. Hence, understanding the biogeographic distributions of these symbionts at regional scales is key to understanding patterns of connectivity and predicting resilience of their host populations (and thus whole communities). However, such assessments are challenging because they necessitate measuring bacterial genetic diversity at fine resolutions. For this purpose, the recently discovered clustered regularly interspaced short palindromic repeats (CRISPR) constitutes a promising new genetic marker. These DNA sequences harboured by about half of bacteria hold their viral immune memory, and as such, might allow discrimination of different lineages or strains of otherwise indistinguishable bacteria. In this study, we assessed the potential of CRISPR as a hypervariable phylogenetic marker in the context of a population genetic study of an uncultured bacterial species. We used high-throughput CRISPR-based typing along with multi-locus sequence analysis (MLSA) to characterize the regional population structure of the obligate but environmentally acquired symbiont species Candidatus Endoriftia persephone on the Juan de Fuca Ridge. Mixed symbiont populations of Ca. Endoriftia persephone were sampled across individual Ridgeia piscesae hosts from contrasting habitats in order to determine if environmental conditions rather than barriers to connectivity are more important drivers of symbiont diversity. We showed that CRISPR revealed a much higher symbiont genetic diversity than the other housekeeping genes. Several lines of evidence imply this diversity is indicative of environmental strains. Finally, we found with both CRISPR and gene markers that local symbiont populations are strongly differentiated across sites known to be isolated by deep-sea circulation patterns. This research showed the high power of CRISPR to resolve the genetic structure of uncultured bacterial populations and represents a step towards making keystone microbial species an integral part of conservation policies for upcoming mining operations on the seafloor.


2020 ◽  
Author(s):  
Qiuyi Li ◽  
Celine Scornavacca ◽  
Nicolas Galtier ◽  
Yao-Ban Chan

AbstractIncomplete lineage sorting (ILS), the interaction between coalescence and speciation, can generate incongruence between gene trees and species trees, as can gene duplication (D), transfer (T) and loss (L). These processes are usually modelled independently, but in reality, ILS can affect gene copy number polymorphism, i.e., interfere with DTL. This has been previously recognised, but not treated in a satisfactory way, mainly because DTL events are naturally modelled forward-in-time, while ILS is naturally modelled backwards-in-time with the coalescent. Here we consider the joint action of ILS and DTL on the gene tree/species tree problem in all its complexity. In particular, we show that the interaction between ILS and duplications/transfers (without losses) can result in patterns usually interpreted as resulting from gene loss, and that the realised rate of D, T and L becomes non-homogeneous in time when ILS is taken into account. We introduce algorithmic solutions to these problems. Our new model, the multilocus multispecies coalescent (MLMSC), which also accounts for any level of linkage between loci, generalises the multispecies coalescent model and offers a versatile, powerful framework for proper simulation and inference of gene family evolution.


Sign in / Sign up

Export Citation Format

Share Document