scholarly journals Enriching for orthologs increases support for Xenacoelomorpha and Ambulacraria sister relationship

2021 ◽  
Author(s):  
Peter O Mulhair ◽  
Charley GP McCarthey ◽  
Karen Siu-Ting ◽  
Christopher J Creevey ◽  
Mary J O'Connell

Conflicting studies place a group of bilaterian invertebrates containing xenoturbellids and acoelomorphs, the Xenacoelomorpha, as either the primary emerging bilaterian phylum, or within Deuterostomia, sister to Ambulacraria. While their placement as sister to the rest of Bilateria supports relatively simple morphology in the ancestral bilaterian, their alternative placement within Deuterostomia suggests a morphologically complex ancestral Bilaterian along with extensive loss of major phenotypic traits in the Xenacoelomorpha. More recently, further studies have brought into question whether Deuterostomia should be considered monophyletic at all. Hidden paralogy presents a major challenge for reconstructing species phylogenies. Here we assess whether hidden paralogy has contributed to the conflict over the placement of Xenacoelomorpha. Our approach assesses previously published datasets, enriching for orthogroups whose gene trees support well resolved clans elsewhere in the animal tree of life. We find that the majority of constituent genes in previously published datasets violate incontestable clans, suggesting that hidden paralogy is rife at this depth. We demonstrate that enrichment for genes with orthologous signal alters the final topology that is inferred, whilst simultaneously improving fit of the model to the data. We discover increased, but ultimately not conclusive, support for the existence of Xenambulacraria in our orthology enriched set of genes. At a time when we are steadily progressing towards sequencing all of life on the planet, we argue that long-standing contentious issues in the tree of life will be resolved using smaller amounts of better quality data that can be modelled adequately.

Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

Despite the long history of using protein sequences to infer the tree of life the potential for different parts of protein structures to retain historical signal remains unclear. We propose that it might be possible to improve analyses of phylogenomic datasets by incorporating information about protein structure; we test this idea using the position of the root of Metazoa (animals) as a model system. We examined the distribution of “strongly decisive” sites (alignment positions that support a specific tree topology) in a dataset comprising >1,500 proteins and almost 100 taxa. The proportion of each class of strongly decisive sites in different structural environments was very sensitive to the model used to analyze the data when a limited number of taxa were used but they were stable when taxa were added. As long as enough taxa were analyzed, sites in all structural environments supported the same topology (ctenophores sister to other animals) regardless of whether standard tree searches or decisive sites were used to select the optimal tree. However, the use of decisive sites revealed a difference between the support for minority topologies for sites in different structural environments; buried sites and sites in sheet and coil environments exhibited equal support for the minority topologies whereas solvent exposed and helix sites had unequal numbers of sites supporting the minority topologies. Given the plausible trees equal support for minority topologies is consistent with discordance among gene trees, making it possible the relatively slowly evolving buried (and sheet and coil) sites are giving an accurate picture of the true species tree as well as the amount of conflict among gene trees. Alternatively, the apparent support could reflect currently uncharacterized processes of molecular evolution. Regardless, it is clear that analyses of the deepest branches in the animal tree of life using sites in different structural environments are associated with a subtle data type effect that results in distinct phylogenetic signals.


2022 ◽  
Vol 22 (1) ◽  
Author(s):  
Monique Aouad ◽  
Jean-Pierre Flandrois ◽  
Frédéric Jauffrit ◽  
Manolo Gouy ◽  
Simonetta Gribaldo ◽  
...  

Abstract Background The recent rise in cultivation-independent genome sequencing has provided key material to explore uncharted branches of the Tree of Life. This has been particularly spectacular concerning the Archaea, projecting them at the center stage as prominently relevant to understand early stages in evolution and the emergence of fundamental metabolisms as well as the origin of eukaryotes. Yet, resolving deep divergences remains a challenging task due to well-known tree-reconstruction artefacts and biases in extracting robust ancient phylogenetic signal, notably when analyzing data sets including the three Domains of Life. Among the various strategies aimed at mitigating these problems, divide-and-conquer approaches remain poorly explored, and have been primarily based on reconciliation among single gene trees which however notoriously lack ancient phylogenetic signal. Results We analyzed sub-sets of full supermatrices covering the whole Tree of Life with specific taxonomic sampling to robustly resolve different parts of the archaeal phylogeny in light of their current diversity. Our results strongly support the existence and early emergence of two main clades, Cluster I and Cluster II, which we name Ouranosarchaea and Gaiarchaea, and we clarify the placement of important novel archaeal lineages within these two clades. However, the monophyly and branching of the fast evolving nanosized DPANN members remains unclear and worth of further study. Conclusions We inferred a well resolved rooted phylogeny of the Archaea that includes all recently described phyla of high taxonomic rank. This phylogeny represents a valuable reference to study the evolutionary events associated to the early steps of the diversification of the archaeal domain. Beyond the specifics of archaeal phylogeny, our results demonstrate the power of divide-and-conquer approaches to resolve deep phylogenetic relationships, which should be applied to progressively resolve the entire Tree of Life.


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Yan Du ◽  
Shaoyuan Wu ◽  
Scott V. Edwards ◽  
Liang Liu

Abstract Background The flood of genomic data to help build and date the tree of life requires automation at several critical junctures, most importantly during sequence assembly and alignment. It is widely appreciated that automated alignment protocols can yield inaccuracies, but the relative impact of various sources error on phylogenomic analysis is not yet known. This study employs an updated mammal data set of 5162 coding loci sampled from 90 species to evaluate the effects of alignment uncertainty, substitution models, and fossil priors on gene tree, species tree, and divergence time estimation. Additionally, a novel coalescent likelihood ratio test is introduced for comparing competing species trees against a given set of gene trees. Results The aligned DNA sequences of 5162 loci from 90 species were trimmed and filtered using trimAL and two filtering protocols. The final dataset contains 4 sets of alignments - before trimming, after trimming, filtered by a recently proposed pipeline, and further filtered by comparing ML gene trees for each locus with the concatenation tree. Our analyses suggest that the average discordance among the coalescent trees is significantly smaller than that among the concatenation trees estimated from the 4 sets of alignments or with different substitution models. There is no significant difference among the divergence times estimated with different substitution models. However, the divergence dates estimated from the alignments after trimming are more recent than those estimated from the alignments before trimming. Conclusions Our results highlight that alignment uncertainty of the updated mammal data set and the choice of substitution models have little impact on tree topologies yielded by coalescent methods for species tree estimation, whereas they are more influential on the trees made by concatenation. Given the choice of calibration scheme and clock models, divergence time estimates are robust to the choice of substitution models, but removing alignments deemed problematic by trimming algorithms can lead to more recent dates. Although the fossil prior is important in divergence time estimation, Bayesian estimates of divergence times in this data set are driven primarily by the sequence data.


2019 ◽  
Vol 36 (8) ◽  
pp. 1831-1842 ◽  
Author(s):  
Mario A Cerón-Romero ◽  
Xyrus X Maurer-Alcalá ◽  
Jean-David Grattepanche ◽  
Ying Yan ◽  
Miguel M Fonseca ◽  
...  

Abstract Estimating multiple sequence alignments (MSAs) and inferring phylogenies are essential for many aspects of comparative biology. Yet, many bioinformatics tools for such analyses have focused on specific clades, with greatest attention paid to plants, animals, and fungi. The rapid increase in high-throughput sequencing (HTS) data from diverse lineages now provides opportunities to estimate evolutionary relationships and gene family evolution across the eukaryotic tree of life. At the same time, these types of data are known to be error-prone (e.g., substitutions, contamination). To address these opportunities and challenges, we have refined a phylogenomic pipeline, now named PhyloToL, to allow easy incorporation of data from HTS studies, to automate production of both MSAs and gene trees, and to identify and remove contaminants. PhyloToL is designed for phylogenomic analyses of diverse lineages across the tree of life (i.e., at scales of >100 My). We demonstrate the power of PhyloToL by assessing stop codon usage in Ciliophora, identifying contamination in a taxon- and gene-rich database and exploring the evolutionary history of chromosomes in the kinetoplastid parasite Trypanosoma brucei, the causative agent of African sleeping sickness. Benchmarking PhyloToL’s homology assessment against that of OrthoMCL and a published paper on superfamilies of bacterial and eukaryotic organellar outer membrane pore-forming proteins demonstrates the power of our approach for determining gene family membership and inferring gene trees. PhyloToL is highly flexible and allows users to easily explore HTS data, test hypotheses about phylogeny and gene family evolution and combine outputs with third-party tools (e.g., PhyloChromoMap, iGTP).


1997 ◽  
Vol 61 (4) ◽  
pp. 456-502
Author(s):  
J R Brown ◽  
W F Doolittle

Since the late 1970s, determining the phylogenetic relationships among the contemporary domains of life, the Archaea (archaebacteria), Bacteria (eubacteria), and Eucarya (eukaryotes), has been central to the study of early cellular evolution. The two salient issues surrounding the universal tree of life are whether all three domains are monophyletic (i.e., all equivalent in taxanomic rank) and where the root of the universal tree lies. Evaluation of the status of the Archaea has become key to answering these questions. This review considers our cumulative knowledge about the Archaea in relationship to the Bacteria and Eucarya. Particular attention is paid to the recent use of molecular phylogenetic approaches to reconstructing the tree of life. In this regard, the phylogenetic analyses of more than 60 proteins are reviewed and presented in the context of their participation in major biochemical pathways. Although many gene trees are incongruent, the majority do suggest a sisterhood between Archaea and Eucarya. Altering this general pattern of gene evolution are two kinds of potential interdomain gene transferrals. One horizontal gene exchange might have involved the gram-positive Bacteria and the Archaea, while the other might have occurred between proteobacteria and eukaryotes and might have been mediated by endosymbiosis.


2020 ◽  
Vol 66 (5) ◽  
pp. 565-574
Author(s):  
C Tristan Stayton

Abstract Contemporary methods for visualizing phenotypic evolution, such as phylomorphospaces, often reveal patterns which depart strongly from a naïve expectation of consistently divergent branching and expansion. Instead, branches regularly crisscross as convergence, reversals, or other forms of homoplasy occur, forming patterns described as “birds’ nests”, “flies in vials”, or less elegantly, “a mess”. In other words, the phenotypic tree of life often appears highly tangled. Various explanations are given for this, such as differential degrees of developmental constraint, adaptation, or lack of adaptation. However, null expectations for the magnitude of disorder or “tangling” have never been established, so it is unclear which or even whether various evolutionary factors are required to explain messy patterns of evolution. I simulated evolution along phylogenies under a number of varying parameters (number of taxa and number of traits) and models (Brownian motion, Ornstein–Uhlenbeck (OU)-based, early burst, and character displacement (CD)] and quantified disorder using 2 measures. All models produce substantial amounts of disorder. Disorder increases with tree size and the number of phenotypic traits. OU models produced the largest amounts of disorder—adaptive peaks influence lineages to evolve within restricted areas, with concomitant increases in crossing of branches and density of evolution. Large early changes in trait values can be important in minimizing disorder. CD consistently produced trees with low (but not absent) disorder. Overall, neither constraints nor a lack of adaptation is required to explain messy phylomorphospaces—both stochastic and deterministic processes can act to produce the tantalizingly tangled phenotypic tree of life.


2020 ◽  
Vol 70 (1) ◽  
pp. 49-66 ◽  
Author(s):  
Paul M Hime ◽  
Alan R Lemmon ◽  
Emily C Moriarty Lemmon ◽  
Elizabeth Prendini ◽  
Jeremy M Brown ◽  
...  

Abstract Molecular phylogenies have yielded strong support for many parts of the amphibian Tree of Life, but poor support for the resolution of deeper nodes, including relationships among families and orders. To clarify these relationships, we provide a phylogenomic perspective on amphibian relationships by developing a taxon-specific Anchored Hybrid Enrichment protocol targeting hundreds of conserved exons which are effective across the class. After obtaining data from 220 loci for 286 species (representing 94% of the families and 44% of the genera), we estimate a phylogeny for extant amphibians and identify gene tree–species tree conflict across the deepest branches of the amphibian phylogeny. We perform locus-by-locus genealogical interrogation of alternative topological hypotheses for amphibian monophyly, focusing on interordinal relationships. We find that phylogenetic signal deep in the amphibian phylogeny varies greatly across loci in a manner that is consistent with incomplete lineage sorting in the ancestral lineage of extant amphibians. Our results overwhelmingly support amphibian monophyly and a sister relationship between frogs and salamanders, consistent with the Batrachia hypothesis. Species tree analyses converge on a small set of topological hypotheses for the relationships among extant amphibian families. These results clarify several contentious portions of the amphibian Tree of Life, which in conjunction with a set of vetted fossil calibrations, support a surprisingly younger timescale for crown and ordinal amphibian diversification than previously reported. More broadly, our study provides insight into the sources, magnitudes, and heterogeneity of support across loci in phylogenomic data sets.[AIC; Amphibia; Batrachia; Phylogeny; gene tree–species tree discordance; genomics; information theory.]


2017 ◽  
Vol 15 (05) ◽  
pp. 1750019
Author(s):  
Christophe Guyeux ◽  
Stéphane Chrétien ◽  
Nathalie M.-L. Côté ◽  
Jacques M. Bahi

In this paper, we propose a high performance computing toolbox implementing efficient statistical methods for the study of phylogenies. This toolbox, which implements logit models and LASSO-type penalties, gives a way to better understand, measure, and compare the impact of each gene on a global phylogeny. As an application, we study the Echinococcus phylogeny, which is often considered as a particularly difficult example. Mitochondrial and nuclear genomes (19 coding sequences) of nine Echinococcus species are considered in order to investigate the molecular phylogeny of this genus. First, we check that the 19 gene trees lead to 19 totally different unsupported topologies (a topology is the sister relationship when both branch lengths and supports are ignored in a phylogenetic tree), while using the 19 genes as a whole are not sufficient for estimating the phylogeny. In order to circumvent this issue and understand the impact of the genes, we computed 43,796 trees using combinations ranging from 13 to 19 genes. By doing so, 15 topologies are obtained. Four particular topologies, appearing more robust and frequent, are then selected for more precise investigation. Refining further our statistical analysis, a particularly robust topology is extracted. We also carefully demonstrate the influence of nuclear genes on the likelihood of the phylogeny.


2010 ◽  
Vol 60 (2) ◽  
pp. 117-125 ◽  
Author(s):  
J. Gordon Burleigh ◽  
Mukul S. Bansal ◽  
Oliver Eulenstein ◽  
Stefanie Hartmann ◽  
André Wehe ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document