scholarly journals Integrative modeling of gene and genome evolution roots the archaeal tree of life

2017 ◽  
Vol 114 (23) ◽  
pp. E4602-E4611 ◽  
Author(s):  
Tom A. Williams ◽  
Gergely J. Szöllősi ◽  
Anja Spang ◽  
Peter G. Foster ◽  
Sarah E. Heaps ◽  
...  

A root for the archaeal tree is essential for reconstructing the metabolism and ecology of early cells and for testing hypotheses that propose that the eukaryotic nuclear lineage originated from within the Archaea; however, published studies based on outgroup rooting disagree regarding the position of the archaeal root. Here we constructed a consensus unrooted archaeal topology using protein concatenation and a multigene supertree method based on 3,242 single gene trees, and then rooted this tree using a recently developed model of genome evolution. This model uses evidence from gene duplications, horizontal transfers, and gene losses contained in 31,236 archaeal gene families to identify the most likely root for the tree. Our analyses support the monophyly of DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, Nanohaloarchaea), a recently discovered cosmopolitan and genetically diverse lineage, and, in contrast to previous work, place the tree root between DPANN and all other Archaea. The sister group to DPANN comprises the Euryarchaeota and the TACK Archaea, including Lokiarchaeum, which our analyses suggest are monophyletic sister lineages. Metabolic reconstructions on the rooted tree suggest that early Archaea were anaerobes that may have had the ability to reduce CO2 to acetate via the Wood–Ljungdahl pathway. In contrast to proposals suggesting that genome reduction has been the predominant mode of archaeal evolution, our analyses infer a relatively small-genomed archaeal ancestor that subsequently increased in complexity via gene duplication and horizontal gene transfer.

2021 ◽  
Author(s):  
Mario A Ceron Romero ◽  
Miguel M Fonseca ◽  
Leonardo de Oliveira Martins ◽  
David Posada ◽  
Laura A Katz

Advances in phylogenetics and high throughput sequencing have allowed the reconstruction of deep phylogenetic relationships in the evolution of eukaryotes. Yet, the root of the eukaryotic tree of life remains elusive. The most popular hypothesis in textbooks and reviews is a root between Unikonta (Opisthokonta + Amoebozoa) and Bikonta (all other eukaryotes), which emerged from analyses of a single gene fusion. Subsequent highly cited studies based on concatenation of genes supported this hypothesis with some variations or proposed a root within Excavata. However, concatenation of genes neither considers phylogenetically informative events (i.e. gene duplications and losses), nor provides an estimate of the root. A more recent study using gene tree / species tree reconciliation methods suggested the root lies between Opisthokonta and all other eukaryotes, but only including 59 taxa and 20 genes. Here we apply a gene tree / species tree reconciliation approach to a gene-rich and taxon rich dataset (i.e. 2,786 gene families from two sets of 158 diverse eukaryotic lineages) to assess the root, and we iterate each analysis 100 times to quantify tree space uncertainty. We estimate a root between Fungi and all other eukaryotes, or between Opisthokonta and all other eukaryotes, and reject alternative roots from the literature. Based on further analysis of genome size we propose Opisthokonta + others as the most likely root.


Nature ◽  
2019 ◽  
Vol 574 (7780) ◽  
pp. 679-685 ◽  
Author(s):  

Abstract Green plants (Viridiplantae) include around 450,000–500,000 species1,2 of great diversity and have important roles in terrestrial and aquatic ecosystems. Here, as part of the One Thousand Plant Transcriptomes Initiative, we sequenced the vegetative transcriptomes of 1,124 species that span the diversity of plants in a broad sense (Archaeplastida), including green plants (Viridiplantae), glaucophytes (Glaucophyta) and red algae (Rhodophyta). Our analysis provides a robust phylogenomic framework for examining the evolution of green plants. Most inferred species relationships are well supported across multiple species tree and supermatrix analyses, but discordance among plastid and nuclear gene trees at a few important nodes highlights the complexity of plant genome evolution, including polyploidy, periods of rapid speciation, and extinction. Incomplete sorting of ancestral variation, polyploidization and massive expansions of gene families punctuate the evolutionary history of green plants. Notably, we find that large expansions of gene families preceded the origins of green plants, land plants and vascular plants, whereas whole-genome duplications are inferred to have occurred repeatedly throughout the evolution of flowering plants and ferns. The increasing availability of high-quality plant genome sequences and advances in functional genomics are enabling research on genome evolution across the green tree of life.


2017 ◽  
Author(s):  
Abigail J. Moore ◽  
Jurriaan M. de Vos ◽  
Lillian P. Hancock ◽  
Eric Goolsby ◽  
Erika J. Edwards

ABSTRACTHybrid enrichment is an increasingly popular approach for obtaining hundreds of loci for phylogenetic analysis across many taxa quickly and cheaply. The genes targeted for sequencing are typically single-copy loci, which facilitate a more straightforward sequence assembly and homology assignment process. However, single copy loci are relatively uncommon elements of most genomes, and as such may provide a biased evolutionary history. Furthermore, this approach limits the inclusion of most genes of functional interest, which often belong to multi-gene families. Here we demonstrate the feasibility of including large gene families in hybrid enrichment protocols for phylogeny reconstruction and subsequent analyses of molecular evolution, using a new set of bait sequences designed for the “portullugo” (Caryophyllales), a moderately sized lineage of flowering plants (~2200 species) that includes the cacti and harbors many evolutionary transitions to C4 and CAM photosynthesis. Including multi-gene families allowed us to simultaneously infer a robust phylogeny and construct a dense sampling of sequences for a major enzyme of C4 and CAM photosynthesis, which revealed the accumulation of adaptive amino acid substitutions associated with C4 and CAM origins in particular paralogs. Our final set of matrices for phylogenetic analyses included 75–218 loci across 74 taxa, with ~50% matrix completeness across datasets. Phylogenetic resolution was greatly improved across the tree, at both shallow and deep levels. Concatenation and coalescent-based approaches both resolve with strong support the sister lineage of the cacti: Anacampserotaceae + Portulacaceae, two lineages of mostly diminutive succulent herbs of warm, arid regions. In spite of this congruence, BUCKy concordance analyses demonstrated strong and conflicting signals across gene trees for the resolution of the sister group of the cacti. Our results add to the growing number of examples illustrating the complexity of phylogenetic signals in genomic-scale data.


2020 ◽  
Author(s):  
Guillaume Louvel ◽  
Hugues Roest Crollius

AbstractMolecular dating is a cornerstone of evolutionary biology, yet it is by far not an exact science. The inference of precise dates using gene sequences is difficult, in part because of the stochastic process of DNA mutation, selective forces that alter substitution rates and many unknown parameters linked to population genetics in ancestral lineages. Dating species divergence is one important challenge in this field, which is usually performed by concatenating extant sequences sampled within a genome as representative of a lineage, and computing distances between these lineages. However, concatenates precludes the dating of events specific to a gene family, such as gene duplication. During evolutionary time, individual gene sequences record different signatures of base substitutions and at rates that may deviate substantially from the average rate. No formal study exists that quantifies which parameters influence this deviation. Here we designed a strategy to date events within a gene family, and we test the influence of more than 30 parameters on dating accuracy. We developed this approach on approximately 5,000 primate gene families comprising 12 genomes that display no gene loss nor gene duplications. We then test its relevance in the complete set of primate gene families to date gene duplications. Our result are compared to previous fossil and molecular dating approaches, and provide a practical set of guidelines for accurate molecular dating at the single gene family level.


Algorithms ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 152
Author(s):  
Nadia El-Mabrouk

Syntenies are genomic segments of consecutive genes identified by a certain conservation in gene content and order. The notion of conservation may vary from one definition to another, the more constrained requiring identical gene contents and gene orders, while more relaxed definitions just require a certain similarity in gene content, and not necessarily in the same order. Regardless of the way they are identified, the goal is to characterize homologous genomic regions, i.e., regions deriving from a common ancestral region, reflecting a certain gene co-evolution that can enlighten important functional properties. In addition of being able to identify them, it is also necessary to infer the evolutionary history that has led from the ancestral segment to the extant ones. In this field, most algorithmic studies address the problem of inferring rearrangement scenarios explaining the disruption in gene order between segments with the same gene content, some of them extending the evolutionary model to gene insertion and deletion. However, syntenies also evolve through other events modifying their content in genes, such as duplications, losses or horizontal gene transfers, i.e., the movement of genes from one species to another. Although the reconciliation approach between a gene tree and a species tree addresses the problem of inferring such events for single-gene families, little effort has been dedicated to the generalization to segmental events and to syntenies. This paper reviews some of the main algorithmic methods for inferring ancestral syntenies and focus on those integrating both gene orders and gene trees.


Genetics ◽  
2020 ◽  
Vol 215 (4) ◽  
pp. 1153-1169 ◽  
Author(s):  
Riddhiman K. Garge ◽  
Jon M. Laurent ◽  
Aashiq H. Kachroo ◽  
Edward M. Marcotte

Many gene families have been expanded by gene duplications along the human lineage, relative to ancestral opisthokonts, but the extent to which the duplicated genes function similarly is understudied. Here, we focused on structural cytoskeletal genes involved in critical cellular processes, including chromosome segregation, macromolecular transport, and cell shape maintenance. To determine functional redundancy and divergence of duplicated human genes, we systematically humanized the yeast actin, myosin, tubulin, and septin genes, testing ∼81% of human cytoskeletal genes across seven gene families for their ability to complement a growth defect induced by inactivation or deletion of the corresponding yeast ortholog. In five of seven families—all but α-tubulin and light myosin, we found at least one human gene capable of complementing loss of the yeast gene. Despite rescuing growth defects, we observed differential abilities of human genes to rescue cell morphology, meiosis, and mating defects. By comparing phenotypes of humanized strains with deletion phenotypes of their interaction partners, we identify instances of human genes in the actin and septin families capable of carrying out essential functions, but failing to fully complement the cytoskeletal roles of their yeast orthologs, thus leading to abnormal cell morphologies. Overall, we show that duplicated human cytoskeletal genes appear to have diverged such that only a few human genes within each family are capable of replacing the essential roles of their yeast orthologs. The resulting yeast strains with humanized cytoskeletal components now provide surrogate platforms to characterize human genes in simplified eukaryotic contexts.


Author(s):  
Lina Kloub ◽  
Sean Gosselin ◽  
Matthew Fullmer ◽  
Joerg Graf ◽  
J Peter Gogarten ◽  
...  

Abstract Horizontal gene transfer (HGT) is central to prokaryotic evolution. However, little is known about the “scale” of individual HGT events. In this work, we introduce the first computational framework to help answer the following fundamental question: How often does more than one gene get horizontally transferred in a single HGT event? Our method, called HoMer, uses phylogenetic reconciliation to infer single-gene HGT events across a given set of species/strains, employs several techniques to account for inference error and uncertainty, combines that information with gene order information from extant genomes, and uses statistical analysis to identify candidate horizontal multi-gene transfers (HMGTs) in both extant and ancestral species/strains. HoMer is highly scalable and can be easily used to infer HMGTs across hundreds of genomes. We apply HoMer to a genome-scale dataset of over 22000 gene families from 103 Aeromonas genomes and identify a large number of plausible HMGTs of various scales at both small and large phylogenetic distances. Analysis of these HMGTs reveals interesting relationships between gene function, phylogenetic distance, and frequency of multi-gene transfer. Among other insights, we find that (i) the observed relative frequency of HMGT increases as divergence between genomes increases, (ii) HMGTs often have conserved gene functions, and (iii) rare genes are frequently acquired through HMGT. We also analyze in detail HMGTs involving the zonula occludens toxin and type III secretion systems. By enabling the systematic inference of HMGTs on a large scale, HoMer will facilitate a more accurate and more complete understanding of HGT and microbial evolution.


2019 ◽  
Author(s):  
Mosè Manni ◽  
Felipe A. Simao ◽  
Hugh M. Robertson ◽  
Marco A. Gabaglio ◽  
Robert M. Waterhouse ◽  
...  

AbstractThe dipluran two-pronged bristletail Campodea augens is a blind ancestrally wingless hexapod with the remarkable capacity to regenerate lost body appendages such as its long antennae. As sister group to Insecta (sensu stricto), Diplura are key to understanding the early evolution of hexapods and the origin and evolution of insects. Here we report the 1.2-Gbp draft genome of C. augens and results from comparative genomic analyses with other arthropods. In C. augens we uncovered the largest chemosensory gene repertoire of ionotropic receptors in the animal kingdom, a massive expansion which might compensate for the loss of vision. We found a paucity of photoreceptor genes mirroring at the genomic level the secondary loss of an ancestral external photoreceptor organ. Expansions of detoxification and carbohydrate metabolism gene families might reflect adaptations for foraging behaviour, and duplicated apoptotic genes might underlie its high regenerative potential.The C. augens genome represents one of the key references for studying the emergence of genomic innovations in insects, the most diverse animal group, and opens up novel opportunities to study the under-explored biology of diplurans.


2021 ◽  
Author(s):  
Benoit Morel ◽  
Paul Schade ◽  
Sarah Lutteropp ◽  
Tom A. Williams ◽  
Gergely J. Szöllösi ◽  
...  

Species tree inference from gene family trees is becoming increasingly popular because it can account for discordance between the species tree and the corresponding gene family trees. In particular, methods that can account for multiple-copy gene families exhibit potential to leverage paralogy as informative signal. At present, there does not exist any widely adopted inference method for this purpose. Here, we present SpeciesRax, the first maximum likelihood method that can infer a rooted species tree from a set of gene family trees and can account for gene duplication, loss, and transfer events. By explicitly modelling events by which gene trees can depart from the species tree, SpeciesRax leverages the phylogenetic rooting signal in gene trees. SpeciesRax infers species tree branch lengths in units of expected substitutions per site and branch support values via paralogy-aware quartets extracted from the gene family trees. Using both empirical and simulated datasets we show that SpeciesRax is at least as accurate as the best competing methods while being one order of magnitude faster on large datasets at the same time. We used SpeciesRax to infer a biologically plausible rooted phylogeny of the vertebrates comprising $188$ species from $31612$ gene families in one hour using $40$ cores. SpeciesRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax and on BioConda.


F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 49 ◽  
Author(s):  
Fabian Schreiber

Summary: Phylogenetic trees are widely used to represent the evolution of gene families. As the history of gene families can be complex (including lots of gene duplications), its visualisation can become a difficult task. A good/accurate visualisation of phylogenetic trees - especially on the web - allows easier understanding and interpretation of trees to help to reveal the mechanisms that shape the evolution of a specific set of gene/species. Here, I present treeWidget, a modular BioJS component to visualise phylogenetic trees on the web. Through its modularity, treeWidget can be easily customized to allow the display of sequence information, e.g. protein domains and alignment conservation patterns.Availability: http://github.com/biojs/biojs; http://dx.doi.org/10.5281/zenodo.7707


Sign in / Sign up

Export Citation Format

Share Document