scholarly journals STRIDE: Species Tree Root Inference from Gene Duplication Events

2017 ◽  
Author(s):  
David M. Emms ◽  
Steven Kelly

AbstractThe correct interpretation of a phylogenetic tree is dependent on it being correctly rooted. A gene duplication event at the base of a clade of species is synapamorphic, and thus excludes the root of the species tree from that clade. We present STRIDE, a fast, effective, and outgroup-free method for species tree root inference from gene duplication events. STRIDE identifies sets of well-supported gene duplication events from cohorts of gene trees, and analyses these events to infer a probability distribution over an unrooted species tree for the location of the true root. We show that STRIDE infers the correct root of the species tree for a large range of simulated and real species sets. We demonstrate that the novel probability model implemented in STRIDE can accurately represent the ambiguity in species tree root assignment for datasets where information is limited. Furthermore, application of STRIDE to inference of the origin of the eukaryotic tree resulted in a root probability distribution that was consistent with, but unable to distinguish between, leading hypotheses for the origin of the eukaryotes. In summary, STRIDE is a fast, scalable, and effective method for species tree root inference from genome scale data.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8813 ◽  
Author(s):  
Kyle T. David ◽  
Jamie R. Oaks ◽  
Kenneth M. Halanych

Background Eukaryotic genes typically form independent evolutionary lineages through either speciation or gene duplication events. Generally, gene copies resulting from speciation events (orthologs) are expected to maintain similarity over time with regard to sequence, structure and function. After a duplication event, however, resulting gene copies (paralogs) may experience a broader set of possible fates, including partial (subfunctionalization) or complete loss of function, as well as gain of new function (neofunctionalization). This assumption, known as the Ortholog Conjecture, is prevalent throughout molecular biology and notably plays an important role in many functional annotation methods. Unfortunately, studies that explicitly compare evolutionary processes between speciation and duplication events are rare and conflicting. Methods To provide an empirical assessment of ortholog/paralog evolution, we estimated ratios of nonsynonymous to synonymous substitutions (ω = dN/dS) for 251,044 lineages in 6,244 gene trees across 77 vertebrate taxa. Results Overall, we found ω to be more similar between lineages descended from speciation events (p < 0.001) than lineages descended from duplication events, providing strong support for the Ortholog Conjecture. The asymmetry in ω following duplication events appears to be largely driven by an increase along one of the paralogous lineages, while the other remains similar to the parent. This trend is commonly associated with neofunctionalization, suggesting that gene duplication is a significant mechanism for generating novel gene functions.



2017 ◽  
Vol 34 (12) ◽  
pp. 3267-3278 ◽  
Author(s):  
David M Emms ◽  
Steven Kelly


2016 ◽  
Author(s):  
Anil S. Thanki ◽  
Nicola Soranzo ◽  
Wilfried Haerty ◽  
Robert P. Davey

AbstractBackgroundGene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestral gene duplication events as well as identifying genes that have diverged from a common ancestor under positive selection. There are various tools available, such as MSOAR, OrthoMCL and HomoloGene, to identify gene families and visualise syntenic information between species, providing an overview of syntenic regions evolution at the family level. Unfortunately, none of them provide information about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes. The Ensembl GeneTrees computational pipeline generates gene trees based on coding sequences and provides details about exon conservation, and is used in the Ensembl Compara project to discover gene families.FindingsA certain amount of expertise is required to configure and run the Ensembl Compara GeneTrees pipeline via command line. Therefore, we have converted the command line Ensembl Compara GeneTrees pipeline into a Galaxy workflow, called GeneSeqToFamily, and provided additional functionality. This workflow uses existing tools from the Galaxy ToolShed, as well as providing additional wrappers and tools that are required to run the workflow.ConclusionsGeneSeqToFamily represents the Ensembl Compara pipeline as a set of interconnected Galaxy tools, so they can be run interactively within the Galaxy’s user-friendly workflow environment while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. Additional tools allow users to subsequently visualise the gene families produced by the workflow, using the Aequatus.js interactive tool, which has been developed as part of the Aequatus software project.



2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Tianyu Zhou ◽  
Xiping Yan ◽  
Guosong Wang ◽  
Hehe Liu ◽  
Xiang Gan ◽  
...  

Peroxisome proliferators-activated receptor (PPAR) gene family members exhibit distinct patterns of distribution in tissues and differ in functions. The purpose of this study is to investigate the evolutionary impacts on diversity functions of PPAR members and the regulatory differences on gene expression patterns. 63 homology sequences of PPAR genes from 31 species were collected and analyzed. The results showed that three isolated types of PPAR gene family may emerge from twice times of gene duplication events. The conserved domains of HOLI (ligand binding domain of hormone receptors) domain and ZnF_C4 (C4 zinc finger in nuclear in hormone receptors) are essential for keeping basic roles of PPAR gene family, and the variant domains of LCRs may be responsible for their divergence in functions. The positive selection sites in HOLI domain are benefit for PPARs to evolve towards diversity functions. The evolutionary variants in the promoter regions and 3′ UTR regions of PPARs result into differential transcription factors and miRNAs involved in regulating PPAR members, which may eventually affect their expressions and tissues distributions. These results indicate that gene duplication event, selection pressure on HOLI domain, and the variants on promoter and 3′ UTR are essential for PPARs evolution and diversity functions acquired.



2018 ◽  
Vol 115 (24) ◽  
pp. 6249-6254 ◽  
Author(s):  
Lily C. Hughes ◽  
Guillermo Ortí ◽  
Yu Huang ◽  
Ying Sun ◽  
Carole C. Baldwin ◽  
...  

Our understanding of phylogenetic relationships among bony fishes has been transformed by analysis of a small number of genes, but uncertainty remains around critical nodes. Genome-scale inferences so far have sampled a limited number of taxa and genes. Here we leveraged 144 genomes and 159 transcriptomes to investigate fish evolution with an unparalleled scale of data: >0.5 Mb from 1,105 orthologous exon sequences from 303 species, representing 66 out of 72 ray-finned fish orders. We apply phylogenetic tests designed to trace the effect of whole-genome duplication events on gene trees and find paralogy-free loci using a bioinformatics approach. Genome-wide data support the structure of the fish phylogeny, and hypothesis-testing procedures appropriate for phylogenomic datasets using explicit gene genealogy interrogation settle some long-standing uncertainties, such as the branching order at the base of the teleosts and among early euteleosts, and the sister lineage to the acanthomorph and percomorph radiations. Comprehensive fossil calibrations date the origin of all major fish lineages before the end of the Cretaceous.



mSphere ◽  
2018 ◽  
Vol 3 (6) ◽  
Author(s):  
An Ngoc Nguyen ◽  
Elena Disconzi ◽  
Guillaume M. Charrière ◽  
Delphine Destoumieux-Garzón ◽  
Philippe Bouloc ◽  
...  

ABSTRACTCsrBs are bacterial highly conserved and multiple-copy noncoding small RNAs (sRNAs) that play major roles in cell physiology and virulence. In theVibriogenus, they are known to be regulated by the two-component system VarS/VarA. They modulate the well-characterized quorum sensing pathway controlling virulence and luminescence inVibrio choleraeandVibrio harveyi, respectively. Remarkably,Vibrio tasmaniensisLGP32, an oyster pathogen that belongs to theSplendidusclade, was found to have four copies ofcsrB, namedcsrB1-4, compared to two to three copies in otherVibriospecies. Here, we show that the extracsrB4copy results from acsrB3gene duplication, a characteristic of theSplendidusclade. Interestingly,csrBgenes are regulated in different ways inV. tasmaniensis, withcsrB1expression being independent of the VarS/VarA system. We found that a complex regulatory network involving CsrBs, quorum sensing, and the stationary-phase sigma factor σS redundantly but differentially controls the production of two secreted metalloproteases, Vsm and PrtV, the former being a major determinant of theV. tasmaniensisextracellular product toxicity. In particular, we identified a novel VarS/VarA-dependent but CsrB-independent pathway that controls positively both Vsm production and PrtV production as well asrpoSexpression. Altogether, our data show that acsrBgene duplication event inV. tasmaniensissupported the evolution of the regulatory network controlling the expression of major toxic secreted metalloproteases, thereby increasing redundancy and enabling the integration of additional input signals.IMPORTANCEThe conserved CsrB sRNAs are an example of sibling sRNAs, i.e., sRNAs which are present in multiple copies in genomes. This report illustrates how new copies arise through gene duplication events and highlights two evolutionary advantages of having such multiple copies: differential regulation of the multiple copies allows integration of different input signals into the regulatory network of which they are parts, and the high redundancy that they provide confers a strong robustness to the system.



2016 ◽  
Vol 14 (03) ◽  
pp. 1642005 ◽  
Author(s):  
Jucheol Moon ◽  
Harris T. Lin ◽  
Oliver Eulenstein

Solving the gene duplication problem is a classical approach for species tree inference from gene trees that are confounded by gene duplications. This problem takes a collection of gene trees and seeks a species tree that implies the minimum number of gene duplications. Wilkinson et al. posed the conjecture that the gene duplication problem satisfies the desirable Pareto property for clusters. That is, for every instance of the problem, all clusters that are commonly present in the input gene trees of this instance, called strict consensus, will also be found in every solution to this instance. We prove that this conjecture does not generally hold. Despite this negative result we show that the gene duplication problem satisfies a weaker version of the Pareto property where the strict consensus is found in at least one solution (rather than all solutions). This weaker property contributes to our design of an efficient scalable algorithm for the gene duplication problem. We demonstrate the performance of our algorithm in analyzing large-scale empirical datasets. Finally, we utilize the algorithm to evaluate the accuracy of standard heuristics for the gene duplication problem using simulated datasets.



2017 ◽  
Author(s):  
Xiaofan Zhou ◽  
Sarah Lutteropp ◽  
Lucas Czech ◽  
Alexandros Stamatakis ◽  
Moritz von Looz ◽  
...  

AbstractIncongruence, or topological conflict, is prevalent in genome-scale data sets but relatively few measures have been developed to quantify it. Internode Certainty (IC) and related measures were recently introduced to explicitly quantify the level of incongruence of a given internode (or internal branch) among a set of phylogenetic trees and complement regular branch support statistics in assessing the confidence of the inferred phylogenetic relationships. Since most phylogenomic studies contain data partitions (e.g., genes) with missing taxa and IC scores stem from the frequencies of bipartitions (or splits) on a set of trees, the calculation of IC scores requires adjusting the frequencies of bipartitions from these partial gene trees. However, when the proportion of missing data is high, current approaches that adjust bipartition frequencies in partial gene trees tend to overestimate IC scores and alternative adjustment approaches differ substantially from each other in their scores. To overcome these issues, we developed three new measures for calculating internode certainty that are based on the frequencies of quartets, which naturally apply to both comprehensive and partial trees. Our comparison of these new quartet-based measures to previous bipartition-based measures on simulated data shows that: 1) on comprehensive trees, both types of measures yield highly similar IC scores; 2) on partial trees, quartet-based measures generate more accurate IC scores; and 3) quartet-based measures are more robust to the absence of phylogenetic signal and errors in the phylogenetic relationships to be assessed. Additionally, analysis of 15 empirical phylogenomic data sets using our quartet-based measures suggests that numerous relationships remain unresolved despite the availability of genome-scale data. Finally, we provide an efficient open-source implementation of these quartet-based measures in the program QuartetScores, which is freely available at https://github.com/algomaus/QuartetScores.



Author(s):  
Sonal Singhal ◽  
Timothy J Colston ◽  
Maggie R Grundler ◽  
Stephen A Smith ◽  
Gabriel C Costa ◽  
...  

Abstract Genome-scale data have the potential to clarify phylogenetic relationships across the tree of life but have also revealed extensive gene tree conflict. This seeming paradox, whereby larger data sets both increase statistical confidence and uncover significant discordance, suggests that understanding sources of conflict is important for accurate reconstruction of evolutionary history. We explore this paradox in squamate reptiles, the vertebrate clade comprising lizards, snakes, and amphisbaenians. We collected an average of 5103 loci for 91 species of squamates that span higher-level diversity within the clade, which we augmented with publicly available sequences for an additional 17 taxa. Using a locus-by-locus approach, we evaluated support for alternative topologies at 17 contentious nodes in the phylogeny. We identified shared properties of conflicting loci, finding that rate and compositional heterogeneity drives discordance between gene trees and species tree and that conflicting loci rarely overlap across contentious nodes. Finally, by comparing our tests of nodal conflict to previous phylogenomic studies, we confidently resolve 9 of the 17 problematic nodes. We suggest this locus-by-locus and node-by-node approach can build consensus on which topological resolutions remain uncertain in phylogenomic studies of other contentious groups. [Anchored hybrid enrichment (AHE); gene tree conflict; molecular evolution; phylogenomic concordance; target capture; ultraconserved elements (UCE).]



2018 ◽  
Author(s):  
John Gatesy ◽  
Daniel B. Sloan ◽  
Jessica M. Warren ◽  
Richard H. Baker ◽  
Mark P. Simmons ◽  
...  

AbstractGenomic datasets sometimes support unconventional or conflicting phylogenetic relationships when different tree-building methods are applied. Coherent interpretations of such results are enabled by partitioning support for controversial relationships among the constituent genes of a phylogenomic dataset. For the supermatrix (= concatenation) approach, several simple methods that measure the distribution of support and conflict among loci were introduced over 15 years ago. More recently, partitioned coalescence support (PCS) was developed for phylogenetic coalescence methods that account for incomplete lineage sorting and use the summed fits of gene trees to estimate the species tree. Here, we automate computation of PCS to permit application of this index to genome-scale matrices that include hundreds of loci. Reanalyses of four phylogenomic datasets for amniotes, land plants, skinks, and angiosperms demonstrate how PCS scores can be used to: 1) compare conflicting results favored by alternative coalescence methods, 2) identify outlier gene trees that have a disproportionate influence on the resolution of contentious relationships, 3) assess the effects of missing data in species-trees analysis, and 4) clarify biases in commonly-implemented coalescence methods and support indices. We show that key phylogenomic conclusions from these analyses often hinge on just a few gene trees and that results can be driven by specific biases of a particular coalescence method and/or the extreme weight placed on gene trees with high taxon sampling. Attributing exceptionally high weight to some gene trees and very low weight to other gene trees counters the basic logic of phylogenomic coalescence analysis; even clades in species trees with high support according to commonly used indices (likelihood-ratio test, bootstrap, Bayesian local posterior probability) can be unstable to the removal of only one or two gene trees with high PCS. Computer simulations cannot adequately describe all of the contingencies and complexities of empirical genetic data. PCS scores complement simulation work by providing specific insights into a particular dataset given the assumptions of the phylogenetic coalescence method that is applied. In combination with standard measures of nodal support, PCS provides a more complete understanding of the overall genomic evidence for contested evolutionary relationships in species trees.



Sign in / Sign up

Export Citation Format

Share Document