Orthology clusters from gene trees with Possvm

Mapping Intimacies ◽

10.1101/2021.05.03.442399 ◽

2021 ◽

Author(s):

Xavier Grau-Bové ◽

Arnau Sebé-Pedrós

Keyword(s):

Gene Family ◽

Phylogenetic Trees ◽

Clustering Algorithm ◽

Gene Tree ◽

Gene Family Evolution ◽

Gene Trees ◽

Orthologous Genes ◽

Gene Annotations ◽

Species Overlap ◽

Markov Clustering

Possvm (Phylogenetic Ortholog Sorting with Species oVerlap and MCL) is a tool that automates the process of classifying clusters of orthologous genes from precomputed phylogenetic trees. It identifies orthology relationships between genes using the species overlap algorithm to infer taxonomic information from the gene tree topology, and then uses the Markov Clustering Algorithm (MCL) to identify orthology clusters and provide annotated gene family classifications. Our benchmarking shows that this approach, when provided with accurate phylogenies, is able to identify manually curated orthogroups with high precision and recall. Overall, Possvm automates the routine process of gene tree inspection and annotation in a highly interpretable manner, and provides reusable outputs that can be used to obtain phylogeny-informed gene annotations and inform comparative genomics and gene family evolution analyses.

Download Full-text

The Multilocus Multispecies Coalescent: A Flexible New Model of Gene Family Evolution

Systematic Biology ◽

10.1093/sysbio/syaa084 ◽

2020 ◽

Author(s):

Qiuyi Li ◽

Celine Scornavacca ◽

Nicolas Galtier ◽

Yao-Ban Chan

Keyword(s):

Gene Family ◽

Incomplete Lineage Sorting ◽

Gene Tree ◽

Gene Family Evolution ◽

Gene Copy ◽

Gene Trees ◽

Species Trees ◽

Lineage Sorting ◽

New Model ◽

Multispecies Coalescent

Abstract Incomplete lineage sorting (ILS), the interaction between coalescence and speciation, can generate incongruence between gene trees and species trees, as can gene duplication (D), transfer (T) and loss (L). These processes are usually modelled independently, but in reality, ILS can affect gene copy number polymorphism, i.e., interfere with DTL. This has been previously recognised, but not treated in a satisfactory way, mainly because DTL events are naturally modelled forward-in-time, while ILS is naturally modelled backwards-in-time with the coalescent. Here we consider the joint action of ILS and DTL on the gene tree/species tree problem in all its complexity. In particular, we show that the interaction between ILS and duplications/transfers (without losses) can result in patterns usually interpreted as resulting from gene loss, and that the realised rate of D, T and L becomes non-homogeneous in time when ILS is taken into account. We introduce algorithmic solutions to these problems. Our new model, the multilocus multispecies coalescent (MLMSC), which also accounts for any level of linkage between loci, generalises the multispecies coalescent model and offers a versatile, powerful framework for proper simulation and inference of gene family evolution.

Download Full-text

The Multilocus Multispecies Coalescent: A Flexible New Model of Gene Family Evolution

10.1101/2020.05.07.081836 ◽

2020 ◽

Author(s):

Qiuyi Li ◽

Celine Scornavacca ◽

Nicolas Galtier ◽

Yao-Ban Chan

Keyword(s):

Gene Family ◽

Incomplete Lineage Sorting ◽

Gene Tree ◽

Gene Family Evolution ◽

Gene Copy ◽

Gene Trees ◽

Species Trees ◽

Lineage Sorting ◽

New Model ◽

Multispecies Coalescent

AbstractIncomplete lineage sorting (ILS), the interaction between coalescence and speciation, can generate incongruence between gene trees and species trees, as can gene duplication (D), transfer (T) and loss (L). These processes are usually modelled independently, but in reality, ILS can affect gene copy number polymorphism, i.e., interfere with DTL. This has been previously recognised, but not treated in a satisfactory way, mainly because DTL events are naturally modelled forward-in-time, while ILS is naturally modelled backwards-in-time with the coalescent. Here we consider the joint action of ILS and DTL on the gene tree/species tree problem in all its complexity. In particular, we show that the interaction between ILS and duplications/transfers (without losses) can result in patterns usually interpreted as resulting from gene loss, and that the realised rate of D, T and L becomes non-homogeneous in time when ILS is taken into account. We introduce algorithmic solutions to these problems. Our new model, the multilocus multispecies coalescent (MLMSC), which also accounts for any level of linkage between loci, generalises the multispecies coalescent model and offers a versatile, powerful framework for proper simulation and inference of gene family evolution.

Download Full-text

Expansion and accelerated evolution of 9-exon odorant receptors in Polistes paper wasps

Molecular Biology and Evolution ◽

10.1093/molbev/msab023 ◽

2021 ◽

Author(s):

Andrew W Legan ◽

Christopher M Jernigan ◽

Sara E Miller ◽

Matthieu F Fuchs ◽

Michael J Sheehan

Keyword(s):

Gene Family ◽

Odorant Receptor ◽

Receptor Gene ◽

Gene Tree ◽

Gene Family Evolution ◽

Paper Wasp ◽

Gene Copy ◽

Evolutionary Divergence ◽

Wasp Species ◽

Or Gene

Abstract Independent origins of sociality in bees and ants are associated with independent expansions of particular odorant receptor (OR) gene subfamilies. In ants, one clade within the OR gene family, the 9-exon subfamily, has dramatically expanded. These receptors detect cuticular hydrocarbons (CHCs), key social signaling molecules in insects. It is unclear to what extent 9-exon OR subfamily expansion is associated with the independent evolution of sociality across Hymenoptera, warranting studies of taxa with independently derived social behavior. Here we describe odorant receptor gene family evolution in the northern paper wasp, Polistes fuscatus, and compare it to four additional paper wasp species spanning ∼40 million years of evolutionary divergence. We find 200 putatively functional OR genes in P. fuscatus, matching predictions from neuroanatomy, and more than half of these are in the 9-exon subfamily. Most OR gene expansions are tandemly arrayed at orthologous loci in Polistes genomes, and microsynteny analysis shows species-specific gain and loss of 9-exon ORs within tandem arrays. There is evidence of episodic positive diversifying selection shaping ORs in expanded subfamilies. Values of omega (d N/dS) are higher among 9-exon ORs compared to other OR subfamilies. Within the Polistes OR gene tree, branches in the 9-exon OR clade experience relaxed negative (purifying) selection relative to other branches in the tree. Patterns of OR evolution within Polistes are consistent with 9-exon OR function in CHC perception by combinatorial coding, with both natural selection and neutral drift contributing to interspecies differences in gene copy number and sequence.

Download Full-text

Integrating phylogenetic and network approaches to study gene family evolution: the case of the AGAMOUS family of floral genes

10.1101/195669 ◽

2017 ◽

Author(s):

Daniel S. Carvalho ◽

James C. Schnable ◽

Ana Maria R. Almeida

Keyword(s):

Gene Family ◽

Functional Divergence ◽

Gene Tree ◽

Gene Families ◽

Gene Family Evolution ◽

Phylogenetic Methods ◽

Combined Use ◽

Approaches To Study ◽

Additional Support ◽

Network Approaches

AbstractThe study of gene family evolution has benefited from the use of phylogenetic tools, which can greatly inform studies of both relationships within gene families and functional divergence. Here, we propose the use of a network-based approach that in combination with phylogenetic methods can provide additional support for models of gene family evolution. We dissect the contributions of each method to the improved understanding of relationships and functions within the well-characterized family of AGAMOUS floral development genes. The results obtained with the two methods largely agreed with one another. In particular, we show how network approaches can provide improved interpretations of branches with low support in a conventional gene tree. The network approach used here may also better reflect known and suspected patterns of functional divergence relative to phylogenetic methods. Overall, we believe that the combined use of phylogenetic and network tools provide more robust assessments of gene family evolution.

Download Full-text

Taxon sampling unequally affects individual nodes in a phylogenetic tree: consequences for model gene tree construction in SwissTree

10.1101/181966 ◽

2017 ◽

Cited By ~ 3

Author(s):

Brigitte Boeckmann ◽

David Dylus ◽

Sebastien Moretti ◽

Adrian Altenhoff ◽

Clément-Marie Train ◽

...

Keyword(s):

Phylogenetic Trees ◽

Phylogenetic Signal ◽

Gene Tree ◽

Evolutionary Relationship ◽

Taxon Sampling ◽

Gene Trees ◽

Data Types ◽

Large Gene ◽

Tree Construction ◽

Taxonomic Range

AbstractMedium to large phylogenetic gene trees constructed from datasets of different species density and taxonomic range are rarely topologically consistent because of missing phylogenetic signal, non-phylogenetic signal and error. In this study, we first use simulations to show that taxon sampling unequally affects nodes in a gene tree, which likely contributes to controversial conclusions from taxon sampling experiments and contradicting species phylogenies such as for the boreoeutherians. Hence, because it is unlikely that a large gene tree can be reconstructed correctly based on a single optimized dataset, we take a two-step approach for the construction of model gene trees. First, stable and unstable clades are identified by comparing phylogenetic trees inferred from multiple datasets and data types (nucleotide, amino acid, codon) from the same gene family. Subsequently, data subsets are optimized for the analysis of individual uncertain clades. Results are summarized in form of a model tree that illustrates the evolutionary relationship of gene loci. A case study shows how a seemingly complex gene phylogeny becomes increasingly consistent with the reference species tree by attentive taxon sampling and subtree analysis. The procedure is progressively introduced to SwissTree (http://swisstree.vital-it.ch), a resource of high confidence model gene (locus) trees. Finally we demonstrate the usefulness of SwissTree for orthology benchmarking.

Download Full-text

PhyloToL: A Taxon/Gene-Rich Phylogenomic Pipeline to Explore Genome Evolution of Diverse Eukaryotes

Molecular Biology and Evolution ◽

10.1093/molbev/msz103 ◽

2019 ◽

Vol 36 (8) ◽

pp. 1831-1842 ◽

Cited By ~ 6

Author(s):

Mario A Cerón-Romero ◽

Xyrus X Maurer-Alcalá ◽

Jean-David Grattepanche ◽

Ying Yan ◽

Miguel M Fonseca ◽

...

Keyword(s):

Gene Family ◽

High Throughput Sequencing ◽

Stop Codon ◽

Tree Of Life ◽

Gene Family Evolution ◽

Third Party ◽

Gene Trees ◽

Sequence Alignments ◽

Multiple Sequence ◽

Membrane Pore

Abstract Estimating multiple sequence alignments (MSAs) and inferring phylogenies are essential for many aspects of comparative biology. Yet, many bioinformatics tools for such analyses have focused on specific clades, with greatest attention paid to plants, animals, and fungi. The rapid increase in high-throughput sequencing (HTS) data from diverse lineages now provides opportunities to estimate evolutionary relationships and gene family evolution across the eukaryotic tree of life. At the same time, these types of data are known to be error-prone (e.g., substitutions, contamination). To address these opportunities and challenges, we have refined a phylogenomic pipeline, now named PhyloToL, to allow easy incorporation of data from HTS studies, to automate production of both MSAs and gene trees, and to identify and remove contaminants. PhyloToL is designed for phylogenomic analyses of diverse lineages across the tree of life (i.e., at scales of >100 My). We demonstrate the power of PhyloToL by assessing stop codon usage in Ciliophora, identifying contamination in a taxon- and gene-rich database and exploring the evolutionary history of chromosomes in the kinetoplastid parasite Trypanosoma brucei, the causative agent of African sleeping sickness. Benchmarking PhyloToL’s homology assessment against that of OrthoMCL and a published paper on superfamilies of bacterial and eukaryotic organellar outer membrane pore-forming proteins demonstrates the power of our approach for determining gene family membership and inferring gene trees. PhyloToL is highly flexible and allows users to easily explore HTS data, test hypotheses about phylogeny and gene family evolution and combine outputs with third-party tools (e.g., PhyloChromoMap, iGTP).

Download Full-text

A Linear-Time Algorithm for the Isometric Reconciliation of Unrooted Trees

Algorithms ◽

10.3390/a13090225 ◽

2020 ◽

Vol 13 (9) ◽

pp. 225

Author(s):

Broňa Brejová ◽

Rastislav Královič

Keyword(s):

Family History ◽

Gene Family ◽

Phylogenetic Trees ◽

Evolutionary History ◽

Linear Time ◽

Gene Tree ◽

Time Algorithm ◽

Species Tree ◽

Linear Time Algorithm ◽

History Of

In the reconciliation problem, we are given two phylogenetic trees. A species tree represents the evolutionary history of a group of species, and a gene tree represents the history of a family of related genes within these species. A reconciliation maps nodes of the gene tree to the corresponding points of the species tree, and thus helps to interpret the gene family history. In this paper, we study the case when both trees are unrooted and their edge lengths are known exactly. The goal is to root them and to find a reconciliation that agrees with the edge lengths. We show a linear-time algorithm for finding the set of all possible root locations, which is a significant improvement compared to the previous O(N3logN) algorithm.

Download Full-text

MIPhy: Identify and quantify rapidly evolving members of large gene families

10.7287/peerj.preprints.26593 ◽

2018 ◽

Author(s):

David M Curran ◽

John S Gilleard ◽

James D Wasmuth

Keyword(s):

Gene Family ◽

Phylogenetic Trees ◽

Gene Tree ◽

Gene Families ◽

P450 Gene ◽

C Elegans ◽

Large Gene ◽

Gene Duplication And Loss ◽

Phylogenetic Signature ◽

The Relationship

After transitioning to a new environment, species often exhibit rapid phenotypic innovation. One of the fastest mechanisms for this is duplication followed by specialization of existing genes, which leaves a phylogenetic signature of lineage-specific expansions and contractions. These can be identified by analyzing the gene family across several species and identifying patterns of gene duplication and loss that do not correlate with the known relationships between those species. This signature, termed phylogenetic instability, has been previously linked to adaptations that change the way an organism samples and responds to its environment; conversely, low phylogenetic instability has been previously linked to proteins with endogenous functions. Here, we present MIPhy, a method to identify and quantify phylogenetic instability by quantifying the incongruence of a gene’s evolutionary history. The motivation behind MIPhy was to produce a tool to aid in interpreting phylogenetic trees. It can predict which members of a gene family are under adaptive evolution, working only from a gene tree and the relationship between the species under consideration. We demonstrate the usefulness of MIPhy by accurately predicting which members of the mammalian cytochrome P450 gene superfamily metabolize xenobiotics and which metabolize endogenous compounds. Our predictions correlate very well with known substrate specificities of the human enzymes. We also analyze the Caenorhabditis collagen gene family and use MIPhy to predict genes that produce an observable phenotype when knocked down in C. elegans, and show that our predictions correlate well with existing knowledge. The software can be downloaded and installed from https://github.com/dave-the-scientist/miphy under a BSD 2-clause license. It is also available as an online web tool at http://miphy.wasmuthlab.org.

Download Full-text

Population Structure of the Bacillus cereus Group as Determined by Sequence Analysis of Six Housekeeping Genes and the plcR Gene

Infection and Immunity ◽

10.1128/iai.72.9.5253-5261.2004 ◽

2004 ◽

Vol 72 (9) ◽

pp. 5253-5261 ◽

Cited By ~ 73

Author(s):

Kwan Soo Ko ◽

Jong-Wan Kim ◽

Jong-Man Kim ◽

Wonyong Kim ◽

Sang-in Chung ◽

...

Keyword(s):

Population Structure ◽

Bacillus Cereus ◽

Phylogenetic Trees ◽

Gene Tree ◽

Housekeeping Genes ◽

Housekeeping Gene ◽

The Other ◽

Gene Trees ◽

History Of ◽

Definition Of

ABSTRACT The population structure of the Bacillus cereus group (52 strains of B. anthracis, B. cereus, and B. thuringiensis) was investigated by sequencing seven gene fragments (rpoB, gyrB, pycA, mdh, mbl, mutS, and plcR). Most of the strains were classifiable into two large subgroups in six housekeeping gene trees but not in the plcR tree. In addition, several consistent clusters were identified, which were unrelated to species distinction. Moreover, interrelationships among these clusters were incongruent in each gene tree. The incongruence length difference test and split decomposition analyses also showed incongruences between genes, suggesting horizontal gene transfer. The plcR gene was observed to have characteristics that differed from those of the other genes in terms of phylogenetic topology and pattern of sequence diversity. Thus, we suggest that the evolutionary history of the PlcR regulon differs from those of the other chromosomal genes and that recombination of the plcR gene may be frequent. The homogeneity of B. anthracis, which is depicted as an independent lineage in phylogenetic trees, is suggested to be of recent origin or to be due to the narrow taxonomic definition of species.

Download Full-text

MIPhy: identify and quantify rapidly evolving members of large gene families

PeerJ ◽

10.7717/peerj.4873 ◽

2018 ◽

Vol 6 ◽

pp. e4873 ◽

Cited By ~ 1

Author(s):

David M. Curran ◽

John S. Gilleard ◽

James D. Wasmuth

Keyword(s):

Gene Family ◽

Adaptive Evolution ◽

Phylogenetic Trees ◽

Gene Tree ◽

Gene Families ◽

P450 Gene ◽

C Elegans ◽

Large Gene ◽

Gene Duplication And Loss ◽

Phylogenetic Signature

After transitioning to a new environment, species often exhibit rapid phenotypic innovation. One of the fastest mechanisms for this is duplication followed by specialization of existing genes. When this happens to a member of a gene family, it tends to leave a detectable phylogenetic signature of lineage-specific expansions and contractions. These can be identified by analyzing the gene family across several species and identifying patterns of gene duplication and loss that do not correlate with the known relationships between those species. This signature, termed phylogenetic instability, has been previously linked to adaptations that change the way an organism samples and responds to its environment; conversely, low phylogenetic instability has been previously linked to proteins with endogenous functions. With the increase in genome-level data, there is a need to identify and quantify phylogenetic instability. Here, we present Minimizing Instability in Phylogenetics (MIPhy), a tool that solves this problem by quantifying the incongruence of a gene’s evolutionary history. The motivation behind MIPhy was to produce a tool to aid in interpreting phylogenetic trees. It can predict which members of a gene family are under adaptive evolution, working only from a gene tree and the relationship between the species under consideration. While it does not conduct any estimation of positive selection—which is the typical indication of adaptive evolution—the results tend to agree. We demonstrate the usefulness of MIPhy by accurately predicting which members of the mammalian cytochrome P450 gene superfamily metabolize xenobiotics and which metabolize endogenous compounds. Our predictions correlate very well with known substrate specificities of the human enzymes. We also analyze the Caenorhabditis collagen gene family and use MIPhy to predict genes that produce an observable phenotype when knocked down in C. elegans, and show that our predictions correlate well with existing knowledge. The software can be downloaded and installed from https://github.com/dave-the-scientist/miphy and is also available as an online web tool at http://www.miphy.wasmuthlab.org.

Download Full-text