scholarly journals MIPhy: identify and quantify rapidly evolving members of large gene families

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4873 ◽  
Author(s):  
David M. Curran ◽  
John S. Gilleard ◽  
James D. Wasmuth

After transitioning to a new environment, species often exhibit rapid phenotypic innovation. One of the fastest mechanisms for this is duplication followed by specialization of existing genes. When this happens to a member of a gene family, it tends to leave a detectable phylogenetic signature of lineage-specific expansions and contractions. These can be identified by analyzing the gene family across several species and identifying patterns of gene duplication and loss that do not correlate with the known relationships between those species. This signature, termed phylogenetic instability, has been previously linked to adaptations that change the way an organism samples and responds to its environment; conversely, low phylogenetic instability has been previously linked to proteins with endogenous functions. With the increase in genome-level data, there is a need to identify and quantify phylogenetic instability. Here, we present Minimizing Instability in Phylogenetics (MIPhy), a tool that solves this problem by quantifying the incongruence of a gene’s evolutionary history. The motivation behind MIPhy was to produce a tool to aid in interpreting phylogenetic trees. It can predict which members of a gene family are under adaptive evolution, working only from a gene tree and the relationship between the species under consideration. While it does not conduct any estimation of positive selection—which is the typical indication of adaptive evolution—the results tend to agree. We demonstrate the usefulness of MIPhy by accurately predicting which members of the mammalian cytochrome P450 gene superfamily metabolize xenobiotics and which metabolize endogenous compounds. Our predictions correlate very well with known substrate specificities of the human enzymes. We also analyze the Caenorhabditis collagen gene family and use MIPhy to predict genes that produce an observable phenotype when knocked down in C. elegans, and show that our predictions correlate well with existing knowledge. The software can be downloaded and installed from https://github.com/dave-the-scientist/miphy and is also available as an online web tool at http://www.miphy.wasmuthlab.org.

2018 ◽  
Author(s):  
David M Curran ◽  
John S Gilleard ◽  
James D Wasmuth

After transitioning to a new environment, species often exhibit rapid phenotypic innovation. One of the fastest mechanisms for this is duplication followed by specialization of existing genes, which leaves a phylogenetic signature of lineage-specific expansions and contractions. These can be identified by analyzing the gene family across several species and identifying patterns of gene duplication and loss that do not correlate with the known relationships between those species. This signature, termed phylogenetic instability, has been previously linked to adaptations that change the way an organism samples and responds to its environment; conversely, low phylogenetic instability has been previously linked to proteins with endogenous functions. Here, we present MIPhy, a method to identify and quantify phylogenetic instability by quantifying the incongruence of a gene’s evolutionary history. The motivation behind MIPhy was to produce a tool to aid in interpreting phylogenetic trees. It can predict which members of a gene family are under adaptive evolution, working only from a gene tree and the relationship between the species under consideration. We demonstrate the usefulness of MIPhy by accurately predicting which members of the mammalian cytochrome P450 gene superfamily metabolize xenobiotics and which metabolize endogenous compounds. Our predictions correlate very well with known substrate specificities of the human enzymes. We also analyze the Caenorhabditis collagen gene family and use MIPhy to predict genes that produce an observable phenotype when knocked down in C. elegans, and show that our predictions correlate well with existing knowledge. The software can be downloaded and installed from https://github.com/dave-the-scientist/miphy under a BSD 2-clause license. It is also available as an online web tool at http://miphy.wasmuthlab.org.


2018 ◽  
Author(s):  
David M Curran ◽  
John S Gilleard ◽  
James D Wasmuth

After transitioning to a new environment, species often exhibit rapid phenotypic innovation. One of the fastest mechanisms for this is duplication followed by specialization of existing genes, which leaves a phylogenetic signature of lineage-specific expansions and contractions. These can be identified by analyzing the gene family across several species and identifying patterns of gene duplication and loss that do not correlate with the known relationships between those species. This signature, termed phylogenetic instability, has been previously linked to adaptations that change the way an organism samples and responds to its environment; conversely, low phylogenetic instability has been previously linked to proteins with endogenous functions. Here, we present MIPhy, a method to identify and quantify phylogenetic instability by quantifying the incongruence of a gene’s evolutionary history. The motivation behind MIPhy was to produce a tool to aid in interpreting phylogenetic trees. It can predict which members of a gene family are under adaptive evolution, working only from a gene tree and the relationship between the species under consideration. We demonstrate the usefulness of MIPhy by accurately predicting which members of the mammalian cytochrome P450 gene superfamily metabolize xenobiotics and which metabolize endogenous compounds. Our predictions correlate very well with known substrate specificities of the human enzymes. We also analyze the Caenorhabditis collagen gene family and use MIPhy to predict genes that produce an observable phenotype when knocked down in C. elegans, and show that our predictions correlate well with existing knowledge. The software can be downloaded and installed from https://github.com/dave-the-scientist/miphy under a BSD 2-clause license. It is also available as an online web tool at http://miphy.wasmuthlab.org.


Genes ◽  
2020 ◽  
Vol 11 (10) ◽  
pp. 1125
Author(s):  
Saminathan Subburaj ◽  
Luhua Tu ◽  
Kayoun Lee ◽  
Gwang-Soo Park ◽  
Hyunbae Lee ◽  
...  

Watermelon (Citrullus lanatus) is an economically important fruit crop grown for consumption of its large edible fruit flesh. Pentatricopeptide-repeat (PPR) encoding genes, one of the large gene families in plants, are important RNA-binding proteins involved in the regulation of plant growth and development by influencing the expression of organellar mRNA transcripts. However, systematic information regarding the PPR gene family in watermelon remains largely unknown. In this comprehensive study, we identified and characterized a total of 422 C. lanatus PPR (ClaPPR) genes in the watermelon genome. Most ClaPPRs were intronless and were mapped across 12 chromosomes. Phylogenetic analysis showed that ClaPPR proteins could be divided into P and PLS subfamilies. Gene duplication analysis suggested that 11 pairs of segmentally duplicated genes existed. In-silico expression pattern analysis demonstrated that ClaPPRs may participate in the regulation of fruit development and ripening processes. Genotyping of 70 lines using 4 single nucleotide polymorphisms (SNPs) from 4 ClaPPRs resulted in match rates of over 0.87 for each validated SNPs in correlation with the unique phenotypes of flesh color, and could be used in differentiating red, yellow, or orange watermelons in breeding programs. Our results provide significant insights for a comprehensive understanding of PPR genes and recommend further studies on their roles in watermelon fruit growth and ripening, which could be utilized for cultivar development of watermelon.


2017 ◽  
Author(s):  
Daniel S. Carvalho ◽  
James C. Schnable ◽  
Ana Maria R. Almeida

AbstractThe study of gene family evolution has benefited from the use of phylogenetic tools, which can greatly inform studies of both relationships within gene families and functional divergence. Here, we propose the use of a network-based approach that in combination with phylogenetic methods can provide additional support for models of gene family evolution. We dissect the contributions of each method to the improved understanding of relationships and functions within the well-characterized family of AGAMOUS floral development genes. The results obtained with the two methods largely agreed with one another. In particular, we show how network approaches can provide improved interpretations of branches with low support in a conventional gene tree. The network approach used here may also better reflect known and suspected patterns of functional divergence relative to phylogenetic methods. Overall, we believe that the combined use of phylogenetic and network tools provide more robust assessments of gene family evolution.


2017 ◽  
Author(s):  
Brigitte Boeckmann ◽  
David Dylus ◽  
Sebastien Moretti ◽  
Adrian Altenhoff ◽  
Clément-Marie Train ◽  
...  

AbstractMedium to large phylogenetic gene trees constructed from datasets of different species density and taxonomic range are rarely topologically consistent because of missing phylogenetic signal, non-phylogenetic signal and error. In this study, we first use simulations to show that taxon sampling unequally affects nodes in a gene tree, which likely contributes to controversial conclusions from taxon sampling experiments and contradicting species phylogenies such as for the boreoeutherians. Hence, because it is unlikely that a large gene tree can be reconstructed correctly based on a single optimized dataset, we take a two-step approach for the construction of model gene trees. First, stable and unstable clades are identified by comparing phylogenetic trees inferred from multiple datasets and data types (nucleotide, amino acid, codon) from the same gene family. Subsequently, data subsets are optimized for the analysis of individual uncertain clades. Results are summarized in form of a model tree that illustrates the evolutionary relationship of gene loci. A case study shows how a seemingly complex gene phylogeny becomes increasingly consistent with the reference species tree by attentive taxon sampling and subtree analysis. The procedure is progressively introduced to SwissTree (http://swisstree.vital-it.ch), a resource of high confidence model gene (locus) trees. Finally we demonstrate the usefulness of SwissTree for orthology benchmarking.


2020 ◽  
Vol 12 (4) ◽  
pp. 381-395
Author(s):  
Nilson Da Rocha Coimbra ◽  
Aristoteles Goes-Neto ◽  
Vasco Azevedo ◽  
Aïda Ouangraoua

Abstract Horizontal gene transfer is a common mechanism in Bacteria that has contributed to the genomic content of existing organisms. Traditional methods for estimating bacterial phylogeny, however, assume only vertical inheritance in the evolution of homologous genes, which may result in errors in the estimated phylogenies. We present a new method for estimating bacterial phylogeny that accounts for the presence of genes acquired by horizontal gene transfer between genomes. The method identifies and corrects putative transferred genes in gene families, before applying a gene tree-based summary method to estimate bacterial species trees. The method was applied to estimate the phylogeny of the order Corynebacteriales, which is the largest clade in the phylum Actinobacteria. We report a collection of 14 phylogenetic trees on 360 Corynebacteriales genomes. All estimated trees display each genus as a monophyletic clade. The trees also display several relationships proposed by past studies, as well as new relevant relationships between and within the main genera of Corynebacteriales: Corynebacterium, Mycobacterium, Nocardia, Rhodococcus, and Gordonia. An implementation of the method in Python is available on GitHub at https://github.com/UdeS-CoBIUS/EXECT (last accessed April 2, 2020).


Algorithms ◽  
2020 ◽  
Vol 13 (9) ◽  
pp. 225
Author(s):  
Broňa Brejová ◽  
Rastislav Královič

In the reconciliation problem, we are given two phylogenetic trees. A species tree represents the evolutionary history of a group of species, and a gene tree represents the history of a family of related genes within these species. A reconciliation maps nodes of the gene tree to the corresponding points of the species tree, and thus helps to interpret the gene family history. In this paper, we study the case when both trees are unrooted and their edge lengths are known exactly. The goal is to root them and to find a reconciliation that agrees with the edge lengths. We show a linear-time algorithm for finding the set of all possible root locations, which is a significant improvement compared to the previous O(N3logN) algorithm.


2019 ◽  
Vol 20 (7) ◽  
pp. 1750 ◽  
Author(s):  
Ghulam Qanmber ◽  
Ji Liu ◽  
Daoqian Yu ◽  
Zhao Liu ◽  
Lili Lu ◽  
...  

Proline-rich extensin-like receptor kinases (PERKs) are an important class of receptor kinases in plants. Receptor kinases comprise large gene families in many plant species, including the 15 PERK genes in Arabidopsis. At present, there is no comprehensive published study of PERK genes in G. hirsutum. Our study identified 33 PERK genes in G. hirsutum. Phylogenetic analysis of conserved PERK protein sequences from 15 plant species grouped them into four well defined clades. The GhPERK gene family is an evolutionarily advanced gene family that lost its introns over time. Several cis-elements were identified in the promoter regions of the GhPERK genes that are important in regulating growth, development, light responses and the response to several stresses. In addition, we found evidence for gene loss or addition through segmental or whole genome duplication in cotton. Gene duplication and synteny analysis identified 149 orthologous/paralogous gene pairs. Ka/Ks values show that most GhPERK genes experienced strong purifying selection during the rapid evolution of the gene family. GhPERK genes showed high expression levels in leaves and during ovule development. Furthermore, the expression of GhPERK genes can be regulated by abiotic stresses and phytohormone treatments. Additionally, PERK genes could be involved in several molecular, biological and physiological processes that might be the result of functional divergence.


2021 ◽  
Author(s):  
Xavier Grau-Bové ◽  
Arnau Sebé-Pedrós

Possvm (Phylogenetic Ortholog Sorting with Species oVerlap and MCL) is a tool that automates the process of classifying clusters of orthologous genes from precomputed phylogenetic trees. It identifies orthology relationships between genes using the species overlap algorithm to infer taxonomic information from the gene tree topology, and then uses the Markov Clustering Algorithm (MCL) to identify orthology clusters and provide annotated gene family classifications. Our benchmarking shows that this approach, when provided with accurate phylogenies, is able to identify manually curated orthogroups with high precision and recall. Overall, Possvm automates the routine process of gene tree inspection and annotation in a highly interpretable manner, and provides reusable outputs that can be used to obtain phylogeny-informed gene annotations and inform comparative genomics and gene family evolution analyses.


Sign in / Sign up

Export Citation Format

Share Document