scholarly journals GeneSeqToFamily: the Ensembl Compara GeneTrees pipeline as a Galaxy workflow

2016 ◽  
Author(s):  
Anil S. Thanki ◽  
Nicola Soranzo ◽  
Wilfried Haerty ◽  
Robert P. Davey

AbstractBackgroundGene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestral gene duplication events as well as identifying genes that have diverged from a common ancestor under positive selection. There are various tools available, such as MSOAR, OrthoMCL and HomoloGene, to identify gene families and visualise syntenic information between species, providing an overview of syntenic regions evolution at the family level. Unfortunately, none of them provide information about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes. The Ensembl GeneTrees computational pipeline generates gene trees based on coding sequences and provides details about exon conservation, and is used in the Ensembl Compara project to discover gene families.FindingsA certain amount of expertise is required to configure and run the Ensembl Compara GeneTrees pipeline via command line. Therefore, we have converted the command line Ensembl Compara GeneTrees pipeline into a Galaxy workflow, called GeneSeqToFamily, and provided additional functionality. This workflow uses existing tools from the Galaxy ToolShed, as well as providing additional wrappers and tools that are required to run the workflow.ConclusionsGeneSeqToFamily represents the Ensembl Compara pipeline as a set of interconnected Galaxy tools, so they can be run interactively within the Galaxy’s user-friendly workflow environment while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. Additional tools allow users to subsequently visualise the gene families produced by the workflow, using the Aequatus.js interactive tool, which has been developed as part of the Aequatus software project.

2016 ◽  
Author(s):  
Anil S. Thanki ◽  
Nicola Soranzo ◽  
Javier Herrero ◽  
Wilfried Haerty ◽  
Robert P. Davey

AbstractBackgroundPhylogenetic information inferred from the study of homologous genes helps us to understand the evolution of genes and gene families, including the identification of ancestral gene duplication events as well as regions under positive or purifying selection within lineages. Gene family and orthogroup characterisation enables the identification of syntenic blocks, which can then be visualised with various tools. Unfortunately, currently available tools display only an overview of syntenic regions as a whole, limited to the gene level, and none provide further details about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes.FindingsWe present Aequatus, a standalone web-based tool that provides an in-depth view of gene structure across gene families, with various options to render and filter visualisations. It relies on pre-calculated alignment and gene feature information typically held in, but not limited to, the Ensembl Compara and Core databases. We also offer Aequatus.js, a reusable JavaScript module that fulfils the visualisation aspects of Aequatus, available within the Galaxy web platform as a visualisation plugin, which can be used to visualise gene trees generated by the GeneSeqToFamily workflow.AvailabilityAequatus is an open-source tool freely available to download under the MIT license at https://github.com/TGAC/Aequatus. A demo server is available at http://aequatus.earlham.ac.uk/. A publicly available instance of the GeneSeqToFamily workflow to generate gene tree information and visualise it using Aequatus is available on the Galaxy EU server at https://[email protected] and [email protected]


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8813 ◽  
Author(s):  
Kyle T. David ◽  
Jamie R. Oaks ◽  
Kenneth M. Halanych

Background Eukaryotic genes typically form independent evolutionary lineages through either speciation or gene duplication events. Generally, gene copies resulting from speciation events (orthologs) are expected to maintain similarity over time with regard to sequence, structure and function. After a duplication event, however, resulting gene copies (paralogs) may experience a broader set of possible fates, including partial (subfunctionalization) or complete loss of function, as well as gain of new function (neofunctionalization). This assumption, known as the Ortholog Conjecture, is prevalent throughout molecular biology and notably plays an important role in many functional annotation methods. Unfortunately, studies that explicitly compare evolutionary processes between speciation and duplication events are rare and conflicting. Methods To provide an empirical assessment of ortholog/paralog evolution, we estimated ratios of nonsynonymous to synonymous substitutions (ω = dN/dS) for 251,044 lineages in 6,244 gene trees across 77 vertebrate taxa. Results Overall, we found ω to be more similar between lineages descended from speciation events (p < 0.001) than lineages descended from duplication events, providing strong support for the Ortholog Conjecture. The asymmetry in ω following duplication events appears to be largely driven by an increase along one of the paralogous lineages, while the other remains similar to the parent. This trend is commonly associated with neofunctionalization, suggesting that gene duplication is a significant mechanism for generating novel gene functions.


2020 ◽  
Author(s):  
N.M. Shaykhutdinov ◽  
G.V. Klink ◽  
S.K. Garushyants ◽  
O.S. Kozlova ◽  
A.V. Cherkasov ◽  
...  

AbstractThe sleeping chironomid Polypedilum vanderplanki is capable of anhydrobiosis, a striking example of adaptation to extreme desiccation. Tolerance to complete desiccation in this species is associated with the emergence of multiple paralogs of protective genes. One of the gene families highly expressed under anhydrobiosis and involved in this process are protein-L-isoaspartate (D-aspartate) O-methyltransferases (PIMTs). Recently, a closely related anhydrobiotic midge from Malawi, P. pembai, showing the ability to tolerate complete desiccation similar to that of P. vanderplanki, but experiences more frequent desiccation-rehydration cycles due to differences in ecology, was discovered. Here, we sequenced and assembled the genome of P. pembai and performed a population genomics analysis of several populations of P. vanderplanki and a population of P. pembai. We observe positive selection and radical changes in the genetic architecture of the PIMT locus between the two species, including multiple duplication events in the P. pembai lineage. In particular, PIMT-4, the most highly expressed of these PIMTs, is present in six copies in the P. pembai; these copies differ in expression profiles, suggesting possible sub- or neofunctionalization. The nucleotide diversity (π) of the genomic region carrying these new genes is decreased in P. pembai, but not in the orthologous region carrying the ancestral gene in P. vanderplanki, providing evidence for a selective sweep associated with post-duplication adaptation in the former. Overall, our results suggest an extensive recent and likely ongoing, adaptation of the mechanisms of anhydrobiosis.


Development ◽  
1994 ◽  
Vol 1994 (Supplement) ◽  
pp. 125-133 ◽  
Author(s):  
Peter W. H. Holland ◽  
Jordi Garcia-Fernàndez ◽  
Nic A. Williams ◽  
Arend Sidow

All vertebrates possess anatomical features not seen in their closest living relatives, the protochordates (tunicates and amphioxus). Some of these features depend on developmental processes or cellular behaviours that are again unique to vertebrates. We are interested in the genetic changes that may have permitted the origin of these innovations. Gene duplication, followed by functional divergence of new genes, may be one class of mutation that permits major evolutionary change. Here we examine the hypothesis that gene duplication events occurred close to the origin and early radiation of the vertebrates. Genome size comparisons are compatible with the occurrence of duplications close to vertebrate origins; more precise insight comes from cloning and phylogenetic analysis of gene families from amphioxus, tunicates and vertebrates. Comparisons of Hox gene clusters, other homeobox gene families, Wnt genes and insulin-related genes all indicate that there was a major phase of gene duplication close to vertebrate origins, after divergence from the amphioxus lineage; we suggest there was probably a second phase of duplication close to jawed vertebrate origins. From amphioxus and vertebrate homeobox gene expression patterns, we suggest that there are multiple routes by which new genes arising from gene duplication acquire new functions and permit the evolution of developmental innovations.


2017 ◽  
Author(s):  
David M. Emms ◽  
Steven Kelly

AbstractThe correct interpretation of a phylogenetic tree is dependent on it being correctly rooted. A gene duplication event at the base of a clade of species is synapamorphic, and thus excludes the root of the species tree from that clade. We present STRIDE, a fast, effective, and outgroup-free method for species tree root inference from gene duplication events. STRIDE identifies sets of well-supported gene duplication events from cohorts of gene trees, and analyses these events to infer a probability distribution over an unrooted species tree for the location of the true root. We show that STRIDE infers the correct root of the species tree for a large range of simulated and real species sets. We demonstrate that the novel probability model implemented in STRIDE can accurately represent the ambiguity in species tree root assignment for datasets where information is limited. Furthermore, application of STRIDE to inference of the origin of the eukaryotic tree resulted in a root probability distribution that was consistent with, but unable to distinguish between, leading hypotheses for the origin of the eukaryotes. In summary, STRIDE is a fast, scalable, and effective method for species tree root inference from genome scale data.


2020 ◽  
Vol 117 (20) ◽  
pp. 10911-10920 ◽  
Author(s):  
Matt W. Giorgianni ◽  
Noah L. Dowell ◽  
Sam Griffin ◽  
Victoria A. Kassner ◽  
Jane E. Selegue ◽  
...  

The genetic origins of novelty are a central interest of evolutionary biology. Most new proteins evolve from preexisting proteins but the evolutionary path from ancestral gene to novel protein is challenging to trace, and therefore the requirements for and order of coding sequence changes, expression changes, or gene duplication are not clear. Snake venoms are important novel traits that are comprised of toxins derived from several distinct protein families, but the genomic and evolutionary origins of most venom components are not understood. Here, we have traced the origin and diversification of one prominent family, the snake venom metalloproteinases (SVMPs) that play key roles in subduing prey in many vipers. Genomic analyses of several rattlesnake (Crotalus) species revealed the SVMP family massively expanded from a single, deeply conserved adam28 disintegrin and metalloproteinase gene, to as many as 31 tandem genes in the Western Diamondback rattlesnake (Crotalus atrox) through a number of single gene and multigene duplication events. Furthermore, we identified a series of stepwise intragenic deletions that occurred at different times in the course of gene family expansion and gave rise to the three major classes of secreted SVMP toxins by sequential removal of a membrane-tethering domain, the cysteine-rich domain, and a disintegrin domain, respectively. Finally, we show that gene deletion has further shaped the SVMP complex within rattlesnakes, creating both fusion genes and substantially reduced gene complexes. These results indicate that gene duplication and intragenic deletion played essential roles in the origin and diversification of these novel biochemical weapons.


2005 ◽  
Vol 79 (22) ◽  
pp. 14095-14101 ◽  
Author(s):  
Karsten Suhre

ABSTRACT Gene duplication is key to molecular evolution in all three domains of life and may be the first step in the emergence of new gene function. It is a well-recognized feature in large DNA viruses but has not been studied extensively in the largest known virus to date, the recently discovered Acanthamoeba polyphaga Mimivirus. Here, I present a systematic analysis of gene and genome duplication events in the mimivirus genome. I found that one-third of the mimivirus genes are related to at least one other gene in the mimivirus genome, either through a large segmental genome duplication event that occurred in the more remote past or through more recent gene duplication events, which often occur in tandem. This shows that gene and genome duplication played a major role in shaping the mimivirus genome. Using multiple alignments, together with remote-homology detection methods based on Hidden Markov Model comparison, I assign putative functions to some of the paralogous gene families. I suggest that a large part of the duplicated mimivirus gene families are likely to interfere with important host cell processes, such as transcription control, protein degradation, and cell regulatory processes. My findings support the view that large DNA viruses are complex evolving organisms, possibly deeply rooted within the tree of life, and oppose the paradigm that viral evolution is dominated by lateral gene acquisition, at least in regard to large DNA viruses.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
David M. Emms ◽  
Steven Kelly

AbstractHere, we present a major advance of the OrthoFinder method. This extends OrthoFinder’s high accuracy orthogroup inference to provide phylogenetic inference of orthologs, rooted gene trees, gene duplication events, the rooted species tree, and comparative genomics statistics. Each output is benchmarked on appropriate real or simulated datasets, and where comparable methods exist, OrthoFinder is equivalent to or outperforms these methods. Furthermore, OrthoFinder is the most accurate ortholog inference method on the Quest for Orthologs benchmark test. Finally, OrthoFinder’s comprehensive phylogenetic analysis is achieved with equivalent speed and scalability to the fastest, score-based heuristic methods. OrthoFinder is available at https://github.com/davidemms/OrthoFinder.


Sign in / Sign up

Export Citation Format

Share Document