scholarly journals PhyloToL: A Taxon/Gene-Rich Phylogenomic Pipeline to Explore Genome Evolution of Diverse Eukaryotes

2019 ◽  
Vol 36 (8) ◽  
pp. 1831-1842 ◽  
Author(s):  
Mario A Cerón-Romero ◽  
Xyrus X Maurer-Alcalá ◽  
Jean-David Grattepanche ◽  
Ying Yan ◽  
Miguel M Fonseca ◽  
...  

Abstract Estimating multiple sequence alignments (MSAs) and inferring phylogenies are essential for many aspects of comparative biology. Yet, many bioinformatics tools for such analyses have focused on specific clades, with greatest attention paid to plants, animals, and fungi. The rapid increase in high-throughput sequencing (HTS) data from diverse lineages now provides opportunities to estimate evolutionary relationships and gene family evolution across the eukaryotic tree of life. At the same time, these types of data are known to be error-prone (e.g., substitutions, contamination). To address these opportunities and challenges, we have refined a phylogenomic pipeline, now named PhyloToL, to allow easy incorporation of data from HTS studies, to automate production of both MSAs and gene trees, and to identify and remove contaminants. PhyloToL is designed for phylogenomic analyses of diverse lineages across the tree of life (i.e., at scales of >100 My). We demonstrate the power of PhyloToL by assessing stop codon usage in Ciliophora, identifying contamination in a taxon- and gene-rich database and exploring the evolutionary history of chromosomes in the kinetoplastid parasite Trypanosoma brucei, the causative agent of African sleeping sickness. Benchmarking PhyloToL’s homology assessment against that of OrthoMCL and a published paper on superfamilies of bacterial and eukaryotic organellar outer membrane pore-forming proteins demonstrates the power of our approach for determining gene family membership and inferring gene trees. PhyloToL is highly flexible and allows users to easily explore HTS data, test hypotheses about phylogeny and gene family evolution and combine outputs with third-party tools (e.g., PhyloChromoMap, iGTP).

F1000Research ◽  
2020 ◽  
Vol 8 ◽  
pp. 2072
Author(s):  
Julien Pichon ◽  
Nicholas M. Luscombe ◽  
Charles Plessy

Background: Ascidians, a tunicate class, use a mitochondrial genetic code that is distinct from vertebrates and other invertebrates. Though it has been used to translate the coding sequences from other tunicate species on a case-by-case basis, it is has not been investigated whether this can be done systematically. This is an important because a) some tunicate mitochondrial sequences are currently translated with the invertebrate code by repositories such as NCBI GenBank, and b) uncertainties about the genetic code to use can complicate or introduce errors in phylogenetic studies based on translated mitochondrial protein sequences. Methods: We collected publicly available nucleotide sequences for non-ascidian tunicates including appendicularians such as Oikopleura dioica, translated them using the ascidian mitochondrial code, and built multiple sequence alignments covering all tunicate classes. Results: All tunicates studied here appear to translate AGR codons to glycine instead of serine (invertebrates) or as a stop codon (vertebrates), as initially described in ascidians. Among Oikopleuridae, we suggest further possible changes in the use of the ATA (Ile → Met) and TGA (Trp → Arg) codons. Conclusions: We recommend using the ascidian mitochondrial code in automatic translation pipelines of mitochondrial sequences for all tunicates. Further investigation is required for additional species-specific differences.


2019 ◽  
Vol 16 (1) ◽  
Author(s):  
Jati Adiputra ◽  
Sridhar Jarugula ◽  
Rayapati A. Naidu

Abstract Background Grapevine leafroll disease is one of the most economically important viral diseases affecting grape production worldwide. Grapevine leafroll-associated virus 4 (GLRaV-4, genus Ampelovirus, family Closteroviridae) is one of the six GLRaV species documented in grapevines (Vitis spp.). GLRaV-4 is made up of several distinct strains that were previously considered as putative species. Currently known strains of GLRaV-4 stand apart from other GLRaV species in lacking the minor coat protein. Methods In this study, the complete genome sequence of three strains of GLRaV-4 from Washington State vineyards was determined using a combination of high-throughput sequencing, Sanger sequencing and RACE. The genome sequence of these three strains was compared with corresponding sequences of GLRaV-4 strains reported from other grapevine-growing regions. Phylogenetic analysis and SimPlot and Recombination Detection Program (RDP) were used to identify putative recombination events among GLRaV-4 strains. Results The genome size of GLRaV-4 strain 4 (isolate WAMR-4), strain 5 (isolate WASB-5) and strain 9 (isolate WALA-9) from Washington State vineyards was determined to be 13,824 nucleotides (nt), 13,820 nt, and 13,850 nt, respectively. Multiple sequence alignments showed that a 11-nt sequence (5′-GTAATCTTTTG-3′) towards 5′ terminus of the 5′ non-translated region (NTR) and a 10-nt sequence (5′-ATCCAGGACC-3′) towards 3′ end of the 3′ NTR are conserved among the currently known GLRaV-4 strains. LR-106 isolate of strain 4 and Estellat isolate of strain 6 were identified as recombinants due to putative recombination events involving divergent sequences in the ORF1a from strain 5 and strain Pr. Conclusion Genome-wide analyses showed for the first time that recombinantion can occur between distinct strains of GLRaV-4 resulting in the emergence of genetically stable and biologically successful chimeric viruses. Although the origin of recombinant strains of GLRaV-4 remains elusive, intra-species recombination could be playing an important role in shaping genetic diversity and evolution of the virus and modulating the biology and epidemiology of GLRaV-4 strains.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 2072 ◽  
Author(s):  
Julien Pichon ◽  
Nicholas M. Luscombe ◽  
Charles Plessy

Background: Ascidians, a tunicate class, use a mitochondrial genetic code that is distinct from vertebrates and other invertebrates. Though it has been used to translate the coding sequences from other tunicate species on a case-by-case basis, it is has not been investigated whether this can be done systematically. This is an important because a) some tunicate mitochondrial sequences are currently translated with the invertebrate code by repositories such as NCBI GenBank, and b) uncertainties about the genetic code to use can complicate or introduce errors in phylogenetic studies based on translated mitochondrial protein sequences. Methods: We collected publicly available nucleotide sequences for non-ascidian tunicates including appendicularians such as Oikopleura dioica, translated them using the ascidian mitochondrial code, and built multiple sequence alignments covering all tunicate classes. Results: All tunicates studied here appear to translate AGR codons to glycine instead of serine (invertebrates) or as a stop codon (vertebrates), as initially described in ascidians. Among Oikopleuridae, we suggest further possible changes in the use of the ATA (Ile → Met) and TGA (Trp → Arg) codons. Conclusions: We recommend using the ascidian mitochondrial code in automatic translation pipelines of mitochondrial sequences for all tunicates. Further investigation is required for additional species-specific differences.


2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Kuan Hu ◽  
Yiming Tao ◽  
Juanni Li ◽  
Zhuang Liu ◽  
Xinyan Zhu ◽  
...  

CCN gene family members have recently been identified as multifunctional regulators involved in diverse biological functions, especially in vascular and skeletal development. In the present study, a comparative genomic and phylogenetic analysis was performed to show the similarities and differences in structure and function of CCNs from different organisms and to reveal their potential evolutionary relationship. First, CCN homologs of metazoans from different species were identified. Then we made multiple sequence alignments, MEME analysis, and functional sites prediction, which show the highly conserved structural features among CCN metazoans. The phylogenetic tree was further established, and thus CCNs were found undergoing extensive lineage-specific duplication events and lineage-specific expansion during the evolutionary process. Besides, comparative analysis about the genomic organization and chromosomal CCN gene surrounding indicated a clear orthologous relationship among these species counterparts. At last, based on these research results above, a potential evolutionary scenario was generated to overview the origin and evolution of the CCN gene family.


2021 ◽  
Author(s):  
Xavier Grau-Bové ◽  
Arnau Sebé-Pedrós

Possvm (Phylogenetic Ortholog Sorting with Species oVerlap and MCL) is a tool that automates the process of classifying clusters of orthologous genes from precomputed phylogenetic trees. It identifies orthology relationships between genes using the species overlap algorithm to infer taxonomic information from the gene tree topology, and then uses the Markov Clustering Algorithm (MCL) to identify orthology clusters and provide annotated gene family classifications. Our benchmarking shows that this approach, when provided with accurate phylogenies, is able to identify manually curated orthogroups with high precision and recall. Overall, Possvm automates the routine process of gene tree inspection and annotation in a highly interpretable manner, and provides reusable outputs that can be used to obtain phylogeny-informed gene annotations and inform comparative genomics and gene family evolution analyses.


2020 ◽  
Author(s):  
Qiuyi Li ◽  
Celine Scornavacca ◽  
Nicolas Galtier ◽  
Yao-Ban Chan

Abstract Incomplete lineage sorting (ILS), the interaction between coalescence and speciation, can generate incongruence between gene trees and species trees, as can gene duplication (D), transfer (T) and loss (L). These processes are usually modelled independently, but in reality, ILS can affect gene copy number polymorphism, i.e., interfere with DTL. This has been previously recognised, but not treated in a satisfactory way, mainly because DTL events are naturally modelled forward-in-time, while ILS is naturally modelled backwards-in-time with the coalescent. Here we consider the joint action of ILS and DTL on the gene tree/species tree problem in all its complexity. In particular, we show that the interaction between ILS and duplications/transfers (without losses) can result in patterns usually interpreted as resulting from gene loss, and that the realised rate of D, T and L becomes non-homogeneous in time when ILS is taken into account. We introduce algorithmic solutions to these problems. Our new model, the multilocus multispecies coalescent (MLMSC), which also accounts for any level of linkage between loci, generalises the multispecies coalescent model and offers a versatile, powerful framework for proper simulation and inference of gene family evolution.


2020 ◽  
Author(s):  
Esaie Kuitche Kamela ◽  
Marie Degen ◽  
Shengrui Wang ◽  
Aïda Ouangraoua

AbstractConstructing accurate gene trees is important, as gene trees play a key role in several biological studies, such as species tree reconstruction, gene functional analysis and gene family evolution studies. The accuracy of these studies is dependent on the accuracy of the input gene trees. Although several methods have been developed for improving the construction and the correction of gene trees by making use of the relationship with a species tree in addition to multiple sequence alignment, there is still a large room for improvement on the accuracy of gene trees and the computing time. In particular, accounting for alternative splicing that allows eukaryote genes to produce multiple transcripts/proteins per gene is a way to improve the quality of multiple sequence alignments used by gene tree reconstruction methods. Current methods for gene tree reconstruction usually make use of a set of transcripts composed of one representative transcript per gene, to generate multiple sequence alignments which are then used to estimate gene trees. Thus, the accuracy of the estimated gene tree depends on the choice of the representative transcripts. In this work, we present an alternative-splicing-aware method called Splicing Homology Transcript (SHT) method to estimate gene trees based on wisely selecting an accurate set of homologous transcripts to represent the genes of a gene family. We introduce a new similarity measure between transcripts for quantifying the level of homology between transcripts by combining a splicing structure-based similarity score with a sequence-based similarity score. We present a new method to cluster transcripts into a set of splicing homology groups based on the new similarity measure. The method is applied to reconstruct gene trees of the Ensembl database gene families, and a comparison with current EnsemblCompara gene trees is performed. The results show that the new approach improves gene tree accuracy thanks to the use of the new similarity measure between transcripts. An implementation of the method as well as the data used and generated in this work are available at https://github.com/UdeS-CoBIUS/SplicingHomologGeneTree/.


2016 ◽  
Author(s):  
Sereina Rutschmann ◽  
Harald Detering ◽  
Sabrina Simon ◽  
Jakob Fredslund ◽  
Michael T. Monaghan

AbstractHigh-throughput sequencing has laid the foundation for fast and cost-effective development of phylogenetic markers. Here we present the program DISCOMARK, which streamlines the development of nuclear DNA (nDNA) markers from whole-genome (or whole-transcriptome) sequencing data, combining local alignment, alignment trimming, reference mapping and primer design based on multiple sequence alignments in order to design primer pairs from input orthologous sequences. In order to demonstrate the suitability of DISCOMARK we designed markers for two groups of species, one consisting of closely related species and one group of distantly related species. For the closely related members of the species complex of Cloeon dipterum s.l. (Insecta, Ephemeroptera), the program discovered a total of 78 markers. Among these, we selected eight markers for amplification and Sanger sequencing. The exon sequence alignments (2,526 base pairs (bp)) were used to reconstruct a well supported phylogeny and to infer clearly structured haplotype networks. For the distantly related species we designed primers for several families in the insect order Ephemeroptera, using available genomic data from four sequenced species. We developed primer pairs for 23 markers that are designed to amplify across several families. The DISCOMARK program will enhance the development of new nDNA markersby providing a streamlined, automated approach to perform genome-scale scans for phylogenetic markers. The program is written in Python, released under a public license (GNU GPL v2), and together with a manual and example data set available at: https://github.com/hdetering/discomark.


2019 ◽  
Author(s):  
Julien Pichon ◽  
Nicholas M. Luscombe ◽  
Charles Plessy

AbstractBackgroundAscidians, a tunicate class, use a mitochondrial genetic code that is distinct from vertebrates and other invertebrates. Though it has been used to translate the coding sequences from other tunicate species on a case-by-case basis, it is has not been investigated whether this can be done systematically. This is an important because a) some tunicate mitochondrial sequences are currently translated with the invertebrate code by repositories such as NCBI’s GenBank, and b) uncertainties about the genetic code to use can complicate or introduce errors in phylogenetic studies based on translated mitochondrial protein sequences.MethodsWe collected publicly available nucleotide sequences for non-ascidian tunicates including appendicularians such as Oikopleura dioica, translated them using the ascidian mitochondrial code, and built multiple sequence alignments covering all tunicate classes.ResultsAll tunicates studied here appear to translate AGR codons to glycine instead of serine (invertebrates) or as a stop codon (vertebrates), as initially described in ascidians. Among Oikopleuridae, we suggest further possible changes in the use of the ATA (Ile → Met) and TGA (Trp → Arg) codons.ConclusionsWe recommend using the ascidian mitochondrial code in automatic translation pipelines of mitochondrial sequences for all tunicates. Further investigation is required for additional species-specific differences.


2020 ◽  
Vol 36 (18) ◽  
pp. 4822-4824 ◽  
Author(s):  
Nicolas Comte ◽  
Benoit Morel ◽  
Damir Hasić ◽  
Laurent Guéguen ◽  
Bastien Boussau ◽  
...  

Abstract Motivation Gene and species tree reconciliation methods are used to interpret gene trees, root them and correct uncertainties that are due to scarcity of signal in multiple sequence alignments. So far, reconciliation tools have not been integrated in standard phylogenetic software and they either lack performance on certain functions, or usability for biologists. Results We present Treerecs, a phylogenetic software based on duplication-loss reconciliation. Treerecs is simple to install and to use. It is fast and versatile, has a graphic output, and can be used along with methods for phylogenetic inference on multiple alignments like PLL and Seaview. Availability and implementation Treerecs is open-source. Its source code (C++, AGPLv3) and manuals are available from https://project.inria.fr/treerecs/.


Sign in / Sign up

Export Citation Format

Share Document