scholarly journals Reconstructing the backbone of the Saccharomycotina yeast phylogeny using genome-scale data

2016 ◽  
Author(s):  
Xing-Xing Shen ◽  
Xiaofan Zhou ◽  
Jacek Kominek ◽  
Cletus P. Kurtzman ◽  
Chris Todd Hittinger ◽  
...  

AbstractUnderstanding the phylogenetic relationships among the yeasts of the subphylum Saccharomycotina is a prerequisite for understanding the evolution of their metabolisms and ecological lifestyles. In the last two decades, the use of rDNA and multi-locus data sets has greatly advanced our understanding of the yeast phylogeny, but many deep relationships remain unsupported. In contrast, phylogenomic analyses have involved relatively few taxa and lineages that were often selected with limited considerations for covering the breadth of yeast biodiversity. Here we used genome sequence data from 86 publicly available yeast genomes representing 9 of the 11 major lineages and 10 non-yeast fungal outgroups to generate a 1,233-gene, 96-taxon data matrix. Species phylogenies reconstructed using two different methods (concatenation and coalescence) and two data matrices (amino acids or the first two codon positions) yielded identical and highly supported relationships between the 9 major lineages. Aside from the lineage comprised by the family Pichiaceae, all other lineages were monophyletic. Most interrelationships among yeast species were robust across the two methods and data matrices. However, 8 of the 93 internodes conflicted between analyses or data sets, including the placements of: the clade defined by species that have reassigned the CUG codon to encode serine, instead of leucine; the clade defined by a whole genome duplication; and of Ascoidea rubescens. These phylogenomic analyses provide a robust roadmap for future comparative work across the yeast subphylum in the disciplines of taxonomy, molecular genetics, evolutionary biology, ecology, and biotechnology. To further this end, we have also provided a BLAST server to query the 86 Saccharomycotina genomes, which can be found at http://y1000plus.org/blast.

2016 ◽  
Vol 6 (12) ◽  
pp. 3927-3939 ◽  
Author(s):  
Xing-Xing Shen ◽  
Xiaofan Zhou ◽  
Jacek Kominek ◽  
Cletus P Kurtzman ◽  
Chris Todd Hittinger ◽  
...  

Abstract Understanding the phylogenetic relationships among the yeasts of the subphylum Saccharomycotina is a prerequisite for understanding the evolution of their metabolisms and ecological lifestyles. In the last two decades, the use of rDNA and multilocus data sets has greatly advanced our understanding of the yeast phylogeny, but many deep relationships remain unsupported. In contrast, phylogenomic analyses have involved relatively few taxa and lineages that were often selected with limited considerations for covering the breadth of yeast biodiversity. Here we used genome sequence data from 86 publicly available yeast genomes representing nine of the 11 known major lineages and 10 nonyeast fungal outgroups to generate a 1233-gene, 96-taxon data matrix. Species phylogenies reconstructed using two different methods (concatenation and coalescence) and two data matrices (amino acids or the first two codon positions) yielded identical and highly supported relationships between the nine major lineages. Aside from the lineage comprised by the family Pichiaceae, all other lineages were monophyletic. Most interrelationships among yeast species were robust across the two methods and data matrices. However, eight of the 93 internodes conflicted between analyses or data sets, including the placements of: the clade defined by species that have reassigned the CUG codon to encode serine, instead of leucine; the clade defined by a whole genome duplication; and the species Ascoidea rubescens. These phylogenomic analyses provide a robust roadmap for future comparative work across the yeast subphylum in the disciplines of taxonomy, molecular genetics, evolutionary biology, ecology, and biotechnology. To further this end, we have also provided a BLAST server to query the 86 Saccharomycotina genomes, which can be found at http://y1000plus.org/blast.


Author(s):  
Robert S de Moya ◽  
Kazunori Yoshizawa ◽  
Kimberly K O Walden ◽  
Andrew D Sweet ◽  
Christopher H Dietrich ◽  
...  

Abstract The insect order Psocodea is a diverse lineage comprising both parasitic (Phthiraptera) and non-parasitic members (Psocoptera). The extreme age and ecological diversity of the group may be associated with major genomic changes, such as base compositional biases expected to affect phylogenetic inference. Divergent morphology between parasitic and non-parasitic members has also obscured the origins of parasitism within the order. We conducted a phylogenomic analysis on the order Psocodea utilizing both transcriptome and genome sequencing to obtain a data set of 2,370 orthologous genes. All phylogenomic analyses, including both concatenated and coalescent methods suggest a single origin of parasitism within the order Psocodea, resolving conflicting results from previous studies. This phylogeny allows us to propose a stable ordinal level classification scheme that retains significant taxonomic names present in historical scientific literature and reflects the evolution of the group as a whole. A dating analysis, with internal nodes calibrated by fossil evidence, suggests an origin of parasitism that predates the K-Pg boundary. Nucleotide compositional biases are detected in third and first codon positions and result in the anomalous placement of the Amphientometae as sister to Psocomorpha when all nucleotide sites are analyzed. Likelihood-mapping and quartet sampling methods demonstrate that base compositional biases can also have an effect on quartet-based methods.


Author(s):  
Yuanning Li ◽  
Jacob L. Steenwyk ◽  
Ying Chang ◽  
Yan Wang ◽  
Timothy Y. James ◽  
...  

AbstractPhylogenomic studies based on genome-scale amounts of data have greatly improved understanding of the tree of life. Despite their diversity, ecological significance, and biomedical and industrial importance, large-scale phylogenomic studies of Fungi are lacking. Furthermore, several evolutionary relationships among major fungal lineages remain controversial, especially those at the base of the fungal phylogeny. To begin filling these gaps and assess progress toward a genome-scale phylogeny of the entire fungal kingdom, we compiled a phylogenomic data matrix of 290 genes from the genomes of 1,644 fungal species that includes representatives from most major fungal lineages; we also compiled 11 additional data matrices by subsampling genes or taxa based on filtering criteria previously shown to improve phylogenomic inference. Analyses of these 12 data matrices using concatenation- and coalescent-based approaches yielded a robust phylogeny of the kingdom in which ∼85% of internal branches were congruent across data matrices and approaches used. We found support for several relationships that have been historically contentious (e.g., for the placement of Wallemiomycotina (Basidiomycota), as sister to Agaricomycotina), as well as evidence for polytomies likely stemming from episodes of ancient diversification (e.g., at the base of Basidiomycota). By examining the relative evolutionary divergence of taxonomic groups of equivalent rank, we found that fungal taxonomy is broadly aligned with genome sequence divergence, but also identified lineages, such as the subphylum Saccharomycotina, where current taxonomic circumscription does not fully account for their high levels of evolutionary divergence. Our results provide a robust phylogenomic framework to explore the tempo and mode of fungal evolution and directions for future fungal phylogenetic and taxonomic studies.


2012 ◽  
Vol 279 (1741) ◽  
pp. 3282-3290 ◽  
Author(s):  
Harald O. Letsch ◽  
Karen Meusemann ◽  
Benjamin Wipfler ◽  
Kai Schütte ◽  
Rolf Beutel ◽  
...  

In this study, we investigated the relationships among insect orders with a main focus on Polyneoptera (lower Neoptera: roaches, mantids, earwigs, grasshoppers, etc.), and Paraneoptera (thrips, lice, bugs in the wide sense). The relationships between and within these groups of insects are difficult to resolve because only few informative molecular and morphological characters are available. Here, we provide the first phylogenomic expressed sequence tags data (‘EST’: short sub-sequences from a c(opy) DNA sequence encoding for proteins) for stick insects (Phasmatodea) and webspinners (Embioptera) to complete published EST data. As recent EST datasets are characterized by a heterogeneous distribution of available genes across taxa, we use different rationales to optimize the data matrix composition. Our results suggest a monophyletic origin of Polyneoptera and Eumetabola (Paraneoptera + Holometabola). However, we identified artefacts of tree reconstruction (human louse Pediculus humanus assigned to Odonata (damselflies and dragonflies) or Holometabola (insects with a complete metamorphosis); mayfly genus Baetis nested within Neoptera), which were most probably rooted in a data matrix composition bias due to the inclusion of sequence data of entire proteomes. Until entire proteomes are available for each species in phylogenomic analyses, this potential pitfall should be carefully considered.


2017 ◽  
Author(s):  
Xiaofan Zhou ◽  
Xingxing Shen ◽  
Chris Todd Hittinger ◽  
Antonis Rokas

AbstractPhylogenetics has witnessed dramatic increases in the sizes of data matrices assembled to resolve branches of the tree of life, motivating the development of programs for fast, yet accurate, inference. For example, several different fast programs have been developed in the very popular maximum likelihood framework, including RAxML/ExaML, PhyML, IQ-TREE, and FastTree. Although these four programs are widely used, a systematic evaluation and comparison of their performance using empirical genome-scale data matrices has so far been lacking. To address this question, we evaluated these four programs on 19 empirical phylogenomic data sets from diverse animal, plant, and fungal lineages with respect to likelihood maximization, tree topology, and computational speed. For single-gene tree inference, we found that the more exhaustive and slower strategies (ten searches per alignment) outperformed faster strategies (one tree search per alignment) using RAxML, PhyML, or IQ-TREE. Interestingly, single-gene trees inferred by the three programs yielded comparable coalescent-based species tree estimations. For concatenation–based species tree inference, IQ-TREE consistently achieved the best-observed likelihoods for all data sets, and RAxML/ExaML was a close second. In contrast, PhyML often failed to complete concatenation-based analyses, whereas FastTree was the fastest but generated lower likelihood values and more dissimilar tree topologies in both types of analyses. Finally, data matrix properties, such as the number of taxa and the strength of phylogenetic signal, sometimes substantially influenced the relative performance of the programs. Our results provide real-world gene and species tree phylogenetic inference benchmarks to inform the design and execution of large-scale phylogenomic data analyses.


2016 ◽  
Author(s):  
K. Jun Tong ◽  
Nathan Lo ◽  
Simon Y W Ho

Reconstructing the timescale of the Tree of Life is one of the principal aims of evolutionary biology. This has been greatly aided by the development of the molecular clock, which enables evolutionary timescales to be estimated from genetic data. In recent years, high-throughput sequencing technology has led to an increase in the feasibility and availability of genome-scale data sets. These represent a rich source of biological information, but they also bring a set of analytical challenges. In this review, we provide an overview of phylogenomic dating and describe the challenges associated with analysing genome-scale data. We also report on recent phylogenomic estimates of the evolutionary timescales of mammals, birds, and insects.


2016 ◽  
Author(s):  
K. Jun Tong ◽  
Nathan Lo ◽  
Simon Y W Ho

Reconstructing the timescale of the Tree of Life is one of the principal aims of evolutionary biology. This has been greatly aided by the development of the molecular clock, which enables evolutionary timescales to be estimated from genetic data. In recent years, high-throughput sequencing technology has led to an increase in the feasibility and availability of genome-scale data sets. These represent a rich source of biological information, but they also bring a set of analytical challenges. In this review, we provide an overview of phylogenomic dating and describe the challenges associated with analysing genome-scale data. We also report on recent phylogenomic estimates of the evolutionary timescales of mammals, birds, and insects.


2018 ◽  
Author(s):  
Jerome Kelleher ◽  
Yan Wong ◽  
Patrick K. Albers ◽  
Anthony W. Wohns ◽  
Gil McVean

AbstractA central problem in evolutionary biology is to infer the full genealogical history of a set of DNA sequences. This history contains rich information about the forces that have influenced a sexually reproducing species. However, existing methods are limited: the most accurate is unable to cope with more than a few dozen samples. With modern genetic data sets rapidly approaching millions of genomes, there is an urgent need for efficient inference methods to exploit such rich resources. We introduce an algorithm to infer whole-genome history which has comparable accuracy to the state-of-the-art but can process around four orders of magnitude more sequences. Additionally, our method results in an “evolutionary encoding” of the original sequence data, enabling efficient access to genealogies and calculation of genetic statistics over the data. We apply this technique to human data from the 1000 Genomes Project, Simons Genome Diversity Project and UK Biobank, showing that the genealogies we estimate are both rich in biological signal and efficient to process.


2013 ◽  
Vol 35 (6) ◽  
pp. 685-694
Author(s):  
Ting-Zhang WANG ◽  
Gao SHAN ◽  
Jian-Hong XU ◽  
Qing-Zhong XUE

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Eleanor F. Miller ◽  
Andrea Manica

Abstract Background Today an unprecedented amount of genetic sequence data is stored in publicly available repositories. For decades now, mitochondrial DNA (mtDNA) has been the workhorse of genetic studies, and as a result, there is a large volume of mtDNA data available in these repositories for a wide range of species. Indeed, whilst whole genome sequencing is an exciting prospect for the future, for most non-model organisms’ classical markers such as mtDNA remain widely used. By compiling existing data from multiple original studies, it is possible to build powerful new datasets capable of exploring many questions in ecology, evolution and conservation biology. One key question that these data can help inform is what happened in a species’ demographic past. However, compiling data in this manner is not trivial, there are many complexities associated with data extraction, data quality and data handling. Results Here we present the mtDNAcombine package, a collection of tools developed to manage some of the major decisions associated with handling multi-study sequence data with a particular focus on preparing sequence data for Bayesian skyline plot demographic reconstructions. Conclusions There is now more genetic information available than ever before and large meta-data sets offer great opportunities to explore new and exciting avenues of research. However, compiling multi-study datasets still remains a technically challenging prospect. The mtDNAcombine package provides a pipeline to streamline the process of downloading, curating, and analysing sequence data, guiding the process of compiling data sets from the online database GenBank.


Sign in / Sign up

Export Citation Format

Share Document