scholarly journals A robust phylogenomic timetree for biotechnologically and medically important fungi in the genera Aspergillus and Penicillium

2018 ◽  
Author(s):  
Jacob L. Steenwyk ◽  
Xing-Xing Shen ◽  
Abigail L. Lind ◽  
Gustavo H. Goldman ◽  
Antonis Rokas

AbstractAbbreviations:NT, nucleotide; AA, amino acid; CI, credible interval; RCV, relative composition variability; IC, internode certainty; GSF, gene support frequencies; GLS, gene-wise log-likelihood scores; DVMC, degree of violation of a molecular clockThe filamentous fungal family Aspergillaceae contains > 1,000 known species, mostly in the genera Aspergillus and Penicillium. Several species are used in the food, biotechnology, and drug industries (e.g., Aspergillus oryzae, Penicillium camemberti), while others are dangerous human and plant pathogens (e.g., Aspergillus fumigatus, Penicillium digitatum). To infer a robust phylogeny and pinpoint poorly resolved branches and their likely underlying contributors, we used 81 genomes spanning the diversity of Aspergillus and Penicillium to construct a 1,668-gene data matrix. Phylogenies of the nucleotide and amino acid versions of this full data matrix as well as of five additional 834-gene data matrices constructed by subsampling the top 50% of genes according to different criteria associated with strong phylogenetic signal were generated using three different maximum likelihood schemes (i.e., gene-partitioned, unpartitioned, and coalescence). Examination of the topological agreement among these 36 phylogenies and measures of internode certainty identified 12 / 78 (15.4%) bipartitions that were incongruent and pinpoint the likely underlying contributing factors (incomplete lineage sorting, hybridization or introgression, and reconstruction artifacts associated with poor taxon sampling). Relaxed molecular clock analyses suggest that Aspergillaceae likely originated in the lower Cretaceous and the Aspergillus and Penicillium genera in the upper Cretaceous. Our results shed light on the ongoing debate on Aspergillus systematics and taxonomy and provide a robust evolutionary and temporal framework for comparative genomic analyses in Aspergillaceae. More broadly, our approach provides a general template for phylogenomic identification of resolved and contentious branches in densely genome-sequenced lineages across the tree of life.

mBio ◽  
2019 ◽  
Vol 10 (4) ◽  
Author(s):  
Jacob L. Steenwyk ◽  
Xing-Xing Shen ◽  
Abigail L. Lind ◽  
Gustavo H. Goldman ◽  
Antonis Rokas

ABSTRACT The filamentous fungal family Aspergillaceae contains >1,000 known species, mostly in the genera Aspergillus and Penicillium. Several species are used in the food, biotechnology, and drug industries (e.g., Aspergillus oryzae and Penicillium camemberti), while others are dangerous human and plant pathogens (e.g., Aspergillus fumigatus and Penicillium digitatum). To infer a robust phylogeny and pinpoint poorly resolved branches and their likely underlying contributors, we used 81 genomes spanning the diversity of Aspergillus and Penicillium to construct a 1,668-gene data matrix. Phylogenies of the nucleotide and amino acid versions of this full data matrix as well as of several additional data matrices were generated using three different maximum likelihood schemes (i.e., gene-partitioned, unpartitioned, and coalescence) and using both site-homogenous and site-heterogeneous models (total of 64 species-level phylogenies). Examination of the topological agreement among these phylogenies and measures of internode certainty identified 11/78 (14.1%) bipartitions that were incongruent and pinpointed the likely underlying contributing factors, which included incomplete lineage sorting, hidden paralogy, hybridization or introgression, and reconstruction artifacts associated with poor taxon sampling. Relaxed molecular clock analyses suggest that Aspergillaceae likely originated in the lower Cretaceous and that the Aspergillus and Penicillium genera originated in the upper Cretaceous. Our results shed light on the ongoing debate on Aspergillus systematics and taxonomy and provide a robust evolutionary and temporal framework for comparative genomic analyses in Aspergillaceae. More broadly, our approach provides a general template for phylogenomic identification of resolved and contentious branches in densely genome-sequenced lineages across the tree of life. IMPORTANCE Understanding the evolution of traits across technologically and medically significant fungi requires a robust phylogeny. Even though species in the Aspergillus and Penicillium genera (family Aspergillaceae, class Eurotiomycetes) are some of the most significant technologically and medically relevant fungi, we still lack a genome-scale phylogeny of the lineage or knowledge of the parts of the phylogeny that exhibit conflict among analyses. Here, we used a phylogenomic approach to infer evolutionary relationships among 81 genomes that span the diversity of Aspergillus and Penicillium species, to identify conflicts in the phylogeny, and to determine the likely underlying factors of the observed conflicts. Using a data matrix comprised of 1,668 genes, we found that while most branches of the phylogeny of the Aspergillaceae are robustly supported and recovered irrespective of method of analysis, a few exhibit various degrees of conflict among our analyses. Further examination of the observed conflict revealed that it largely stems from incomplete lineage sorting and hybridization or introgression. Our analyses provide a robust and comprehensive evolutionary genomic roadmap for this important lineage, which will facilitate the examination of the diverse technologically and medically relevant traits of these fungi in an evolutionary context.


2019 ◽  
Author(s):  
Thomas Flouris ◽  
Xiyun Jiao ◽  
Bruce Rannala ◽  
Ziheng Yang

AbstractRecent analyses suggest that cross-species gene flow or introgression is common in nature, especially during species divergences. Genomic sequence data can be used to infer introgression events and to estimate the timing and intensity of introgression, providing an important means to advance our understanding of the role of gene flow in speciation. Here we implement the multispecies-coalescent-with-introgression (MSci) model, an extension of the multispecies-coalescent (MSC) model to incorporate introgression, in our Bayesian Markov chain Monte Carlo (MCMC) program BPP. The MSci model accommodates deep coalescence (or incomplete lineage sorting) and introgression and provides a natural framework for inference using genomic sequence data. Computer simulation confirms the good statistical properties of the method, although hundreds or thousands of loci are typically needed to estimate introgression probabilities reliably. Re-analysis of datasets from the purple cone spruce confirms the hypothesis of homoploid hybrid speciation. We estimated the introgression probability using the genomic sequence data from six mosquito species in the Anopheles gambiae species complex, which varies considerably across the genome, likely driven by differential selection against introgressed alleles.


2017 ◽  
Author(s):  
◽  
Jacob Daniel Washburn

Most plants convert sunlight into chemical energy using a process known as C[subscript 3] photosynthesis. However, some of the world's most successful plants instead use the C[subscript 4] photosynthetic pathway which allows them to more efficiently use water, nitrogen, and solar energy. In the past 30 million years, C4 photosynthesis has convergently evolved from C3 over 60 times and new lineages are in the process of evolving even today. Because of this complex evolutionary history, C[subscript 4] is not "one" uniform photosynthetic type, but a diverse collection of photosynthetic sub-types that are classically grouped according to their use of three different biochemical pathways. The grass tribe Paniceae is especially interesting in this aspect because it contains all three of these biochemical subtypes as well as important food and bioenergy crops. To better understand the evolution of C[subscript 4] photosynthesis, DNA and RNA sequencing were undertaken for various species from within the Paniceae and used for phylogenetic and comparative genomic studies. Cell type specific RNA expression profiling for the two major C4 cell types was also completed for representative species of each C[subscript 4] sub-type. Streamlined bioinformatics pipelines for both chloroplast and nuclear phylogenetics were developed for processing the data. These analyses resulted in: 1) The first "genome scale" phylogenetic tree of the grass tribe Paniceae, 2) The clearest evidence to date of the evolutionary relationships between the three classically defined C[subscript 4] sub-types, 3) The most convincing results to date that the chloroplast and nuclear phylogenies of the Paniceae are incongruent, 4) Evidence that this chloroplast nuclear incongruence is likely due to introgression and/or incomplete lineage sorting, and 5) Strong support for sub-type mixing as well as the existence of a PCK sub-type.


2021 ◽  
Author(s):  
Alexander Knyshov ◽  
Yana Hrytsenko ◽  
Robert Literman ◽  
Rachel S Schwartz

The position of some taxa on the Tree of Life remains controversial despite the increase in genomic data used to infer phylogenies. While analyzing large datasets alleviates stochastic errors, it does not prevent systematic errors in inference, caused by both biological (e.g., incomplete lineage sorting, hybridization) and methodological (e.g., incorrect modeling, erroneous orthology assessments) factors. In our study, we systematically investigated factors that could result in these controversies, using the treeshrew (Scandentia, Mammalia) as a study case. Recent studies have narrowed the phylogenetic position of treeshrews to three competing hypotheses: sister to primates and flying lemurs (Primatomorpha), sister to rodents and lagomorphs (Glires), or sister to a clade comprising all of these. We sampled 50 mammal species including three treeshrews, a selection of taxa from the potential sister groups, and outgroups. Using a large diverse set of loci, we assessed support for the alternative phylogenetic position of treeshrews. A plurality of loci support treeshrews as sister to rodents and lagomorphs; however, only a few loci exhibit strong support for any hypothesis. Surprisingly, we found that a subset of loci that strongly support the monophyly of Primates, support treeshrews as sister to primates and flying lemurs. The overall small magnitude of differences in phylogenetic signal among the alternative hypotheses suggests that these three groups diversified nearly simultaneously. However, with our large dataset and approach to examining support, we provide evidence for the hypothesis of treeshrews as sister to rodents and lagomorphs, while demonstrating why support for alternate hypotheses has been seen in prior work. We also suggest that locus selection can unwittingly bias results.


2021 ◽  
Author(s):  
Xing-Xing Shen ◽  
Jacob L Steenwyk ◽  
Antonis Rokas

Abstract Topological conflict or incongruence is widespread in phylogenomic data. Concatenation- and coalescent-based approaches often result in incongruent topologies, but the causes of this conflict can be difficult to characterize. We examined incongruence stemming from conflict between likelihood-based signal (quantified by the difference in gene-wise log likelihood score or ΔGLS) and quartet-based topological signal (quantified by the difference in gene-wise quartet score or ΔGQS) for every gene in three phylogenomic studies in animals, fungi, and plants, which were chosen because their concatenation-based IQ-TREE (T1) and quartet-based ASTRAL (T2) phylogenies are known to produce eight conflicting internal branches (bipartitions). By comparing the types of phylogenetic signal for all genes in these three data matrices, we found that 30% - 36% of genes in each data matrix are inconsistent, that is, each of these genes has higher log likelihood score for T1 versus T2 (i.e., ΔGLS >0) whereas its T1 topology has lower quartet score than its T2 topology (i.e., ΔGQS <0) or vice versa. Comparison of inconsistent and consistent genes using a variety of metrics (e.g., evolutionary rate, gene tree topology, distribution of branch lengths, hidden paralogy, and gene tree discordance) showed that inconsistent genes are more likely to recover neither T1 nor T2 and have higher levels of gene tree discordance than consistent genes. Simulation analyses demonstrate that removal of inconsistent genes from datasets with low levels of incomplete lineage sorting (ILS) and low and medium levels of gene tree estimation error (GTEE) reduced incongruence and increased accuracy. In contrast, removal of inconsistent genes from datasets with medium and high ILS levels and high GTEE levels eliminated or extensively reduced incongruence, but the resulting congruent species phylogenies were not always topologically identical to the true species trees.


Mycologia ◽  
2018 ◽  
Vol 110 (1) ◽  
pp. 104-117 ◽  
Author(s):  
Quandt ◽  
Patterson ◽  
Spatafora

Host specialization is common among parasitic fungi; however, there are examples when transitions in host specificity between disparately related hosts have occurred. Here, we examine the interkingdom host jump from insect pathogenicity and mycoparasitism in Tolypocladium. Previous phylogenetic inferences made using only a few genes and with poor support reconstructed an ancestral character state of insect pathogenesis, a transition to mycoparasitism, and reversions to insect pathogenesis. To further explore the directionality and genes underlying the transitions in host, we sequenced two additional species of Tolypocladium (T. capitatum and T. paradoxum) and used phylogenomics to compare two insect pathogens and two mycoparasites. Our whole-genome-scale analysis suggests that the diversification of Tolypocladium species happened relatively quickly and that the truffle parasites form a monophyletic, derived lineage within the genus that is the result of a single ecological transition or host jump from insects to fungi. A significant amount of gene tree/species tree discordance occurs within the data set, and we infer this to be the product of both an historical hybridization event and incomplete lineage sorting that was likely because of the rapid diversification of the clade. Furthermore, comparative genomic analyses revealed a set of genes that are exclusive to the mycoparasitic species. These potentially mycoparasitic gene clusters were characterized by a reduced proportion of secreted proteins when compared with entomopathogen-enriched genes and involved the reshaping of the fungal secretome in the ecological context of mycoparasitism.


2014 ◽  
Author(s):  
Yi-Chieh Wu ◽  
Mukul S Bansal ◽  
Matthew D Rasmussen ◽  
Javier Herrero ◽  
Manolis Kellis

Model organisms can serve the biological and medical community by enabling the study of conserved gene families and pathways in experimentally-tractable systems. Their use, however, hinges on the ability to reliably identify evolutionary orthologs and paralogs with high accuracy, which can be a great challenge at both small and large evolutionary distances. Here, we present a phylogenomics-based approach for the identification of orthologous and paralogous genes in human, mouse, fly, and worm, which forms the foundation of the comparative analyses of the modENCODE and mouse ENCODE projects. We study a median of 16,101 genes across 2 mammalian genomes (human, mouse), 12 Drosophila genomes, 5 Caenorhabditis genomes, and an outgroup yeast genome, and demonstrate that accurate inference of evolutionary relationships and events across these species must account for frequent gene-tree topology errors due to both incomplete lineage sorting and insufficient phylogenetic signal. Furthermore, we show that integration of two separate phylogenomic pipelines yields increased accuracy, suggesting that their sources of error are independent, and finally, we leverage the resulting annotation of homologous genes to study the functional impact of gene duplication and loss in the context of rich gene expression and functional genomic datasets of the modENCODE, mouse ENCODE, and human ENCODE projects.


Author(s):  
Amanda Patsis ◽  
Rick P. Overson ◽  
Krissa A. Skogen ◽  
Norman J. Wickett ◽  
Matthew G. Johnson ◽  
...  

Oenothera sect. Pachylophus has proven to be a valuable system in which to study plant-insect coevolution and the drivers of variation in floral morphology and scent. Current species circumscriptions based on morphological characteristics suggest that the section consists of five species, one of which is subdivided into five subspecies. Previous attempts to understand species (and subspecies) relationships at amolecular level have been largely unsuccessful due to high levels of incomplete lineage sorting and limited phylogenetic signal from slowly evolving gene regions. In the present study, target enrichment was used to sequence 322 conserved protein-coding nuclear genes from 50 individuals spanning the geographic range of Oenothera sect. Pachylophus, with species trees inferred using concatenation and coalescentbasedmethods. Our findings concur with previous research in suggesting that O. psammophila and O. harringtonii are nested within a paraphyleticOenothera cespitosa. By contrast, our results show clearly that the two annual species (O. cavernae and O. brandegeei) did not arise from the O. cespitosa lineage, but rather from a common ancestor of Oenothera sect. Pachylophus. Budding speciation as a result of edaphic specializationappears to best explain the evolution of the narrow endemic species O. harringtonii and O. psammophila. Complete understanding of possible introgression among subspecies of O. cespitosa will require broader sampling across the full geographical and ecological ranges of these taxa.


2020 ◽  
Vol 70 (1) ◽  
pp. 49-66 ◽  
Author(s):  
Paul M Hime ◽  
Alan R Lemmon ◽  
Emily C Moriarty Lemmon ◽  
Elizabeth Prendini ◽  
Jeremy M Brown ◽  
...  

Abstract Molecular phylogenies have yielded strong support for many parts of the amphibian Tree of Life, but poor support for the resolution of deeper nodes, including relationships among families and orders. To clarify these relationships, we provide a phylogenomic perspective on amphibian relationships by developing a taxon-specific Anchored Hybrid Enrichment protocol targeting hundreds of conserved exons which are effective across the class. After obtaining data from 220 loci for 286 species (representing 94% of the families and 44% of the genera), we estimate a phylogeny for extant amphibians and identify gene tree–species tree conflict across the deepest branches of the amphibian phylogeny. We perform locus-by-locus genealogical interrogation of alternative topological hypotheses for amphibian monophyly, focusing on interordinal relationships. We find that phylogenetic signal deep in the amphibian phylogeny varies greatly across loci in a manner that is consistent with incomplete lineage sorting in the ancestral lineage of extant amphibians. Our results overwhelmingly support amphibian monophyly and a sister relationship between frogs and salamanders, consistent with the Batrachia hypothesis. Species tree analyses converge on a small set of topological hypotheses for the relationships among extant amphibian families. These results clarify several contentious portions of the amphibian Tree of Life, which in conjunction with a set of vetted fossil calibrations, support a surprisingly younger timescale for crown and ordinal amphibian diversification than previously reported. More broadly, our study provides insight into the sources, magnitudes, and heterogeneity of support across loci in phylogenomic data sets.[AIC; Amphibia; Batrachia; Phylogeny; gene tree–species tree discordance; genomics; information theory.]


2019 ◽  
Vol 69 (3) ◽  
pp. 431-444 ◽  
Author(s):  
Emily J Roycroft ◽  
Adnan Moussalli ◽  
Kevin C Rowe

Abstract The estimation of robust and accurate measures of branch support has proven challenging in the era of phylogenomics. In data sets of potentially millions of sites, bootstrap support for bifurcating relationships around very short internal branches can be inappropriately inflated. Such overestimation of branch support may be particularly problematic in rapid radiations, where phylogenetic signal is low and incomplete lineage sorting severe. Here, we explore this issue by comparing various branch support estimates under both concatenated and coalescent frameworks, in the recent radiation Australo-Papuan murine rodents (Muridae: Hydromyini). Using nucleotide sequence data from 1245 independent loci and several phylogenomic inference methods, we unequivocally resolve the majority of genus-level relationships within Hydromyini. However, at four nodes we recover inconsistency in branch support estimates both within and among concatenated and coalescent approaches. In most cases, concatenated likelihood approaches using standard fast bootstrap algorithms did not detect any uncertainty at these four nodes, regardless of partitioning strategy. However, we found this could be overcome with two-stage resampling, that is, across genes and sites within genes (using -bsam GENESITE in IQ-TREE). In addition, low confidence at recalcitrant nodes was recovered using UFBoot2, a recent revision to the bootstrap protocol in IQ-TREE, but this depended on partitioning strategy. Summary coalescent approaches also failed to detect uncertainty under some circumstances. For each of four recalcitrant nodes, an equivalent (or close to equivalent) number of genes were in strong support ($>$ 75% bootstrap) of both the primary and at least one alternative topological hypothesis, suggesting notable phylogenetic conflict among loci not detected using some standard branch support metrics. Recent debate has focused on the appropriateness of concatenated versus multigenealogical approaches to resolving species relationships, but less so on accurately estimating uncertainty in large data sets. Our results demonstrate the importance of employing multiple approaches when assessing confidence and highlight the need for greater attention to the development of robust measures of uncertainty in the era of phylogenomics.


Sign in / Sign up

Export Citation Format

Share Document