scholarly journals phyloFlash – Rapid SSU rRNA profiling and targeted assembly from metagenomes

2019 ◽  
Author(s):  
Harald R. Gruber-Vodicka ◽  
Brandon K. B. Seah ◽  
Elmar Pruesse

ABSTRACTThe SSU rRNA gene is the key marker in molecular ecology for all domains of life, but is largely absent from metagenome-assembled genomes that often are the only resource available for environmental microbes. Here we present phyloFlash, a pipeline to overcome this gap with rapid, SSU rRNA-centered taxonomic classification, targeted assembly, and graph-based binning of full metagenomic assemblies. We show that a cleanup of artifacts is pivotal even with a curated reference database. With such a filtered database, the general-purpose mapper BBmap extracts SSU rRNA reads five times faster than the rRNA-specialized tool SortMeRNA with similar sensitivity and higher selectivity on simulated metagenomes. Reference-based targeted assemblers yielded either highly fragmented assemblies or high levels of chimerism, so we employ the general-purpose genomic assembler SPAdes. Our optimized implementation is independent of reference database composition and has satisfactory levels of chimera formation. Using the phyloFlash workflow we could recover the first complete genomes of several enigmatic taxa, including Marinamargulisbacteria from surface ocean seawater. phyloFlash quickly processes Illumina (meta)genomic data, is straightforward to use, even as part of high-throughput quality control, and has user-friendly output reports. The software is available at https://github.com/HRGV/phyloFlash (GPL3 license) and is documented with an online manual.

mSystems ◽  
2020 ◽  
Vol 5 (5) ◽  
Author(s):  
Harald R. Gruber-Vodicka ◽  
Brandon K. B. Seah ◽  
Elmar Pruesse

ABSTRACT The small-subunit rRNA (SSU rRNA) gene is the key marker in molecular ecology for all domains of life, but it is largely absent from metagenome-assembled genomes that often are the only resource available for environmental microbes. Here, we present phyloFlash, a pipeline to overcome this gap with rapid, SSU rRNA-centered taxonomic classification, targeted assembly, and graph-based binning of full metagenomic assemblies. We show that a cleanup of artifacts is pivotal even with a curated reference database. With such a filtered database, the general-purpose mapper BBmap extracts SSU rRNA reads five times faster than the rRNA-specialized tool SortMeRNA with similar sensitivity and higher selectivity on simulated metagenomes. Reference-based targeted assemblers yielded either highly fragmented assemblies or high levels of chimerism, so we employ the general-purpose genomic assembler SPAdes. Our optimized implementation is independent of reference database composition and has satisfactory levels of chimera formation. phyloFlash quickly processes Illumina (meta)genomic data, is straightforward to use, even as part of high-throughput quality control, and has user-friendly output reports. The software is available at https://github.com/HRGV/phyloFlash (GPL3 license) and is documented with an online manual. IMPORTANCE To track organisms across all domains of life, the SSU rRNA gene is the gold standard. Many environmental microbes are known only from high-throughput sequence data, but the SSU rRNA gene, the key to visualization by molecular probes and link to existing literature, is often missing from metagenome-assembled genomes (MAGs). The easy-to-use phyloFlash software suite tackles this gap with rapid, SSU rRNA-centered taxonomic classification, targeted assembly, and graph-based linking to MAGs. Starting from a cleaned reference database, phyloFlash profiles the taxonomic diversity and assembles the sorted SSU rRNA reads. The phyloFlash design is domain agnostic and covers eukaryotes, archaea, and bacteria alike. phyloFlash also provides utilities to visualize multisample comparisons and to integrate the recovered SSU rRNAs in a metagenomics workflow by linking them to MAGs using assembly graph parsing.


2018 ◽  
Author(s):  
Jürgen F. H. Strassert ◽  
Elisabeth Hehenberger ◽  
Javier del Campo ◽  
Noriko Okamoto ◽  
Martin Kolisko ◽  
...  

ABSTRACTSpores of the dinoflagellate Chytriodinium are known to infest copepod eggs causing their lethality. Despite the potential to control the population of such an ecologically important host, knowledge about Chytriodinium parasites is limited: we know little about phylogeny, parasitism, abundance, or geographical distribution. We carried out genome sequence surveys on four manually isolated sporocytes from the same sporangium to analyse the phylogenetic position of Chytriodinium based on SSU and concatenated SSU/LSU rRNA gene sequences, and also characterize two genes related to the plastidial heme pathway, hemL and hemY. The results suggest the presence of a cryptic plastid in Chytriodinium and a photosynthetic ancestral state of the parasitic Chytriodinium/Dissodinium clade. Finally, by mapping Tara Oceans V9 SSU amplicon data to the recovered SSU rRNA gene sequences from the sporocytes, we show that globally, Chytriodinium parasites are most abundant within the pico/nano- and mesoplankton of the surface ocean and almost absent within microplankton, a distribution indicating that they generally exist either as free-living spores or host-associated sporangia.


2018 ◽  
Author(s):  
Javier del Campo ◽  
Thierry Heger ◽  
Raquel Rodríguez-Martínez ◽  
Alexandra Z. Worden ◽  
Thomas A. Richards ◽  
...  

Apicomplexans are a group of microbial eukaryotes that contain some of the most well-studied parasites, including widespread intracellular pathogens of mammals such as Toxoplasma and Plasmodium (the agent of malaria), and emergent pathogens like Cryptosporidium and Babesia. Decades of research have illuminated the pathogenic mechanisms, molecular biology, and genomics of model apicomplexans, but we know surprisingly little about their diversity and distribution in natural environments. In this study we analyze the distribution of apicomplexans across a range of both host-associated and free-living environments, covering animal hosts from cnidarians to mammals, and ecosystems from soils to fresh and marine waters. Using publicly available small subunit (SSU) rRNA gene databases, high-throughput environmental sequencing (HTES) surveys such as Tara Oceans and VAMPS, as well as our own generated HTES data, we developed an apicomplexan reference database, which includes the largest apicomplexan SSU rRNA tree available to date and encompasses comprehensive sampling of this group and their closest relatives. This tree allowed us to identify and correct incongruences in the molecular identification of sequences, particularly within the hematozoans and the gregarines. Analyzing the diversity and distribution of apicomplexans in HTES studies with this curated reference database also showed a widespread, and quantitatively important, presence of apicomplexans across a variety of free-living environments. These data allow us to describe a remarkable molecular diversity of this group compared with our current knowledge, especially when compared with that identified from described apicomplexan species. This revision is most striking in marine environments, where potentially the most diverse apicomplexans apparently exist, but have not yet been formally recognized. The new database will be useful for both microbial ecology and epidemiological studies, and provide valuable reference for medical and veterinary diagnosis especially in cases of emerging, zoonotic, and cryptic infections.


mBio ◽  
2018 ◽  
Vol 9 (6) ◽  
Author(s):  
Margarita Lopez-Fernandez ◽  
Domenico Simone ◽  
Xiaofen Wu ◽  
Lucile Soler ◽  
Emelie Nilsson ◽  
...  

ABSTRACT The continental subsurface is suggested to contain a significant part of the earth’s total biomass. However, due to the difficulty of sampling, the deep subsurface is still one of the least understood ecosystems. Therefore, microorganisms inhabiting this environment might profoundly influence the global nutrient and energy cycles. In this study, in situ fixed RNA transcripts from two deep continental groundwaters from the Äspö Hard Rock Laboratory (a Baltic Sea-influenced water with a residence time of <20 years, defined as “modern marine,” and an “old saline” groundwater with a residence time of thousands of years) were subjected to metatranscriptome sequencing. Although small subunit (SSU) rRNA gene and mRNA transcripts aligned to all three domains of life, supporting activity within these community subsets, the data also suggested that the groundwaters were dominated by bacteria. Many of the SSU rRNA transcripts grouped within newly described candidate phyla or could not be mapped to known branches on the tree of life, suggesting that a large portion of the active biota in the deep biosphere remains unexplored. Despite the extremely oligotrophic conditions, mRNA transcripts revealed a diverse range of metabolic strategies that were carried out by multiple taxa in the modern marine water that is fed by organic carbon from the surface. In contrast, the carbon dioxide- and hydrogen-fed old saline water with a residence time of thousands of years predominantly showed the potential to carry out translation. This suggested these cells were active, but waiting until an energy source episodically becomes available. IMPORTANCE A newly designed sampling apparatus was used to fix RNA under in situ conditions in the deep continental biosphere and benchmarks a strategy for deep biosphere metatranscriptomic sequencing. This apparatus enabled the identification of active community members and the processes they carry out in this extremely oligotrophic environment. This work presents for the first time evidence of eukaryotic, archaeal, and bacterial activity in two deep subsurface crystalline rock groundwaters from the Äspö Hard Rock Laboratory with different depths and geochemical characteristics. The findings highlight differences between organic carbon-fed shallow communities and carbon dioxide- and hydrogen-fed old saline waters. In addition, the data reveal a large portion of uncharacterized microorganisms, as well as the important role of candidate phyla in the deep biosphere, but also the disparity in microbial diversity when using standard microbial 16S rRNA gene amplification versus the large unknown portion of the community identified with unbiased metatranscriptomes.


2021 ◽  
Author(s):  
Arkadiy I Garber ◽  
Catherine R Armbruster ◽  
Stella E Lee ◽  
Vaughn S Cooper ◽  
Jennifer M Bomberger ◽  
...  

Shotgun sequencing of cultured microbial isolates/individual eukaryotes (whole-genome sequencing) and microbial communities (metagenomics) has become commonplace in biology. Very often, sequenced samples encompass organisms spanning multiple domains of life, necessitating increasingly elaborate software for accurate taxonomic classification of assembled sequences. While many software tools for taxonomic classification exist, SprayNPray offers a quick and user-friendly, semi- automated approach, allowing users to separate contigs by taxonomy (and other metrics) of interest. Easy installation, usage, and intuitive output, which is amenable to visual inspection and/or further computational parsing, will reduce barriers for biologists beginning to analyze genomes and metagenomes. This approach can be used for broad-level overviews, preliminary analyses, or as a supplement to other taxonomic classification or binning software. SprayNPray profiles contigs using multiple metrics, including closest homologs from a user-specified reference database, gene density, read coverage, GC content, tetranucleotide frequency, and codon-usage bias. The output from this software is designed to allow users to spot-check metagenome-assembled genomes, identify, and remove contigs from putative contaminants in isolate assemblies, identify bacteria in eukaryotic assemblies (and vice-versa), and identify possible horizontal gene transfer events.


2016 ◽  
Author(s):  
Søren M. Karst ◽  
Morten S. Dueholm ◽  
Simon J. McIlroy ◽  
Rasmus H. Kirkegaard ◽  
Per H. Nielsen ◽  
...  

AbstractRibosomal RNA (rRNA) genes are the consensus marker for determination of microbial diversity on the planet, invaluable in studies of evolution and, for the past decade, high-throughput sequencing of variable regions of ribosomal RNA genes has become the backbone of most microbial ecology studies. However, the underlying reference databases of full-length rRNA gene sequences are underpopulated, ecosystem skewed1, and subject to primer bias2, which hamper our ability to study the true diversity of ecosystems. Here we present an approach that combines reverse transcription of full-length small subunit (SSU) rRNA genes and synthetic long read sequencing by molecular tagging, to generate primer-free, full-length SSU rRNA gene sequences from all domains of life, with a median raw error rate of 0.17%. We generated thousands of full-length SSU rRNA sequences from five well-studied ecosystems (soil, human gut, fresh water, anaerobic digestion, and activated sludge) and obtained sequences covering all domains of life and the majority of all described phyla. Interestingly, 30% of all bacterial operational taxonomic units were novel, compared to the SILVA database (less than 97% similarity). For the Eukaryotes, the novelty was even larger with 63% of all OTUs representing novel taxa. In addition, 15% of the 18S rRNA OTUs were highly novel sequences with less than 80% similarity to the databases. The generation of primer-free full-length SSU rRNA sequences enabled eco-system specific estimation of primer-bias and, especially for eukaryotes, showed a dramatic discrepancy between the in-silico evaluation and primer-free data generated in this study. The large amount of novel sequences obtained here reaffirms that there is still vast, untapped microbial diversity lacking representatives in the SSU rRNA databases and that there might be more than millions after all1, 3. With our new approach, it is possible to readily expand the rRNA databases by orders of magnitude within a short timeframe. This will, for the first time, enable a broad census of the tree of life.


2019 ◽  
Vol 14 (7) ◽  
pp. 621-627 ◽  
Author(s):  
Youhuang Bai ◽  
Xiaozhuan Dai ◽  
Tiantian Ye ◽  
Peijing Zhang ◽  
Xu Yan ◽  
...  

Background: Long noncoding RNAs (lncRNAs) are endogenous noncoding RNAs, arbitrarily longer than 200 nucleotides, that play critical roles in diverse biological processes. LncRNAs exist in different genomes ranging from animals to plants. Objective: PlncRNADB is a searchable database of lncRNA sequences and annotation in plants. Methods: We built a pipeline for lncRNA prediction in plants, providing a convenient utility for users to quickly distinguish potential noncoding RNAs from protein-coding transcripts. Results: More than five thousand lncRNAs are collected from four plant species (Arabidopsis thaliana, Arabidopsis lyrata, Populus trichocarpa and Zea mays) in PlncRNADB. Moreover, our database provides the relationship between lncRNAs and various RNA-binding proteins (RBPs), which can be displayed through a user-friendly web interface. Conclusion: PlncRNADB can serve as a reference database to investigate the lncRNAs and their interaction with RNA-binding proteins in plants. The PlncRNADB is freely available at http://bis.zju.edu.cn/PlncRNADB/.


2011 ◽  
Vol 104 (2) ◽  
pp. 173-185 ◽  
Author(s):  
Amit Halder ◽  
Ashish Dhall ◽  
Ashim K. Datta ◽  
D. Glenn Black ◽  
P.M. Davidson ◽  
...  

Author(s):  
Y. C. Pao

Abstract A software package MenuCAD has been developed for the general need of designing menu-driven, user-friendly CAD computer programs. The main menu is formatted similar to the major contents in the final report of the design project including Contents, Analysis, Sample Design Cases, Illustrations and Tables, References, and Program Listings. Sub-menus are further divided into items delineating the steps involved in the design. Screen help messages are provided for design of the main menu and sub-menus interactively and for applying the arrow keys on the keyboard to select a sub-menus and a particular item in the sub-menu in order to execute a desired design step. MenuCAD builds the framework, its user has to supplement with a subroutine ExecItem for describing the special features and for directing how each design step should be executed in the project. A CAD design of four-bar linkage project is presented as a sample application of this package.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 441-442
Author(s):  
Adrian Maynez-Perez ◽  
Francisco Jahuey-Martinez ◽  
Jose A Martinez-Quintana ◽  
Michael E Hume ◽  
Robin C Anderson ◽  
...  

Abstract Raramuri Criollo cattle from the Chihuahuan desert in northern Mexico have been described as an ecological ecotype due to their enormous advantage in land grass utilization and their capacity to diversify their diet with cacti, forbs and woody plants. This diversification in diet utilization, could reflect upon their microbiome composition. The aim of this study was to characterize the rumen microbiome of Raramuri criollo cattle and to compare it to other lineages that graze in the same area. A total of 28 cows representing three linages [Criollo (n = 13), European (n = 9) and Criollo x European Crossbred (n = 6)] were grazed without supplementation for 45 days. DNA was extracted from ruminal samples and the V4 region of the 16S rRNA gene was sequenced on an Illumina platform. Data were analyzed with the QIIME2 software package and DADA2 plugin and the amplicon sequence variants were taxonomically classified with naïve Bayesian using the SILVA 16S rRNA gene reference database (version 132). Statistical analysis was performed by ANOVA and PERMANOVA for alpha and beta diversity indexes, respectively, and the non-strict version of linear discriminant analysis effect size (LEfSe) was used to determine significantly different taxa among lineages. Differences in beta diversity indexes (P &lt; 0.05) were found in ruminal microbiome composition between Criollo and European groups, whereas the Crossbred showed intermediate values when compared to the pure breeds (Table 1). LEfSe analysis identified a total of 20 bacterial groups that explained differences between lineages, including one for Crossbreed, ten for European and nine for Criollo. These results show ruminal microbiome differences between Raramuri criollo cattle and the mainstream European breeds used in the northern Mexico Chihuahuan desert and reflect that those differences could be a consequence of dissimilar grazing behavior.


Sign in / Sign up

Export Citation Format

Share Document