scholarly journals Ribovore: ribosomal RNA sequence analysis for GenBank submissions and database curation

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Alejandro A. Schäffer ◽  
Richard McVeigh ◽  
Barbara Robbertse ◽  
Conrad L. Schoch ◽  
Anjanette Johnston ◽  
...  

Abstract Background The DNA sequences encoding ribosomal RNA genes (rRNAs) are commonly used as markers to identify species, including in metagenomics samples that may combine many organismal communities. The 16S small subunit ribosomal RNA (SSU rRNA) gene is typically used to identify bacterial and archaeal species. The nuclear 18S SSU rRNA gene, and 28S large subunit (LSU) rRNA gene have been used as DNA barcodes and for phylogenetic studies in different eukaryote taxonomic groups. Because of their popularity, the National Center for Biotechnology Information (NCBI) receives a disproportionate number of rRNA sequence submissions and BLAST queries. These sequences vary in quality, length, origin (nuclear, mitochondria, plastid), and organism source and can represent any region of the ribosomal cistron. Results To improve the timely verification of quality, origin and loci boundaries, we developed Ribovore, a software package for sequence analysis of rRNA sequences. The and programs are used to validate incoming sequences of bacterial and archaeal SSU rRNA. The program is used to create high-quality datasets of rRNAs from different taxonomic groups. Key algorithmic steps include comparing candidate sequences against rRNA sequence profile hidden Markov models (HMMs) and covariance models of rRNA sequence and secondary-structure conservation, as well as other tests. Nine freely available rRNA databases created and maintained with Ribovore are used for checking incoming GenBank submissions and used by the browser interface at NCBI. Since 2018, Ribovore has been used to analyze more than 50 million prokaryotic SSU rRNA sequences submitted to GenBank, and to select at least 10,435 fungal rRNA RefSeq records from type material of 8350 taxa. Conclusion Ribovore combines single-sequence and profile-based methods to improve GenBank processing and analysis of rRNA sequences. It is a standalone, portable, and extensible software package for the alignment, classification and validation of rRNA sequences. Researchers planning on submitting SSU rRNA sequences to GenBank are encouraged to download and use Ribovore to analyze their sequences prior to submission to determine which sequences are likely to be automatically accepted into GenBank.

2021 ◽  
Author(s):  
Alejandro A. Schäffer ◽  
Richard McVeigh ◽  
Barbara Robbertse ◽  
Conrad L. Schoch ◽  
Anjanette Johnston ◽  
...  

AbstractBackgroundThe DNA sequences encoding ribosomal RNA genes (rRNAs) are commonly used as markers to identify species, including in metagenomics samples that may combine many organismal communities. The 16S small subunit ribosomal RNA (SSU rRNA) gene is typically used to identify bacterial and archaeal species. The nuclear 18S SSU rRNA gene, and 28S large subunit (LSU) rRNA gene have been used as DNA barcodes and for phylogenetic studies in different eukaryote taxonomic groups. Because of their popularity, the National Center for Biotechnology Information (NCBI) receives a disproportionate number of rRNA sequence submissions and BLAST queries. These sequences vary in quality, length, origin (nuclear, mitochondria, plastid), and organism source and can represent any region of the ribosomal cistron.ResultsTo improve the timely verification of quality, origin and loci boundaries, we developed Ribovore, a software package for sequence analysis of rRNA sequences. The ribotyper and ribosensor programs are used to validate incoming sequences of bacterial and archaeal SSU rRNA. The ribodbmaker program is used to create high-quality datasets of rRNAs from different taxonomic groups. Key algorithmic steps include comparing candidate sequences against rRNA sequence profile hidden Markov models (HMMs) and covariance models of rRNA sequence and secondary-structure conservation, as well as other tests. At least nine freely available blastn rRNA databases created and maintained with Ribovore are used either for checking incoming GenBank submissions or by the blastn browser interface at NCBI or both. Since 2018, Ribovore has been used to analyze more than 50 million prokaryotic SSU rRNA sequences submitted to GenBank, and to select at least 10,435 fungal rRNA RefSeq records from type material of 8,350 taxa.ConclusionRibovore combines single-sequence and profile-based methods to improve GenBank processing and analysis of rRNA sequences. It is a standalone, portable, and extensible software package for the alignment, classification and validation of rRNA sequences. Researchers planning on submitting SSU rRNA sequences to GenBank are encouraged to download and use Ribovore to analyze their sequences prior to submission to determine which sequences are likely to be automatically accepted into GenBank.


2020 ◽  
Vol 139 ◽  
pp. 15-23
Author(s):  
SRM Jones ◽  
H Ahonen ◽  
J Taskinen

Infections with microsporidian parasites are described in skeletal muscle of burbot Lota lota from Lake Haukivesi, Finland. Infected myocytes contained spores within sporophorous vesicles (SPVs) in contact with host cell cytoplasm, similar to Pleistophora ladogensis in L. lota and smelt Osmerus eperlanus in western Russia and northern Germany. Analysis of small subunit ribosomal RNA (SSU rRNA) gene sequences indicated identity with Myosporidium spraguei in burbot and pike-perch from this lake. The latter is considered a junior synonym of P. ladogensis. Phylogenetic analysis of SSU rRNA sequences resolved the burbot parasite apart from a clade containing the type species P. typicalis, but together with M. merluccius. The parasite is renamed Myosporidium ladogensis (Voronin, 1978) n. comb. Networks of tubular appendages arising from developing meronts and SPVs were associated with degradation of host cell cytoplasm.


2012 ◽  
Vol 57 (4) ◽  
Author(s):  
B. Nath ◽  
S. Gupta ◽  
A. Bajpai

AbstractThe life cycle, spore morphology, pathogenicity, tissue specificity, mode of transmission and small subunit rRNA (SSU-rRNA) gene sequence analysis of the five new microsporidian isolates viz., NIWB-11bp, NIWB-12n, NIWB-13md, NIWB-14b and NIWB-15mb identified from the silkworm, Bombyx mori have been studied along with type species, NIK-1s_mys. The life cycle of the microsporidians identified exhibited the sequential developmental cycles that are similar to the general developmental cycle of the genus, Nosema. The spores showed considerable variations in their shape, length and width. The pathogenicity observed was dose-dependent and differed from each of the microsporidian isolates; the NIWB-15mb was found to be more virulent than other isolates. All of the microsporidians were found to infect most of the tissues examined and showed gonadal infection and transovarial transmission in the infected silkworms. SSU-rRNA sequence based phylogenetic tree placed NIWB-14b, NIWB-12n and NIWB-11bp in a separate branch along with other Nosema species and Nosema bombycis; while NIWB-15mb and NIWB-13md together formed another cluster along with other Nosema species. NIK-1s_mys revealed a signature sequence similar to standard type species, N. bombycis, indicating that NIK-1s_mys is similar to N. bombycis. Based on phylogenetic relationships, branch length information based on genetic distance and nucleotide differences, we conclude that the microsporidian isolates identified are distinctly different from the other known species and belonging to the genus, Nosema. This SSU-rRNA gene sequence analysis method is found to be more useful approach in detecting different and closely related microsporidians of this economically important domestic insect.


2016 ◽  
Author(s):  
Søren M. Karst ◽  
Morten S. Dueholm ◽  
Simon J. McIlroy ◽  
Rasmus H. Kirkegaard ◽  
Per H. Nielsen ◽  
...  

AbstractRibosomal RNA (rRNA) genes are the consensus marker for determination of microbial diversity on the planet, invaluable in studies of evolution and, for the past decade, high-throughput sequencing of variable regions of ribosomal RNA genes has become the backbone of most microbial ecology studies. However, the underlying reference databases of full-length rRNA gene sequences are underpopulated, ecosystem skewed1, and subject to primer bias2, which hamper our ability to study the true diversity of ecosystems. Here we present an approach that combines reverse transcription of full-length small subunit (SSU) rRNA genes and synthetic long read sequencing by molecular tagging, to generate primer-free, full-length SSU rRNA gene sequences from all domains of life, with a median raw error rate of 0.17%. We generated thousands of full-length SSU rRNA sequences from five well-studied ecosystems (soil, human gut, fresh water, anaerobic digestion, and activated sludge) and obtained sequences covering all domains of life and the majority of all described phyla. Interestingly, 30% of all bacterial operational taxonomic units were novel, compared to the SILVA database (less than 97% similarity). For the Eukaryotes, the novelty was even larger with 63% of all OTUs representing novel taxa. In addition, 15% of the 18S rRNA OTUs were highly novel sequences with less than 80% similarity to the databases. The generation of primer-free full-length SSU rRNA sequences enabled eco-system specific estimation of primer-bias and, especially for eukaryotes, showed a dramatic discrepancy between the in-silico evaluation and primer-free data generated in this study. The large amount of novel sequences obtained here reaffirms that there is still vast, untapped microbial diversity lacking representatives in the SSU rRNA databases and that there might be more than millions after all1, 3. With our new approach, it is possible to readily expand the rRNA databases by orders of magnitude within a short timeframe. This will, for the first time, enable a broad census of the tree of life.


1999 ◽  
Vol 35 (3) ◽  
pp. 458-465 ◽  
Author(s):  
Joon-seok Chae ◽  
Suryakant D. Waghela ◽  
Thomas M. Craig ◽  
Alan A. Kocan ◽  
Gerald G. Wagner ◽  
...  

2013 ◽  
Vol 60 (3) ◽  
pp. 135-148 ◽  
Author(s):  
Ioannis A. Papaioannou ◽  
Chrysoula D. Dimopoulou ◽  
Milton A. Typas

2018 ◽  
Author(s):  
Jeffrey S. McLean ◽  
Batbileg Bor ◽  
Thao T. To ◽  
Quanhui Liu ◽  
Kristopher A. Kerns ◽  
...  

ABSTRACTRecently, we discovered that a member of the Saccharibacteria/TM7 phylum (strain TM7x) isolated from the human oral cavity, has an ultra-small cell size (200-300nm), a highly reduced genome (705 Kbp) with limited de novo biosynthetic capabilities, and a very novel lifestyle as an obligate epibiont on the surface of another bacterium 1. There has been considerable interest in uncultivated phyla, particularly those that are now classified as the proposed candidate phyla radiation (CPR) reported to include 35 or more phyla and are estimated to make up nearly 15% of the domain Bacteria. Most members of the larger CPR group share genomic properties with Saccharibacteria including reduced genomes (<1Mbp) and lack of biosynthetic capabilities, yet to date, strain TM7x represents the only member of the CPR that has been cultivated and is one of only three CPR routinely detected in the human body. Through small subunit ribosomal RNA (SSU rRNA) gene surveys, members of the Saccharibacteria phylum are reported in many environments as well as within a diversity of host species and have been shown to increase dramatically in human oral and gut diseases. With a single copy of the 16S rRNA gene resolved on a few limited genomes, their absolute abundance is most often underestimated and their potential role in disease pathogenesis is therefore underappreciated. Despite being an obligate parasite dependent on other bacteria, six groups (G1-G6) are recognized using SSU rRNA gene phylogeny in the oral cavity alone. At present, only genomes from the G1 group, which includes related and remarkably syntenic environmental and human oral associated representatives1, have been uncovered to date. In this study we systematically captured the spectrum of known diversity in this phylum by reconstructing completely novel Class level genomes belonging to groups G3, G6 and G5 through cultivation enrichment and/or metagenomic binning from humans and mammalian rumen. Additional genomes for representatives of G1 were also obtained from modern oral plaque and ancient dental calculus. Comparative analysis revealed remarkable divergence in the host-associated members across this phylum. Within the human oral cavity alone, variation in as much as 70% of the genes from nearest oral clade (AAI 50%) as well as wide GC content variation is evident in these newly captured divergent members (G3, G5 and G6) with no environmental relatives. Comparative analyses suggest independent episodes of transmission of these TM7 groups into humans and convergent evolution of several key functions during adaptation within hosts. In addition, we provide evidence from in vivo collected samples that each of these major groups are ultra-small in size and are found attached to larger cells.


Sign in / Sign up

Export Citation Format

Share Document