scholarly journals HashSeq: a Simple, Scalable, and Conservative De Novo Variant Caller for 16S rRNA Gene Data Sets

mSystems ◽  
2021 ◽  
Author(s):  
Farnaz Fouladi ◽  
Jacqueline B. Young ◽  
Anthony A. Fodor

Recent bioinformatics development has enabled the detection of sequence variants with a high resolution of only one single-nucleotide difference in 16S rRNA gene sequence data. Despite this progress, there are several limitations that can be associated with variant calling pipelines, such as producing a large number of low-abundance sequence variants which need to be filtered out with arbitrary thresholds in downstream analyses or having a slow runtime.

2021 ◽  
Author(s):  
Farnaz Fouladi ◽  
Jacqueline B Young ◽  
Anthony A Fodor

16S rRNA gene sequencing is a common and cost-effective technique for characterization of microbial communities. Recent bioinformatics methods enable high-resolution detection of sequence variants of only one nucleotide difference. In this manuscript, we utilize a very fast HashMap-based approach to detect sequence variants in six publicly available 16S rRNA gene datasets. We then use the normal distribution combined with LOESS regression to estimate background error rates as a function of sequencing depth for individual clusters of sequences. This method is computationally efficient and produces inference that yields sets of variants that are conservative and well supported by reference databases. We argue that this approach to inference is fast, simple, scalable to large datasets, and provides a high-resolution set of sequence variants which are less likely to be the result of sequencing error.


2016 ◽  
Vol 54 (11) ◽  
pp. 2749-2756 ◽  
Author(s):  
Janetta R. Hakovirta ◽  
Samantha Prezioso ◽  
David Hodge ◽  
Segaran P. Pillai ◽  
Linda M. Weigel

Analysis of 16S rRNA genes is important for phylogenetic classification of known and novel bacterial genera and species and for detection of uncultivable bacteria. PCR amplification of 16S rRNA genes with universal primers produces a mixture of amplicons from all rRNA operons in the genome, and the sequence data generally yield a consensus sequence. Here we describe valuable data that are missing from consensus sequences, variable effects on sequence data generated from nonidentical 16S rRNA amplicons, and the appearance of data displayed by different software programs. These effects are illustrated by analysis of 16S rRNA genes from 50 strains of theBacillus cereusgroup, i.e.,Bacillus anthracis,Bacillus cereus,Bacillus mycoides, andBacillus thuringiensis. These species have 11 to 14 rRNA operons, and sequence variability occurs among the multiple 16S rRNA genes. A single nucleotide polymorphism (SNP) previously reported to be specific toB. anthraciswas detected in someB. cereusstrains. However, a different SNP, at position 1139, was identified as being specific toB. anthracis, which is a biothreat agent with high mortality rates. Compared with visual analysis of the electropherograms, basecaller software frequently missed gene sequence variations or could not identify variant bases due to overlapping basecalls. Accurate detection of 16S rRNA gene sequences that include intragenomic variations can improve discrimination among closely related species, improve the utility of 16S rRNA databases, and facilitate rapid bacterial identification by targeted DNA sequence analysis or by whole-genome sequencing performed by clinical or reference laboratories.


Algologia ◽  
2021 ◽  
Vol 31 (1) ◽  
pp. 93-113
Author(s):  
A.R. Nur Fadzliana ◽  
◽  
W.O. Wan Maznah ◽  
S.A.M. Nor ◽  
Choon Pin Foong ◽  
...  

Cyanobacteria are the most widespread group of photosynthetic prokaryotes. They are primary producers in a wide variety of habitats and are able to thrive in harsh environments, including polluted waters; therefore, this study was conducted to explore the cyanobacterial populations inhabiting river tributaries with different levels of pollution. Sediment samples (epipelon) were collected from selected tributaries of the Pinang River basin. Air Terjun (T1) and Air Itam rivers (T2) represent the upper streams of Pinang River basin, while Dondang (T3) and Jelutong rivers (T4) are located at in the middle of the river basin. The Pinang River (T5) is located near the estuary and is subjected to saline water intrusion during high tides. Cyanobacterial community was determined by identifying the taxa via 16S rRNA gene amplicon sequence data. 16S rRNA gene amplicons generated from collected samples were sequenced using illumina Miseq, with the targeted V3 and V4 regions yielding approximately 1 mln reads per sample. Synechococcus, Phormidium, Arthronema and Leptolyngbya were found in all samples. Shannon-Weiner diversity index was highest (H’ = 1.867) at the clean upstream station (T1), while the moderately polluted stream (T3) recorded the lowest diversity (H’ = 0.399), and relatively polluted stations (T4 and T5) recorded fairly high values of H’. This study provides insights into the cyanobacterial community structure in Pinang River basin via cultivation-independent techniques using 16S rRNA gene amplicon sequence. Occurrence of some morphospecies at specific locations showed that the cyanobacterial communities are quite distinct and have specific ecological demands. Some species which were ubiquitous might be able to tolerate varied environmental conditions.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Daniel Roush ◽  
Ana Giraldo-Silva ◽  
Ferran Garcia-Pichel

AbstractCyanobacteria are a widespread and important bacterial phylum, responsible for a significant portion of global carbon and nitrogen fixation. Unfortunately, reliable and accurate automated classification of cyanobacterial 16S rRNA gene sequences is muddled by conflicting systematic frameworks, inconsistent taxonomic definitions (including the phylum itself), and database errors. To address this, we introduce Cydrasil 3 (https://www.cydrasil.org), a curated 16S rRNA gene reference package, database, and web application designed to provide a full phylogenetic perspective for cyanobacterial systematics and routine identification. Cydrasil 3 contains over 1300 manually curated sequences longer than 1100 base pairs and can be used for phylogenetic placement or as a reference sequence set for de novo phylogenetic reconstructions. The web application (utilizing PaPaRA and EPA-ng) can place thousands of sequences into the reference tree and has detailed instructions on how to analyze results. While the Cydrasil web application offers no taxonomic assignments, it instead provides phylogenetic placement, as well as a searchable database with curation notes and metadata, and a mechanism for community feedback.


2015 ◽  
Vol 65 (Pt_2) ◽  
pp. 723-731 ◽  
Author(s):  
Ronel Roberts ◽  
Emma T. Steenkamp ◽  
Gerhard Pietersen

Greening disease of citrus in South Africa is associated with ‘Candidatus Liberibacter africanus’ (Laf), a phloem-limited bacterium vectored by the sap-sucking insect Trioza erytreae (Triozidae). Despite the implementation of control strategies, this disease remains problematic, suggesting the existence of reservoir hosts to Laf. The current study aimed to identify such hosts. Samples from 234 trees of Clausena anisata, 289 trees of Vepris lanceolata and 231 trees of Zanthoxylum capense were collected throughout the natural distribution of these trees in South Africa. Total DNA was extracted from samples and tested for the presence of liberibacters by a generic Liberibacter TaqMan real-time PCR assay. Liberibacters present in positive samples were characterized by amplifying and sequencing rplJ, omp and 16S rRNA gene regions. The identity of tree host species from which liberibacter sequences were obtained was verified by sequencing host rbcL genes. Of the trees tested, 33 specimens of Clausena, 17 specimens of Vepris and 10 specimens of Zanthoxylum tested positive for liberibacter. None of the samples contained typical citrus-infecting Laf sequences. Phylogenetic analysis of 16S rRNA gene sequences indicated that the liberibacters obtained from Vepris and Clausena had 16S rRNA gene sequences identical to that of ‘Candidatus Liberibacter africanus subsp. capensis’ (LafC), whereas those from Zanthoxylum species grouped separately. Phylogenetic analysis of the rplJ and omp gene regions revealed unique clusters for liberibacters associated with each tree species. We propose the following names for these novel liberibacters: ‘Candidatus Liberibacter africanus subsp. clausenae’ (LafCl), ‘Candidatus Liberibacter africanus subsp. vepridis’ (LafV) and ‘Candidatus Liberibacter africanus subsp. zanthoxyli’ (LafZ). This study did not find any natural hosts of Laf associated with greening of citrus. While native citrus relatives were shown to be infected with Laf-related liberibacters, nucleotide sequence data suggest that these are not alternative sources of Laf to citrus orchards, per se.


mBio ◽  
2019 ◽  
Vol 10 (4) ◽  
Author(s):  
Marc A. Sze ◽  
Begüm D. Topçuoğlu ◽  
Nicholas A. Lesniak ◽  
Mack T. Ruffin ◽  
Patrick D. Schloss

ABSTRACT Colonic bacterial populations are thought to have a role in the development of colorectal cancer with some protecting against inflammation and others exacerbating inflammation. Short-chain fatty acids (SCFAs) have been shown to have anti-inflammatory properties and are produced in large quantities by colonic bacteria that produce SCFAs by fermenting fiber. We assessed whether there was an association between fecal SCFA concentrations and the presence of colonic adenomas or carcinomas in a cohort of individuals using 16S rRNA gene and metagenomic shotgun sequence data. We measured the fecal concentrations of acetate, propionate, and butyrate within the cohort and found that there were no significant associations between SCFA concentration and tumor status. When we incorporated these concentrations into random forest classification models trained to differentiate between people with healthy colons and those with adenomas or carcinomas, we found that they did not significantly improve the ability of 16S rRNA gene or metagenomic gene sequence-based models to classify individuals. Finally, we generated random forest regression models trained to predict the concentration of each SCFA based on 16S rRNA gene or metagenomic gene sequence data from the same samples. These models performed poorly and were able to explain at most 14% of the observed variation in the SCFA concentrations. These results support the broader epidemiological data that questions the value of fiber consumption for reducing the risks of colorectal cancer. Although other bacterial metabolites may serve as biomarkers to detect adenomas or carcinomas, fecal SCFA concentrations have limited predictive power. IMPORTANCE Considering that colorectal cancer is the third leading cancer-related cause of death within the United States, it is important to detect colorectal tumors early and to prevent the formation of tumors. Short-chain fatty acids (SCFAs) are often used as a surrogate for measuring gut health and for being anticarcinogenic because of their anti-inflammatory properties. We evaluated the fecal SCFA concentrations of a cohort of individuals with different colonic tumor burdens who were previously analyzed to identify microbiome-based biomarkers of tumors. We were unable to find an association between SCFA concentration and tumor burden or use SCFAs to improve our microbiome-based models of classifying people based on their tumor status. Furthermore, we were unable to find an association between the fecal community structure and SCFA concentrations. Our results indicate that the association between fecal SCFAs, the gut microbiome, and tumor burden is weak.


2011 ◽  
Vol 32 (2) ◽  
pp. 66 ◽  
Author(s):  
Peter Kampfer ◽  
Stefanie P Glaeser

The initial step in prokaryote species and genera descriptions is now largely based on the 16S rRNA gene sequencing approach followed often by a very restricted additional phenotypic characterisation of the representatives of the potential novel taxa. Despite the advantages of the sequence-based approaches, there appears to be a tendency to classify new species on the basis of comparative sequence analyses of 16S rRNA gene sequences and other gene sequence data (multilocus sequence analyses, MLSA), contrary to the indications of other data. However, the biological meaning behind these sequence data is not always clear, and one should be careful with comprehensive taxonomic rearrangements until there is better insight of these data.


2020 ◽  
Vol 9 (24) ◽  
Author(s):  
Sangam Kandel ◽  
Supaphen Sripiboon ◽  
Piroon Jenjaroenpun ◽  
David W. Ussery ◽  
Intawat Nookaew ◽  
...  

ABSTRACT Here, we present a 16S rRNA gene amplicon sequence data set and profiles demonstrating the bacterial diversity of baby and adult elephants from four different geographical locations in Thailand. The dominant phyla among baby and adult elephants were Bacteroidetes, Firmicutes, Proteobacteria, Kiritimatiellaeota, Euryarchaeota, and Tenericutes.


Sign in / Sign up

Export Citation Format

Share Document