scholarly journals AmpliCI: A High-resolution Model-Based Approach for Denoising Illumina Amplicon Data

2020 ◽  
Author(s):  
Xiyu Peng ◽  
Karin Dorman

AbstractMotivationNext-generation amplicon sequencing is a powerful tool for investigating microbial communities. One main challenge is to distinguish true biological variants from errors caused by PCR and sequencing. In the traditional analysis pipeline, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false positive rates. Recently developed “denoising” methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low frequency sequences, especially those near abundant variants, because they ignore the sequencing quality information.ResultsWe introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI takes into account quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. We show that AmpliCI is superior to three popular denoising methods, with acceptable computation time and memory usage.AvailabilitySource code available at https://github.com/DormanLab/AmpliCI

Author(s):  
Xiyu Peng ◽  
Karin S Dorman

Abstract Motivation Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false-positive rates. Recently developed ‘denoising’ methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low-frequency sequences, especially those near more frequent sequences, because they ignore the sequencing quality information. Results We introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI considers the quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. AmpliCI has better performance than three popular denoising methods, with acceptable computation time and memory usage. Availability and implementation Source code is available at https://github.com/DormanLab/AmpliCI. Supplementary information Supplementary material are available at Bioinformatics online.


Biology ◽  
2021 ◽  
Vol 10 (7) ◽  
pp. 569
Author(s):  
Chakriya Sansupa ◽  
Sara Fareed Mohamed Wahdan ◽  
Terd Disayathanoowat ◽  
Witoon Purahong

This study aims to estimate the proportion and diversity of soil bacteria derived from eDNA-based and culture-based methods. Specifically, we used Illumina Miseq to sequence and characterize the bacterial communities from (i) DNA extracted directly from forest soil and (ii) DNA extracted from a mixture of bacterial colonies obtained by enrichment cultures on agar plates of the same forest soil samples. The amplicon sequencing of enrichment cultures allowed us to rapidly screen a culturable community in an environmental sample. In comparison with an eDNA community (based on a 97% sequence similarity threshold), the fact that enrichment cultures could capture both rare and abundant bacterial taxa in forest soil samples was demonstrated. Enrichment culture and eDNA communities shared 2% of OTUs detected in total community, whereas 88% of enrichment cultures community (15% of total community) could not be detected by eDNA. The enrichment culture-based methods observed 17% of the bacteria in total community. FAPROTAX functional prediction showed that the rare and unique taxa, which were detected with the enrichment cultures, have potential to perform important functions in soil systems. We suggest that enrichment culture-based amplicon sequencing could be a beneficial approach to evaluate a cultured bacterial community. Combining this approach together with the eDNA method could provide more comprehensive information of a bacterial community. We expected that more unique cultured taxa could be detected if further studies used both selective and non-selective culture media to enrich bacteria at the first step.


Fuels ◽  
2021 ◽  
Vol 2 (2) ◽  
pp. 241-252
Author(s):  
Dyah Asri Handayani Taroepratjeka ◽  
Tsuyoshi Imai ◽  
Prapaipid Chairattanamanokorn ◽  
Alissara Reungsang

Extreme halophiles offer the advantage to save on the costs of sterilization and water for biohydrogen production from lignocellulosic waste after the pretreatment process with their ability to withstand extreme salt concentrations. This study identifies the dominant hydrogen-producing genera and species among the acclimatized, extremely halotolerant microbial communities taken from two salt-damaged soil locations in Khon Kaen and one location from the salt evaporation pond in Samut Sakhon, Thailand. The microbial communities’ V3–V4 regions of 16srRNA were analyzed using high-throughput amplicon sequencing. A total of 345 operational taxonomic units were obtained and the high-throughput sequencing confirmed that Firmicutes was the dominant phyla of the three communities. Halanaerobium fermentans and Halanaerobacter lacunarum were the dominant hydrogen-producing species of the communities. Spatial proximity was not found to be a determining factor for similarities between these extremely halophilic microbial communities. Through the study of the microbial communities, strategies can be developed to increase biohydrogen molar yield.


2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Sandra Reitmeier ◽  
Thomas C. A. Hitch ◽  
Nicole Treichel ◽  
Nikolaos Fikas ◽  
Bela Hausmann ◽  
...  

Abstract16S rRNA gene amplicon sequencing is a popular approach for studying microbiomes. However, some basic concepts have still not been investigated comprehensively. We studied the occurrence of spurious sequences using defined microbial communities based on data either from the literature or generated in three sequencing facilities and analyzed via both operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) approaches. OTU clustering and singleton removal, a commonly used approach, delivered approximately 50% (mock communities) to 80% (gnotobiotic mice) spurious taxa. The fraction of spurious taxa was generally lower based on ASV analysis, but varied depending on the gene region targeted and the barcoding system used. A relative abundance of 0.25% was found as an effective threshold below which the analysis of spurious taxa can be prevented to a large extent in both OTU- and ASV-based analysis approaches. Using this cutoff improved the reproducibility of analysis, i.e., variation in richness estimates was reduced by 38% compared with singleton filtering using six human fecal samples across seven sequencing runs. Beta-diversity analysis of human fecal communities was markedly affected by both the filtering strategy and the type of phylogenetic distances used for comparison, highlighting the importance of carefully analyzing data before drawing conclusions on microbiome changes. In summary, handling of artifact sequences during bioinformatic processing of 16S rRNA gene amplicon data requires careful attention to avoid the generation of misleading findings. We propose the concept of effective richness to facilitate the comparison of alpha-diversity across studies.


GigaScience ◽  
2017 ◽  
Vol 6 (2) ◽  
Author(s):  
Mohamed Mysara ◽  
Mercy Njima ◽  
Natalie Leys ◽  
Jeroen Raes ◽  
Pieter Monsieurs

MycoKeys ◽  
2018 ◽  
Vol 39 ◽  
pp. 29-40 ◽  
Author(s):  
Sten Anslan ◽  
R. Henrik Nilsson ◽  
Christian Wurzbacher ◽  
Petr Baldrian ◽  
Leho Tedersoo ◽  
...  

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.


Author(s):  
Damien Jacot ◽  
Trestan Pillonel ◽  
Gilbert Greub ◽  
Claire Bertelli

Although many laboratories worldwide have developed their sequencing capacities in response to the need for SARS-CoV-2 genome-based surveillance of variants, only few reported some quality criteria to ensure sequence quality before lineage assignment and submission to public databases. Hence, we aimed here to provide simple quality control criteria for SARS-CoV-2 sequencing to prevent erroneous interpretation of low quality or contaminated data. We retrospectively investigated 647 SARS-CoV-2 genomes obtained over ten tiled amplicons sequencing runs. We extracted 26 potentially relevant metrics covering the entire workflow from sample selection to bioinformatics analysis. Based on data distribution, critical values were established for eleven selected metrics to prompt further quality investigations for problematic samples, in particular those with a low viral RNA quantity. Low frequency variants (<70% of supporting reads) can result from PCR amplification errors, sample cross contaminations or presence of distinct SARS-CoV2 genomes in the sample sequenced. The number and the prevalence of low frequency variants can be used as a robust quality criterion to identify possible sequencing errors or contaminations. Overall, we propose eleven metrics with fixed cutoff values as a simple tool to evaluate the quality of SARS-CoV-2 genomes, among which cycle thresholds, mean depth, proportion of genome covered at least 10x and the number of low frequency variants combined with mutation prevalence data.


2018 ◽  
Author(s):  
Estelle Couradeau ◽  
Joelle Sasse ◽  
Danielle Goudeau ◽  
Nandita Nath ◽  
Terry C. Hazen ◽  
...  

AbstractThe ability to link soil microbial diversity to soil processes requires technologies that differentiate active subpopulations of microbes from so-called relic DNA and dormant cells. Measures of microbial activity based on various techniques including DNA labelling have suggested that most cells in soils are inactive, a fact that has been difficult to reconcile with observed high levels of bulk soil activities. We hypothesized that measures of in situ DNA synthesis may be missing the soil microbes that are metabolically active but not replicating, and we therefore applied BONCAT (Bioorthogonal Non Canonical Amino Acid Tagging) i.e. a proxy for activity that does not rely on cell division, to measure translationally active cells in soils. We compared the active population of two soil depths from Oak Ridge (TN) incubated under the same conditions for up to seven days. Depending on the soil, a maximum of 25 – 70% of the cells were active, accounting for 3-4 million cells per gram of soil type, which is an order of magnitude higher than previous estimates. The BONCAT positive cell fraction was recovered by fluorescence activated cell sorting (FACS) and identified by 16S rDNA amplicon sequencing. The diversity of the active fraction was a selected subset of the bulk soil community. Excitingly, some of the same members of the community were recruited at both depths independently from their abundance rank. On average, 86% of sequence reads recovered from the active community shared >97% sequence similarity with cultured isolates from the field site. Our observations are in line with a recent report that, of the few taxa that are both abundant and ubiquitous in soil, 45% are also cultured – and indeed some of these ubiquitous microorganisms were found to be translationally active. The use of BONCAT on soil microbiomes provides evidence that a large portion of the soil microbes can be active simultaneously. We conclude that BONCAT coupled to FACS and sequencing is effective for interrogating the active fraction of soil microbiomes in situ and provides new perspectives to link metabolic capacity to overall soil ecological traits and processes.


Sign in / Sign up

Export Citation Format

Share Document