AmpliCI: A High-resolution Model-Based Approach for Denoising Illumina Amplicon Data

Mapping Intimacies ◽

10.1101/2020.02.23.961227 ◽

2020 ◽

Author(s):

Xiyu Peng ◽

Karin Dorman

Keyword(s):

Sequence Similarity ◽

Low Frequency ◽

Computation Time ◽

Amplicon Sequencing ◽

Quality Information ◽

Operational Taxonomic Units ◽

Model Based ◽

Main Challenge ◽

Sequencing Quality ◽

Resolution Model

AbstractMotivationNext-generation amplicon sequencing is a powerful tool for investigating microbial communities. One main challenge is to distinguish true biological variants from errors caused by PCR and sequencing. In the traditional analysis pipeline, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false positive rates. Recently developed “denoising” methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low frequency sequences, especially those near abundant variants, because they ignore the sequencing quality information.ResultsWe introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI takes into account quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. We show that AmpliCI is superior to three popular denoising methods, with acceptable computation time and memory usage.AvailabilitySource code available at https://github.com/DormanLab/AmpliCI

Download Full-text

AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data

Bioinformatics ◽

10.1093/bioinformatics/btaa648 ◽

2020 ◽

Author(s):

Xiyu Peng ◽

Karin S Dorman

Keyword(s):

Sequence Similarity ◽

Low Frequency ◽

Computation Time ◽

Amplicon Sequencing ◽

Supplementary Information ◽

Quality Information ◽

Model Based ◽

Main Challenge ◽

Sequencing Quality ◽

Resolution Model

Abstract Motivation Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false-positive rates. Recently developed ‘denoising’ methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low-frequency sequences, especially those near more frequent sequences, because they ignore the sequencing quality information. Results We introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI considers the quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. AmpliCI has better performance than three popular denoising methods, with acceptable computation time and memory usage. Availability and implementation Source code is available at https://github.com/DormanLab/AmpliCI. Supplementary information Supplementary material are available at Bioinformatics online.

Download Full-text

Identifying Hidden Viable Bacterial Taxa in Tropical Forest Soils Using Amplicon Sequencing of Enrichment Cultures

Biology ◽

10.3390/biology10070569 ◽

2021 ◽

Vol 10 (7) ◽

pp. 569

Author(s):

Chakriya Sansupa ◽

Sara Fareed Mohamed Wahdan ◽

Terd Disayathanoowat ◽

Witoon Purahong

Keyword(s):

Bacterial Community ◽

Forest Soil ◽

Culture Media ◽

Enrichment Culture ◽

Sequence Similarity ◽

Illumina Miseq ◽

Amplicon Sequencing ◽

Soil Samples ◽

Enrichment Cultures ◽

Total Community

This study aims to estimate the proportion and diversity of soil bacteria derived from eDNA-based and culture-based methods. Specifically, we used Illumina Miseq to sequence and characterize the bacterial communities from (i) DNA extracted directly from forest soil and (ii) DNA extracted from a mixture of bacterial colonies obtained by enrichment cultures on agar plates of the same forest soil samples. The amplicon sequencing of enrichment cultures allowed us to rapidly screen a culturable community in an environmental sample. In comparison with an eDNA community (based on a 97% sequence similarity threshold), the fact that enrichment cultures could capture both rare and abundant bacterial taxa in forest soil samples was demonstrated. Enrichment culture and eDNA communities shared 2% of OTUs detected in total community, whereas 88% of enrichment cultures community (15% of total community) could not be detected by eDNA. The enrichment culture-based methods observed 17% of the bacteria in total community. FAPROTAX functional prediction showed that the rare and unique taxa, which were detected with the enrichment cultures, have potential to perform important functions in soil systems. We suggest that enrichment culture-based amplicon sequencing could be a beneficial approach to evaluate a cultured bacterial community. Combining this approach together with the eDNA method could provide more comprehensive information of a bacterial community. We expected that more unique cultured taxa could be detected if further studies used both selective and non-selective culture media to enrich bacteria at the first step.

Download Full-text

Extremely Halophilic Biohydrogen Producing Microbial Communities from High-Salinity Soil and Salt Evaporation Pond

Fuels ◽

10.3390/fuels2020014 ◽

2021 ◽

Vol 2 (2) ◽

pp. 241-252

Author(s):

Dyah Asri Handayani Taroepratjeka ◽

Tsuyoshi Imai ◽

Prapaipid Chairattanamanokorn ◽

Alissara Reungsang

Keyword(s):

Microbial Communities ◽

High Throughput ◽

High Throughput Sequencing ◽

High Salinity ◽

Amplicon Sequencing ◽

Spatial Proximity ◽

Lignocellulosic Waste ◽

Evaporation Pond ◽

Operational Taxonomic Units ◽

Determining Factor

Extreme halophiles offer the advantage to save on the costs of sterilization and water for biohydrogen production from lignocellulosic waste after the pretreatment process with their ability to withstand extreme salt concentrations. This study identifies the dominant hydrogen-producing genera and species among the acclimatized, extremely halotolerant microbial communities taken from two salt-damaged soil locations in Khon Kaen and one location from the salt evaporation pond in Samut Sakhon, Thailand. The microbial communities’ V3–V4 regions of 16srRNA were analyzed using high-throughput amplicon sequencing. A total of 345 operational taxonomic units were obtained and the high-throughput sequencing confirmed that Firmicutes was the dominant phyla of the three communities. Halanaerobium fermentans and Halanaerobacter lacunarum were the dominant hydrogen-producing species of the communities. Spatial proximity was not found to be a determining factor for similarities between these extremely halophilic microbial communities. Through the study of the microbial communities, strategies can be developed to increase biohydrogen molar yield.

Download Full-text

Development of a multiplex amplicon‐sequencing assay to detect low‐frequency mutations in poinsettia ( Euphorbia pulcherrima ) breeding programmes

Plant Breeding ◽

10.1111/pbr.12925 ◽

2021 ◽

Author(s):

Vinicius Vilperte ◽

Robert Boehm ◽

Thomas Debener

Keyword(s):

Low Frequency ◽

Amplicon Sequencing ◽

Euphorbia Pulcherrima ◽

Breeding Programmes

Download Full-text

Handling of spurious sequences affects the outcome of high-throughput 16S rRNA gene amplicon profiling

ISME Communications ◽

10.1038/s43705-021-00033-z ◽

2021 ◽

Vol 1 (1) ◽

Author(s):

Sandra Reitmeier ◽

Thomas C. A. Hitch ◽

Nicole Treichel ◽

Nikolaos Fikas ◽

Bela Hausmann ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Amplicon Sequencing ◽

Rrna Gene ◽

Diversity Analysis ◽

Careful Attention ◽

Operational Taxonomic Units ◽

Basic Concepts ◽

Gnotobiotic Mice ◽

Mock Communities

Abstract16S rRNA gene amplicon sequencing is a popular approach for studying microbiomes. However, some basic concepts have still not been investigated comprehensively. We studied the occurrence of spurious sequences using defined microbial communities based on data either from the literature or generated in three sequencing facilities and analyzed via both operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) approaches. OTU clustering and singleton removal, a commonly used approach, delivered approximately 50% (mock communities) to 80% (gnotobiotic mice) spurious taxa. The fraction of spurious taxa was generally lower based on ASV analysis, but varied depending on the gene region targeted and the barcoding system used. A relative abundance of 0.25% was found as an effective threshold below which the analysis of spurious taxa can be prevented to a large extent in both OTU- and ASV-based analysis approaches. Using this cutoff improved the reproducibility of analysis, i.e., variation in richness estimates was reduced by 38% compared with singleton filtering using six human fecal samples across seven sequencing runs. Beta-diversity analysis of human fecal communities was markedly affected by both the filtering strategy and the type of phylogenetic distances used for comparison, highlighting the importance of carefully analyzing data before drawing conclusions on microbiome changes. In summary, handling of artifact sequences during bioinformatic processing of 16S rRNA gene amplicon data requires careful attention to avoid the generation of misleading findings. We propose the concept of effective richness to facilitate the comparison of alpha-diversity across studies.

Download Full-text

From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data

GigaScience ◽

10.1093/gigascience/giw017 ◽

2017 ◽

Vol 6 (2) ◽

Cited By ~ 16

Author(s):

Mohamed Mysara ◽

Mercy Njima ◽

Natalie Leys ◽

Jeroen Raes ◽

Pieter Monsieurs

Keyword(s):

Amplicon Sequencing ◽

Sequencing Data ◽

Operational Taxonomic Units ◽

Processing Pipeline

Download Full-text

Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding

MycoKeys ◽

10.3897/mycokeys.39.28109 ◽

2018 ◽

Vol 39 ◽

pp. 29-40 ◽

Cited By ~ 21

Author(s):

Sten Anslan ◽

R. Henrik Nilsson ◽

Christian Wurzbacher ◽

Petr Baldrian ◽

Leho Tedersoo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Potential Effect ◽

Data Sets ◽

Sequencing Data ◽

Operational Taxonomic Units ◽

High Throughput Sequencing Data ◽

Recent Developments

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.

Download Full-text

Assessment of SARS-CoV-2 genome sequencing: quality criteria and low frequency variants

Journal of Clinical Microbiology ◽

10.1128/jcm.00944-21 ◽

2021 ◽

Author(s):

Damien Jacot ◽

Trestan Pillonel ◽

Gilbert Greub ◽

Claire Bertelli

Keyword(s):

Sample Selection ◽

Low Frequency ◽

Pcr Amplification ◽

Quality Criterion ◽

Quality Criteria ◽

Sequencing Errors ◽

Sequencing Quality ◽

Quality Control Criteria ◽

Control Criteria ◽

Sequence Quality

Although many laboratories worldwide have developed their sequencing capacities in response to the need for SARS-CoV-2 genome-based surveillance of variants, only few reported some quality criteria to ensure sequence quality before lineage assignment and submission to public databases. Hence, we aimed here to provide simple quality control criteria for SARS-CoV-2 sequencing to prevent erroneous interpretation of low quality or contaminated data. We retrospectively investigated 647 SARS-CoV-2 genomes obtained over ten tiled amplicons sequencing runs. We extracted 26 potentially relevant metrics covering the entire workflow from sample selection to bioinformatics analysis. Based on data distribution, critical values were established for eleven selected metrics to prompt further quality investigations for problematic samples, in particular those with a low viral RNA quantity. Low frequency variants (<70% of supporting reads) can result from PCR amplification errors, sample cross contaminations or presence of distinct SARS-CoV2 genomes in the sample sequenced. The number and the prevalence of low frequency variants can be used as a robust quality criterion to identify possible sequencing errors or contaminations. Overall, we propose eleven metrics with fixed cutoff values as a simple tool to evaluate the quality of SARS-CoV-2 genomes, among which cycle thresholds, mean depth, proportion of genome covered at least 10x and the number of low frequency variants combined with mutation prevalence data.

Download Full-text

Optimal sequence similarity thresholds for clustering of molecular operational taxonomic units in DNA metabarcoding studies

10.22541/au.163398436.69595421/v1 ◽

2021 ◽

Author(s):

Aurelie Bonin ◽

Alessia Guerrieri ◽

Gentile Francesco Ficetola

Keyword(s):

Sequence Similarity ◽

Optimal Sequence ◽

Operational Taxonomic Units ◽

Dna Metabarcoding ◽

Similarity Thresholds

Download Full-text

Study of Oak Ridge soils using BONCAT-FACS-Seq reveals that a large fraction of the soil microbiome is active

10.1101/404087 ◽

2018 ◽

Cited By ~ 5

Author(s):

Estelle Couradeau ◽

Joelle Sasse ◽

Danielle Goudeau ◽

Nandita Nath ◽

Terry C. Hazen ◽

...

Keyword(s):

Soil Microbes ◽

Sequence Similarity ◽

Large Fraction ◽

Amplicon Sequencing ◽

Bulk Soil ◽

Soil Microbiome ◽

Active Fraction ◽

Soil Microbial Diversity ◽

Oak Ridge

AbstractThe ability to link soil microbial diversity to soil processes requires technologies that differentiate active subpopulations of microbes from so-called relic DNA and dormant cells. Measures of microbial activity based on various techniques including DNA labelling have suggested that most cells in soils are inactive, a fact that has been difficult to reconcile with observed high levels of bulk soil activities. We hypothesized that measures of in situ DNA synthesis may be missing the soil microbes that are metabolically active but not replicating, and we therefore applied BONCAT (Bioorthogonal Non Canonical Amino Acid Tagging) i.e. a proxy for activity that does not rely on cell division, to measure translationally active cells in soils. We compared the active population of two soil depths from Oak Ridge (TN) incubated under the same conditions for up to seven days. Depending on the soil, a maximum of 25 – 70% of the cells were active, accounting for 3-4 million cells per gram of soil type, which is an order of magnitude higher than previous estimates. The BONCAT positive cell fraction was recovered by fluorescence activated cell sorting (FACS) and identified by 16S rDNA amplicon sequencing. The diversity of the active fraction was a selected subset of the bulk soil community. Excitingly, some of the same members of the community were recruited at both depths independently from their abundance rank. On average, 86% of sequence reads recovered from the active community shared >97% sequence similarity with cultured isolates from the field site. Our observations are in line with a recent report that, of the few taxa that are both abundant and ubiquitous in soil, 45% are also cultured – and indeed some of these ubiquitous microorganisms were found to be translationally active. The use of BONCAT on soil microbiomes provides evidence that a large portion of the soil microbes can be active simultaneously. We conclude that BONCAT coupled to FACS and sequencing is effective for interrogating the active fraction of soil microbiomes in situ and provides new perspectives to link metabolic capacity to overall soil ecological traits and processes.

Download Full-text