Ultra-accurate microbial amplicon sequencing with synthetic long reads

Abstract Background Out of the many pathogenic bacterial species that are known, only a fraction are readily identifiable directly from a complex microbial community using standard next generation DNA sequencing. Long-read sequencing offers the potential to identify a wider range of species and to differentiate between strains within a species, but attaining sufficient accuracy in complex metagenomes remains a challenge. Methods Here, we describe and analytically validate LoopSeq, a commercially available synthetic long-read (SLR) sequencing technology that generates highly accurate long reads from standard short reads. Results LoopSeq reads are sufficiently long and accurate to identify microbial genes and species directly from complex samples. LoopSeq perfectly recovered the full diversity of 16S rRNA genes from known strains in a synthetic microbial community. Full-length LoopSeq reads had a per-base error rate of 0.005%, which exceeds the accuracy reported for other long-read sequencing technologies. 18S-ITS and genomic sequencing of fungal and bacterial isolates confirmed that LoopSeq sequencing maintains that accuracy for reads up to 6 kb in length. LoopSeq full-length 16S rRNA reads could accurately classify organisms down to the species level in rinsate from retail meat samples, and could differentiate strains within species identified by the CDC as potential foodborne pathogens. Conclusions The order-of-magnitude improvement in length and accuracy over standard Illumina amplicon sequencing achieved with LoopSeq enables accurate species-level and strain identification from complex- to low-biomass microbiome samples. The ability to generate accurate and long microbiome sequencing reads using standard short read sequencers will accelerate the building of quality microbial sequence databases and removes a significant hurdle on the path to precision microbial genomics.

Download Full-text

Ultra-accurate Microbial Amplicon Sequencing Directly from Complex Samples with Synthetic Long Reads

10.1101/2020.07.07.192286 ◽

2020 ◽

Cited By ~ 4

Author(s):

Benjamin J Callahan ◽

Dmitry Grinevich ◽

Siddhartha Thakur ◽

Michael A Balamotis ◽

Tuval Ben Yehezkel

Keyword(s):

Microbial Community ◽

16S Rrna ◽

Amplicon Sequencing ◽

Species Level ◽

Full Length ◽

Rrna Genes ◽

Sequencing Technology ◽

Complex Samples ◽

Long Reads ◽

Long Read

AbstractOut of the many pathogenic bacterial species that are known, only a fraction are readily identifiable directly from a complex microbial community using standard next generation DNA sequencing technology. Long-read sequencing offers the potential to identify a wider range of species and to differentiate between strains within a species, but attaining sufficient accuracy in complex metagenomes remains a challenge. Here, we describe and analytically validate LoopSeq, a commercially-available synthetic long-read (SLR) sequencing technology that generates highly-accurate long reads from standard short reads. LoopSeq reads are sufficiently long and accurate to identify microbial genes and species directly from complex samples. LoopSeq applied to full-length 16S rRNA genes from known strains in a microbial community perfectly recovered the full diversity of full-length exact sequence variants in a known microbial community. Full-length LoopSeq reads had a per-base error rate of 0.005%, which exceeds the accuracy reported for other long-read sequencing technologies. 18S-ITS and genomic sequencing of fungal and bacterial isolates confirmed that LoopSeq sequencing maintains that accuracy for reads up to 6 kilobases in length. Analysis of rinsate from retail meat samples demonstrated that LoopSeq full-length 16S rRNA synthetic long-reads could accurately classify organisms down to the species level, and could differentiate between different strains within species identified by the CDC as potential foodborne pathogens. The order-of-magnitude improvement in both length and accuracy over standard Illumina amplicon sequencing achieved with LoopSeq enables accurate species-level and strain identification from complex and low-biomass microbiome samples. The ability to generate accurate and long microbiome sequencing reads using standard short read sequencers will accelerate the building of quality microbial sequence databases and removes a significant hurdle on the path to precision microbial genomics.

Download Full-text

Microdiversity and phylogeographic diversification of bacterioplankton in pelagic freshwater systems revealed through long-read amplicon sequencing

Microbiome ◽

10.1186/s40168-020-00974-y ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Yusuke Okazaki ◽

Shohei Fujinaga ◽

Michaela M. Salcher ◽

Cristiana Callieri ◽

Atsushi Tanaka ◽

...

Keyword(s):

16S Rrna ◽

Regional Scale ◽

Scale Up ◽

Amplicon Sequencing ◽

Freshwater Ecosystems ◽

16S Rrna Genes ◽

Rrna Genes ◽

Rrna Gene ◽

Metagenomic Sequencing ◽

Long Read

Abstract Background Freshwater ecosystems are inhabited by members of cosmopolitan bacterioplankton lineages despite the disconnected nature of these habitats. The lineages are delineated based on > 97% 16S rRNA gene sequence similarity, but their intra-lineage microdiversity and phylogeography, which are key to understanding the eco-evolutional processes behind their ubiquity, remain unresolved. Here, we applied long-read amplicon sequencing targeting nearly full-length 16S rRNA genes and the adjacent ribosomal internal transcribed spacer sequences to reveal the intra-lineage diversities of pelagic bacterioplankton assemblages in 11 deep freshwater lakes in Japan and Europe. Results Our single nucleotide-resolved analysis, which was validated using shotgun metagenomic sequencing, uncovered 7–101 amplicon sequence variants for each of the 11 predominant bacterial lineages and demonstrated sympatric, allopatric, and temporal microdiversities that could not be resolved through conventional approaches. Clusters of samples with similar intra-lineage population compositions were identified, which consistently supported genetic isolation between Japan and Europe. At a regional scale (up to hundreds of kilometers), dispersal between lakes was unlikely to be a limiting factor, and environmental factors or genetic drift were potential determinants of population composition. The extent of microdiversification varied among lineages, suggesting that highly diversified lineages (e.g., Iluma-A2 and acI-A1) achieve their ubiquity by containing a consortium of genotypes specific to each habitat, while less diversified lineages (e.g., CL500-11) may be ubiquitous due to a small number of widespread genotypes. The lowest extent of intra-lineage diversification was observed among the dominant hypolimnion-specific lineage (CL500-11), suggesting that their dispersal among lakes is not limited despite the hypolimnion being a more isolated habitat than the epilimnion. Conclusions Our novel approach complemented the limited resolution of short-read amplicon sequencing and limited sensitivity of the metagenome assembly-based approach, and highlighted the complex ecological processes underlying the ubiquity of freshwater bacterioplankton lineages. To fully exploit the performance of the method, its relatively low read throughput is the major bottleneck to be overcome in the future.

Download Full-text

Full-length 16S rRNA gene amplicon analysis of human gut microbiota using MinION™ nanopore sequencing confers species-level resolution

BMC Microbiology ◽

10.1186/s12866-021-02094-5 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yoshiyuki Matsuo ◽

Shinnosuke Komiya ◽

Yoshiaki Yasumizu ◽

Yuki Yasuoka ◽

Katsura Mizushima ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Amplicon Sequencing ◽

Species Level ◽

Full Length ◽

Rrna Gene ◽

Short Read ◽

Short Read Sequencing ◽

Long Read

Abstract Background Species-level genetic characterization of complex bacterial communities has important clinical applications in both diagnosis and treatment. Amplicon sequencing of the 16S ribosomal RNA (rRNA) gene has proven to be a powerful strategy for the taxonomic classification of bacteria. This study aims to improve the method for full-length 16S rRNA gene analysis using the nanopore long-read sequencer MinION™. We compared it to the conventional short-read sequencing method in both a mock bacterial community and human fecal samples. Results We modified our existing protocol for full-length 16S rRNA gene amplicon sequencing by MinION™. A new strategy for library construction with an optimized primer set overcame PCR-associated bias and enabled taxonomic classification across a broad range of bacterial species. We compared the performance of full-length and short-read 16S rRNA gene amplicon sequencing for the characterization of human gut microbiota with a complex bacterial composition. The relative abundance of dominant bacterial genera was highly similar between full-length and short-read sequencing. At the species level, MinION™ long-read sequencing had better resolution for discriminating between members of particular taxa such as Bifidobacterium, allowing an accurate representation of the sample bacterial composition. Conclusions Our present microbiome study, comparing the discriminatory power of full-length and short-read sequencing, clearly illustrated the analytical advantage of sequencing the full-length 16S rRNA gene.

Download Full-text

Seasonal Variation of Microbial Diversity of Coastal Sediment in Tongyeong, South Korea, Using 16S rRNA Gene Amplicon Sequencing

Microbiology Resource Announcements ◽

10.1128/mra.00446-21 ◽

2021 ◽

Vol 10 (27) ◽

Author(s):

Nur Indradewi Oktavitri ◽

Jong-Oh Kim ◽

Kyunghoi Kim

Keyword(s):

Seasonal Variation ◽

Microbial Community ◽

16S Rrna ◽

South Korea ◽

Microbial Diversity ◽

Amplicon Sequencing ◽

16S Rrna Genes ◽

Rrna Genes ◽

Rrna Gene ◽

Generation Sequencing

Benthic microbial diversity in Tongyeong, South Korea, was analyzed using next-generation sequencing of the 16S rRNA genes, to reveal the effects of seasonal variations on the microbial community in sediment. Proteobacteria was the dominant phylum, with a relative abundance of 61.5 to 68.1%.

Download Full-text

Full-length 16S rRNA gene amplicon analysis of human gut microbiota using MinION™ nanopore sequencing confers species-level resolution

10.1101/2020.05.06.078147 ◽

2020 ◽

Cited By ~ 3

Author(s):

Yoshiyuki Matsuo ◽

Shinnosuke Komiya ◽

Yoshiaki Yasumizu ◽

Yuki Yasuoka ◽

Katsura Mizushima ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Amplicon Sequencing ◽

Species Level ◽

Full Length ◽

Rrna Gene ◽

Short Read ◽

Short Read Sequencing ◽

16S Amplicon Sequencing ◽

Long Read

AbstractBackgroundSpecies-level genetic characterization of complex bacterial communities has important clinical applications in both diagnosis and treatment. Amplicon sequencing of the 16S ribosomal RNA (rRNA) gene has proven to be a powerful strategy for the taxonomic classification of bacteria. This study aims to improve the method for full-length 16S rRNA gene analysis using the nanopore long-read sequencer MinION™. We compared it to the conventional short-read sequencing method in both a mock bacterial community and human fecal samples.ResultsWe modified our existing protocol for full-length 16S amplicon sequencing by MinION™. A new strategy for library construction with an optimized primer set overcame PCR-associated bias and enabled taxonomic classification across a broad range of bacterial species. We compared the performance of full-length and short-read 16S amplicon sequencing for the characterization of human gut microbiota with a complex bacterial composition. The relative abundance of dominant bacterial genera was highly similar between full-length and short-read sequencing. At the species level, MinION™ long-read sequencing had better resolution for discriminating between members of particular taxa such as Bifidobacterium, allowing an accurate representation of the sample bacterial composition.ConclusionsOur present microbiome study, comparing the discriminatory power of full-length and short-read sequencing, clearly illustrated the analytical advantage of sequencing the full-length 16S rRNA gene, which provided the requisite species-level resolution and accuracy in clinical settings.

Download Full-text

Microdiversity and phylogeographic diversification of bacterioplankton in pelagic freshwater systems revealed through long-read amplicon sequencing

10.1101/2020.06.03.133140 ◽

2020 ◽

Cited By ~ 1

Author(s):

Yusuke Okazaki ◽

Shohei Fujinaga ◽

Michaela M. Salcher ◽

Cristiana Callieri ◽

Atsushi Tanaka ◽

...

Keyword(s):

16S Rrna ◽

Regional Scale ◽

Scale Up ◽

Amplicon Sequencing ◽

Freshwater Ecosystems ◽

16S Rrna Genes ◽

Rrna Genes ◽

Rrna Gene ◽

Metagenomic Sequencing ◽

Long Read

AbstractFreshwater ecosystems are inhabited by members of cosmopolitan bacterioplankton lineages despite the disconnected nature of these habitats. The lineages are delineated based on >97% 16S rRNA gene sequence similarity, but their intra-lineage microdiversity and phylogeography, which are key to understanding the eco-evolutional processes behind their ubiquity, remain unresolved. Here, we applied long-read amplicon sequencing targeting nearly full-length 16S rRNA genes and the adjacent ribosomal internal transcribed spacer sequences to reveal the intra-lineage diversities of pelagic bacterioplankton assemblages in 11 deep freshwater lakes in Japan and Europe. Our single nucleotide-resolved analysis, which was validated using shotgun metagenomic sequencing, uncovered 7–101 amplicon sequence variants for each of the 11 predominant bacterial lineages and demonstrated sympatric, allopatric, and temporal microdiversities that could not be resolved through conventional approaches. Clusters of samples with similar intra-lineage population compositions were identified, which consistently supported genetic isolation between Japan and Europe. At a regional scale (up to hundreds of kilometers), dispersal between lakes was unlikely to be a limiting factor, and environmental factors were potential determinants of population composition. The extent of microdiversification varied among lineages, suggesting that highly diversified lineages (e.g., Iluma-A2 and acI-A1) achieve their ubiquity by containing a consortium of genotypes specific to each habitat, while less diversified lineages (e.g., CL500-11) may be ubiquitous due to a small number of widespread genotypes. The lowest extent of intra-lineage diversification was observed among the dominant hypolimnion-specific lineage (CL500-11), suggesting that their dispersal among lakes is not limited despite the hypolimnion being a more isolated habitat than the epilimnion. Our novel approach complemented the limited resolution of short-read amplicon sequencing and limited sensitivity of the metagenome assembly-based approach, and highlighted the complex ecological processes underlying the ubiquity of freshwater bacterioplankton lineages.

Download Full-text

Species-level bacterial community profiling of the healthy sinonasal microbiome using Pacific Biosciences sequencing of full-length 16S rRNA genes

10.1101/338731 ◽

2018 ◽

Cited By ~ 2

Author(s):

Joshua P. Earl ◽

Nithin D. Adappa ◽

Jaroslaw Krol ◽

Archana S. Bhat ◽

Sergey Balashov ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Species Level ◽

Full Length ◽

16S Rrna Genes ◽

Rrna Genes ◽

Rrna Gene ◽

Mock Community ◽

Analysis Pipeline ◽

Pacific Biosciences

AbstractBackgroundPan-bacterial 16S rRNA microbiome surveys performed with massively parallel DNA sequencing technologies have transformed community microbiological studies. Current 16S profiling methods, however, fail to provide sufficient taxonomic resolution and accuracy to adequately perform species-level associative studies for specific conditions. This is due to the amplification and sequencing of only short 16S rRNA gene regions, typically providing for only family- or genus-level taxonomy. Moreover, sequencing errors often inflate the number of taxa present. Pacific Biosciences’ (PacBio’s) long-read technology in particular suffers from high error rates per base. Herein we present a microbiome analysis pipeline that takes advantage of PacBio circular consensus sequencing (CCS) technology to sequence and error correct full-length bacterial 16S rRNA genes, which provides high-fidelity species-level microbiome dataResultsAnalysis of a mock community with 20 bacterial species demonstrated 100% specificity and sensitivity. Examination of a 250-plus species mock community demonstrated correct species-level classification of >90% of taxa and relative abundances were accurately captured. The majority of the remaining taxa were demonstrated to be multiply, incorrectly, or incompletely classified. Using this methodology, we examined the microgeographic variation present among the microbiomes of six sinonasal sites, by both swab and biopsy, from the anterior nasal cavity to the sphenoid sinus from 12 subjects undergoing trans-sphenoidal hypophysectomy. We found greater variation among subjects than among sites within a subject, although significant within-individual differences were also observed.Propiniobacterium acnes(recently renamedCutibacterium acnes[1]) was the predominant species throughout, but was found at distinct relative abundances by site.ConclusionsOur microbial composition analysis pipeline for single-molecule real-time 16S rRNA gene sequencing (MCSMRT,https://github.com/jpearl01/mcsmrt) overcomes deficits of standard marker gene based microbiome analyses by using CCS of entire 16S rRNA genes to provide increased taxonomic and phylogenetic resolution. Extensions of this approach to other marker genes could help refine taxonomic assignments of microbial species and improve reference databases, as well as strengthen the specificity of associations between microbial communities and dysbiotic states.

Download Full-text

Species-level bacterial community profiling of the healthy sinonasal microbiome using Pacific Biosciences sequencing of full-length 16S rRNA genes

Microbiome ◽

10.1186/s40168-018-0569-2 ◽

2018 ◽

Vol 6 (1) ◽

Cited By ~ 25

Author(s):

Joshua P. Earl ◽

Nithin D. Adappa ◽

Jaroslaw Krol ◽

Archana S. Bhat ◽

Sergey Balashov ◽

...

Keyword(s):

Bacterial Community ◽

16S Rrna ◽

Species Level ◽

Full Length ◽

16S Rrna Genes ◽

Rrna Genes ◽

Pacific Biosciences ◽

Community Profiling

Download Full-text

An inter-laboratory study to investigate the impact of the bioinformatics component on microbiome analysis using mock communities

Scientific Reports ◽

10.1038/s41598-021-89881-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Denise M. O’Sullivan ◽

Ronan M. Doyle ◽

Sasithon Temisak ◽

Nicholas Redshaw ◽

Alexandra S. Whale ◽

...

Keyword(s):

Microbial Community ◽

16S Rrna ◽

16S Rrna Gene ◽

Relative Abundance ◽

Amplicon Sequencing ◽

16S Rrna Genes ◽

Rrna Genes ◽

Rrna Gene ◽

Microbiome Analysis ◽

The Impact

AbstractDespite the advent of whole genome metagenomics, targeted approaches (such as 16S rRNA gene amplicon sequencing) continue to be valuable for determining the microbial composition of samples. Amplicon microbiome sequencing can be performed on clinical samples from a normally sterile site to determine the aetiology of an infection (usually single pathogen identification) or samples from more complex niches such as human mucosa or environmental samples where multiple microorganisms need to be identified. The methodologies are frequently applied to determine both presence of micro-organisms and their quantity or relative abundance. There are a number of technical steps required to perform microbial community profiling, many of which may have appreciable precision and bias that impacts final results. In order for these methods to be applied with the greatest accuracy, comparative studies across different laboratories are warranted. In this study we explored the impact of the bioinformatic approaches taken in different laboratories on microbiome assessment using 16S rRNA gene amplicon sequencing results. Data were generated from two mock microbial community samples which were amplified using primer sets spanning five different variable regions of 16S rRNA genes. The PCR-sequencing analysis included three technical repeats of the process to determine the repeatability of their methods. Thirteen laboratories participated in the study, and each analysed the same FASTQ files using their choice of pipeline. This study captured the methods used and the resulting sequence annotation and relative abundance output from bioinformatic analyses. Results were compared to digital PCR assessment of the absolute abundance of each target representing each organism in the mock microbial community samples and also to analyses of shotgun metagenome sequence data. This ring trial demonstrates that the choice of bioinformatic analysis pipeline alone can result in different estimations of the composition of the microbiome when using 16S rRNA gene amplicon sequencing data. The study observed differences in terms of both presence and abundance of organisms and provides a resource for ensuring reproducible pipeline development and application. The observed differences were especially prevalent when using custom databases and applying high stringency operational taxonomic unit (OTU) cut-off limits. In order to apply sequencing approaches with greater accuracy, the impact of different analytical steps needs to be clearly delineated and solutions devised to harmonise microbiome analysis results.

Download Full-text

Connecting structure to function with the recovery of over 1000 high-quality metagenome-assembled genomes from activated sludge using long-read sequencing

Nature Communications ◽

10.1038/s41467-021-22203-2 ◽

2021 ◽

Vol 12 (1) ◽

Cited By ~ 2

Author(s):

Caitlin M. Singleton ◽

Francesca Petriglieri ◽

Jannie M. Kristensen ◽

Rasmus H. Kirkegaard ◽

Thomas Y. Michaelsen ◽

...

Keyword(s):

16S Rrna ◽

Wastewater Treatment Plants ◽

In Situ Hybridisation ◽

Amplicon Sequencing ◽

Rrna Genes ◽

Fluorescence In Situ Hybridisation ◽

Sequencing Data ◽

High Quality ◽

16S Rrna Amplicon Sequencing ◽

Long Read

AbstractMicroorganisms play crucial roles in water recycling, pollution removal and resource recovery in the wastewater industry. The structure of these microbial communities is increasingly understood based on 16S rRNA amplicon sequencing data. However, such data cannot be linked to functional potential in the absence of high-quality metagenome-assembled genomes (MAGs) for nearly all species. Here, we use long-read and short-read sequencing to recover 1083 high-quality MAGs, including 57 closed circular genomes, from 23 Danish full-scale wastewater treatment plants. The MAGs account for ~30% of the community based on relative abundance, and meet the stringent MIMAG high-quality draft requirements including full-length rRNA genes. We use the information provided by these MAGs in combination with >13 years of 16S rRNA amplicon sequencing data, as well as Raman microspectroscopy and fluorescence in situ hybridisation, to uncover abundant undescribed lineages belonging to important functional groups.

Download Full-text