Kelpie: generating full-length ‘amplicons’ from whole-metagenome datasets

Introduction. Whole-metagenome sequencing can be a rich source of information about the structure and function of entire metagenomic communities, but getting accurate and reliable results from these datasets can be challenging. Analysis of these datasets is founded on the mapping of sequencing reads onto known genomic regions from known organisms, but short reads will often map equally well to multiple regions, and to multiple reference organisms. Assembling metagenomic datasets prior to mapping can generate much longer and more precisely mappable sequences but the presence of closely related organisms and highly conserved regions makes metagenomic assembly challenging, and some regions of particular interest can assemble poorly. One solution to these problems is to use specialised tools, such as Kelpie, that can accurately extract and assemble full-length sequences for defined genomic regions from whole-metagenome datasets. Methods. Kelpie is a kMer-based tool that generates full-length amplicon-like sequences from whole-metagenome datasets. It takes a pair of primer sequences and a set of metagenomic reads, and uses a combination of kMer filtering, error correction and assembly techniques to construct sets of full-length inter-primer sequences. Results. The effectiveness of Kelpie is demonstrated here through the extraction and assembly of full-length ribosomal marker gene regions, as this allows comparisons with conventional amplicon sequencing and published metagenomic benchmarks. The results show that the Kelpie-generated sequences and community profiles closely match those produced by amplicon sequencing, down to low abundance levels, and running Kelpie on the synthetic CAMI metagenomic benchmarking datasets shows similar high levels of both precision and recall. Conclusions. Kelpie can be thought of as being somewhat like an in-silico PCR tool, taking a primer pair and producing the resulting ‘amplicons’ from a whole-metagenome dataset. Marker regions from the 16S rRNA gene were used here as an example because this allowed the overall accuracy of Kelpie to be evaluated through comparisons with other datasets, approaches and benchmarks. Kelpie is not limited to this application though, and can be used to extract and assemble any genomic region present in a whole metagenome dataset, as long as it is bound by a pairs of highly conserved primer sequences.

Download Full-text

Kelpie: generating full-length ‘amplicons’ from whole-metagenome datasets

PeerJ ◽

10.7717/peerj.6174 ◽

2019 ◽

Vol 6 ◽

pp. e6174 ◽

Cited By ~ 1

Author(s):

Paul Greenfield ◽

Nai Tran-Dinh ◽

David Midgley

Keyword(s):

Marker Gene ◽

Amplicon Sequencing ◽

Genomic Region ◽

Full Length ◽

Rrna Gene ◽

Multiple Regions ◽

Metagenome Sequencing ◽

And Function ◽

Genomic Regions ◽

Metagenomic Assembly

Introduction Whole-metagenome sequencing can be a rich source of information about the structure and function of entire metagenomic communities, but getting accurate and reliable results from these datasets can be challenging. Analysis of these datasets is founded on the mapping of sequencing reads onto known genomic regions from known organisms, but short reads will often map equally well to multiple regions, and to multiple reference organisms. Assembling metagenomic datasets prior to mapping can generate much longer and more precisely mappable sequences but the presence of closely related organisms and highly conserved regions makes metagenomic assembly challenging, and some regions of particular interest can assemble poorly. One solution to these problems is to use specialised tools, such as Kelpie, that can accurately extract and assemble full-length sequences for defined genomic regions from whole-metagenome datasets. Methods Kelpie is a kMer-based tool that generates full-length amplicon-like sequences from whole-metagenome datasets. It takes a pair of primer sequences and a set of metagenomic reads, and uses a combination of kMer filtering, error correction and assembly techniques to construct sets of full-length inter-primer sequences. Results The effectiveness of Kelpie is demonstrated here through the extraction and assembly of full-length ribosomal marker gene regions, as this allows comparisons with conventional amplicon sequencing and published metagenomic benchmarks. The results show that the Kelpie-generated sequences and community profiles closely match those produced by amplicon sequencing, down to low abundance levels, and running Kelpie on the synthetic CAMI metagenomic benchmarking datasets shows similar high levels of both precision and recall. Conclusions Kelpie can be thought of as being somewhat like an in-silico PCR tool, taking a primer pair and producing the resulting ‘amplicons’ from a whole-metagenome dataset. Marker regions from the 16S rRNA gene were used here as an example because this allowed the overall accuracy of Kelpie to be evaluated through comparisons with other datasets, approaches and benchmarks. Kelpie is not limited to this application though, and can be used to extract and assemble any genomic region present in a whole metagenome dataset, as long as it is bound by a pairs of highly conserved primer sequences.

Download Full-text

Kelpie: generating full-length ‘amplicons’ from whole-metagenome datasets

10.7287/peerj.preprints.27376v1 ◽

2018 ◽

Author(s):

Paul Greenfield ◽

Nai Tran-Dinh ◽

David Midgley

Keyword(s):

Marker Gene ◽

Amplicon Sequencing ◽

Genomic Region ◽

Full Length ◽

Rrna Gene ◽

Multiple Regions ◽

Metagenome Sequencing ◽

And Function ◽

Genomic Regions ◽

Metagenomic Assembly

Download Full-text

A comparison of approaches to scaffolding multiple regions along the 16S rRNA gene for improved resolution

10.1101/2021.03.23.436606 ◽

2021 ◽

Author(s):

Justine W Debelius ◽

Michael Robeson ◽

Luisa W. Hugerth ◽

Fredrik Boulund ◽

Weimin Ye ◽

...

Keyword(s):

16S Rrna ◽

Marker Gene ◽

Alpha Diversity ◽

Real Data ◽

Full Length ◽

Rrna Genes ◽

Taxonomic Resolution ◽

Rrna Gene ◽

Multiple Regions ◽

Tree Building

AbstractMotivationFull length, high resolution 16s rRNA marker gene sequencing has been challenging historically. Short amplicons provide high accuracy reads with widely available equipment, at the cost of taxonomic resolution. One recent proposal has been to reconstruct multiple amplicons along the full-length marker gene, however no barcode-free computationally tractable approach for this is available. To address this gap, we present Sidle (SMURF Implementation Done to acceLerate Efficiency), an implementation of the Short MUltiple Reads Framework algorithm with a novel tree building approach to reconstruct rRNA genes from individually amplified regions.ResultsUsing simulated and real data, we compared Sidle to two other approaches of leveraging multiple gene region data. We found that Sidle had the least bias in non-phylogenetic alpha diversity, feature-based measures of beta diversity, and the reconstruction of individual clades. With a curated database, Sidle also provided the most precise species-level resolution.Availability and ImplementationSidle is available under a BSD 3 license from https://github.com/jwdebelius/q2-sidle

Download Full-text

High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution

Nucleic Acids Research ◽

10.1093/nar/gkz569 ◽

2019 ◽

Vol 47 (18) ◽

pp. e103-e103 ◽

Cited By ~ 58

Author(s):

Benjamin J Callahan ◽

Joan Wong ◽

Cheryl Heiner ◽

Steve Oh ◽

Casey M Theriot ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

High Throughput ◽

Amplicon Sequencing ◽

Full Length ◽

Rrna Gene ◽

Single Nucleotide ◽

Full Complement ◽

Nucleotide Resolution ◽

Single Nucleotide Resolution

AbstractTargeted PCR amplification and high-throughput sequencing (amplicon sequencing) of 16S rRNA gene fragments is widely used to profile microbial communities. New long-read sequencing technologies can sequence the entire 16S rRNA gene, but higher error rates have limited their attractiveness when accuracy is important. Here we present a high-throughput amplicon sequencing methodology based on PacBio circular consensus sequencing and the DADA2 sample inference method that measures the full-length 16S rRNA gene with single-nucleotide resolution and a near-zero error rate. In two artificial communities of known composition, our method recovered the full complement of full-length 16S sequence variants from expected community members without residual errors. The measured abundances of intra-genomic sequence variants were in the integral ratios expected from the genuine allelic variants within a genome. The full-length 16S gene sequences recovered by our approach allowed Escherichia coli strains to be correctly classified to the O157:H7 and K12 sub-species clades. In human fecal samples, our method showed strong technical replication and was able to recover the full complement of 16S rRNA alleles in several E. coli strains. There are likely many applications beyond microbial profiling for which high-throughput amplicon sequencing of complete genes with single-nucleotide resolution will be of use.

Download Full-text

Superior resolution characterisation of microbial diversity in anaerobic digesters using full-length 16S rRNA gene amplicon sequencing

Water Research ◽

10.1016/j.watres.2020.115815 ◽

2020 ◽

Vol 178 ◽

pp. 115815 ◽

Cited By ~ 2

Author(s):

Theo Y.C. Lam ◽

Ran Mei ◽

Zhuoying Wu ◽

Patrick K.H. Lee ◽

Wen-Tso Liu ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Microbial Diversity ◽

Amplicon Sequencing ◽

Full Length ◽

Rrna Gene ◽

Anaerobic Digesters

Download Full-text

Press xenobiotic disturbance favors deterministic assembly with a shift in function and structure of bacterial communities in sludge bioreactors

10.1101/2020.10.15.341966 ◽

2020 ◽

Author(s):

Ezequiel Santillan ◽

Hari Seshan ◽

Stefan Wuertz

Keyword(s):

Community Structure ◽

Bacterial Communities ◽

Temporal Dynamics ◽

Treatment Plant ◽

Amplicon Sequencing ◽

Rrna Gene ◽

Batch Reactors ◽

Function Structure ◽

Community Function ◽

And Function

AbstractDisturbance is thought to affect community assembly mechanisms, which in turn shape community structure and the overall function of the ecosystem. Here, we tested the effect of a continuous (press) xenobiotic disturbance on the function, structure, and assembly of bacterial communities within a wastewater treatment system. Two sets of four-liter sequencing batch reactors were operated in triplicate with and without the addition of 3-chloroaniline for a period of 132 days, following 58 days of acclimation after inoculation with sludge from a full-scale treatment plant. Temporal dynamics of bacterial community structure were derived from 16S rRNA gene amplicon sequencing. Community function, structure and assembly differed between press disturbed and undisturbed reactors. Temporal partitioning of assembly mechanisms via phylogenetic and non-phylogenetic null modelling analysis revealed that deterministic assembly prevailed for disturbed bioreactors, while the role of stochastic assembly was stronger for undisturbed reactors. Our findings are relevant because research spanning various disturbance types, environments and spatiotemporal scales is needed for a comprehensive understanding of the effects of press disturbances on assembly mechanisms, structure, and function of microbial communities.Graphical abstract

Download Full-text

Establishment and Assessment of An Amplicon Sequencing Method Targeting The 16S-ITS-23S rRNA Operon For Analysis of The Equine Gut Microbiome

10.21203/rs.3.rs-156589/v1 ◽

2021 ◽

Author(s):

Yuta Kinoshita ◽

Hidekazu NIWA ◽

Eri UCHIDA-FUJII ◽

Toshio NUKADA

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Amplicon Sequencing ◽

Full Length ◽

Rrna Operon ◽

Rrna Genes ◽

Taxonomic Resolution ◽

23S Rrna ◽

Rrna Gene ◽

Fecal Samples

Abstract Microbial communities are commonly studied by using amplicon sequencing of part of the 16S rRNA gene. Sequencing of the full-length 16S rRNA gene can provide higher taxonomic resolution and accuracy. To obtain even higher taxonomic resolution, with as few false-positives as possible, we assessed a method using long amplicon sequencing targeting the rRNA operon combined with a CCMetagen pipeline. Taxonomic assignment had >90% accuracy at the species level in a mock sample and at the family level in equine fecal samples, generating similar taxonomic composition as shotgun sequencing. The rRNA operon amplicon sequencing of equine fecal samples underestimated compositional percentages of bacterial strains containing unlinked rRNA genes by a third to almost a half, but unlinked rRNA genes had a limited effect on the overall results. The rRNA operon amplicon sequencing with the A519F + U2428R primer set was able to reflect archaeal genomes, whereas full-length 16S rRNA with 27F + 1492R could not. Therefore, we conclude that amplicon sequencing targeting the rRNA operon captures more detailed variations of bacterial and archaeal microbiota.

Download Full-text

Identification of Multiple Blastocystis Subtypes in Domestic Animals From Colombia Using Amplicon-Based Next Generation Sequencing

Frontiers in Veterinary Science ◽

10.3389/fvets.2021.732129 ◽

2021 ◽

Vol 8 ◽

Cited By ~ 1

Author(s):

Adriana Higuera ◽

Giovanny Herrera ◽

Paula Jimenez ◽

Diego García-Corredor ◽

Martin Pulido-Medellín ◽

...

Keyword(s):

Amplicon Sequencing ◽

Domestic Animals ◽

Small Subunit ◽

Full Length ◽

Farm Animals ◽

Rrna Gene ◽

Next Generation ◽

Fecal Samples ◽

Two Samples ◽

Sequencing Strategy

Blastocystis is frequently reported in fecal samples from animals and humans worldwide, and a variety of subtypes (STs) have been observed in wild and domestic animals. In Colombia, few studies have focused on the transmission dynamics and epidemiological importance of Blastocystis in animals. In this study, we characterized the frequency and subtypes of Blastocystis in fecal samples of domestic animals including pigs, minipigs, cows, dogs, horses, goats, sheep, and llama from three departments of Colombia. Of the 118 fecal samples included in this study 81.4% (n = 96) were positive for Blastocystis using a PCR that amplifies a fragment of the small subunit ribosomal RNA (SSU rRNA) gene. PCR positive samples were sequenced by next generation amplicon sequencing (NGS) to determine subtypes. Eleven subtypes were detected, ten previously reported, ST5 (50.7%), ST10 (47.8%), ST25 (34.3%), ST26 (29.8%), ST21 (22.4%), ST23 (22.4%), ST1 (17.9%), ST14 (16.4%), ST24 (14.9%), ST3 (7.5%), and a novel subtype, named ST32 (3.0%). Mixed infection and/or intra -subtype variations were identified in most of the samples. Novel ST32 was observed in two samples from a goat and a cow. To support novel subtype designation, a MinION based sequencing strategy was used to generate the full-length of the SSU rRNA gene. Comparison of full-length nucleotide sequences with those from current valid subtypes supported the designation of ST32. This is the first study in Colombia using NGS to molecularly characterize subtypes of Blastocystis in farm animals. A great diversity of subtypes was observed in domestic animals including subtypes previously identified in humans. Additionally, subtype overlap between the different hosts examined in this study were observed. These findings highlight the presence of Blastocystis subtypes with zoonotic potential in farm animals indicating that farm animals could play a role in transmission to humans.

Download Full-text

Buffalo Gut Microbes May Affect the Host Th2 Responses During Fasciola Gigantica Infection via SCFAs Metabolism and TLR Signaling Pathway

10.21203/rs.3.rs-135564/v1 ◽

2020 ◽

Author(s):

Jun Li ◽

Zhao’an Sheng ◽

Dongying Wang ◽

Yaoyao Zhang ◽

Wei Shi ◽

...

Keyword(s):

Signaling Pathway ◽

Time Course ◽

Rrna Gene ◽

Th2 Response ◽

Tlr Signaling ◽

Gut Microbes ◽

Th2 Responses ◽

Metagenome Sequencing ◽

And Function ◽

Tlr Signaling Pathway

Abstract Background: Helminth-induced Th2 responses are essential to modify the structure and diversity of gut microbes. However, observations have come mainly from studies of helminth-infected humans or rodent models. Very little research has been conducted in veterinary animals. Methods: In this study, we searched for links between microbiota and Th2-biased responses during the time course of Fasciala gigantica infection in buffaloes.16S rRNA gene amplicon and metagenome sequencing were applied to analyze the structure and function of the gut microbiota. Results: Both alpha and beta diversities decreased during infection, and gut microbes differed considerably across different sections of the gut at different stages. Immune responses changed when the microbiota traverses the gut wall into the peritoneal cavity, in line with the changes in Th2 response induced by F. gigantica infection. We found that the order Coriobacteriales was greatly decreased at the early stages in which the Peptostreptococcaceae and Family_XIII families are closely linked to the upregulation of IgG1 and IL4, respectively. The F. gigantica infection significantly reduced short-chain fatty acid (SCFAs)-producing microbes, reduced the concentrations of gut SCFAs and downregulated the SCFAs-producing metabolic pathways. In addition, The microbes associated with TLR2 increased and showed similar trend to the TLR2 and Th2 cytokine production during infection, suggesting that bacteria ligands might recognize TLR2 and subsequently induce a Th2-biased response. Conclusions: Our data show that buffalo gut microbes may affect the host Th2 response during F. gigantica infection via the SCFAs metabolism and TLR signaling pathway. These findings provide new insights into the relationship between F. gigantica–microbiota-host, which may provide new potential therapeutic targets for prevention and control Fasciolosis.

Download Full-text

Full-length 16S rRNA gene amplicon analysis of human gut microbiota using MinION™ nanopore sequencing confers species-level resolution

BMC Microbiology ◽

10.1186/s12866-021-02094-5 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yoshiyuki Matsuo ◽

Shinnosuke Komiya ◽

Yoshiaki Yasumizu ◽

Yuki Yasuoka ◽

Katsura Mizushima ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Amplicon Sequencing ◽

Species Level ◽

Full Length ◽

Rrna Gene ◽

Short Read ◽

Short Read Sequencing ◽

Long Read

Abstract Background Species-level genetic characterization of complex bacterial communities has important clinical applications in both diagnosis and treatment. Amplicon sequencing of the 16S ribosomal RNA (rRNA) gene has proven to be a powerful strategy for the taxonomic classification of bacteria. This study aims to improve the method for full-length 16S rRNA gene analysis using the nanopore long-read sequencer MinION™. We compared it to the conventional short-read sequencing method in both a mock bacterial community and human fecal samples. Results We modified our existing protocol for full-length 16S rRNA gene amplicon sequencing by MinION™. A new strategy for library construction with an optimized primer set overcame PCR-associated bias and enabled taxonomic classification across a broad range of bacterial species. We compared the performance of full-length and short-read 16S rRNA gene amplicon sequencing for the characterization of human gut microbiota with a complex bacterial composition. The relative abundance of dominant bacterial genera was highly similar between full-length and short-read sequencing. At the species level, MinION™ long-read sequencing had better resolution for discriminating between members of particular taxa such as Bifidobacterium, allowing an accurate representation of the sample bacterial composition. Conclusions Our present microbiome study, comparing the discriminatory power of full-length and short-read sequencing, clearly illustrated the analytical advantage of sequencing the full-length 16S rRNA gene.

Download Full-text