PICRUSt2: An improved and customizable approach for metagenome inference

One major limitation of microbial community marker gene sequencing is that it does not provide direct information on the functional composition of sampled communities. Here, we present PICRUSt2 (https://github.com/picrust/picrust2), which expands the capabilities of the original PICRUSt method1 to predict the functional potential of a community based on marker gene sequencing profiles. This updated method and implementation includes several improvements over the previous algorithm: an expanded database of gene families and reference genomes, a new approach now compatible with any OTU-picking or denoising algorithm, and novel phenotype predictions. Upon evaluation, PICRUSt2 was more accurate than PICRUSt1 and other current approaches overall. PICRUSt2 is also now more flexible and allows the addition of custom reference databases. We highlight these improvements and also important caveats regarding the use of predicted metagenomes, which are related to the inherent challenges of analyzing metagenome data in general.

Download Full-text

Cascabel: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results

Frontiers in Genetics ◽

10.3389/fgene.2020.489357 ◽

2020 ◽

Vol 11 ◽

Author(s):

Alejandro Abdala Asbun ◽

Marc A. Besseling ◽

Sergio Balzano ◽

Judith D. L. van Bleijswijk ◽

Harry J. Witte ◽

...

Keyword(s):

Data Analysis ◽

Sequence Data ◽

Single Gene ◽

Marker Gene ◽

Gene Sequencing ◽

Data Generation ◽

Clustering Methods ◽

Analysis Pipeline ◽

Data Analysis Pipeline ◽

Marker Gene Sequencing

Marker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since this technique is based on sequencing a single gene, or even only parts of a single gene rather than the entire genome, the number of reads needed per sample to assess the microbial community structure is lower than that required for metagenome sequencing. This makes marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed Cascabel, a scalable, flexible, and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. Cascabel takes the raw data as input and delivers a table of operational taxonomic units (OTUs) or Amplicon Sequence Variants (ASVs) in BIOM and text format and representative sequences. Cascabel is a highly versatile software that allows users to customize several steps of the pipeline, such as selecting from a set of OTU clustering methods or performing ASV analysis. In addition, we designed Cascabel to run in any linux/unix computing environment from desktop computers to computing servers making use of parallel processing if possible. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. Cascabel is freely available at Github: https://github.com/AlejandroAb/CASCABEL.

Download Full-text

Cascabel: a flexible, scalable and easy-to-use amplicon sequence data analysis pipeline

10.1101/809384 ◽

2019 ◽

Cited By ~ 3

Author(s):

Alejandro Abdala Asbun ◽

Marc A Besseling ◽

Sergio Balzano ◽

Judith van Bleijswijk ◽

Harry Witte ◽

...

Keyword(s):

Data Analysis ◽

Sequence Data ◽

Single Gene ◽

Marker Gene ◽

Gene Sequencing ◽

Data Generation ◽

Analysis Pipeline ◽

Entire Genome ◽

Data Analysis Pipeline ◽

Marker Gene Sequencing

ABSTRACTMarker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since this technique is based on sequencing a single gene rather than the entire genome, the number of reads needed per sample is lower than that required for metagenome sequencing, making marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed Cascabel, a flexible and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. Cascabel takes the raw data as input and delivers a table of operational taxonomic units (OTUs) and a representative sequence tree. Our pipeline allows customizing the analyses by offering several choices for most of the steps, for example different OTU generating methods. The pipeline can make use of multiple computing nodes and scales from personal computers to computing servers. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. Cascabel is freely available at Github: https://github.com/AlejandroAb/CASCABEL and licensed under GNU GPLv3.

Download Full-text

Genome-based targeted sequencing as a reproducible microbial community profiling assay

10.1101/2020.08.07.241950 ◽

2020 ◽

Author(s):

Jacquelynn Benjamino ◽

Benjamin Leopold ◽

Daniel Phillips ◽

Mark D. Adams

Keyword(s):

16S Rrna ◽

Relative Abundance ◽

Marker Gene ◽

Cost Effective ◽

Reference Database ◽

New Approach ◽

Community Profiling ◽

Curtis Dissimilarity ◽

Stool Specimens ◽

Reference Genomes

AbstractCurrent sequencing-based methods for profiling microbial communities rely on marker gene (e.g. 16S rRNA) or metagenome shotgun sequencing (mWGS) analysis. We present a new approach based on highly multiplexed oligonucleotide probes designed from reference genomes in a pooled primer-extension reaction during library construction to derive relative abundance data. This approach, termed MA-GenTA: Microbial Abundances from Genome Tagged Analysis, enables quantitative, straightforward, cost-effective microbiome profiling that combines desirable features of both 16S rRNA and mWGS strategies. To test the utility of the MA-GenTA assay, probes were designed for 830 genome sequences representing bacteria present in mouse stool specimens. Comparison of the MA-GenTA data with mWGS data demonstrated excellent correlation down to 0.01% relative abundance and a similar number of organisms detected per sample. Despite the incompleteness of the reference database, NMDS clustering based on the Bray-Curtis dissimilarity metric of sample groups was consistent between MA-GenTA, mWGS and 16S rRNA datasets. MA-GenTA represents a potentially useful new method for microbiome community profiling based on reference genomes.

Download Full-text

BugBase predicts organism-level microbiome phenotypes

10.1101/133462 ◽

2017 ◽

Cited By ~ 46

Author(s):

Tonya Ward ◽

Jake Larson ◽

Jeremy Meulemans ◽

Ben Hillmann ◽

Joshua Lynch ◽

...

Keyword(s):

Marker Gene ◽

Gene Sequencing ◽

Amplicon Sequencing ◽

Sequencing Data ◽

Functional Capability ◽

Pathogenic Potential ◽

Shotgun Metagenomics ◽

Functional Changes ◽

Gram Staining ◽

Marker Gene Sequencing

AbstractShotgun metagenomics and marker gene amplicon sequencing can be used to directly measure or predict the functional repertoire of the microbiota en masse, but current methods do not readily estimate the functional capability of individual microorganisms. Here we present BugBase, an algorithm that predicts organism-level coverage of functional pathways as well as biologically interpretable phenotypes such as oxygen tolerance, Gram staining and pathogenic potential, within complex microbiomes using either whole-genome shotgun or marker gene sequencing data. We find BugBase’s organism-level pathway coverage predictions to be statistically higher powered than current ‘bag-of-genes’ approaches for discerning functional changes in both host-associated and environmental microbiomes.

Download Full-text

A New Approach to the Extraction of ANN Rules and to Their Generalization Capacity Through GP

Neural Computation ◽

10.1162/089976604323057461 ◽

2004 ◽

Vol 16 (7) ◽

pp. 1483-1523 ◽

Cited By ~ 27

Author(s):

Juan R. Rabuñal ◽

Julián Dorado ◽

Alejandro Pazos ◽

Javier Pereira ◽

Daniel Rivero

Keyword(s):

Genetic Programming ◽

Rule Extraction ◽

Human Beings ◽

Activation Functions ◽

New Approach ◽

Previous Algorithm ◽

Internal Distribution

Various techniques for the extraction of ANN rules have been used, but most of them have focused on certain types of networks and their training. There are very few methods that deal with ANN rule extraction as systems that are independent of their architecture, training, and internal distribution of weights, connections, and activation functions. This article proposes a methodology for the extraction of ANN rules, regardless of their architecture, and based on genetic programming. The strategy is based on the previous algorithm and aims at achieving the generalization capacity that is characteristic of ANNs by means of symbolic rules that are understandable to human beings.

Download Full-text

Accurate Reconstruction of Microbial Strains from Metagenomic Sequencing Using Representative Reference Genomes

10.1101/215707 ◽

2017 ◽

Cited By ~ 2

Author(s):

Zhemin Zhou ◽

Nina Luhmann ◽

Nabil-Fareed Alikhan ◽

Christopher Quince ◽

Mark Achtman

Keyword(s):

Evaluation Studies ◽

Species Level ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

Reference Databases ◽

Microbial Strains ◽

Taxonomic Assignments ◽

Taxonomic Groups ◽

Reference Genomes ◽

Recent Evaluation

AbstractExploring the genetic diversity of microbes within the environment through metagenomic sequencing first requires classifying these reads into taxonomic groups. Current methods compare these sequencing data with existing biased and limited reference databases. Several recent evaluation studies demonstrate that current methods either lack sufficient sensitivity for species-level assignments or suffer from false positives, overestimating the number of species in the metagenome. Both are especially problematic for the identification of low-abundance microbial species, e. g. detecting pathogens in ancient metagenomic samples. We present a new method, SPARSE, which improves taxonomic assignments of metagenomic reads. SPARSE balances existing biased reference databases by grouping reference genomes into similarity-based hierarchical clusters, implemented as an efficient incremental data structure. SPARSE assigns reads to these clusters using a probabilistic model, which specifically penalizes non-specific mappings of reads from unknown sources and hence reduces false-positive assignments. Our evaluation on simulated datasets from two recent evaluation studies demonstrated the improved precision of SPARSE in comparison to other methods for species-level classification. In a third simulation, our method successfully differentiated multiple co-existing Escherichia coli strains from the same sample. In real archaeological datasets, SPARSE identified ancient pathogens with ≤ 0.02% abundance, consistent with published findings that required additional sequencing data. In these datasets, other methods either missed targeted pathogens or reported non-existent ones. SPARSE and all evaluation scripts are available at https://github.com/zheminzhou/SPARSE.

Download Full-text

Definitive Hematopoietic Stem Cells Minimally Contribute to Embryonic Hematopoiesis

10.1101/2021.05.02.442359 ◽

2021 ◽

Author(s):

Bianca A Ulloa ◽

Samima S Habbsa ◽

Kathryn S Potts ◽

Alana Lewis ◽

Mia McKinstry ◽

...

Keyword(s):

Stem Cells ◽

Hematopoietic Stem Cells ◽

De Novo ◽

Marker Gene ◽

Lineage Tracing ◽

Hematopoietic Stem ◽

Functional Potential ◽

Injury Model ◽

Rare Cells

Hematopoietic stem cells (HSCs) are rare cells that arise in the embryo and sustain adult hematopoiesis. Although the functional potential of nascent HSCs is detectable by transplantation, their native contribution during development is unknown, in part due to the overlapping genesis and marker gene expression with other embryonic blood progenitors. Using single cell transcriptomics, we defined gene signatures that distinguish nascent HSCs from embryonic blood progenitors. Applying a new lineage tracing approach, we selectively tracked HSC output in situ and discovered significantly delayed lymphomyeloid contribution. Using a novel inducible HSC injury model, we demonstrated a negligible impact on larval lymphomyelopoiesis following HSC depletion. HSCs are not merely dormant at this developmental stage as they showed robust regeneration after injury. Combined, our findings illuminate that nascent HSCs self-renew but display differentiation latency, while HSC-independent embryonic progenitors sustain developmental hematopoiesis. Understanding the differences among embryonic HSC and progenitor populations will guide improved de novo generation and expansion of functional HSCs.

Download Full-text

Jackson Heights Neighborhood Transportation Study, New York City: New Approach in Community-Based Planning

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/2307-02 ◽

2012 ◽

Vol 2307 (1) ◽

pp. 9-20 ◽

Cited By ~ 1

Author(s):

Oliver Ernhofer ◽

Willa Ng ◽

Gill Mosseri ◽

David Stein ◽

Don Varley ◽

...

Keyword(s):

New York ◽

New York City ◽

York City ◽

Community Based ◽

New Approach

Download Full-text

Sequences of the variable regions of three monoclonal antibodies specific for histidine-containing protein of the bacterial phosphoenolpyruvate:sugar phosphotransferase system

Biochemistry and Cell Biology ◽

10.1139/o91-045 ◽

1991 ◽

Vol 69 (4) ◽

pp. 297-302 ◽

Cited By ~ 4

Author(s):

Teresa Steeves ◽

M. Michele Barry ◽

Harry W. Duckworth ◽

E. Bruce Waygood ◽

Jeremy S. Lee

Keyword(s):

Monoclonal Antibodies ◽

Gene Sequencing ◽

Gene Families ◽

Gene Sequences ◽

Phosphotransferase System ◽

Variable Regions ◽

Gene Usage ◽

Vh Gene

The variable regions of three monoclonal antibodies, Jel 42, Jel 44, and Jel 324, specific for the histidine-containing protein of the bacterial phosphoenolpyruvate:sugar phosphotransferase system have been sequenced from their respective mRNAs. The Vh gene families were deduced from the percent homology to the concensus gene sequences and the J gene and D gene usage was also analysed.Key words: monoclonal antibodies, gene sequencing.

Download Full-text