scholarly journals Using AnnoTree to get more assignments, faster, in DIAMOND+MEGAN microbiome analysis

2021 ◽  
Author(s):  
Anupam Gautam ◽  
Hendrik Felderhoff ◽  
Caner Bagci ◽  
Daniel H Huson

In microbiome analysis, one main approach is to align metagenomic sequencing reads against a protein-reference database such as NCBI-nr, and then to perform taxonomic and functional binning based on the alignments. This approach is embodied, for example, in the standard DIAMOND+MEGAN analysis pipeline, which first aligns reads against NCBI-nr using DIAMOND and then performs taxonomic and functional binning using MEGAN. Here we propose the use of the AnnoTree protein database, rather than NCBI-nr, in such alignment-based analyses to determine the prokaryotic content of metagenomic samples. We demonstrate a 2-fold speedup over the usage of the prokaryotic part of NCBI-nr, and increased assignment rates, in particular, assigning twice as many reads to KEGG. In addition to binning to the NCBI taxonomy, MEGAN now also bins to the GTDB taxonomy.

2018 ◽  
Author(s):  
MS Zinter ◽  
CC Dvorak ◽  
MY Mayday ◽  
K Iwanaga ◽  
NP Ly ◽  
...  

ABSTRACTRATIONALEDespite improved diagnostics, pulmonary pathogens in immunocompromised children frequently evade detection, leading to significant morbidity and mortality.OBJECTIVESTo develop a highly sensitive metagenomic next generation sequencing (mNGS) assay capable of evaluating the pulmonary microbiome and identifying diverse pathogens in the lungs of immunocompromised children.METHODSWe collected 41 lower respiratory specimens from 34 immunocompromised children undergoing evaluation for pulmonary disease at 3 children’s hospitals from 2014-2016. Samples underwent mechanical homogenization, paired RNA/DNA extraction, and metagenomic sequencing. Sequencing reads were aligned to the NCBI nucleotide reference database to determine taxonomic identities. Statistical outliers were determined based on abundance within each sample and relative to other samples in the cohort.MEASUREMENTS & MAIN RESULTSWe identified a rich cross-domain pulmonary microbiome containing bacteria, fungi, RNA viruses, and DNA viruses in each patient. Potentially pathogenic bacteria were ubiquitous among samples but could be distinguished as possible causes of disease by parsing for outlier organisms. Samples with bacterial outliers had significantly depressed alpha-diversity (median 0.58, IQR 0.33-0.62 vs. median 0.94, IQR 0.93-0.95, p<0.001). Potential pathogens were detected in half of samples previously negative by clinical diagnostics, demonstrating increased sensitivity for missed pulmonary pathogens (p<0.001).CONCLUSIONSAn optimized mNGS assay for pulmonary microbes demonstrates significant inoculation of the lower airways of immunocompromised children with diverse bacteria, fungi, and viruses. Potential pathogens can be identified based on absolute and relative abundance. Ongoing investigation is needed to determine the pathogenic significance of outlier microbes in the lungs of immunocompromised children with pulmonary disease.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Nathan LaPierre ◽  
Mohammed Alser ◽  
Eleazar Eskin ◽  
David Koslicki ◽  
Serghei Mangul

Abstract Metagenomic profiling, predicting the presence and relative abundances of microbes in a sample, is a critical first step in microbiome analysis. Alignment-based approaches are often considered accurate yet computationally infeasible. Here, we present a novel method, Metalign, that performs efficient and accurate alignment-based metagenomic profiling. We use a novel containment min hash approach to pre-filter the reference database prior to alignment and then process both uniquely aligned and multi-aligned reads to produce accurate abundance estimates. In performance evaluations on both real and simulated datasets, Metalign is the only method evaluated that maintained high performance and competitive running time across all datasets.


2019 ◽  
Vol 3 (Supplement_1) ◽  
Author(s):  
Wayne Young ◽  
Caterina Carco ◽  
Jane Mullaney ◽  
Paul Maclean ◽  
Paul Cotter ◽  
...  

Abstract Objectives Irritable Bowel Syndrome (IBS) is a functional gastrointestinal (GI) disorder featuring chronic or recurrent abdominal discomfort, usually with changes in GI habit. To improve our understanding of links between the microbiome and IBS, and how these links can be manipulated through diet, we undertook shotgun metagenomic sequencing of fecal samples from a case-control study. Methods Fecal samples from 172 individuals were analyzed by shotgun sequencing using the Illumina NextSeq platform. Of these, 77 were classified as controls, 16 were constipation-predominant IBS (IBS-C), 39 were diarrhea-predominant IBS (IBS-D), 29 were diagnosed with functional constipation (FC), and 11 had functional diarrhea (FD). Taxonomic classifications were determined using Metaxa2 and the SILVA 128 database. Gene functions were assigned by alignment of sequences against a protein reference database using DIAMOND. Mean relative abundance of bacterial taxa and functional genes were compared using permutation ANOVA. Ethical approval was obtained from the University of Otago Human Ethics Committee (Health) (Reference H16/094). Results Bacterial genera that discriminated case-controls (P < 0.05) from those with constipation (IBS-C + FC) and diarrhea (IBS-D + FD) included Megasphaera (increased in those with constipation), Blautia (increased in those with diarrhea), and Bilophila (increased in both constipation and diarrhea groups). Megasphaera and Blautia include bacteria that are bile-resistant and produce butyrate, possessing a wide range of Carbohydrate-Active enzymes. Bilophila are sulfite-reducing bacteria that are able to utilize bile-acids. Associated with these taxonomic differences, a wide range of genes involved in carbohydrate, energy, and amino acid metabolism differed significantly (P < 0.05), including some involved in taurine and glycine metabolism. Bile acids are conjugated with taurine or glycine in the liver, and these amino acids are removed by the action of members of the GI microbiota. Conclusions Results from our study suggest carbohydrate and bile acid metabolism by the GI microbiome may be important distinguishing characteristics in functional GI disorders. Funding Sources Funded by the New Zealand National Science Challenge High-Value Nutrition program.


Author(s):  
Alex L Mitchell ◽  
Alexandre Almeida ◽  
Martin Beracochea ◽  
Miguel Boland ◽  
Josephine Burgin ◽  
...  

Abstract MGnify (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify (formerly EBI Metagenomics) has more than doubled the number of publicly available analysed datasets held within the resource. Recently, an updated approach to data analysis has been unveiled (version 5.0), replacing the previous single pipeline with multiple analysis pipelines that are tailored according to the input data, and that are formally described using the Common Workflow Language, enabling greater provenance, reusability, and reproducibility. MGnify's new analysis pipelines offer additional approaches for taxonomic assertions based on ribosomal internal transcribed spacer regions (ITS1/2) and expanded protein functional annotations. Biochemical pathways and systems predictions have also been added for assembled contigs. MGnify's growing focus on the assembly of metagenomic data has also seen the number of datasets it has assembled and analysed increase six-fold. The non-redundant protein database constructed from the proteins encoded by these assemblies now exceeds 1 billion sequences. Meanwhile, a newly developed contig viewer provides fine-grained visualisation of the assembled contigs and their enriched annotations.


Author(s):  
Xiaolong Cao ◽  
Jinchuan Xing

Abstract Summary As the next-generation sequencing technology becomes broadly applied, genomics and transcriptomics are becoming more commonly used in both research and clinical settings. However, proteomics is still an obstacle to be conquered. For most peptide search programs in proteomics, a standard reference protein database is used. Because of the thousands of coding DNA variants in each individual, a standard reference database does not provide perfect match for many proteins/peptides of an individual. A personalized reference database can improve the detection power and accuracy for individual proteomics data. To connect genomics and proteomics, we designed a Python package PrecisionProDB that is specialized for generating a personized protein database for proteomics applications. PrecisionProDB supports multiple popular file formats and reference databases, and can generate a personized database in minutes. To demonstrate the application of PrecisionProDB, we generated human population-specific reference protein databases with PrecisionProDB, which improves the number of identified peptides by 0.34% on average. In addition, by incorporating cell line-specific variants into the protein database, we demonstrated a 0.71% improvement for peptide identification in the Jurkat cell line. With PrecisionProDB and these datasets, researchers and clinicians can improve their peptide search performance by adopting the more representative protein database or adding population and individual-specific proteins to the search database with minimum increase of efforts. Availabilityand implementation PrecisionProDB and pre-calculated protein databases are freely available at https://github.com/ATPs/PrecisionProDB and https://github.com/ATPs/PrecisionProDB_references. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 ◽  
Author(s):  
Liangwei Mao ◽  
Yu Zhang ◽  
Jing Tian ◽  
Ming Sang ◽  
Guimin Zhang ◽  
...  

Gastrointestinal dysfunction plays an important role in the occurrence and development of Parkinson’s disease (PD). This study investigates the composition of the gut microbiome using shotgun metagenomic sequencing in PD patients in central China. Fecal samples from 39 PD patients (PD group) and the corresponding 39 healthy spouses of the patients (SP) were collected for shotgun metagenomics sequencing. Results showed a significantly altered microbial composition in the PD patients. Bilophila wadsworthia enrichment was found in the gut microbiome of PD patients, which has not been reported in previous studies. The random forest (RF) model, which identifies differences in microbiomes, reliably discriminated patients with PD from controls; the area under the receiver operating characteristic curve was 0.803. Further analysis of the microbiome and clinical symptoms showed that Klebsiella and Parasutterella were positively correlated with the duration and severity of PD, whereas hydrogen-generating Prevotella was negatively correlated with disease severity. The Cluster of Orthologous Groups of protein database, the KEGG Orthology database, and the carbohydrate-active enzymes of gene-category analysis showed that branched-chain amino acid–related proteins were significantly increased, and GH43 was significantly reduced in the PD group. Functional analysis of the metagenome confirmed differences in microbiome metabolism in the PD group related to short-chain fatty acid precursor metabolism.


2018 ◽  
Vol 68 (11) ◽  
pp. 1847-1855 ◽  
Author(s):  
Matt S Zinter ◽  
Christopher C Dvorak ◽  
Madeline Y Mayday ◽  
Kensho Iwanaga ◽  
Ngoc P Ly ◽  
...  

Abstract Background Despite improved diagnostics, pulmonary pathogens in immunocompromised children frequently evade detection, leading to significant mortality. Therefore, we aimed to develop a highly sensitive metagenomic next-generation sequencing (mNGS) assay capable of evaluating the pulmonary microbiome and identifying diverse pathogens in the lungs of immunocompromised children. Methods We collected 41 lower respiratory specimens from 34 immunocompromised children undergoing evaluation for pulmonary disease at 3 children’s hospitals from 2014–2016. Samples underwent mechanical homogenization, parallel RNA/DNA extraction, and metagenomic sequencing. Sequencing reads were aligned to the National Center for Biotechnology Information nucleotide reference database to determine taxonomic identities. Statistical outliers were determined based on abundance within each sample and relative to other samples in the cohort. Results We identified a rich cross-domain pulmonary microbiome that contained bacteria, fungi, RNA viruses, and DNA viruses in each patient. Potentially pathogenic bacteria were ubiquitous among samples but could be distinguished as possible causes of disease by parsing for outlier organisms. Samples with bacterial outliers had significantly depressed alpha-diversity (median, 0.61; interquartile range [IQR], 0.33–0.72 vs median, 0.96; IQR, 0.94–0.96; P < .001). Potential pathogens were detected in half of samples previously negative by clinical diagnostics, demonstrating increased sensitivity for missed pulmonary pathogens (P < .001). Conclusions An optimized mNGS assay for pulmonary microbes demonstrates significant inoculation of the lower airways of immunocompromised children with diverse bacteria, fungi, and viruses. Potential pathogens can be identified based on absolute and relative abundance. Ongoing investigation is needed to determine the pathogenic significance of outlier microbes in the lungs of immunocompromised children with pulmonary disease.


Author(s):  
Martin Steinegger ◽  
Steven L Salzberg

Metagenomic sequencing allows researchers to investigate organisms sampled from their native environments by sequencing their DNA directly, and then quantifying the abundance and taxonomic composition of the organisms thus captured. However, these types of analyses are sensitive to contamination in public databases caused by incorrectly labeled reference sequences. Here we describe Conterminator, an efficient method to detect and remove incorrectly labelled sequences by an exhaustive all-against-all sequence comparison. Our analysis reports contamination in 114,035 sequences and 2767 species in the NCBI Reference Sequence Database (RefSeq), 2,161,746 sequences and 6795 species in the GenBank database, and 14,132 protein sequences in the NR non-redundant protein database. Conterminator uncovers contamination in sequences spanning the whole range from draft genomes to “complete” model organism genomes. Our method, which scales linearly with input size, was able to process 3.3 terabytes of genomic sequence data in 12 days on a single 32-core compute node. We believe that Conterminator can become an important tool to ensure the quality of reference databases with particular importance for downstream metagenomic analyses. Source code (GPLv3): https://github.com/martin-steinegger/conterminator


2022 ◽  
pp. gr.275533.121
Author(s):  
Tyler A Joseph ◽  
Philippe Chlenski ◽  
Aviya Litman ◽  
Tal Korem ◽  
Itsik Pe'er

Patterns of sequencing coverage along a bacterial genome---summarized by a peak-to-trough ratio (PTR)---have been shown to accurately reflect microbial growth rates, revealing a new facet of microbial dynamics and host-microbe interactions. Here, we introduce CoPTR (Compute PTR): a tool for computing PTRs from complete reference genomes and assemblies. Using simulations and data from growth experiments in simple and complex communities, we show that CoPTR is more accurate than the current state-of-the-art, while also providing more PTR estimates overall. We further develop theory formalizing a biological interpretation for PTRs. Using a reference database of 2935 species, we applied CoPTR to a case-control study of 1304 metagenomic samples from 106 individuals with inflammatory bowel disease. We show that growth rates are personalized, are only loosely correlated with relative abundances, and are associated with disease status. We conclude by demonstrating how PTRs can be combined with relative abundances and metabolomics to investigate their effect on the microbiome.


Sign in / Sign up

Export Citation Format

Share Document