scholarly journals Multiple comparative metagenomics using multisetk-mer counting

2016 ◽  
Vol 2 ◽  
pp. e94 ◽  
Author(s):  
Gaëtan Benoit ◽  
Pierre Peterlongo ◽  
Mahendra Mariadassou ◽  
Erwan Drezen ◽  
Sophie Schbath ◽  
...  

BackgroundLarge scale metagenomic projects aim to extract biodiversity knowledge between different environmental conditions. Current methods for comparing microbial communities face important limitations. Those based on taxonomical or functional assignation rely on a small subset of the sequences that can be associated to known organisms. On the other hand,de novomethods, that compare the whole sets of sequences, either do not scale up on ambitious metagenomic projects or do not provide precise and exhaustive results.MethodsThese limitations motivated the development of a newde novometagenomic comparative method, called Simka. This method computes a large collection of standard ecological distances by replacing species counts byk-mer counts. Simka scales-up today’s metagenomic projects thanks to a new parallelk-mer counting strategy on multiple datasets.ResultsExperiments on public Human Microbiome Project datasets demonstrate that Simka captures the essential underlying biological structure. Simka was able to compute in a few hours both qualitative and quantitative ecological distances on hundreds of metagenomic samples (690 samples, 32 billions of reads). We also demonstrate that analyzing metagenomes at thek-mer level is highly correlated with extremely precisede novocomparison techniques which rely on all-versus-all sequences alignment strategy or which are based on taxonomic profiling.

2017 ◽  
Author(s):  
Victoria Cepeda ◽  
Bo Liu ◽  
Mathieu Almeida ◽  
Christopher M. Hill ◽  
Sergey Koren ◽  
...  

ABSTRACTMetagenomic studies have primarily relied on de novo approaches for reconstructing genes and genomes from microbial mixtures. While database driven approaches have been employed in certain analyses, they have not been used in the assembly of metagenomes. Here we describe the first effective approach for reference-guided metagenomic assembly of low-abundance bacterial genomes that can complement and improve upon de novo metagenomic assembly methods. When combined with de novo assembly approaches, we show that MetaCompass can generate more complete assemblies than can be obtained by de novo assembly alone, and improve on assemblies from the Human Microbiome Project (over 2,000 samples).


2020 ◽  
Vol 8 (2) ◽  
pp. 197
Author(s):  
Shomeek Chowdhury ◽  
Stephen S. Fong

The impact of microorganisms on human health has long been acknowledged and studied, but recent advances in research methodologies have enabled a new systems-level perspective on the collections of microorganisms associated with humans, the human microbiome. Large-scale collaborative efforts such as the NIH Human Microbiome Project have sought to kick-start research on the human microbiome by providing foundational information on microbial composition based upon specific sites across the human body. Here, we focus on the four main anatomical sites of the human microbiome: gut, oral, skin, and vaginal, and provide information on site-specific background, experimental data, and computational modeling. Each of the site-specific microbiomes has unique organisms and phenomena associated with them; there are also high-level commonalities. By providing an overview of different human microbiome sites, we hope to provide a perspective where detailed, site-specific research is needed to understand causal phenomena that impact human health, but there is equally a need for more generalized methodology improvements that would benefit all human microbiome research.


2018 ◽  
Author(s):  
Raphael R. Eguchi ◽  
Po-Ssu Huang

AbstractRecent advancements in computational methods have facilitated large-scale sampling of protein structures, leading to breakthroughs in protein structural prediction and enabling de novo protein design. Establishing methods to identify candidate structures that can lead to native folds or designable structures remains a challenge, since few existing metrics capture high-level structural features such as architectures, folds, and conformity to conserved structural motifs. Convolutional Neural Networks (CNNs) have been successfully used in semantic segmentation — a subfield of image classification in which a class label is predicted for every pixel. Here, we apply semantic segmentation to protein structures as a novel strategy for fold identification and structural quality assessment. We represent protein structures as 2D α-carbon distance matrices (“contact maps”), and train a CNN that assigns each residue in a multi-domain protein to one of 38 architecture classes designated by the CATH database. Our model performs exceptionally well, achieving a per-residue accuracy of 90.8% on the test set (95.0% average accuracy over all classes; 87.8% average within-structure accuracy). The unique aspect of our classifier is that it encodes sequence agnostic residue environments from the PDB and can assess structural quality as quantitative probabilities. We demonstrate that individual class probabilities can be used as a metric that indicates the degree to which a randomly generated structure assumes a specific fold, as well as a metric that highlights non-conformative regions of a protein belonging to a known class. These capabilities yield a powerful tool for guiding structural sampling for both structural prediction and design.SignificanceRecent computational advances have allowed researchers to predict the structure of many proteins from their amino acid sequences, as well as designing new sequences that fold into predefined structures. However, these tasks are often challenging because they require selection of a small subset of promising structural models from a large pool of stochastically generated ones. Here, we describe a novel approach to protein model selection that uses 2D image classification techniques to evaluate 3D protein models. Our method can be used to select structures based on the fold that they adopt, and can also be used to identify regions of low structural quality. These capabilities yield a powerful tool for both protein design and structure prediction.


2019 ◽  
Author(s):  
Devika Ganesamoorthy ◽  
Mengjia Yan ◽  
Valentine Murigneux ◽  
Chenxi Zhou ◽  
Minh Duc Cao ◽  
...  

ABSTRACTTandem repeats (TRs) are highly prone to variation in copy numbers due to their repetitive and unstable nature, which makes them a major source of genomic variation between individuals. However, population variation of TRs have not been widely explored due to the limitations of existing tools, which are either low-throughput or restricted to a small subset of TRs. Here, we used SureSelect targeted sequencing approach combined with Nanopore sequencing to overcome these limitations. We achieved an average of 3062-fold target enrichment on a panel of 142 TR loci, generating an average of 97X sequence coverage on 7 samples utilizing 2 MinION flow-cells with 200ng of input DNA per sample. We identified a subset of 110 TR loci with length less than 2kb, and GC content greater than 25% for which we achieved an average genotyping rate of 75% and increasing to 91% for the highest-coverage sample. Alleles estimated from targeted long-read sequencing were concordant with gold standard PCR sizing analysis and moreover highly correlated with alleles estimated from whole genome long-read sequencing. We demonstrate a targeted long-read sequencing approach that enables simultaneous analysis of hundreds of TRs and accuracy is comparable to PCR sizing analysis. Our approach is feasible to scale for more targets and more samples facilitating large-scale analysis of TRs.


2016 ◽  
Author(s):  
Shea N Gardner ◽  
Sasha K Ames ◽  
Maya B Gokhale ◽  
Tom R Slezak ◽  
Jonathan Allen

Software for rapid, accurate, and comprehensive microbial profiling of metagenomic sequence data on a desktop will play an important role in large scale clinical use of metagenomic data. Here we describe LMAT-ML (Livermore Metagenomics Analysis Toolkit-Marker Library) which can be run with 24 GB of DRAM memory, an amount available on many clusters, or with 16 GB DRAM plus a 24 GB low cost commodity flash drive (NVRAM), a cost effective alternative for desktop or laptop users. We compared results from LMAT with five other rapid, low-memory tools for metagenome analysis for 131 Human Microbiome Project samples, and assessed discordant calls with BLAST. All the tools except LMAT-ML reported overly specific or incorrect species and strain resolution of reads that were in fact much more widely conserved across species, genera, and even families. Several of the tools misclassified reads from synthetic or vector sequence as microbial or human reads as viral. We attribute the high numbers of false positive and false negative calls to a limited reference database with inadequate representation of known diversity. Our comparisons with real world samples show that LMAT-ML is the only tool tested that classifies the majority of reads, and does so with high accuracy.


Microbiome ◽  
2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Chang Liu ◽  
Meng-Xuan Du ◽  
Rexiding Abuduaini ◽  
Hai-Ying Yu ◽  
Dan-Hua Li ◽  
...  

Abstract Background In gut microbiome studies, the cultured gut microbial resource plays essential roles, such as helping to unravel gut microbial functions and host-microbe interactions. Although several major studies have been performed to elucidate the cultured human gut microbiota, up to 70% of the Unified Human Gastrointestinal Genome species have not been cultured to date. Large-scale gut microbial isolation and identification as well as availability to the public are imperative for gut microbial studies and further characterizing human gut microbial functions. Results In this study, we constructed a human Gut Microbial Biobank (hGMB; homepage: hgmb.nmdc.cn) through the cultivation of 10,558 isolates from 31 sample mixtures of 239 fresh fecal samples from healthy Chinese volunteers, and deposited 1170 strains representing 400 different species in culture collections of the International Depository Authority for long-term preservation and public access worldwide. Following the rules of the International Code of Nomenclature of Prokaryotes, 102 new species were characterized and denominated, while 28 new genera and 3 new families were proposed. hGMB represented over 80% of the common and dominant human gut microbial genera and species characterized from global human gut 16S rRNA gene amplicon data (n = 11,647) and cultured 24 “most-wanted” and “medium priority” taxa proposed by the Human Microbiome Project. We in total sequenced 115 genomes representing 102 novel taxa and 13 previously known species. Further in silico analysis revealed that the newly sequenced hGMB genomes represented 22 previously uncultured species in the Unified Human Gastrointestinal Genome (UHGG) and contributed 24 representatives of potentially “dark taxa” that had not been discovered by UHGG. The nonredundant gene catalogs generated from the hGMB genomes covered over 50% of the functionally known genes (KEGG orthologs) in the largest global human gut gene catalogs and approximately 10% of the “most wanted” functionally unknown proteins in the FUnkFams database. Conclusions A publicly accessible human Gut Microbial Biobank (hGMB) was established that contained 1170 strains and represents 400 human gut microbial species. hGMB expands the gut microbial resources and genomic repository by adding 102 novel species, 28 new genera, 3 new families, and 115 new genomes of human gut microbes.


2018 ◽  
Author(s):  
E. Whittle ◽  
M.O. Leonard ◽  
R. Harrison ◽  
T.W. Gant ◽  
D.P Tonge

AbstractThe term microbiome describes the genetic material encoding the various microbial populations that inhabit our body. Whilst colonisation of various body niches (e.g. the gut) by dynamic communities of microorganisms is now universally accepted, the existence of microbial populations in other “classically sterile” locations, including the blood, is a relatively new concept. The presence of bacteria-specific DNA in the blood has been reported in the literature for some time, yet the true origin of this is still the subject of much deliberation. The aim of this study was to investigate the phenomenon of a “blood microbiome” by providing a comprehensive description of bacterially-derived nucleic acids using a range of complementary molecular and classical microbiological techniques. For this purpose we utilised a set of plasma samples from healthy subjects (n = 5) and asthmatic subjects (n = 5). DNA-level analyses involved the amplification and sequencing of the 16S rRNA gene. RNA-level analyses were based upon thede novoassembly of unmapped mRNA reads and subsequent taxonomic identification. Molecular studies were complemented by viability data from classical aerobic and anaerobic microbial culture experiments. At the phylum level, the blood microbiome was predominated by Proteobacteria, Actinobacteria, Firmicutes and Bacteroidetes. The key phyla detected were consistent irrespective of molecular method (DNA vs RNA), and consistent with the results of other published studies.In silicocomparison of our data with that of the Human Microbiome Project revealed that members of the blood microbiome were most likely to have originated from the oral or skin communities. To our surprise, aerobic and anaerobic cultures were positive in eight of out the ten donor samples investigated, and we reflect upon their source. Our data provide further evidence of a core blood microbiome, and provide insight into the potential source of the bacterial DNA / RNA detected in the blood. Further, data reveal the importance of robust experimental procedures, and identify areas for future consideration.


Author(s):  
Chang Liu ◽  
Meng-Xuan Du ◽  
Rexiding Abuduaini ◽  
Hai-Ying Yu ◽  
Dan-Hua Li ◽  
...  

Abstract Background The cultivated gut microbial resource plays essential role in gut microbiome studies such as gut microbial function and their interactions with host. Though several major studies had been performed to understand the cultured human gut microbiota, up to 70% of the Unified Human Gastrointestinal Genome species remain uncultivated and their taxonomy is not clear. Large-scale gut microbial isolation and identification and their access to pubic are imperative for gut microbial studies and for understanding of the human gut microbial functions.Results Here, we report the construction of an human Gut Microbial Biobank (hGMB) (homepage: hgmb.nmdc.cn) by large-scale cultivation of 10,558 isolates from 239 feces of healthy Chinese volunteers, and deposited 1,170 strains representing 404 different species in International Depository Authority for long-term preservation and public access worldwidely. We discovered and denominated 107 new species, and proposed 28 new genera and 3 new families. The new species and their newly sequenced genomes uncovered 16 “most-wanted” or “medium priority” taxa proposed by the Human Microbiome Project and 42 previously-uncultured MAGs in IGGdb, respectively. The hGMB represented over 80% of the common and dominant human gut microbial genera or species of global human gut 16S rRNA gene amplicon data (n=11,647), and covered 70% of the known genes (KEGG Orthologys) and 10% of the functionally-unknown genes in the global human gut gene catalogs. Conclusions A publically accessible human Gut Microbial Biobank (hGMB) that contains 1,170 strains and represents 404 human gut microbial speces is estabolished. The hGMB expands the currently known, taxonomically-characterized gut microbial resources and genomic repository by adding 107 new species and 115 new genomes of human gut microbes. Based on the newly discovered species in this study, 28 new genera and 3 new families of human gut microbes were identified and proposed.


2021 ◽  
Author(s):  
Utpal Bakshi ◽  
Vinod K Gupta ◽  
Aileen R Lee ◽  
John M Davis ◽  
Sriram Chandrasekaran ◽  
...  

Biosynthetic gene clusters (BGCs) in microbial genomes encode for the production of bioactive secondary metabolites (SMs). Given the well-recognized importance of SMs in microbe-microbe and microbe-host interactions, the large-scale identification of BGCs from microbial metagenomes could offer novel functional insights into complex chemical ecology. Despite recent progress, currently available tools for predicting BGCs from shotgun metagenomes have several limitations, including the need for computationally demanding read-assembly and prediction of a narrow breadth of BGC classes. To overcome these limitations, we developed TaxiBGC (Taxonomy-guided Identification of Biosynthetic Gene Clusters), a computational pipeline for identifying experimentally verified BGCs in shotgun metagenomes by first pinpointing the microbial species likely to produce them. We show that our species-centric approach was able to identify BGCs in simulated metagenomes more accurately than by solely detecting BGC genes. By applying TaxiBGC on 5,423 metagenomes from the Human Microbiome Project and various case-control studies, we identified distinct BGC signatures of major human body sites and candidate stool-borne biomarkers for multiple diseases, including inflammatory bowel disease, colorectal cancer, and psychiatric disorders. In all, TaxiBGC demonstrates a significant advantage over existing techniques for systematically characterizing BGCs and inferring their SMs from microbiome data.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Andrea Nuzzo ◽  
Somdutta Saha ◽  
Ellen Berg ◽  
Channa Jayawickreme ◽  
Joel Tocker ◽  
...  

AbstractMetabolites produced in the human gut are known modulators of host immunity. However, large-scale identification of metabolite–host receptor interactions remains a daunting challenge. Here, we employed computational approaches to identify 983 potential metabolite–target interactions using the Inflammatory Bowel Disease (IBD) cohort dataset of the Human Microbiome Project 2 (HMP2). Using a consensus of multiple machine learning methods, we ranked metabolites based on importance to IBD, followed by virtual ligand-based screening to identify possible human targets and adding evidence from compound assay, differential gene expression, pathway enrichment, and genome-wide association studies. We confirmed known metabolite–target pairs such as nicotinic acid–GPR109a or linoleoyl ethanolamide–GPR119 and inferred interactions of interest including oleanolic acid–GABRG2 and alpha-CEHC–THRB. Eleven metabolites were tested for bioactivity in vitro using human primary cell-types. By expanding the universe of possible microbial metabolite–host protein interactions, we provide multiple drug targets for potential immune-therapies.


Sign in / Sign up

Export Citation Format

Share Document