Improving characterization of understudied human microbiomes using targeted phylogenetics
AbstractWhole genome bacterial sequences are required to better understand microbial functions, niches-pecific bacterial metabolism, and disease states. Although genomic sequences are available for many of the human-associated bacteria from commonly tested body habitats (e.g. stool), as few as 13% of bacterial-derived reads from other sites such as the skin map to known bacterial genomes. To facilitate a better characterization of metagenomic shotgun reads from under-represented body sites, we collected over 10,000 bacterial isolates originating from 14 human body habitats, identified novel taxonomic groups based on full length 16S rRNA sequences, clustered the sequences to ensure that no individual taxonomic group was over-selected for sequencing, prioritized bacteria from under-represented body sites (such as skin, respiratory and urinary tract), and sequenced and assembled genomes for 665 new bacterial strains. Here we show that addition of these genomes improved read mapping rates of HMP metagenomic samples by nearly 30% for the previously under-represented phylum Fusobacteria, and 27.5% of the novel genomes generated here had high representation in at least one of the tested HMP samples, compared to 12.5% of the sequences in the public databases, indicating an enrichment of useful novel genomic sequences resulting from the prioritization procedure. As our understanding of the human microbiome continues to improve and to enter the realm of therapy developments, targeted approaches such as this to improve genomic databases will increase in importance from both an academic and clinical perspective.ImportanceThe human microbiome plays a critically important role in health and disease, but current understanding of the mechanisms underlying the interactions between the varying microbiome and the different host environments is lacking. Having access to a database of fully sequenced bacterial genomes provides invaluable insights into microbial functions, but currently sequenced genomes for the human microbiome have largely come from a limited number of body sites (primarily stool), while other sites such as the skin, respiratory tract and urinary tracts are under-represented, resulting in as little as 13% of bacterial-derived reads mapping to known bacterial genomes. Here, we sequenced and assembled 665 new bacterial genomes, prioritized from a larger database to select under-represented body sites and bacterial taxa in the existing databases. As a result, we substantially improve mapping rates for samples from the Human Microbiome Project and provide an important contribution to human bacterial genomic databases for future studies.