VIRGO, a comprehensive non-redundant gene catalog, reveals extensive within community intraspecies diversity in the human vagina

AbstractBackgroundAnalysis of metagenomic and metatranscriptomic data is complicated and typically requires extensive computational resources. Leveraging a curated reference database of genes encoded by members of the target microbiome can make these analyses more tractable. Unfortunately, there is no such reference database available for the vaginal microbiome.ResultsIn this study, we assembled a comprehensive human vaginal non-redundant gene catalog (VIRGO) from 264 vaginal metagenomes and 416 genomes of urogenital bacterial isolates. VIRGO includes 0.95 million non-redundant genes compiled from a total of 5.5 million genes belonging to 318 unique bacterial species. We show that VIRGO covers more than 95% of the vaginal bacterial gene content in metagenomes from North American, African, and Chinese women. The gene catalog was extensively functionally annotated from 17 diverse protein databases, and importantly taxonomy was assigned through in silico binning of genes derived from metagenomic assemblies. To further enable focused analyses of individual genes and proteins, we also clustered the non-redundant genes into vaginal orthologous groups (VOG). The gene-centric design of VIRGO and VOG provides an easily accessible tool to comprehensively characterize the structure and function of vaginal metagenome and metatranscriptome datasets. To highlight the utility of VIRGO, we analyzed 1,507 additional vaginal metagenomes, uncovering an as of yet undetected high degree of intraspecies diversity within and across vaginal microbiota.ConclusionsVIRGO offers a convenient reference database and toolkit that will facilitate a more in-depth understanding of the role of vaginal microorganisms in women’s health and reproductive outcomes.

Download Full-text

A comprehensive non-redundant gene catalog reveals extensive within-community intraspecies diversity in the human vagina

Nature Communications ◽

10.1038/s41467-020-14677-3 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 6

Author(s):

Bing Ma ◽

Michael T. France ◽

Jonathan Crabtree ◽

Johanna B. Holm ◽

Michael S. Humphrys ◽

...

Keyword(s):

Gene Catalog ◽

Human Vagina ◽

Redundant Gene ◽

Intraspecies Diversity

Download Full-text

Integration of culture-dependent and independent methods provides a more coherent picture of the pig gut microbiome

FEMS Microbiology Ecology ◽

10.1093/femsec/fiaa022 ◽

2020 ◽

Vol 96 (3) ◽

Cited By ~ 2

Author(s):

Gavin J Fenske ◽

Sudeep Ghimire ◽

Linto Antony ◽

Jane Christopher-Hennings ◽

Joy Scaria

Keyword(s):

Bacterial Communities ◽

Bacterial Species ◽

Shotgun Metagenomics ◽

Culture Techniques ◽

Microbiome Composition ◽

Culture Independent ◽

Health And Disease ◽

The Media ◽

And Function ◽

Culture Dependent

ABSTRACT Bacterial communities resident in the hindgut of pigs, have profound impacts on health and disease. Investigations into the pig microbiome have utilized either culture-dependent, or far more commonly, culture-independent techniques using next generation sequencing. We contend that a combination of both approaches generates a more coherent view of microbiome composition. In this study, we surveyed the microbiome of Tamworth breed and feral pigs through the integration high throughput culturing and shotgun metagenomics. A single culture medium was used for culturing. Selective screens were added to the media to increase culture diversity. In total, 46 distinct bacterial species were isolated from the Tamworth and feral samples. Selective screens successfully shifted the diversity of bacteria on agar plates. Tamworth pigs are highly dominated by Bacteroidetes primarily composed of the genus Prevotella whereas feral samples were more diverse with almost equal proportions of Firmicutes and Bacteroidetes. The combination of metagenomics and culture techniques facilitated a greater retrieval of annotated genes than either method alone. The single medium based pig microbiota library we report is a resource to better understand pig gut microbial ecology and function. It allows for assemblage of defined bacterial communities for studies in bioreactors or germfree animal models.

Download Full-text

A Genomic Distance Based on MUM Indicates Discontinuity between Most Bacterial Species and Genera

Journal of Bacteriology ◽

10.1128/jb.01202-08 ◽

2008 ◽

Vol 191 (1) ◽

pp. 91-99 ◽

Cited By ~ 115

Author(s):

Marc Deloger ◽

Meriem El Karoui ◽

Marie-Agnès Petit

Keyword(s):

Dna Sequences ◽

Dna Content ◽

Core Genome ◽

Biological Diversity ◽

Bacterial Species ◽

Genomic Distance ◽

The Core ◽

Intraspecies Diversity ◽

Genome Level ◽

Definition Of

ABSTRACT The fundamental unit of biological diversity is the species. However, a remarkable extent of intraspecies diversity in bacteria was discovered by genome sequencing, and it reveals the need to develop clear criteria to group strains within a species. Two main types of analyses used to quantify intraspecies variation at the genome level are the average nucleotide identity (ANI), which detects the DNA conservation of the core genome, and the DNA content, which calculates the proportion of DNA shared by two genomes. Both estimates are based on BLAST alignments for the definition of DNA sequences common to the genome pair. Interestingly, however, results using these methods on intraspecies pairs are not well correlated. This prompted us to develop a genomic-distance index taking into account both criteria of diversity, which are based on DNA maximal unique matches (MUM) shared by two genomes. The values, called MUMi, for MUM index, correlate better with the ANI than with the DNA content. Moreover, the MUMi groups strains in a way that is congruent with routinely used multilocus sequence-typing trees, as well as with ANI-based trees. We used the MUMi to determine the relatedness of all available genome pairs at the species and genus levels. Our analysis reveals a certain consistency in the current notion of bacterial species, in that the bulk of intraspecies and intragenus values are clearly separable. It also confirms that some species are much more diverse than most. As the MUMi is fast to calculate, it offers the possibility of measuring genome distances on the whole database of available genomes.

Download Full-text

Linear plasmids in Klebsiella and other Enterobacteriaceae

10.1101/2021.12.14.472703 ◽

2021 ◽

Author(s):

Jane Hawkey ◽

Hugh Cottingham ◽

Alex Tokolyi ◽

Ryan R Wick ◽

Louise M Judd ◽

...

Keyword(s):

Plasmid Stability ◽

Bacterial Species ◽

Salmonella Typhi ◽

Multidrug Resistant ◽

Linear Plasmid ◽

Extrachromosomal Dna ◽

Linear Plasmids ◽

Future Studies ◽

And Function

Linear plasmids are extrachromosomal DNA that have been found in a small number of bacterial species. To date, the only linear plasmids described in the Enterobacteriaceae family belong to Salmonella, first found in Salmonella Typhi. Here, we describe a collection of 12 isolates of the Klebsiella pneumoniae species complex in which we identified linear plasmids. We used this collection to search public sequence databases and discovered an additional 74 linear plasmid sequences in a variety of Enterobacteriaceae species. Gene content analysis divided these plasmids into five distinct phylogroups, with very few genes shared across more than two phylogroups. The majority of linear plasmid-encoded genes are of unknown function, however each phylogroup carried its own unique toxin-antitoxin system and genes with homology to those encoding the ParAB plasmid stability system. Passage in vitro of the 12 linear plasmid-carrying Klebsiella isolates in our collection (which include representatives of all five phylogroups) indicated that these linear plasmids can be stably maintained, and our data suggest they can transmit between K. pneumoniae strains (including members of globally disseminated multidrug resistant clones) and also between diverse Enterobacteriaceae species. The linear plasmid sequences, and representative isolates harbouring them, are made available as a resource to facilitate future studies on the evolution and function of these novel plasmids.

Download Full-text

Metagenomic profiling of host-associated bacteria from 8 datasets of the red alga Porphyra purpurea, with MetaPhlAn 3.0

10.1101/2020.11.17.386862 ◽

2020 ◽

Author(s):

Orestis Nousias ◽

Federica Montesanto

Keyword(s):

16S Rrna ◽

Bacterial Species ◽

Reference Database ◽

Marker Genes ◽

Specific Marker ◽

Bacterial Genomes ◽

Associated Bacteria ◽

Pan Genome ◽

Nucleotide Database ◽

Porphyra Purpurea

AbstractMicrobial communities play a fundamental role in the association with marine algae, in fact they are recognized to be actively involved in growth and morphogenesis.Porphyra purpurea is a red algae commonly found in the intertidal zone with an high economical value, indeed several species belonging to the genus Porphyra are intensely cultivated in the Eastern Asian countries. Moreover, P. purpurea is widely used as model species in different fields, mainly due to its peculiar life cycle. Despite of that, little is known about the microbial community associated to this species. Here we report the microbial-associated diversity of P. purpurea in four different localities (Ireland, Italy United Kingdom and USA) through the analysis of eight metagenomic datasets obtained from the publicly available metagenomic nucleotide database (https://www.ebi.ac.uk/ena/). The metagenomic datasets were quality controlled with FastQC version 0.11.8, pre-processed with Trimmomatic version 0.39 and analysed with Methaplan 3.0, with a reference database containing clade specific marker genes from ~ 99.500 bacterial genomes, following the pan-genome approach, in order to identify the putative bacterial taxonomies and their relative abundances. Furthermore, we compared the results to the 16S rRNA metagenomic analysis pipeline of MGnify database to evaluate the effectiveness of the two methods. Out of the 43 bacterial species identified with MetaPhlAn 3.0 only 5 were common with the MGnify results and from the 21 genera, only 9 were common. This approach highlighted the different taxonomical resolution of a 16S rRNA OTU-based method in contrast to the pan-genome approach deployed by MetaPhlAn 3.0.

Download Full-text

A community-supported metaproteomic pipeline for improving peptide identifications in hydrothermal vent microbiota

Briefings in Bioinformatics ◽

10.1093/bib/bbab052 ◽

2021 ◽

Author(s):

Yafei Chang ◽

Qilian Fan ◽

Jialin Hou ◽

Yu Zhang ◽

Jing Li

Keyword(s):

Deep Sea ◽

Hydrothermal Vent ◽

Hydrothermal Vents ◽

Extreme Environments ◽

Reference Database ◽

False Discovery Rate Method ◽

Analysis Strategy ◽

Gene Database ◽

And Function ◽

Functional Profiles

Abstract Microorganisms in deep-sea hydrothermal vents provide valuable insights into life under extreme conditions. Mass spectrometry-based proteomics has been widely used to identify protein expression and function. However, the metaproteomic studies in deep-sea microbiota have been constrained largely by the low identification rates of protein or peptide. To improve the efficiency of metaproteomics for hydrothermal vent microbiota, we firstly constructed a microbial gene database (HVentDB) based on 117 public metagenomic samples from hydrothermal vents and proposed a metaproteomic analysis strategy, which takes the advantages of not only the sample-matched metagenome, but also the metagenomic information released publicly in the community of hydrothermal vents. A two-stage false discovery rate method was followed up to control the risk of false positive. By applying our community-supported strategy to a hydrothermal vent sediment sample, about twice as many peptides were identified when compared with the ways against the sample-matched metagenome or the public reference database. In addition, more enriched and explainable taxonomic and functional profiles were detected by the HVentDB-based approach exclusively, as well as many important proteins involved in methane, amino acid, sugar, glycan metabolism and DNA repair, etc. The new metaproteomic analysis strategy will enhance our understanding of microbiota, including their lifestyles and metabolic capabilities in extreme environments. The database HVentDB is freely accessible from http://lilab.life.sjtu.edu.cn:8080/HventDB/main.html.

Download Full-text

Molecular Mechanism ofN,N-Dimethylformamide Degradation inMethylobacteriumsp. Strain DM1

Applied and Environmental Microbiology ◽

10.1128/aem.00275-19 ◽

2019 ◽

Vol 85 (12) ◽

Cited By ~ 8

Author(s):

Xinyu Lu ◽

Weiwei Wang ◽

Lige Zhang ◽

Haiyang Hu ◽

Ping Xu ◽

...

Keyword(s):

Molecular Mechanisms ◽

Gene Clusters ◽

Sole Source ◽

Human Beings ◽

Sequencing Data ◽

Carbon And Nitrogen ◽

Redundant Genes ◽

Evolutionary Advantage ◽

Redundant Gene ◽

Phenotype Identification

ABSTRACTN,N-Dimethylformamide (DMF) is one of the most common xenobiotic chemicals, and it can be easily emitted into the environment, where it causes harm to human beings. Herein, an efficient DMF-degrading strain, DM1, was isolated and identified asMethylobacteriumsp. This strain can use DMF as the sole source of carbon and nitrogen. Whole-genome sequencing of strain DM1 revealed that it has a 5.66-Mbp chromosome and a 200-kbp megaplasmid. The plasmid pLVM1 specifically harbors the genes essential for the initial steps of DMF degradation, and the chromosome carries the genes facilitating subsequent methylotrophic metabolism. Through analysis of the transcriptome sequencing data, the complete mineralization pathway and redundant gene clusters of DMF degradation were elucidated. The dimethylformamidase (DMFase) gene was heterologously expressed, and DMFase was purified and characterized. Plasmid pLVM1 is catabolically crucial for DMF utilization, as evidenced by the phenotype identification of the plasmid-free strain. This study systematically elucidates the molecular mechanisms of DMF degradation byMethylobacterium.IMPORTANCEDMF is a hazardous pollutant that has been used in the chemical industry, pharmaceutical manufacturing, and agriculture. Biodegradation as a method for removing DMF has received increasing attention. Here, we identified an efficient DMF degrader,Methylobacteriumsp. strain DM1, and characterized the complete DMF mineralization pathway and enzymatic properties of DMFase in this strain. This study provides insights into the molecular mechanisms and evolutionary advantage of DMF degradation facilitated by plasmid pLVM1 and redundant genes in strain DM1, suggesting the emergence of new ecotypes ofMethylobacterium.

Download Full-text

Regulation of OmpA Translation and Shigella dysenteriae Virulence by an RNA Thermometer

Infection and Immunity ◽

10.1128/iai.00871-19 ◽

2019 ◽

Vol 88 (3) ◽

Cited By ~ 1

Author(s):

Erin R. Murphy ◽

Johanna Roßmanith ◽

Jacob Sieg ◽

Megan E. Fris ◽

Hebaallaha Hussein ◽

...

Keyword(s):

In Silico ◽

Bacterial Species ◽

Regulation Of Gene Expression ◽

Structure And Function ◽

Shigella Dysenteriae ◽

Content Type ◽

Bacterial Physiology ◽

Wide Range ◽

Rna Thermometer ◽

And Function

ABSTRACT RNA thermometers are cis-acting riboregulators that mediate the posttranscriptional regulation of gene expression in response to environmental temperature. Such regulation is conferred by temperature-responsive structural changes within the RNA thermometer that directly result in differential ribosomal binding to the regulated transcript. The significance of RNA thermometers in controlling bacterial physiology and pathogenesis is becoming increasingly clear. This study combines in silico, molecular genetics, and biochemical analyses to characterize both the structure and function of a newly identified RNA thermometer within the ompA transcript of Shigella dysenteriae. First identified by in silico structural predictions, genetic analyses have demonstrated that the ompA RNA thermometer is a functional riboregulator sufficient to confer posttranscriptional temperature-dependent regulation, with optimal expression observed at the host-associated temperature of 37°C. Structural studies and ribosomal binding analyses have revealed both increased exposure of the ribosomal binding site and increased ribosomal binding to the ompA transcript at permissive temperatures. The introduction of site-specific mutations predicted to alter the temperature responsiveness of the ompA RNA thermometer has predictable consequences for both the structure and function of the regulatory element. Finally, in vitro tissue culture-based analyses implicate the ompA RNA thermometer as a bona fide S. dysenteriae virulence factor in this bacterial pathogen. Given that ompA is highly conserved among Gram-negative pathogens, these studies not only provide insight into the significance of riboregulation in controlling Shigella virulence, but they also have the potential to facilitate further understanding of the physiology and/or pathogenesis of a wide range of bacterial species.

Download Full-text

PelX is a UDP-N-acetylglucosamine C4-epimerase involved in Pel polysaccharide–dependent biofilm formation

Journal of Biological Chemistry ◽

10.1074/jbc.ra120.014555 ◽

2020 ◽

Vol 295 (34) ◽

pp. 11949-11962 ◽

Cited By ~ 1

Author(s):

Lindsey S. Marmont ◽

Gregory B. Whitfield ◽

Roland Pfoh ◽

Rohan J. Williams ◽

Trevor E. Randall ◽

...

Keyword(s):

Pseudomonas Aeruginosa ◽

Biofilm Formation ◽

Potential Role ◽

Bacterial Species ◽

Gene Clusters ◽

Structure And Function ◽

Bacterial Polysaccharide ◽

H Nmr ◽

Short Chain Dehydrogenase ◽

And Function

Pel is a GalNAc-rich bacterial polysaccharide that contributes to the structure and function of Pseudomonas aeruginosa biofilms. The pelABCDEFG operon is highly conserved among diverse bacterial species, and Pel may therefore be a widespread biofilm determinant. Previous annotation of pel gene clusters has helped us identify an additional gene, pelX, that is present adjacent to pelABCDEFG in >100 different bacterial species. The pelX gene is predicted to encode a member of the short-chain dehydrogenase/reductase (SDR) superfamily, but its potential role in Pel-dependent biofilm formation is unknown. Herein, we have used Pseudomonas protegens Pf-5 as a model to elucidate PelX function as Pseudomonas aeruginosa lacks a pelX homologue in its pel gene cluster. We found that P. protegens forms Pel-dependent biofilms; however, despite expression of pelX under these conditions, biofilm formation was unaffected in a ΔpelX strain. This observation led us to identify a pelX paralogue, PFL_5533, which we designate here PgnE, that appears to be functionally redundant to pelX. In line with this, a ΔpelX ΔpgnE double mutant was substantially impaired in its ability to form Pel-dependent biofilms. To understand the molecular basis for this observation, we determined the structure of PelX to 2.1 Å resolution. The structure revealed that PelX resembles UDP-GlcNAc C4-epimerases. Using 1H NMR analysis, we show that PelX catalyzes the epimerization between UDP-GlcNAc and UDP-GalNAc. Our results indicate that Pel-dependent biofilm formation requires a UDP-GlcNAc C4-epimerase that generates the UDP-GalNAc precursors required by the Pel synthase machinery for polymer production.

Download Full-text

Toxin-Antitoxin Systems: A Tool for Taxonomic Analysis of Human Intestinal Microbiota

Toxins ◽

10.3390/toxins12060388 ◽

2020 ◽

Vol 12 (6) ◽

pp. 388

Author(s):

Ksenia M. Klimina ◽

Viktoriya N. Voroshilova ◽

Elena U. Poluektova ◽

Vladimir A. Veselovsky ◽

Roman A. Yunes ◽

...

Keyword(s):

Bacterial Species ◽

Computer Assisted ◽

Functional Markers ◽

Type Ii ◽

Gastrointestinal Microbiota ◽

Strain Characterization ◽

Rich Diversity ◽

Human Intestinal Microbiota ◽

Gene Catalog ◽

Genes Encoding

The human gastrointestinal microbiota (HGM) is known for its rich diversity of bacterial species and strains. Yet many studies stop at characterizing the HGM at the family level. This is mainly due to lack of adequate methods for a high-resolution profiling of the HGM. One way to characterize the strain diversity of the HGM is to look for strain-specific functional markers. Here, we propose using type II toxin-antitoxin systems (TAS). To identify TAS systems in the HGM, we previously developed the software TAGMA. This software was designed to detect the TAS systems, MazEF and RelBE, in lactobacilli and bifidobacteria. In this study, we updated the gene catalog created previously and used it to test our software anew on 1346 strains of bacteria, which belonged to 489 species and 49 genera. We also sequenced the genomes of 20 fecal samples and analyzed the results with TAGMA. Although some differences were detected at the strain level, the results showed no particular difference in the bacterial species between our method and other classic analysis software. These results support the use of the updated catalog of genes encoding type II TAS as a useful tool for computer-assisted species and strain characterization of the HGM.

Download Full-text