scholarly journals Global genomic similarity and core genome sequence diversity of the Streptococcus genus as a toolkit to identify closely related bacterial species in complex environments.

Author(s):  
Hugo R Barajas de la Torre ◽  
Miguel Romero ◽  
Shamayim Martínez-Sánchez ◽  
Luis D Alcaraz

Background. Comparative genomics between closely related bacterial strains can distinguish important features determining pathogenesis, antibiotic resistance, and phylogenetic structure. The Streptococcus genus is relevant to public health and food safety and it is well-represented (>100 genomes) in databases of publicly available databases. Streptococci are cosmopolitan, with multiple sources of isolation, from humans to dairy products. The Streptococcus genus has been classified by morphology, serotypes, 16S rRNA gene, and Multi Locus Sequence Types (MLST). The Genomic Similarity Score (GSS) is proposed as a tool to quantify genome level relatedness between species of Streptococcus. The Streptococcus core genome can be used to assess strain specific abundances in metagenomic sequences. Methods. A 16S rRNA gene phylogeny was calculated for 108 strains, belonging to 16 Streptococcus species and compared to a dendrogram using GSS pairwise distances for the same genomes. The core and pan-genome were calculated for these 108 genomes. The core genome sequences were analyzed and used as a resource to discriminate homologous fragment reads from closely related strains in metagenomic samples. Results. A total of 404 proteins are shared by all 108 Streptococcus genomes, which is the core genome. The pairwise amino acid identity values of the core proteins for all the compared strains and outgroups are reported. Lower sequence identity variation (90-100%) is predominantly found in core clusters containing ribosomal and translation-related proteins. For 48 core proteins (11.8%) no functional assignment could be made and those proteins have larger sequence identity variations than other core proteins. The sequence identity of the core genome diminishes as GSS score between species decreases. The GSS dendrogram recovers most of the clades in the 16S rRNA gene phylogeny while distinguishing between 16S polytomies (unresolved nodes). Finally, the core genome was used to distinguish between closely related species within human oral metagenomes. Discussion. The Streptococcus genus provides a benchmark dataset for comparative genomic studies due to the breath depth of genomic coverage. Comparing metagenomic shotgun fragment reads to the core genome using rapid alignment tools allows species-specific abundance estimates in metagenomic samples. Understanding of genomic variability and strains relatedness is the goal of tools like GSS, which make use of both pairwise shared core and pan-genomic homologous shared sequences for its calculation.

2018 ◽  
Author(s):  
Hugo R Barajas de la Torre ◽  
Miguel Romero ◽  
Shamayim Martínez-Sánchez ◽  
Luis D Alcaraz

Background. Comparative genomics between closely related bacterial strains can distinguish important features determining pathogenesis, antibiotic resistance, and phylogenetic structure. The Streptococcus genus is relevant to public health and food safety and it is well-represented (>100 genomes) in databases of publicly available databases. Streptococci are cosmopolitan, with multiple sources of isolation, from humans to dairy products. The Streptococcus genus has been classified by morphology, serotypes, 16S rRNA gene, and Multi Locus Sequence Types (MLST). The Genomic Similarity Score (GSS) is proposed as a tool to quantify genome level relatedness between species of Streptococcus. The Streptococcus core genome can be used to assess strain specific abundances in metagenomic sequences. Methods. A 16S rRNA gene phylogeny was calculated for 108 strains, belonging to 16 Streptococcus species and compared to a dendrogram using GSS pairwise distances for the same genomes. The core and pan-genome were calculated for these 108 genomes. The core genome sequences were analyzed and used as a resource to discriminate homologous fragment reads from closely related strains in metagenomic samples. Results. A total of 404 proteins are shared by all 108 Streptococcus genomes, which is the core genome. The pairwise amino acid identity values of the core proteins for all the compared strains and outgroups are reported. Lower sequence identity variation (90-100%) is predominantly found in core clusters containing ribosomal and translation-related proteins. For 48 core proteins (11.8%) no functional assignment could be made and those proteins have larger sequence identity variations than other core proteins. The sequence identity of the core genome diminishes as GSS score between species decreases. The GSS dendrogram recovers most of the clades in the 16S rRNA gene phylogeny while distinguishing between 16S polytomies (unresolved nodes). Finally, the core genome was used to distinguish between closely related species within human oral metagenomes. Discussion. The Streptococcus genus provides a benchmark dataset for comparative genomic studies due to the breath depth of genomic coverage. Comparing metagenomic shotgun fragment reads to the core genome using rapid alignment tools allows species-specific abundance estimates in metagenomic samples. Understanding of genomic variability and strains relatedness is the goal of tools like GSS, which make use of both pairwise shared core and pan-genomic homologous shared sequences for its calculation.


PeerJ ◽  
2019 ◽  
Vol 6 ◽  
pp. e6233 ◽  
Author(s):  
Hugo R. Barajas ◽  
Miguel F. Romero ◽  
Shamayim Martínez-Sánchez ◽  
Luis D. Alcaraz

Background The Streptococcus genus is relevant to both public health and food safety because of its ability to cause pathogenic infections. It is well-represented (>100 genomes) in publicly available databases. Streptococci are ubiquitous, with multiple sources of isolation, from human pathogens to dairy products. The Streptococcus genus has traditionally been classified by morphology, serum types, the 16S ribosomal RNA (rRNA) gene, and multi-locus sequence types subject to in-depth comparative genomic analysis. Methods Core and pan-genomes described the genomic diversity of 108 strains belonging to 16 Streptococcus species. The core genome nucleotide diversity was calculated and compared to phylogenomic distances within the genus Streptococcus. The core genome was also used as a resource to recruit metagenomic fragment reads from streptococci dominated environments. A conventional 16S rRNA gene phylogeny reconstruction was used as a reference to compare the resulting dendrograms of average nucleotide identity (ANI) and genome similarity score (GSS) dendrograms. Results The core genome, in this work, consists of 404 proteins that are shared by all 108 Streptococcus. The average identity of the pairwise compared core proteins decreases proportionally to GSS lower scores, across species. The GSS dendrogram recovers most of the clades in the 16S rRNA gene phylogeny while distinguishing between 16S polytomies (unresolved nodes). The GSS is a distance metric that can reflect evolutionary history comparing orthologous proteins. Additionally, GSS resulted in the most useful metric for genus and species comparisons, where ANI metrics failed due to false positives when comparing different species. Discussion Understanding of genomic variability and species relatedness is the goal of tools like GSS, which makes use of the maximum pairwise shared orthologous sequences for its calculation. It allows for long evolutionary distances (above species) to be included because of the use of amino acid alignment scores, rather than nucleotides, and normalizing by positive matches. Newly sequenced species and strains could be easily placed into GSS dendrograms to infer overall genomic relatedness. The GSS is not restricted to ubiquitous conservancy of gene features; thus, it reflects the mosaic-structure and dynamism of gene acquisition and loss in bacterial genomes.


2018 ◽  
Author(s):  
Hugo R Barajas de la Torre ◽  
Miguel Romero ◽  
Shamayim Martínez-Sánchez ◽  
Luis D Alcaraz

Background. Comparative genomics between closely related bacterial strains aids to distinguish important features like pathogenesis, antibiotic resistance, and phylogenetic structure. Streptococcus is relevant because public health and food safety and it are well-represented (>100 genomes ) in databases of publicly available databases. Streptococci are cosmopolitan, and there are multiple sources of isolation, from humans to dairy products. The Streptococcus have been classified by morphology, serum types, 16S rRNA gene, and Multi Locus Sequence Types (MLST). The Genomic Similarity Score (GSS) is proposed as a tool to quantify genome level relatedness between Streptococcus and using their core genome as a simplified tool to assess strain specific abundances in metagenomic sequences. Methods. A 16S rRNA gene phylogeny has been calculated for 108 strains, belonging to 16 Streptococcus species and compared the results to a dendrogram using the GSS with all homologous shared information available in the genomes. Additionally, genus core and pan-genome were calculated. The core genome sequences identity was analyzed and the core genome was used as a seed to discriminate abundances between close related strains in metagenomic samples. Results. A total of 404 proteins are shared by all 108 Streptococcus genomes, which are the core genome. The core identity values ranges across all the compared strains and outgroups are reported. Lower sequence identity variation (90-100%) within the core belongs to ribosomal and translation-related proteins. It was found out that 48 proteins (11.8%) of the core genome are considered a hypothetical protein and those proteins host the larger sequence identity variations within the core. The sequence identity of the core genome identity diminishes as GSS score between species increases. The GSS dendrogram recovers most of the clades in the 16S rRNA gene phylogeny with the advantage to distinguish between 16S polytomies (unresolved nodes). Finally, our proposed core genome was used to distinguish the abundances of close related strains within human oral metagenomes being able to get strain relative abundances between healthy and caries infected (with S. mutans) individuals. Discussion. The clinical and food safety importance of Streptococcus genus gives a playground to test multiple comparative genomic scenarios due to its excellent genomic coverage. Understanding of genomic variability and strains relatedness is the goal of tools like GSS, which make use of both pairwise shared core and pan-genomic homologous shared sequences for its calculation. Combination of core genome and rapid alignment tools allows to estimate abundance and discriminate in a strain-specific manner in metagenomic samples. Here it is shared with the community both GSS genomic dendrogram and core genome to explore possibilities within streptococci.


2021 ◽  
Vol 9 (8) ◽  
pp. 1570
Author(s):  
Chien-Hsun Huang ◽  
Chih-Chieh Chen ◽  
Yu-Chun Lin ◽  
Chia-Hsuan Chen ◽  
Ai-Yun Lee ◽  
...  

The current taxonomy of the Lactiplantibacillus plantarum group comprises of 17 closely related species that are indistinguishable from each other by using commonly used 16S rRNA gene sequencing. In this study, a whole-genome-based analysis was carried out for exploring the highly distinguished target genes whose interspecific sequence identity is significantly less than those of 16S rRNA or conventional housekeeping genes. In silico analyses of 774 core genes by the cano-wgMLST_BacCompare analytics platform indicated that csbB, morA, murI, mutL, ntpJ, rutB, trmK, ydaF, and yhhX genes were the most promising candidates. Subsequently, the mutL gene was selected, and the discrimination power was further evaluated using Sanger sequencing. Among the type strains, mutL exhibited a clearly superior sequence identity (61.6–85.6%; average: 66.6%) to the 16S rRNA gene (96.7–100%; average: 98.4%) and the conventional phylogenetic marker genes (e.g., dnaJ, dnaK, pheS, recA, and rpoA), respectively, which could be used to separat tested strains into various species clusters. Consequently, species-specific primers were developed for fast and accurate identification of L. pentosus, L. argentoratensis, L. plantarum, and L. paraplantarum. During this study, one strain (BCRC 06B0048, L. pentosus) exhibited not only relatively low mutL sequence identities (97.0%) but also a low digital DNA–DNA hybridization value (78.1%) with the type strain DSM 20314T, signifying that it exhibits potential for reclassification as a novel subspecies. Our data demonstrate that mutL can be a genome-wide target for identifying and classifying the L. plantarum group species and for differentiating novel taxa from known species.


Author(s):  
Jun-Jie Ying ◽  
Zhi-Cheng Wu ◽  
Yuan-Chun Fang ◽  
Lin Xu ◽  
Cong Sun

Parvularcula flava was proposed as a novel member of genus Parvularcula in 2016. Some time earlier, Aquisalinus flavus has been proposed as a novel species of a novel genus named Aquisalinus . When comparing the 16S rRNA gene sequences of type strains P. flava NH6-79T and A. flavus D11M-2T, they showed 97.9 % sequence identity, much higher than the sequence identities 92.7–94.3 % between P. flava NH6-79T and type strains in the genus Parvularcula , indicating that the later proposed novel taxon Parvularcula flava need reclassification. The phylogenetic trees based on 16S rRNA gene sequences and genome sequences both showed that P. flava NH6-79T and A. flavus D11M-2T formed a separated branch away from strains in the genera Parvularcula , Marinicaulis and Amphiplicatus . The average amino acid identity and average nucleotide identity values of P. flava NH6-79T and A. flavus D11M-2T were 87.9 and 85.0 %, respectively, much higher than the values between P. flava NH6-79T and other closely related type strains (54.3 %–58.1 % and 68.6–70.4 %, respectively). P. flava NH6-79T and A. flavus D11M-2T also contained summed feature 8 (C18 : 1  ω6c and/or C18 : 1  ω7c) and C16 : 0 as major fatty acids, distinguishing them from other closely related taxa. Based on the results of the phylogenetic, comparative genomic and phenotypic analyses, Parvularcula flava should be reclassified as Aquisalinus luteolus nom. nov. and the description of genus Aquisalinus is emended.


Author(s):  
Soon Dong Lee ◽  
Yeong-Sik Byeon ◽  
Sung-Min Kim ◽  
Hong Lim Yang ◽  
In Seop Kim

Taxonomic positions of four Gram-negative bacterial strains, which were isolated from larvae of two insects in Jeju, Republic of Korea, were determined by a polyphasic approach. Strains CWB-B4, CWB-B41 and CWB-B43 were recovered from larvae of Protaetia brevitarsis seulensis, whereas strain BWR-B9T was from larvae of Allomyrina dichotoma. All the isolates grew at 10–37 °C, at pH 5.0–9.0 and in the presence of 4 % (w/v) NaCl. The 16S rRNA gene phylogeny showed that the four isolates formed two distinct sublines within the order Enterobacteriales and closely associated with members of the genus Jinshanibacter . The first group represented by strain CWB-B4 formed a tight cluster with Jinshanibacter xujianqingii CF-1111T (99.3 % sequence similarity), whereas strain BWR-B9T was most closely related to Jinshanibacter zhutongyuii CF-458T (99.5 % sequence similarity). The 92 core gene analysis showed that the isolates belonged to the family Budviciaceae and supported the clustering shown in 16S rRNA gene phylogeny. The genomic DNA G+C content of the isolates was 45.2 mol%. A combination of overall genomic relatedness and phenotypic distinctness supported that three isolates from Protaetia brevitarsis seulensis are different strains of Jinshanibacter xujianqingii , whereas one isolate from Allomyrina dichotoma represents a new species of the genus Jinshanibacter . On the basis of results obtained here, Jinshanibacter allomyrinae sp. nov. (type strain BWR-B9T=KACC 22153T=NBRC 114879T) and Insectihabitans xujianqingii gen. nov., comb. nov. are proposed, with the emended descriptions of the genera Jinshanibacter , Limnobaculum and Pragia .


2009 ◽  
Vol 75 (22) ◽  
pp. 7153-7162 ◽  
Author(s):  
Junichi Miyazaki ◽  
Ryosaku Higa ◽  
Tomohiro Toki ◽  
Juichiro Ashi ◽  
Urumu Tsunogai ◽  
...  

ABSTRACT The potential for microbial nitrogen fixation in the anoxic methane seep sediments in a mud volcano, the number 8 Kumano Knoll, was characterized by molecular phylogenetic analyses. A total of 111 of the nifH (a gene coding a nitrogen fixation enzyme, Fe protein) clones were obtained from different depths of the core sediments, and the phylogenetic analysis of the clones indicated the genetic diversity of nifH genes. The predominant group detected (methane seep group 2), representing 74% of clonal abundance, was phylogenetically related to the nifH sequences obtained from the Methanosarcina species but was most closely related to the nifH sequences potentially derived from the anoxic methanotrophic archaea (ANME-2 archaea). The recovery of the nif gene clusters including the nifH sequences of the methane seep group 2 and the subsequent reverse transcription-PCR detection of the nifD and nifH genes strongly suggested that the genetic components of the gene clusters would be operative for the in situ assimilation of molecular nitrogen (N2) by the host microorganisms. DNA-based quantitative PCR of the archaeal 16S rRNA gene, the group-specific mcrA (a gene encoding the methyl-coenzyme M reductase α subunit) gene, and the nifD and nifH genes demonstrated the similar distribution patterns of the archaeal 16S rRNA gene, the mcrA groups c-d and e, and the nifD and nifH genes through the core sediments. These results supported the idea that the anoxic methanotrophic archaea ANME-2c could be the microorganisms hosting the nif gene clusters and could play an important role in not only the in situ carbon (methane) cycle but also the nitrogen cycle in subseafloor sediments.


2015 ◽  
Vol 65 (Pt_1) ◽  
pp. 251-259 ◽  
Author(s):  
Patricia L. Tavormina ◽  
Roland Hatzenpichler ◽  
Shawn McGlynn ◽  
Grayson Chadwick ◽  
Katherine S. Dawson ◽  
...  

We report the isolation and growth characteristics of a gammaproteobacterial methane-oxidizing bacterium (Methylococcaceae strain WF1T, ‘whale fall 1’) that shares 98 % 16S rRNA gene sequence identity with uncultivated free-living methanotrophs and the methanotrophic endosymbionts of deep-sea mussels, ≤94.6 % 16S rRNA gene sequence identity with species of the genus Methylobacter and ≤93.6 % 16S rRNA gene sequence identity with species of the genera Methylomonas and Methylosarcina . Strain WF1T represents the first cultivar from the ‘deep sea-1’ clade of marine methanotrophs, which includes members that participate in methane oxidation in sediments and the water column in addition to mussel endosymbionts. Cells of strain WF1T were elongated cocci, approximately 1.5 µm in diameter, and occurred singly, in pairs and in clumps. The cell wall was Gram-negative, and stacked intracytoplasmic membranes and storage granules were evident. The genomic DNA G+C content of WF1T was 40.5 mol%, significantly lower than that of currently described cultivars, and the major fatty acids were 16 : 0, 16 : 1ω9c, 16 : 1ω9t, 16 : 1ω8c and 16 : 2ω9,14. Growth occurred in liquid media at an optimal temperature of 23 °C, and was dependent on the presence of methane or methanol. Atmospheric nitrogen could serve as the sole nitrogen source for WF1T, a capacity that had not been functionally demonstrated previously in members of Methylobacter . On the basis of its unique morphological, physiological and phylogenetic properties, this strain represents the type species within a new genus, and we propose the name Methyloprofundus sedimenti gen. nov., sp. nov. The type strain of Methyloprofundus sedimenti is WF1T ( = LMG 28393T = ATCC BAA-2619T).


2020 ◽  
Author(s):  
Eiseul Kim ◽  
Seung-Min Yang ◽  
Bora Lim ◽  
Si Hong Park ◽  
Bryna Rackerby ◽  
...  

Abstract Background Lactobacillus species are used as probiotics and play an important role in fermented food production. However, use of 16S rRNA gene sequences as standard markers for the differentiation of Lactobacillus species offers a very limited scope, as several species of Lactobacillus share similar 16S rRNA gene sequences. In this study, we developed a rapid and accurate method based on comparative genomic analysis for the simultaneous identification of 37 Lactobacillus species that are commonly used in probiotics and fermented foods. Results To select species-specific sequences or genes, a total of 180 Lactobacillus genome sequences were compared using Python scripts. In 14 out of 37 species, species-specific sequences could not be found due to the similarity of the 16S–23S rRNA gene. Selected unique genes were obtained using comparative genomic analysis and all genes were confirmed to be specific for 52,478,804 genomes via in silico analysis; they were found not to be strain-specific, but to exist in all strains of the same species. Species-specific primer pairs were designed from the selected 16S–23S rRNA gene sequences or unique genes of species. The specificity of the species-specific primer pairs was confirmed using reference strains, and the accuracy and efficiency of the polymerase chain reaction (PCR) with the standard curve were confirmed. The PCR method developed in this study is able to accurately differentiate species that were not distinguishable using the 16S rRNA gene alone. This PCR assays were designed to detect and identify 37 Lactobacillus species. The developed method was then applied in the monitoring of 19 probiotics and 12 dairy products. The applied tests confirmed that the species detected in 17 products matched those indicated on their labels, whereas the remaining products contained species other than those appearing on the label. Conclusions The method developed in this study is able to rapidly and accurately distinguish different species of Lactobacillus , and can be used to monitor specific Lactobacillus species in foods such as probiotics and dairy products.


Sign in / Sign up

Export Citation Format

Share Document