scholarly journals Global genomic similarity and core genome sequence diversity of the Streptococcus genus as a toolkit to identify close related bacterial strains in complex environments

Author(s):  
Hugo R Barajas de la Torre ◽  
Miguel Romero ◽  
Shamayim Martínez-Sánchez ◽  
Luis D Alcaraz

Background. Comparative genomics between closely related bacterial strains aids to distinguish important features like pathogenesis, antibiotic resistance, and phylogenetic structure. Streptococcus is relevant because public health and food safety and it are well-represented (>100 genomes ) in databases of publicly available databases. Streptococci are cosmopolitan, and there are multiple sources of isolation, from humans to dairy products. The Streptococcus have been classified by morphology, serum types, 16S rRNA gene, and Multi Locus Sequence Types (MLST). The Genomic Similarity Score (GSS) is proposed as a tool to quantify genome level relatedness between Streptococcus and using their core genome as a simplified tool to assess strain specific abundances in metagenomic sequences. Methods. A 16S rRNA gene phylogeny has been calculated for 108 strains, belonging to 16 Streptococcus species and compared the results to a dendrogram using the GSS with all homologous shared information available in the genomes. Additionally, genus core and pan-genome were calculated. The core genome sequences identity was analyzed and the core genome was used as a seed to discriminate abundances between close related strains in metagenomic samples. Results. A total of 404 proteins are shared by all 108 Streptococcus genomes, which are the core genome. The core identity values ranges across all the compared strains and outgroups are reported. Lower sequence identity variation (90-100%) within the core belongs to ribosomal and translation-related proteins. It was found out that 48 proteins (11.8%) of the core genome are considered a hypothetical protein and those proteins host the larger sequence identity variations within the core. The sequence identity of the core genome identity diminishes as GSS score between species increases. The GSS dendrogram recovers most of the clades in the 16S rRNA gene phylogeny with the advantage to distinguish between 16S polytomies (unresolved nodes). Finally, our proposed core genome was used to distinguish the abundances of close related strains within human oral metagenomes being able to get strain relative abundances between healthy and caries infected (with S. mutans) individuals. Discussion. The clinical and food safety importance of Streptococcus genus gives a playground to test multiple comparative genomic scenarios due to its excellent genomic coverage. Understanding of genomic variability and strains relatedness is the goal of tools like GSS, which make use of both pairwise shared core and pan-genomic homologous shared sequences for its calculation. Combination of core genome and rapid alignment tools allows to estimate abundance and discriminate in a strain-specific manner in metagenomic samples. Here it is shared with the community both GSS genomic dendrogram and core genome to explore possibilities within streptococci.

2018 ◽  
Author(s):  
Hugo R Barajas de la Torre ◽  
Miguel Romero ◽  
Shamayim Martínez-Sánchez ◽  
Luis D Alcaraz

Background. Comparative genomics between closely related bacterial strains can distinguish important features determining pathogenesis, antibiotic resistance, and phylogenetic structure. The Streptococcus genus is relevant to public health and food safety and it is well-represented (>100 genomes) in databases of publicly available databases. Streptococci are cosmopolitan, with multiple sources of isolation, from humans to dairy products. The Streptococcus genus has been classified by morphology, serotypes, 16S rRNA gene, and Multi Locus Sequence Types (MLST). The Genomic Similarity Score (GSS) is proposed as a tool to quantify genome level relatedness between species of Streptococcus. The Streptococcus core genome can be used to assess strain specific abundances in metagenomic sequences. Methods. A 16S rRNA gene phylogeny was calculated for 108 strains, belonging to 16 Streptococcus species and compared to a dendrogram using GSS pairwise distances for the same genomes. The core and pan-genome were calculated for these 108 genomes. The core genome sequences were analyzed and used as a resource to discriminate homologous fragment reads from closely related strains in metagenomic samples. Results. A total of 404 proteins are shared by all 108 Streptococcus genomes, which is the core genome. The pairwise amino acid identity values of the core proteins for all the compared strains and outgroups are reported. Lower sequence identity variation (90-100%) is predominantly found in core clusters containing ribosomal and translation-related proteins. For 48 core proteins (11.8%) no functional assignment could be made and those proteins have larger sequence identity variations than other core proteins. The sequence identity of the core genome diminishes as GSS score between species decreases. The GSS dendrogram recovers most of the clades in the 16S rRNA gene phylogeny while distinguishing between 16S polytomies (unresolved nodes). Finally, the core genome was used to distinguish between closely related species within human oral metagenomes. Discussion. The Streptococcus genus provides a benchmark dataset for comparative genomic studies due to the breath depth of genomic coverage. Comparing metagenomic shotgun fragment reads to the core genome using rapid alignment tools allows species-specific abundance estimates in metagenomic samples. Understanding of genomic variability and strains relatedness is the goal of tools like GSS, which make use of both pairwise shared core and pan-genomic homologous shared sequences for its calculation.


2018 ◽  
Author(s):  
Hugo R Barajas de la Torre ◽  
Miguel Romero ◽  
Shamayim Martínez-Sánchez ◽  
Luis D Alcaraz

Background. Comparative genomics between closely related bacterial strains can distinguish important features determining pathogenesis, antibiotic resistance, and phylogenetic structure. The Streptococcus genus is relevant to public health and food safety and it is well-represented (>100 genomes) in databases of publicly available databases. Streptococci are cosmopolitan, with multiple sources of isolation, from humans to dairy products. The Streptococcus genus has been classified by morphology, serotypes, 16S rRNA gene, and Multi Locus Sequence Types (MLST). The Genomic Similarity Score (GSS) is proposed as a tool to quantify genome level relatedness between species of Streptococcus. The Streptococcus core genome can be used to assess strain specific abundances in metagenomic sequences. Methods. A 16S rRNA gene phylogeny was calculated for 108 strains, belonging to 16 Streptococcus species and compared to a dendrogram using GSS pairwise distances for the same genomes. The core and pan-genome were calculated for these 108 genomes. The core genome sequences were analyzed and used as a resource to discriminate homologous fragment reads from closely related strains in metagenomic samples. Results. A total of 404 proteins are shared by all 108 Streptococcus genomes, which is the core genome. The pairwise amino acid identity values of the core proteins for all the compared strains and outgroups are reported. Lower sequence identity variation (90-100%) is predominantly found in core clusters containing ribosomal and translation-related proteins. For 48 core proteins (11.8%) no functional assignment could be made and those proteins have larger sequence identity variations than other core proteins. The sequence identity of the core genome diminishes as GSS score between species decreases. The GSS dendrogram recovers most of the clades in the 16S rRNA gene phylogeny while distinguishing between 16S polytomies (unresolved nodes). Finally, the core genome was used to distinguish between closely related species within human oral metagenomes. Discussion. The Streptococcus genus provides a benchmark dataset for comparative genomic studies due to the breath depth of genomic coverage. Comparing metagenomic shotgun fragment reads to the core genome using rapid alignment tools allows species-specific abundance estimates in metagenomic samples. Understanding of genomic variability and strains relatedness is the goal of tools like GSS, which make use of both pairwise shared core and pan-genomic homologous shared sequences for its calculation.


PeerJ ◽  
2019 ◽  
Vol 6 ◽  
pp. e6233 ◽  
Author(s):  
Hugo R. Barajas ◽  
Miguel F. Romero ◽  
Shamayim Martínez-Sánchez ◽  
Luis D. Alcaraz

Background The Streptococcus genus is relevant to both public health and food safety because of its ability to cause pathogenic infections. It is well-represented (>100 genomes) in publicly available databases. Streptococci are ubiquitous, with multiple sources of isolation, from human pathogens to dairy products. The Streptococcus genus has traditionally been classified by morphology, serum types, the 16S ribosomal RNA (rRNA) gene, and multi-locus sequence types subject to in-depth comparative genomic analysis. Methods Core and pan-genomes described the genomic diversity of 108 strains belonging to 16 Streptococcus species. The core genome nucleotide diversity was calculated and compared to phylogenomic distances within the genus Streptococcus. The core genome was also used as a resource to recruit metagenomic fragment reads from streptococci dominated environments. A conventional 16S rRNA gene phylogeny reconstruction was used as a reference to compare the resulting dendrograms of average nucleotide identity (ANI) and genome similarity score (GSS) dendrograms. Results The core genome, in this work, consists of 404 proteins that are shared by all 108 Streptococcus. The average identity of the pairwise compared core proteins decreases proportionally to GSS lower scores, across species. The GSS dendrogram recovers most of the clades in the 16S rRNA gene phylogeny while distinguishing between 16S polytomies (unresolved nodes). The GSS is a distance metric that can reflect evolutionary history comparing orthologous proteins. Additionally, GSS resulted in the most useful metric for genus and species comparisons, where ANI metrics failed due to false positives when comparing different species. Discussion Understanding of genomic variability and species relatedness is the goal of tools like GSS, which makes use of the maximum pairwise shared orthologous sequences for its calculation. It allows for long evolutionary distances (above species) to be included because of the use of amino acid alignment scores, rather than nucleotides, and normalizing by positive matches. Newly sequenced species and strains could be easily placed into GSS dendrograms to infer overall genomic relatedness. The GSS is not restricted to ubiquitous conservancy of gene features; thus, it reflects the mosaic-structure and dynamism of gene acquisition and loss in bacterial genomes.


Author(s):  
Soon Dong Lee ◽  
Yeong-Sik Byeon ◽  
Sung-Min Kim ◽  
Hong Lim Yang ◽  
In Seop Kim

Taxonomic positions of four Gram-negative bacterial strains, which were isolated from larvae of two insects in Jeju, Republic of Korea, were determined by a polyphasic approach. Strains CWB-B4, CWB-B41 and CWB-B43 were recovered from larvae of Protaetia brevitarsis seulensis, whereas strain BWR-B9T was from larvae of Allomyrina dichotoma. All the isolates grew at 10–37 °C, at pH 5.0–9.0 and in the presence of 4 % (w/v) NaCl. The 16S rRNA gene phylogeny showed that the four isolates formed two distinct sublines within the order Enterobacteriales and closely associated with members of the genus Jinshanibacter . The first group represented by strain CWB-B4 formed a tight cluster with Jinshanibacter xujianqingii CF-1111T (99.3 % sequence similarity), whereas strain BWR-B9T was most closely related to Jinshanibacter zhutongyuii CF-458T (99.5 % sequence similarity). The 92 core gene analysis showed that the isolates belonged to the family Budviciaceae and supported the clustering shown in 16S rRNA gene phylogeny. The genomic DNA G+C content of the isolates was 45.2 mol%. A combination of overall genomic relatedness and phenotypic distinctness supported that three isolates from Protaetia brevitarsis seulensis are different strains of Jinshanibacter xujianqingii , whereas one isolate from Allomyrina dichotoma represents a new species of the genus Jinshanibacter . On the basis of results obtained here, Jinshanibacter allomyrinae sp. nov. (type strain BWR-B9T=KACC 22153T=NBRC 114879T) and Insectihabitans xujianqingii gen. nov., comb. nov. are proposed, with the emended descriptions of the genera Jinshanibacter , Limnobaculum and Pragia .


2021 ◽  
Vol 9 (8) ◽  
pp. 1570
Author(s):  
Chien-Hsun Huang ◽  
Chih-Chieh Chen ◽  
Yu-Chun Lin ◽  
Chia-Hsuan Chen ◽  
Ai-Yun Lee ◽  
...  

The current taxonomy of the Lactiplantibacillus plantarum group comprises of 17 closely related species that are indistinguishable from each other by using commonly used 16S rRNA gene sequencing. In this study, a whole-genome-based analysis was carried out for exploring the highly distinguished target genes whose interspecific sequence identity is significantly less than those of 16S rRNA or conventional housekeeping genes. In silico analyses of 774 core genes by the cano-wgMLST_BacCompare analytics platform indicated that csbB, morA, murI, mutL, ntpJ, rutB, trmK, ydaF, and yhhX genes were the most promising candidates. Subsequently, the mutL gene was selected, and the discrimination power was further evaluated using Sanger sequencing. Among the type strains, mutL exhibited a clearly superior sequence identity (61.6–85.6%; average: 66.6%) to the 16S rRNA gene (96.7–100%; average: 98.4%) and the conventional phylogenetic marker genes (e.g., dnaJ, dnaK, pheS, recA, and rpoA), respectively, which could be used to separat tested strains into various species clusters. Consequently, species-specific primers were developed for fast and accurate identification of L. pentosus, L. argentoratensis, L. plantarum, and L. paraplantarum. During this study, one strain (BCRC 06B0048, L. pentosus) exhibited not only relatively low mutL sequence identities (97.0%) but also a low digital DNA–DNA hybridization value (78.1%) with the type strain DSM 20314T, signifying that it exhibits potential for reclassification as a novel subspecies. Our data demonstrate that mutL can be a genome-wide target for identifying and classifying the L. plantarum group species and for differentiating novel taxa from known species.


2021 ◽  
Vol 9 (6) ◽  
pp. 1307
Author(s):  
Sebastian Böttger ◽  
Silke Zechel-Gran ◽  
Daniel Schmermund ◽  
Philipp Streckbein ◽  
Jan-Falco Wilbrand ◽  
...  

Severe odontogenic abscesses are regularly caused by bacteria of the physiological oral microbiome. However, the culture of these bacteria is often prone to errors and sometimes does not result in any bacterial growth. Furthermore, various authors found completely different bacterial spectra in odontogenic abscesses. Experimental 16S rRNA gene next-generation sequencing analysis was used to identify the microbiome of the saliva and the pus in patients with a severe odontogenic infection. The microbiome of the saliva and the pus was determined for 50 patients with a severe odontogenic abscess. Perimandibular and submandibular abscesses were the most commonly observed diseases at 15 (30%) patients each. Polymicrobial infections were observed in 48 (96%) cases, while the picture of a mono-infection only occurred twice (4%). On average, 31.44 (±12.09) bacterial genera were detected in the pus and 41.32 (±9.00) in the saliva. In most cases, a predominantly anaerobic bacterial spectrum was found in the pus, while saliva showed a similar oral microbiome to healthy individuals. In the majority of cases, odontogenic infections are polymicrobial. Our results indicate that these are mainly caused by anaerobic bacterial strains and that aerobic and facultative anaerobe bacteria seem to play a more minor role than previously described by other authors. The 16S rRNA gene analysis detects significantly more bacteria than conventional methods and molecular methods should therefore become a part of routine diagnostics in medical microbiology.


Agriculture ◽  
2020 ◽  
Vol 10 (9) ◽  
pp. 383 ◽  
Author(s):  
Gustavo Enrique Mendoza-Arroyo ◽  
Manuel Jesús Chan-Bacab ◽  
Ruth Noemi Aguila-Ramírez ◽  
Benjamín Otto Ortega-Morales ◽  
René Efraín Canché Solís ◽  
...  

The excessive use of fertilizers in agriculture is mainly due to the recognized plant requirements for soluble phosphorus. This problem has limited the implementation of sustainable agriculture. A viable alternative is to use phosphate solubilizing soil microorganisms. This work aimed to isolate inorganic phosphorus-solubilizing bacteria from the soils of agroecosystems, to select and identify, based on sequencing and phylogenetic analysis of the 16S rRNA gene, the bacterium with the highest capacity for in vitro solubilization of inorganic phosphate. Additionally, we aimed to determine its primary phosphate solubilizing mechanisms and to evaluate its effect on Habanero pepper seedlings growth. A total of 21 bacterial strains were isolated by their activity on Pikovskaya agar. Of these, strain ITCB-09 exhibited the highest ability to solubilize inorganic phosphate (865.98 µg/mL) through the production of organic acids. This strain produced extracellular polymeric substances and siderophores that have ecological implications for phosphate solubilization. 16S rRNA gene sequence analysis revealed that strain ITCB-09 belongs to the genus Enterobacter. Enterobacter sp. ITCB-09, especially when immobilized in beads, had a positive effect on Capsicum chinense Jacq. seedling growth, indicating its potential as a biofertilizer.


2009 ◽  
Vol 75 (22) ◽  
pp. 7153-7162 ◽  
Author(s):  
Junichi Miyazaki ◽  
Ryosaku Higa ◽  
Tomohiro Toki ◽  
Juichiro Ashi ◽  
Urumu Tsunogai ◽  
...  

ABSTRACT The potential for microbial nitrogen fixation in the anoxic methane seep sediments in a mud volcano, the number 8 Kumano Knoll, was characterized by molecular phylogenetic analyses. A total of 111 of the nifH (a gene coding a nitrogen fixation enzyme, Fe protein) clones were obtained from different depths of the core sediments, and the phylogenetic analysis of the clones indicated the genetic diversity of nifH genes. The predominant group detected (methane seep group 2), representing 74% of clonal abundance, was phylogenetically related to the nifH sequences obtained from the Methanosarcina species but was most closely related to the nifH sequences potentially derived from the anoxic methanotrophic archaea (ANME-2 archaea). The recovery of the nif gene clusters including the nifH sequences of the methane seep group 2 and the subsequent reverse transcription-PCR detection of the nifD and nifH genes strongly suggested that the genetic components of the gene clusters would be operative for the in situ assimilation of molecular nitrogen (N2) by the host microorganisms. DNA-based quantitative PCR of the archaeal 16S rRNA gene, the group-specific mcrA (a gene encoding the methyl-coenzyme M reductase α subunit) gene, and the nifD and nifH genes demonstrated the similar distribution patterns of the archaeal 16S rRNA gene, the mcrA groups c-d and e, and the nifD and nifH genes through the core sediments. These results supported the idea that the anoxic methanotrophic archaea ANME-2c could be the microorganisms hosting the nif gene clusters and could play an important role in not only the in situ carbon (methane) cycle but also the nitrogen cycle in subseafloor sediments.


2007 ◽  
Vol 57 (9) ◽  
pp. 2089-2095 ◽  
Author(s):  
Jung-Hoon Yoon ◽  
So-Jung Kang ◽  
Sooyeon Park ◽  
Tae-Kwang Oh

Two Gram-negative, non-motile, pleomorphic bacterial strains, DS-40T and DS-45T, were isolated from a soil sample collected from Dokdo, Korea, and their exact taxonomic positions were investigated by using a polyphasic approach. Strains DS-40T and DS-45T grew optimally at 25 °C and pH 6.5–7.5 in the presence of 0–1.0 % (w/v) NaCl. They contained MK-7 as the predominant menaquinone and possessed iso-C15 : 0, iso-C17 : 0 3-OH and summed feature 3 (C16 : 1 ω7c and/or iso-C15 : 0 2-OH) as the major fatty acids. The DNA G+C contents of strains DS-40T and DS-45T were 36.0 and 36.8 mol%, respectively. Strains DS-40T and DS-45T shared a 16S rRNA gene sequence similarity of 96.7 % and demonstrated a mean DNA–DNA relatedness level of 12 %. Phylogenetic analyses based on 16S rRNA gene sequences revealed that strains DS-40T and DS-45T were most closely phylogenetically affiliated with the genus Pedobacter of the family Sphingobacteriaceae. Strains DS-40T and DS-45T exhibited 16S rRNA gene sequence similarity values of 91.4–93.7 and 89.9–91.6 % with respect to the type strains of Pedobacter and Sphingobacterium species, respectively. Phenotypic and chemotaxonomic properties, together with the phylogenetic data, support the assignment of strains DS-40T and DS-45T as two distinct species within the genus Pedobacter. On the basis of phenotypic, phylogenetic and genetic data, strains DS-40T and DS-45T represent two novel species of the genus Pedobacter, for which the names Pedobacter lentus sp. nov. and Pedobacter terricola sp. nov. are proposed, respectively. The respective type strains are DS-40T (=KCTC 12875T=JCM 14593T) and DS-45T (=KCTC 12876T=JCM 14594T).


Sign in / Sign up

Export Citation Format

Share Document