taxonomic assignments
Recently Published Documents


TOTAL DOCUMENTS

95
(FIVE YEARS 48)

H-INDEX

17
(FIVE YEARS 5)

2022 ◽  
Vol 12 ◽  
Author(s):  
Xiangyang Li ◽  
Zilin Yang ◽  
Zhao Wang ◽  
Weipeng Li ◽  
Guohui Zhang ◽  
...  

Pseudomonas stutzeri is a species complex with extremely broad phenotypic and genotypic diversity. However, very little is known about its diversity, taxonomy and phylogeny at the genomic scale. To address these issues, we systematically and comprehensively defined the taxonomy and nomenclature for this species complex and explored its genetic diversity using hundreds of sequenced genomes. By combining average nucleotide identity (ANI) evaluation and phylogenetic inference approaches, we identified 123 P. stutzeri complex genomes covering at least six well-defined species among all sequenced Pseudomonas genomes; of these, 25 genomes represented novel members of this species complex. ANI values of ≥∼95% and digital DNA-DNA hybridization (dDDH) values of ≥∼60% in combination with phylogenomic analysis consistently and robustly supported the division of these strains into 27 genomovars (most likely species to some extent), comprising 16 known and 11 unknown genomovars. We revealed that 12 strains had mistaken taxonomic assignments, while 16 strains without species names can be assigned to the species level within the species complex. We observed an open pan-genome of the P. stutzeri complex comprising 13,261 gene families, among which approximately 45% gene families do not match any sequence present in the COG database, and a large proportion of accessory genes. The genome contents experienced extensive genetic gain and loss events, which may be one of the major mechanisms driving diversification within this species complex. Surprisingly, we found that the ectoine biosynthesis gene cluster (ect) was present in all genomes of P. stutzeri species complex strains but distributed at very low frequency (43 out of 9548) in other Pseudomonas genomes, suggesting a possible origin of the ancestors of P. stutzeri species complex in high-osmolarity environments. Collectively, our study highlights the potential of using whole-genome sequences to re-evaluate the current definition of the P. stutzeri complex, shedding new light on its genomic diversity and evolutionary history.


2022 ◽  
Author(s):  
Mark Achtman ◽  
Zhemin Zhou ◽  
Jane Charlesworth ◽  
Laura A. Baxter

The definition of bacterial species is traditionally a taxonomic issue while defining bacterial populations is done with population genetics. These assignments are species specific, and depend on the practitioner. Legacy multilocus sequence typing is commonly used to identify sequence types (STs) and clusters (ST Complexes). However, these approaches are not adequate for the millions of genomic sequences from bacterial pathogens that have been generated since 2012. EnteroBase (http://enterobase.warwick.ac.uk) automatically clusters core genome MLST alleles into hierarchical clusters (HierCC) after assembling annotated draft genomes from short read sequences. HierCC clusters span core sequence diversity from the species level down to individual transmission chains. Here we evaluate the ability of HierCC to correctly assign 100,000s of genomes to the species/subspecies and population levels for Salmonella, Clostridoides, Yersinia, Vibrio and Streptococcus. HierCC assignments were more consistent with maximum-likelihood super-trees of core SNPs or presence/absence of accessory genes than classical taxonomic assignments or 95% ANI. However, neither HierCC nor ANI were uniformly consistent with classical taxonomy of Streptococcus. HierCC was also consistent with legacy eBGs/ST Complexes in Salmonella or Escherichia and revealed differences in vertical inheritance of O serogroups. Thus, EnteroBase HierCC supports the automated identification of and assignment to species/subspecies and populations for multiple genera.


2021 ◽  
Vol 5 ◽  
Author(s):  
Barbara R. Leite ◽  
Pedro E. Vieira ◽  
Jesús S. Troncoso ◽  
Filipe O. Costa

DNA metabarcoding has great potential to improve marine biomonitoring programs by providing a rapid and accurate assessment of species composition in zoobenthic communities. However, some methodological improvements are still required, especially regarding failed detections, primers efficiency and incompleteness of databases. Here we assessed the efficiency of two different marker loci (COI and 18S) and three primer pairs in marine species detection through DNA metabarcoding of the macrozoobenthic communities colonizing three types of artificial substrates (slate, PVC and granite), sampled between 3 and 15 months of deployment. To accurately compare detection success between markers, we also compared the representativeness of the detected species in public databases and revised the reliability of the taxonomic assignments. Globally, we recorded extensive complementarity in the species detected by each marker, with 69% of the species exclusively detected by either 18S or COI. Individually, each of the three primer pairs recovered, at most, 52% of all species detected on the samples, showing also different abilities to amplify specific taxonomic groups. Most of the detected species have reliable reference sequences in their respective databases (82% for COI and 72% for 18S), meaning that when a species was detected by one marker and not by the other, it was most likely due to faulty amplification, and not by lack of matching sequences in the database. Overall, results showed the impact of marker and primer applied on species detection ability and indicated that, currently, if only a single marker or primer pair is employed in marine zoobenthos metabarcoding, a fair portion of the diversity may be overlooked.


2021 ◽  
Author(s):  
Francesca Petriglieri ◽  
Caitlin Singleton ◽  
Zivile Kondrotaite ◽  
Morten K. D. Dueholm ◽  
Elizabeth A. McDaniel ◽  
...  

Candidatus Accumulibacter was the first microorganism identified as a polyphosphate-accumulating organism (PAO), important for phosphorus removal from wastewater. This genus is diverse, and the current phylogeny and taxonomic framework appears complicated, with the majority of publicly available genomes classified as Candidatus Accumulibacter phosphatis, despite notable phylogenetic divergence. The ppk1 marker gene allows for a finer scale differentiation into different types and clades, nevertheless taxonomic assignments remain confusing and inconsistent across studies. Therefore, a comprehensive re-evaluation is needed to establish a common understanding of this genus, both in terms of naming and basic conserved physiological traits. Here, we provide this re-assessment using a comparison of genome, ppk1, and 16S rRNA gene-based approaches from comprehensive datasets. We identified 15 novel species, along with the well-known Ca. A. phosphatis, Ca. A. deltensis and Ca. A. aalborgensis. To compare the species in situ, we designed new species-specific FISH probes and revealed their morphology and arrangement in activated sludge. Based on the MiDAS global survey, Ca. Accumulibacter species were widespread in WWTPs with phosphorus removal, indicating the process design as a major driver for their abundance. Genome mining for PAO related pathways and FISH-Raman microspectroscopy confirmed the potential for the PAO metabolism in all Ca. Accumulibacter species, with detection in situ of the typical PAO storage polymers. Genome annotation further revealed fine-scale differences in the nitrate/nitrite reduction pathways. This provides insights into the niche differentiation of these lineages, potentially explaining their coexistence in the same ecosystem while contributing to overall phosphorus and nitrogen removal.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12065
Author(s):  
Katie Miaow ◽  
Donnabella Lacap-Bugler ◽  
Hannah L. Buckley

Microbes are fundamental to Earth’s ecosystems, thus understanding ecosystem connectivity through microbial dispersal is key to predicting future ecosystem changes in a warming world. However, aerial microbial dispersal remains poorly understood. Few studies have been performed on bioaerosols (microorganisms and biological fragments suspended in the atmosphere), despite them harboring pathogens and allergens. Most environmental microbes grow poorly in culture, therefore molecular approaches are required to characterize aerial diversity. Bioinformatic tools are needed for processing the next generation sequencing (NGS) data generated from these molecular approaches; however, there are numerous options and choices in the process. These choices can markedly affect key aspects of the data output including relative abundances, diversity, and taxonomy. Bioaerosol samples have relatively little DNA, and often contain novel and proportionally high levels of contaminant organisms, that are difficult to identify. Therefore, bioinformatics choices are of crucial importance. A bioaerosol dataset for bacteria and fungi based on the 16S rRNA gene (16S) and internal transcribed spacer (ITS) DNA sequencing from parks in the metropolitan area of Auckland, Aotearoa New Zealand was used to develop a process for determining the bioinformatics pipeline that would maximize the data amount and quality generated. Two popular tools (Dada2 and USEARCH) were compared for amplicon sequence variant (ASV) inference and generation of an ASV table. A scorecard was created and used to assess multiple outputs and make systematic choices about the most suitable option. The read number and ASVs were assessed, alpha diversity was calculated (Hill numbers), beta diversity (Bray–Curtis distances), differential abundance by site and consistency of ASVs were considered. USEARCH was selected, due to higher consistency in ASVs identified and greater read counts. Taxonomic assignment is highly dependent on the taxonomic database used. Two popular taxonomy databases were compared in terms of number and confidence of assignments, and a combined approach developed that uses information in both databases to maximize the number and confidence of taxonomic assignments. This approach increased the assignment rate by 12–15%, depending on amplicon and the overall assignment was 77% for bacteria and 47% for fungi. Assessment of decontamination using “decontam” and “microDecon” was performed, based on review of ASVs identified as contaminants by each and consideration of the probability of them being legitimate members of the bioaerosol community. For this example, “microDecon’s” subtraction approach for removing background contamination was selected. This study demonstrates a systematic approach to determining the optimal bioinformatics pipeline using a multi-criteria scorecard for microbial bioaerosol data. Example code in the R environment for this data processing pipeline is provided.


2021 ◽  
Vol 12 ◽  
Author(s):  
Alex J. Mullins ◽  
Eshwar Mahenthiralingam

Burkholderia sensu lato is a collection of closely related genera within the family Burkholderiaceae that includes species of environmental, industrial, biotechnological, and clinical importance. Multiple species within the complex are the source of diverse specialized metabolites, many of which have been identified through genome mining of their biosynthetic gene clusters (BGCs). However, the full, true genomic diversity of these species and genera, and their biosynthetic capacity have not been investigated. This study sought to cluster and classify over 4000 Burkholderia sensu lato genome assemblies into distinct genomic taxa representing named and uncharacterized species. We delineated 235 species groups by average nucleotide identity analyses that formed seven distinct phylogenomic clades, representing the genera of Burkholderia sensu lato: Burkholderia, Paraburkholderia, Trinickia, Caballeronia, Mycetohabitans, Robbsia, and Pararobbisa. A total of 137 genomic taxa aligned with named species possessing a sequenced type strain, while 93 uncharacterized species groups were demarcated. The 95% ANI threshold proved capable of delineating most genomic species and was only increased to resolve several closely related species. These analyses enabled the assessment of species classifications of over 4000 genomes, and the correction of over 400 genome taxonomic assignments in public databases into existing and uncharacterized genomic species groups. These species groups were genome mined for BGCs, their specialized metabolite capacity calculated per species and genus, and the number of distinct BGCs per species estimated through kmer-based de-replication. Mycetohabitans species dedicated a larger proportion of their relatively small genomes to specialized metabolite biosynthesis, while Burkholderia species harbored more BGCs on average per genome and possessed the most distinct BGCs per species compared to the remaining genera. Exploring the hidden genomic diversity of this important multi-genus complex contributes to our understanding of their taxonomy and evolutionary relationships, and supports future efforts toward natural product discovery.


2021 ◽  
Author(s):  
Abhijeet Singh ◽  
Anna Schnurer

AcetoBase is a public repository and database published in 2019, for the formyltetrahydrofolate synthetase (FTHFS) sequences. It is the first systematic collection of bacterial formyltetrahydrofolate nucleotide and protein sequences from the genomes and metagenome assembled genomes (MAGs), as well as sequences generated by clone library sequencing. In addition, AcetoBase was first to establish connection between FTHFS gene with the Wood-Ljungdahl pathway and 16S rRNA genes. Since the publication of AcetoBase, significant improvements were seen in the taxonomy of many bacterial lineages and accessibility/availability of public genomics and metagenomics data. Thus, an update to the AcetoBase database with new sequence data and taxonomy has been made along with improvements in web-functionality and user interface. The update in AcetoBase reference database version 2 was furthermore evaluated by reanalysis of publicly accessible FTHFS amplicon sequencing data previously analysed with AcetoBase version 1. The latest database update showed significant improvements in the taxonomic assignments of FTHFS sequences. AcetoBase with its enhancements in functionality and content is publicly accessible at https://acetobase.molbio.slu.se.


2021 ◽  
Vol 5 ◽  
Author(s):  
Ľubomír Rajter ◽  
Micah Dunthorn

Although ciliates are one of the most dominant microbial eukaryotic groups in many environments, there is a lack of updated global ciliate alignments and reference trees that can be used for phylogenetic placement methods to analyze environmental metabarcoding data. Here we fill this gap by providing reference alignments and trees for those ciliates taxa with available SSU-rDNA sequences derived from identified species. Each alignment contains 478 ciliate and six outgroup taxa, and they were made using different masking strategies for alignment positions (unmasked, masked and masked except the hypervariable V4 region). We constrained the monophyly of the major ciliate groups based on the recently updated classification of protists and based on phylogenomic data. Taxa of uncertain phylogenetic position were kept unconstrained, except for Mesodinium species that we constrained to form a clade with the Litostomatea. These ciliate reference alignments and trees can be used to perform taxonomic assignments of metabarcoding data, discover novel ciliate clades, estimate species richness, and overlay measured ecological parameters onto the phylogenetic placements.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11942
Author(s):  
Leslie M. Montes-Carreto ◽  
José Luis Aguirre-Noyola ◽  
Itzel A. Solís-García ◽  
Jorge Ortega ◽  
Esperanza Martinez-Romero ◽  
...  

Background The volcano rabbit is the smallest lagomorph in Mexico, it is monotypic and endemic to the Trans-Mexican Volcanic Belt. It is classified as endangered by Mexican legislation and as critically endangered by the IUCN, in the Red List. Romerolagus diazi consumes large amounts of grasses, seedlings, shrubs, and trees. Pines and oaks contain tannins that can be toxic to the organisms which consume them. The volcano rabbit microbiota may be rich in bacteria capable of degrading fiber and phenolic compounds. Methods We obtained the fecal microbiome of three adults and one young rabbit collected in Coajomulco, Morelos, Mexico. Taxonomic assignments and gene annotation revealed the possible roles of different bacteria in the rabbit gut. We searched for sequences encoding tannase enzymes and enzymes associated with digestion of plant fibers such as cellulose and hemicellulose. Results The most representative phyla within the Bacteria domain were: Proteobacteria, Firmicutes and Actinobacteria for the young rabbit sample (S1) and adult rabbit sample (S2), which was the only sample not confirmed by sequencing to correspond to the volcano rabbit. Firmicutes, Actinobacteria and Cyanobacteria were found in adult rabbit samples S3 and S4. The most abundant phylum within the Archaea domain was Euryarchaeota. The most abundant genera of the Bacteria domain were Lachnoclostridium (Firmicutes) and Acinetobacter (Proteobacteria), while Methanosarcina predominated from the Archaea. In addition, the potential functions of metagenomic sequences were identified, which include carbohydrate and amino acid metabolism. We obtained genes encoding enzymes for plant fiber degradation such as endo 1,4 β-xylanases, arabinofuranosidases, endoglucanases and β-glucosidases. We also found 18 bacterial tannase sequences.


2021 ◽  
Author(s):  
Physilia Y.S Chua ◽  
Frederik Leerhoi ◽  
Emilia M.R Langkjaer ◽  
Ashot Margaryan ◽  
Christina L Noer ◽  
...  

Recently, there has been a push towards the extended barcode concept of utilising chloroplast genomes (cpGenome) and nuclear ribosomal DNA (nrDNA) sequences for molecular identification of plants instead of the standard barcode regions. These extended barcodes has a wide range of applications, including biodiversity monitoring and assessment, primer design, and evolutionary studies. However, these extended barcodes are not well represented in global reference databases. To fill this gap, we generated cpGenomes and nrDNA reference data from genome skims of 184 plant species collected in Denmark. We further explored the application of our generated reference data for molecular identifications of plants in an environmental DNA metagenomics study. We assembled partial cpGenomes for 82.1% of sequenced species and full or partial nrDNA sequences for 83.7% of species. We added all assemblies to GenBank, of which chloroplast reference data from 101 species and nuclear reference data from 6 species were not previously represented. On average, we recovered 45 genes per species. The rate of recovery of standard barcodes was higher for nuclear barcodes (>89%) than chloroplast barcodes (< 60%). Extracted DNA yield did not affect assembly outcome, whereas high GC content did so negatively. For the in silico simulation of metagenomic reads, taxonomic assignments using the reference data generated had better species resolution (94.9%) as compared to GenBank (18.1%) without any identification errors. Genome skimming generates reference data of both standard barcodes and other loci, contributing to the global DNA reference database for plants.


Sign in / Sign up

Export Citation Format

Share Document