identity threshold
Recently Published Documents


TOTAL DOCUMENTS

15
(FIVE YEARS 9)

H-INDEX

4
(FIVE YEARS 3)

2021 ◽  
Vol 12 ◽  
Author(s):  
Yang Liu ◽  
Tao Pei ◽  
Shuoxing Yi ◽  
Juan Du ◽  
Xianjiao Zhang ◽  
...  

Rapid and accurate strain identification of the most closely related genera Myxococcus, Corallococcus, and Pyxidicoccus can enhance the efficiency of the mining of novel secondary metabolites through dereplication. However, the commonly used 16S rRNA gene sequencing cannot accurately differentiate members of the three genera above, and the whole-genome sequencing is unable to rapidly and inexpensively provide species assignation toward a large number of isolates. To overcome the limitations, the gyrB gene was investigated as a candidate genetic marker for exploring the phylogenetic relationships of bacteria within the three genera and for developing the gyrB-based typing method. Here, the bacterial phylogeny and species affiliations of the three genera were determined based on the phylogenomic reconstruction and the analysis of digital DNA–DNA hybridization values among 90 genomes, further confirming nine novel taxa and assigning over one-third of genomes to defined species. The phylogenetic relationships of these strains based on the gyrB gene sequences were congruent with those based on their genome sequences, allowing the use of the gyrB gene as a molecular marker. The gyrB gene-specific primers for the PCR-amplification and sequencing of bacteria within the three genera were designed and validated for 31 isolates from our group collection. The gyrB-based taxonomic tool proved to be able to differentiate closely related isolates at the species level. Based on the newly proposed 98.6% identity threshold for the 966-bp gyrB gene and the phylogenetic inference, these isolates were assigned into two known species and eight additional putative new species. In summary, this report demonstrated that the gyrB gene is a powerful phylogenetic marker for taxonomy and phylogeny of bacteria within the closely related genera Myxococcus, Corallococcus, and Pyxidicoccus, particularly in the case of hundreds or thousands of isolates in environmental studies.


Author(s):  
Seung-Hyeon Choi ◽  
Jam-Eon Park ◽  
Ji Young Choi ◽  
Ji-Sun Kim ◽  
Se Won Kang ◽  
...  

A novel bacterial isolate designated as strain AGMB01083T was isolated from Korean cow faeces deposited in the National Institute of Animal Science (Wanju, Republic of Korea). The bacterium is obligate anaerobic, Gram-strain-positive, and motile. Cells are straight or curved rod-shaped, flagella and spores are observed. Growth occurs between 20–40 °C (temperature optimum of 35 °C), at pH 7–9 (pH optimum of 7), and in the presence of 0.5–1.0 % (w/v) NaCl. Based on the 16S rRNA gene sequence analysis, the strain belongs to the genus Anaerosporobacter and is most closely related to A. mobilis HY-37-4T (=KCTC5027T, similarity, 95.7 %). The DNA G+C content is 36.2 mol%, determined by the whole-genome sequence. The average nucleotide identity value between strain AGMB01083T and strain A. mobilis HY-37-4T is 75.5 %, below the interspecies identity threshold value. The major cellular fatty acids (>10 %) of strain AGMB01083T are C16 : 0, C16 : 0 dimethyl acetal (DMA), and C16 : 0 3-OH. Based on the phylogenetic, phenotypic, biochemical, chemotaxonomic, and genomic characterization, strain AGMB01083T is proposed to be a novel species, named Anaerosporobacter faecicola, in the genus Anaerosporobacter . The type strain is AGMB01083T (=KCTC 15857T=NBRC 114517T).


2021 ◽  
Author(s):  
David Peris ◽  
Dabao Sun Lu ◽  
Vilde Bruhn Kinneberg ◽  
Ine-Susanne Hopland Methlie ◽  
Malin Stapnes Dahl ◽  
...  

Balancing selection, an evolutionary force that retains genetic diversity, has been detected in multiple genes and organisms, such as the sexual mating loci in fungi. In tetrapolar basidiomycete fungi, sexual type is determined by two unlinked loci, MATA and MATB. These loci are usually highly diverse, but with conserved domains. Previous studies have revealed that species of the genus Trichaptum (Hymenochaetales, Basidiomycota) possess a tetrapolar mating system, with multiple inferred alleles for MATA and MATB. Here, we sequenced a total of a hundred and eighty specimens of three Trichaptum species. We characterized the chromosomal location of MATA (chromosome 2) and MATB (chromosome 9), the molecular structure of MAT regions and their allelic richness. We found multiple MAT alleles segregating in both multiple Trichaptum specimens, and the non-Trichaptum species included for comparison. Phylogenetic analyses and various nucleotide statistics suggested that long-term balancing selection has generated trans-species polymorphisms. Mating sequences were classified in different allelic classes based on an identity threshold of higher than 86%. The observed allelic classes could potentially generate 14,560 different mating types. The inferred allelic information mirrored the outcome of in vitro crosses, thus allowing us to support the degree of allelic divergence needed for successful mating. Even with the high amount of divergence, key amino acids in functional domains are conserved. We conclude that the genetic diversity of mating in Trichaptum loci is due to long-term balancing selection that likely promote sexual outcrossing, with limited recombination and duplication activity. Our large number of sequenced specimens highlighted the importance of sequencing multiple individuals from different species to detect the mating-related genes, the mechanisms generating diversity and the evolutionary forces maintaining them.


2021 ◽  
Vol 22 (5) ◽  
pp. 2244
Author(s):  
Anton E. Shikov ◽  
Yury V. Malovichko ◽  
Arseniy A. Lobov ◽  
Maria E. Belousova ◽  
Anton A. Nizhnikov ◽  
...  

Bacillus thuringiensis, commonly referred to as Bt, is an object of the lasting interest of microbiologists due to its highly effective insecticidal properties, which make Bt a prominent source of biologicals. To categorize the exuberance of Bt strains discovered, serotyping assays are utilized in which flagellin serves as a primary seroreactive molecule. Despite its convenience, this approach is not indicative of Bt strains’ phenotypes, neither it reflects actual phylogenetic relationships within the species. In this respect, comparative genomic and proteomic techniques appear more informative, but their use in Bt strain classification remains limited. In the present work, we used a bottom-up proteomic approach based on fluorescent two-dimensional difference gel electrophoresis (2D-DIGE) coupled with liquid chromatography/tandem mass spectrometry(LC-MS/MS) protein identification to assess which stage of Bt culture, vegetative or spore, would be more informative for strain characterization. To this end, the proteomic differences for the israelensis-attributed strains were assessed to compare sporulating cultures of the virulent derivative to the avirulent one as well as to the vegetative stage virulent bacteria. Using the same approach, virulent spores of the israelensis strain were also compared to the spores of strains belonging to two other major Bt serovars, namely darmstadiensis and thuringiensis. The identified proteins were analyzed regarding the presence of the respective genes in the 104 Bt genome assemblies available at open access with serovar attributions specified. Of 21 proteins identified, 15 were found to be encoded in all the present assemblies at 67% identity threshold, including several virulence factors. Notable, individual phylogenies of these core genes conferred neither the serotyping nor the flagellin-based phylogeny but corroborated the reconstruction based on phylogenomics approaches in terms of tree topology similarity. In its turn, the distribution of accessory protein genes was not confined to the existing serovars. The obtained results indicate that neither gene presence nor the core gene sequence may serve as distinctive bases for the serovar attribution, undermining the notion that the serotyping system reflects strains’ phenotypic or genetic similarity. We also provide a set of loci, which fit in with the phylogenomics data plausibly and thus may serve for draft phylogeny estimation of the novel strains.


2020 ◽  
Author(s):  
João Pedro Saraiva ◽  
Marta Gomes ◽  
René Kallies ◽  
Carsten Vogt ◽  
Antonis Chatzinotas ◽  
...  

Abstract Background: The exponential increase in high-throughput sequencing data and the development of computational sciences and bioinformatics pipelines has advanced our understanding of microbial community composition and distribution in complex ecosystems. Despite these advances, the identification of microbial interactions from genomic data remains a major bottleneck. To address this challenge, we present OrtSuite, a flexible workflow to predict putative microbial interactions based on genomic content. Results: OrtSuite combines ortholog clustering strategies with genome annotation based on a user-defined set of functions allowing for hypothesis-driven data analysis. OrtSuit allows users to install and run all workflow components and analyze the generated outputs using a simple pipeline consisting of 23 bash commands and one R command. Annotation is based on a two-stage process. First, only a subset of sequences from each ortholog cluster are aligned to all sequences in the Ortholog-Reaction Association database (ORAdb). Next, all sequences from clusters that meet a user-defined identity threshold are aligned to all sequence sets in ORAdb to which they had a hit. This approach results in a decrease in time needed for functional annotation. Further, OrtSuit identifies putative interspecies interactions based on their individual genomic content based on constrains given by the users. Additional control is afforded to the user at several stages of the workflow: 1) The construction of ORAdb only needs to be performed once for each specific process also allowing manual curation; 2) The identity and sequence similarity thresholds used during the annotation stage can be adjusted; and 3) Constraints related to pathway reaction composition and known species contributions to ecosystem processes can be defined. Conclusions: OrtSuit is an easy to use workflow that allows for rapid functional annotation based on a user curated database. Further, this novel workflow allows the identification of interspecies interactions through user-defined constrains. Due to its low computational demands, for small datasets (e.g. maximum 100 genomes) OrtSuit can run on a personal computer. For larger datasets (> 100 genomes), we suggest the use of computer clusters. OrtSuit is an open-source software available at https://github.com/mdsufz/OrtSuit .


2020 ◽  
Vol 21 (6) ◽  
pp. 2243
Author(s):  
Nicolas K. Shinada ◽  
Peter Schmidtke ◽  
Alexandre G. de Brevern

The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein–protein, protein–DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Hence, the protein-ligand duplicates in the PDB are widely known, but were never quantitatively assessed, as they are quite complex to analyze and compare. Here, we present a specific clustering of protein-ligand structures to avoid bias found in different studies. The methodology is based on binding site superposition, and a combination of weighted Root Mean Square Deviation (RMSD) assessment and hierarchical clustering. Repeated structures of proteins of interest are highlighted and only representative conformations were conserved for a non-biased view of protein distribution. Three types of cases are described based on the number of distinct conformations identified for each complex. Defining these categories decreases by 3.84-fold the number of complexes, and offers more refined results compared to a protein sequence-based method. Widely distinct conformations were analyzed using normalized B-factors. Furthermore, a non-redundant dataset was generated for future molecular interactions analysis or virtual screening studies.


2019 ◽  
Author(s):  
Miriam I. Brandt ◽  
Blandine Trouche ◽  
Laure Quintric ◽  
Patrick Wincker ◽  
Julie Poulain ◽  
...  

ABSTRACTEnvironmental metabarcoding is an increasingly popular tool for studying biodiversity in marine and terrestrial biomes. With sequencing costs decreasing, multiple-marker metabarcoding, spanning several branches of the tree of life, is becoming more accessible. However, bioinformatic approaches need to adjust to the diversity of taxonomic compartments targeted as well as to each barcode gene specificities. We built and tested a pipeline based on Illumina read correction with DADA2 allowing analyzing metabarcoding data from prokaryotic (16S) and eukaryotic (18S, COI) life compartments. We implemented the option to cluster Amplicon Sequence Variants (ASVs) into Operational Taxonomic Units (OTUs) with swarm v2, a network-based clustering algorithm, and to further curate the ASVs/OTUs based on sequence similarity and co-occurrence rates using a recently developed algorithm, LULU. Finally, flexible taxonomic assignment was implemented via Ribosomal Database Project (RDP) Bayesian classifier and BLAST. We validate this pipeline with ribosomal and mitochondrial markers using eukaryotic mock communities and 42 deep-sea sediment samples. The results show that ASVs, reflecting genetic diversity, may not be appropriate for alpha diversity estimation of organisms fitting the biological species concept. The results underline the advantages of clustering and LULU-curation for producing more reliable metazoan biodiversity inventories, and show that LULU is an effective tool for filtering metazoan molecular clusters, although the minimum identity threshold applied to co-occurring OTUs has to be increased for 18S. The comparison of BLAST and the RDP Classifier underlined the potential of the latter to deliver very good assignments, but highlighted the need for a concerted effort to build comprehensive, ecosystem-specific, databases adapted to the studied communities.


2019 ◽  
Author(s):  
Stijn Wittouck ◽  
Sander Wuyts ◽  
Conor J Meehan ◽  
Vera van Noort ◽  
Sarah Lebeer

AbstractBackgroundThere are over 200 published species within the Lactobacillus Genus Complex (LGC), the majority of which have sequenced type strain genomes available. Although gold standard, genome-based species delimitation cutoffs are accepted by the community, they are seldom checked against currently available genome data. In addition, there are many species-level misclassification issues within the LGC. We constructed a de novo species taxonomy for the LGC based on 2,459 publicly available, decent-quality genomes and using a 94% core nucleotide identity threshold. We reconciled these de novo species with published species and subspecies names by (i) identifying genomes of type strains in our dataset and (ii) performing comparisons based on 16S rRNA sequence identity against type strains.ResultsWe found that genomes within the LGC could be divided into 239 clusters (de novo species) that were discontinuous and exclusive. Comparison of these de novo species to published species lead to the identification of ten sets of published species that can be merged and one species that can be split. Further, we found at least eight genome clusters that constitute new species. Finally, we were able to accurately classify 98 unclassified genomes and reclassify 74 wrongly classified genomes.ConclusionsThe current state of LGC species taxonomy is largely consistent with genome data, but there are some inconsistencies as well as genome misclassifications. These inconsistencies should be resolved to evolve towards a meaningful taxonomy where species have a consistent size in terms of sequence divergence.


2019 ◽  
Vol 3 (4) ◽  
pp. 256-259 ◽  
Author(s):  
Marie Lefebvre ◽  
Sébastien Theil ◽  
Yuxin Ma ◽  
Thierry Candresse

Viral metagenomics relies on high-throughput sequencing and on bioinformatic analyses to access the genetic content and diversity of entire viral communities. No universally accepted strategy or tool currently exists to define operational taxonomy units (OTUs) and evaluate viral alpha or beta diversity from virome data. Here we present a new bioinformatic resource, the VirAnnot (automated viral diversity estimation) pipeline, which performs the automated identification of OTUs. Reverse-position-specific BLAST (RPS-Blastn) is used to detect conserved viral protein motifs. The corresponding contigs are then aligned and a clustering approach is used to group in the same OTU contigs sharing more than a set identity threshold. A 10% threshold has been validated as producing OTUs that reasonably approach, in many families, the International Committee for the Taxonomy of Viruses taxonomy and can therefore be used as a proxy to viral species.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5364 ◽  
Author(s):  
Jacob T. Nearing ◽  
Gavin M. Douglas ◽  
André M. Comeau ◽  
Morgan G.I. Langille

High-depth sequencing of universal marker genes such as the 16S rRNA gene is a common strategy to profile microbial communities. Traditionally, sequence reads are clustered into operational taxonomic units (OTUs) at a defined identity threshold to avoid sequencing errors generating spurious taxonomic units. However, there have been numerous bioinformatic packages recently released that attempt to correct sequencing errors to determine real biological sequences at single nucleotide resolution by generating amplicon sequence variants (ASVs). As more researchers begin to use high resolution ASVs, there is a need for an in-depth and unbiased comparison of these novel “denoising” pipelines. In this study, we conduct a thorough comparison of three of the most widely-used denoising packages (DADA2, UNOISE3, and Deblur) as well as an open-reference 97% OTU clustering pipeline on mock, soil, and host-associated communities. We found from the mock community analyses that although they produced similar microbial compositions based on relative abundance, the approaches identified vastly different numbers of ASVs that significantly impact alpha diversity metrics. Our analysis on real datasets using recommended settings for each denoising pipeline also showed that the three packages were consistent in their per-sample compositions, resulting in only minor differences based on weighted UniFrac and Bray–Curtis dissimilarity. DADA2 tended to find more ASVs than the other two denoising pipelines when analyzing both the real soil data and two other host-associated datasets, suggesting that it could be better at finding rare organisms, but at the expense of possible false positives. The open-reference OTU clustering approach identified considerably more OTUs in comparison to the number of ASVs from the denoising pipelines in all datasets tested. The three denoising approaches were significantly different in their run times, with UNOISE3 running greater than 1,200 and 15 times faster than DADA2 and Deblur, respectively. Our findings indicate that, although all pipelines result in similar general community structure, the number of ASVs/OTUs and resulting alpha-diversity metrics varies considerably and should be considered when attempting to identify rare organisms from possible background noise.


Sign in / Sign up

Export Citation Format

Share Document