scholarly journals Taxonomy annotation errors in 16S rRNA and fungal ITS sequence databases

2018 ◽  
Author(s):  
Robert C. Edgar

AbstractSequencing of the 16S ribosomal RNA (rRNA) gene and the fungal Internal Transcribed Spacer (ITS) region is widely used to survey microbial communities. Specialized ribosomal sequence databases have been developed to support this approach including Greengenes, SILVA and RDP. Most taxonomy annotations in these databases are predictions from sequence rather than authoritative assignments based on studies of type strains or isolates. Here, I investigate the error rates of taxonomy annotations in these databases. I found 253,485 sequences with conflicting annotations in SILVA v128 and Greengenes v13.5 at ranks up to phylum (9,644 conflicts), indicating that the annotation error rate in these databases is ~15%. I found that 34% of non-singleton genera have overlapping subtrees in the Greengenes tree from 2001 according to the RDP taxonomy, most of which are probably due to branching order errors in the Greengenes tree, which is therefore an unreliable guide to phylogeny. Using a blinded test, I estimated that the annotation error rate of the RDP database is ~10%.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5030 ◽  
Author(s):  
Robert Edgar

Sequencing of the 16S ribosomal RNA (rRNA) gene is widely used to survey microbial communities. Specialized 16S rRNA databases have been developed to support this approach including Greengenes, RDP and SILVA. Most taxonomy annotations in these databases are predictions from sequence rather than authoritative assignments based on studies of type strains or isolates. In this work, I investigated the taxonomy annotations and guide trees provided by these databases. Using a blinded test, I estimated that the annotation error rate of the RDP database is ∼10%. The branching orders of the Greengenes and SILVA guide trees were found to disagree at comparable rates with each other and with taxonomy annotations according to the training set (authoritative reference) provided by RDP, indicating that the trees have comparable quality. Pervasive conflicts between tree branching order and type strain taxonomies strongly suggest that the guide trees are unreliable guides to phylogeny. I found 249,490 identical sequences with conflicting annotations in SILVA v128 and Greengenes v13.5 at ranks up to phylum (7,804 conflicts), indicating that the annotation error rate in these databases is ∼17%.



2013 ◽  
Vol 280 (1771) ◽  
pp. 20131177 ◽  
Author(s):  
Ping Sun ◽  
John C. Clamp ◽  
Dapeng Xu ◽  
Bangqin Huang ◽  
Mann Kyoon Shin ◽  
...  

Vorticella includes more than 100 currently recognized species and represents one of the most taxonomically challenging genera of ciliates. Molecular phylogenetic analysis of Vorticella has been performed so far with only sequences coding for small subunit ribosomal RNA (SSU rRNA); only a few of its species have been investigated using other genetic markers owing to a lack of similar sequences for comparison. Consequently, phylogenetic relationships within the genus remain unclear, and molecular discrimination between morphospecies is often difficult because most regions of the SSU rRNA gene are too highly conserved to be helpful. In this paper, we move molecular systematics for this group of ciliates to the infrageneric level by sequencing additional molecular markers—fast-evolving internal transcribed spacer (ITS) regions—in a broad sample of 66 individual samples of 28 morphospecies of Vorticella collected from Asia, North America and Europe. Our phylogenies all featured two strongly supported, highly divergent, paraphyletic clades (I, II) comprising the morphologically defined genus Vorticella . Three major lineages made up clade I, with a relatively well-resolved branching order in each one. The marked divergence of clade II from clade I confirms that the former should be recognized as a separate taxonomic unit as indicated by SSU rRNA phylogenies. We made the first attempt to elucidate relationships between species in clade II using both morphological and multi-gene approaches, and our data supported a close relationship between some morphospecies of Vorticella and Opisthonecta , indicating that relationships between species in the clade are far more complex than would be expected from their morphology. Different patterns of helix III of ITS2 secondary structure were clearly specific to clades and subclades of Vorticella and, therefore, may prove useful for resolving phylogenetic relationships in other groups of ciliates.



2020 ◽  
Author(s):  
Fuchang Yu ◽  
Yangwenna Cao ◽  
Haiyan Wang ◽  
Qiang Liu ◽  
Aiyun Zhao ◽  
...  

Abstract Background: Enterocytozoon bieneusi is a zoonotic gastrointestinal pathogen and can infect both humans and animals. The coypu (Myocastor coypus) is a semi-aquatic rodent, in which few E. bieneusi infections have been reported and the distribution of genotypes and zoonotic potential remains unknown.Methods: A total of 308 fresh fecal samples were collected from seven coypu farms in China to determine the infection rate and the distribution of genotypes of E. bieneusi from coypus using nested-PCR amplification of the internal transcribed spacer (ITS) region of the ribosomal RNA (rRNA) gene.Results: Enterocytozoon bieneusi was detected with an infection rate of 41.2% (n = 127). Four genotypes were identified, including three known genotypes (CHN4 (n = 111), EbpC (n = 8) and EbpA (n = 7)) and a novel genotype named CNCP1 (n = 1). Conclusions: The rare genotype CHN4 was the most common genotype in the present study, and the transmission dynamics of E. bieneusi in coypus were different from other rodents. To the best of our knowledge, this is the first report of E. bieneusi infections in coypus in China. Our study reveals that E. bieneusi in coypus may be a potential infection source to humans.



2021 ◽  
Author(s):  
Jaya Srivastava ◽  
Ritu Hembrom ◽  
Ankita Kumawat ◽  
Petety V. Balaji

UniProt and BFD databases together have 2.5 billion protein sequences. A large majority of these proteins have been electronically annotated. Automated annotation pipelines, vis-á-vis manual curation, have the advantage of scale and speed but are fraught with relatively higher error rates. This is because sequence homology does not necessarily translate to functional homology, molecular function specification is hierarchic and not all functional families have the same amount of experimental data that one can exploit for annotation. Consequently, customization of annotation workflow is inevitable to minimize annotation errors. In this study, we illustrate possible ways of customizing the search of sequence databases for functional homologs using profile HMMs. Choosing an optimal bit score threshold is a critical step in the application of HMMs. We illustrate ways in which an optimal bit score can be arrived at using four Case Studies. These are the single domain nucleotide sugar 6-dehydrogenase and lysozyme-C families, and SH3 and GT-A domains which are typically found as a part of multi-domain proteins. We also discuss the limitations of using profile HMMs for functional annotation and suggests some possible ways to partially overcome such limitations.



2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Fuchang Yu ◽  
Yangwenna Cao ◽  
Haiyan Wang ◽  
Qiang Liu ◽  
Aiyun Zhao ◽  
...  

Abstract Background Enterocytozoon bieneusi is a zoonotic gastrointestinal pathogen and can infect both humans and animals. The coypu (Myocastor coypus) is a semi-aquatic rodent, in which few E. bieneusi infections have been reported and the distribution of genotypes and zoonotic potential remains unknown. Methods A total of 308 fresh fecal samples were collected from seven coypu farms in China to determine the infection rate and the distribution of genotypes of E. bieneusi from coypus using nested-PCR amplification of the internal transcribed spacer (ITS) region of the ribosomal RNA (rRNA) gene. Results Enterocytozoon bieneusi was detected with an infection rate of 41.2% (n = 127). Four genotypes were identified, including three known genotypes (CHN4 (n = 111), EbpC (n = 8) and EbpA (n = 7)) and a novel genotype named CNCP1 (n = 1). Conclusions The rare genotype CHN4 was the most common genotype in the present study, and the transmission dynamics of E. bieneusi in coypus were different from other rodents. To the best of our knowledge, this is the first report of E. bieneusi infections in coypus in China. Our study reveals that E. bieneusi in coypus may be a potential infection source to humans.



2021 ◽  
Vol 26 (5) ◽  
pp. 3008-3013
Author(s):  
DİLEK TEKDAL ◽  
◽  
İLKNUR AKÇA ◽  
ASLI KÜÇÜKRECEP ◽  
SELİM ÇETİNER ◽  
...  

The common bean is a valuable food source in the human diet. Leklek is a local variety of common bean (Phaseolus sp.) widely grown in Mersin's Gülnar district, but little is known about this variety. In the present study, bacterial species from root nodules of this common bean variety were identified by PCR-amplified 16S ribosomal RNA (rRNA) gene and 16S-23S ribosomal RNA (rRNA) Internal Transcribed Spacer (ITS) region and sequencing. The partial 16S rRNA gene and 16S-23S rRNA ITS region sequences were submitted to the NCBI database (accession numbers MT967369, MT968518, respectively). Amplified sequences were used to construct a phylogenetic tree. Phylogenetic analysis based on the identified sequences showed that the isolate belonged to the genus Microbacterium and was closely related to Microbacterium paraoxydans. The findings presented here will provide a clue for understanding this bacterium's role in nodule formation in Phaseolus sp. (variety Leklek).



2014 ◽  
Vol 64 (Pt_8) ◽  
pp. 2566-2572 ◽  
Author(s):  
Atsushi Yamazaki ◽  
Hiroko Kawasaki

We isolated two strains of a novel Lipomyces species from soil collected in Chichibu forest, Saitama prefecture, Japan. Based on their morphological and biochemical characteristics, along with multilocus sequence typing using the D1/D2 domain of the large-subunit (LSU) rRNA gene, the internal transcribed spacer (ITS) region and the translation elongation factor 1 alpha gene (EF-1α), the two strains were shown to represent a novel species of the genus Lipomyces, described as Lipomyces chichibuensis sp. nov. (type strain CB08-2T = NBRC 109582T = CBS 12929T; Mycobank no. MB808164). In addition, we reidentified the type strains of Lipomyces kononenkoae and Lipomyces spencermartinsiae maintained in culture collections based on phenotypic characters and/or DNA–DNA hybridization to ensure correct future identification of species of the genus Lipomyces. The correct type strains of L. kononenkoae and L. spencermartinsiae are NBRC 107661T ( = CBS 2514T) and NBRC 10376T ( = CBS 5608T), respectively.



2020 ◽  
Author(s):  
Fuchang Yu ◽  
Yangwenna Cao ◽  
Haiyan Wang ◽  
Qiang Liu ◽  
Aiyun Zhao ◽  
...  

Abstract Background: Enterocytozoon bieneusi is a zoonotic gastrointestinal pathogen and can infect both humans and animals. Coypus (Myocastor coypus) are semi-aquatic rodents, in which few E. bieneusi infections have been reported and the distribution of genotypes and zoonotic potential remains unknown.Methods: A total of 308 fresh fecal samples were collected from seven coypu farms in China to determine the infection rate and the distribution of genotypes of E. bieneusi from coypus using nested-PCR amplification of the internal transcribed spacer (ITS) region of the ribosomal RNA (rRNA) gene.Results: E. bieneusi was detected with an infection rate of 41.2% (n = 127). Four genotypes were identified, including three known genotypes: CHN4 (n = 111), EbpC (n = 8) and EbpA (n = 7) and a novel genotype named CNCP1 (n = 1). Conclusions: The rare genotype CHN4 was the most common one in the present study, and the transmission dynamics of E. bieneusi in coypus were different from other rodents. This is the first report of E. bieneusi infections in coypus in China. Our study reveals that E. bieneusi in coypus may be potential infection source to humans.



2019 ◽  
Vol 28 (4) ◽  
pp. 1411-1431 ◽  
Author(s):  
Lauren Bislick ◽  
William D. Hula

Purpose This retrospective analysis examined group differences in error rate across 4 contextual variables (clusters vs. singletons, syllable position, number of syllables, and articulatory phonetic features) in adults with apraxia of speech (AOS) and adults with aphasia only. Group differences in the distribution of error type across contextual variables were also examined. Method Ten individuals with acquired AOS and aphasia and 11 individuals with aphasia participated in this study. In the context of a 2-group experimental design, the influence of 4 contextual variables on error rate and error type distribution was examined via repetition of 29 multisyllabic words. Error rates were analyzed using Bayesian methods, whereas distribution of error type was examined via descriptive statistics. Results There were 4 findings of robust differences between the 2 groups. These differences were found for syllable position, number of syllables, manner of articulation, and voicing. Group differences were less robust for clusters versus singletons and place of articulation. Results of error type distribution show a high proportion of distortion and substitution errors in speakers with AOS and a high proportion of substitution and omission errors in speakers with aphasia. Conclusion Findings add to the continued effort to improve the understanding and assessment of AOS and aphasia. Several contextual variables more consistently influenced breakdown in participants with AOS compared to participants with aphasia and should be considered during the diagnostic process. Supplemental Material https://doi.org/10.23641/asha.9701690



2014 ◽  
Vol 53 (05) ◽  
pp. 343-343

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.



Sign in / Sign up

Export Citation Format

Share Document