scholarly journals Multiple Cases of Bacterial Sequence Erroneously Incorporated Into Publicly Available Chloroplast Genomes

2022 ◽  
Vol 12 ◽  
Author(s):  
Aaron J. Robinson ◽  
Hajnalka E. Daligault ◽  
Julia M. Kelliher ◽  
Erick S. LeBrun ◽  
Patrick S. G. Chain

Public sequencing databases are invaluable resources to biological researchers, but assessing data veracity as well as the curation and maintenance of such large collections of data can be challenging. Genomes of eukaryotic organelles, such as chloroplasts and other plastids, are particularly susceptible to assembly errors and misrepresentations in these databases due to their close evolutionary relationships with bacteria, which may co-occur within the same environment, as can be the case when sequencing plants. Here, based on sequence similarities with bacterial genomes, we identified several suspicious chloroplast assemblies present in the National Institutes of Health (NIH) Reference Sequence (RefSeq) collection. Investigations into these chloroplast assemblies reveal examples of erroneous integration of bacterial sequences into chloroplast ribosomal RNA (rRNA) loci, often within the rRNA genes, presumably due to the high similarity between plastid and bacterial rRNAs. The bacterial lineages identified within the examined chloroplasts as the most likely source of contamination are either known associates of plants, or co-occur in the same environmental niches as the examined plants. Modifications to the methods used to process untargeted ‘raw’ shotgun sequencing data from whole genome sequencing efforts, such as the identification and removal of bacterial reads prior to plastome assembly, could eliminate similar errors in the future.

2021 ◽  
Author(s):  
Wenjun Fan ◽  
Eetu Eklund ◽  
Rachel M Sherman ◽  
Hester Liu ◽  
Stephanie Pitts ◽  
...  

Polymorphism drives survival under stress and provides adaptability. Genetic polymorphism of ribosomal RNA (rRNA) genes derives from internal repeat variation of this multicopy gene, and from interindividual variation. A considerable amount of rRNA sequence heterogeneity has been proposed but has been challenging to estimate given the scarcity of accurate reference sequences. We identified four rDNA copies on chromosome 21 (GRCh38) with 99% similarity to recently introduced reference sequence KY962518.1. Pairwise alignment of the rRNA coding sequences of these copies showed differences in sequence and length. We customized a GATK bioinformatics pipeline using the four rDNA loci, spanning a total 145 kb, for variant calling. We employed whole genome sequencing (WGS) data from the 1000 Genomes Project phase 3 and analyzed variants in 2,504 individuals from 26 populations. Using the pipeline, we identified a total of 3,790 variant positions. The variants positioned non-randomly on the rRNA gene. Invariant regions included the promoter, early 5' ETS, 5.8S, ITS1 and certain regions of the 28S rRNA, and large areas of the intragenic spacer. 18S rRNA coding region had very few variants, while a total of 470 variant positions were observed on 28S rRNA. The majority of the 28S rRNA variants located on highly flexible human-expanded rRNA helical folds ES7L and ES27L, suggesting that these represent positions of diversity and are potentially under continuous evolution. These findings provide a genetic view for rRNA heterogeneity and raise the need to functional assess how the 28S rRNA variants affect ribosome functions.


2021 ◽  
Vol 95 ◽  
Author(s):  
B. Neov ◽  
G.P. Vasileva ◽  
G. Radoslavov ◽  
P. Hristov ◽  
D.T.J. Littlewood ◽  
...  

Abstract The aim of the study is to test a hypothesis for the phylogenetic relationships among mammalian hymenolepidid tapeworms, based on partial (D1–D3) nuclear 28S ribosomal RNA (rRNA) genes, by estimating new molecular phylogenies for the group based on partial mitochondrial cytochrome c oxidase I (COI) and nuclear 18S rRNA genes, as well as a combined analysis using all three genes. New sequences of COI and 18S rRNA genes were obtained for Coronacanthus integrus, C. magnihamatus, C. omissus, C. vassilevi, Ditestolepis diaphana, Lineolepis scutigera, Spasskylepis ovaluteri, Staphylocystis tiara, S. furcata, S. uncinata, Vaucherilepis trichophorus and Neoskrjabinolepis sp. The phylogenetic analyses confirmed the major clades identified by Haukisalmi et al. (Zoologica Scripta 39: 631–641, 2010): Ditestolepis clade, Hymenolepis clade, Rodentolepis clade and Arostrilepis clade. While the Ditestolepis clade is associated with soricids, the structure of the other three clades suggests multiple evolutionary events of host switching between shrews and rodents. Two of the present analyses (18S rRNA and COI genes) show that the basal relationships of the four mammalian clades are branching at the same polytomy with several hymenolepidids from birds (both terrestrial and aquatic). This may indicate a rapid radiation of the group, with multiple events of colonizations of mammalian hosts by avian parasites.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Caitlin M. Singleton ◽  
Francesca Petriglieri ◽  
Jannie M. Kristensen ◽  
Rasmus H. Kirkegaard ◽  
Thomas Y. Michaelsen ◽  
...  

AbstractMicroorganisms play crucial roles in water recycling, pollution removal and resource recovery in the wastewater industry. The structure of these microbial communities is increasingly understood based on 16S rRNA amplicon sequencing data. However, such data cannot be linked to functional potential in the absence of high-quality metagenome-assembled genomes (MAGs) for nearly all species. Here, we use long-read and short-read sequencing to recover 1083 high-quality MAGs, including 57 closed circular genomes, from 23 Danish full-scale wastewater treatment plants. The MAGs account for ~30% of the community based on relative abundance, and meet the stringent MIMAG high-quality draft requirements including full-length rRNA genes. We use the information provided by these MAGs in combination with >13 years of 16S rRNA amplicon sequencing data, as well as Raman microspectroscopy and fluorescence in situ hybridisation, to uncover abundant undescribed lineages belonging to important functional groups.


Genes ◽  
2021 ◽  
Vol 12 (8) ◽  
pp. 1185
Author(s):  
Wenqian Wang ◽  
Huan Zhang ◽  
Jérôme Constant ◽  
Charles R. Bartlett ◽  
Daozheng Qin

The complete mitogenomes of nine fulgorid species were sequenced and annotated to explore their mitogenome diversity and the phylogenetics of Fulgoridae. All species are from China and belong to five genera: Dichoptera Spinola, 1839 (Dichoptera sp.); Neoalcathous Wang and Huang, 1989 (Neoalcathous huangshanana Wang and Huang, 1989); Limois Stål, 1863 (Limois sp.); Penthicodes Blanchard, 1840 (Penthicodes atomaria (Weber, 1801), Penthicodes caja (Walker, 1851), Penthicodes variegata (Guérin-Méneville, 1829)); Pyrops Spinola, 1839 (Pyrops clavatus (Westwood, 1839), Pyrops lathburii (Kirby, 1818), Pyrops spinolae (Westwood, 1842)). The nine mitogenomes were 15,803 to 16,510 bp in length with 13 protein-coding genes (PCGs), 22 transfer RNA genes (tRNAs), 2 ribosomal RNA genes (rRNAs) and a control region (A + T-rich region). Combined with previously reported fulgorid mitogenomes, all PCGs initiate with either the standard start codon of ATN or the nonstandard GTG. The TAA codon was used for termination more often than the TAG codon and the incomplete T codon. The nad1 and nad4 genes varied in length within the same genus. A high percentage of F residues were found in the nad4 and nad5 genes of all fulgorid mitogenomes. The DHU stem of trnV was absent in the mitogenomes of all fulgorids sequenced except Dichoptera sp. Moreover, in most fulgorid mitogenomes, the trnL2, trnR, and trnT genes had an unpaired base in the aminoacyl stem and trnS1 had an unpaired base in the anticodon stem. The similar tandem repeat regions of the control region were found in the same genus. Phylogenetic analyses were conducted based on 13 PCGs and two rRNA genes from 53 species of Fulgoroidea and seven outgroups. The Bayesian inference and maximum likelihood trees had a similar topological structure. The major results show that Fulgoroidea was divided into two groups: Delphacidae and ((Achilidae + (Lophopidae + (Issidae + (Flatidae + Ricaniidae)))) + Fulgoridae). Furthermore, the monophyly of Fulgoridae was robustly supported, and Aphaeninae was divided into Aphaenini and Pyropsini, which includes Neoalcathous, Pyrops, Datua Schmidt, 1911, and Saiva Distant, 1906. The genus Limois is recovered in the Aphaeninae, and the Limoisini needs further confirmation; Dichoptera sp. was the earliest branch in the Fulgoridae.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jiawei Zhou ◽  
Shuo Zhang ◽  
Jie Wang ◽  
Hongmei Shen ◽  
Bin Ai ◽  
...  

AbstractThe chloroplast is one of two organelles containing a separate genome that codes for essential and distinct cellular functions such as photosynthesis. Given the importance of chloroplasts in plant metabolism, the genomic architecture and gene content have been strongly conserved through long periods of time and as such are useful molecular tools for evolutionary inferences. At present, complete chloroplast genomes from over 4000 species have been deposited into publicly accessible databases. Despite the large number of complete chloroplast genomes, comprehensive analyses regarding genome architecture and gene content have not been conducted for many lineages with complete species sampling. In this study, we employed the genus Populus to assess how more comprehensively sampled chloroplast genome analyses can be used in understanding chloroplast evolution in a broadly studied lineage of angiosperms. We conducted comparative analyses across Populus in order to elucidate variation in key genome features such as genome size, gene number, gene content, repeat type and number, SSR (Simple Sequence Repeat) abundance, and boundary positioning between the four main units of the genome. We found that some genome annotations were variable across the genus owing in part from errors in assembly or data checking and from this provided corrected annotations. We also employed complete chloroplast genomes for phylogenetic analyses including the dating of divergence times throughout the genus. Lastly, we utilized re-sequencing data to describe the variations of pan-chloroplast genomes at the population level for P. euphratica. The analyses used in this paper provide a blueprint for the types of analyses that can be conducted with publicly available chloroplast genomes as well as methods for building upon existing datasets to improve evolutionary inference.


mSystems ◽  
2020 ◽  
Vol 5 (1) ◽  
Author(s):  
Matthew R. Olm ◽  
Alexander Crits-Christoph ◽  
Spencer Diamond ◽  
Adi Lavy ◽  
Paula B. Matheus Carnevali ◽  
...  

ABSTRACT Longstanding questions relate to the existence of naturally distinct bacterial species and genetic approaches to distinguish them. Bacterial genomes in public databases form distinct groups, but these databases are subject to isolation and deposition biases. To avoid these biases, we compared 5,203 bacterial genomes from 1,457 environmental metagenomic samples to test for distinct clouds of diversity and evaluated metrics that could be used to define the species boundary. Bacterial genomes from the human gut, soil, and the ocean all exhibited gaps in whole-genome average nucleotide identities (ANI) near the previously suggested species threshold of 95% ANI. While genome-wide ratios of nonsynonymous and synonymous nucleotide differences (dN/dS) decrease until ANI values approach ∼98%, two methods for estimating homologous recombination approached zero at ∼95% ANI, supporting breakdown of recombination due to sequence divergence as a species-forming force. We evaluated 107 genome-based metrics for their ability to distinguish species when full genomes are not recovered. Full-length 16S rRNA genes were least useful, in part because they were underrecovered from metagenomes. However, many ribosomal proteins displayed both high metagenomic recoverability and species discrimination power. Taken together, our results verify the existence of sequence-discrete microbial species in metagenome-derived genomes and highlight the usefulness of ribosomal genes for gene-level species discrimination. IMPORTANCE There is controversy about whether bacterial diversity is clustered into distinct species groups or exists as a continuum. To address this issue, we analyzed bacterial genome databases and reports from several previous large-scale environment studies and identified clear discrete groups of species-level bacterial diversity in all cases. Genetic analysis further revealed that quasi-sexual reproduction via horizontal gene transfer is likely a key evolutionary force that maintains bacterial species integrity. We next benchmarked over 100 metrics to distinguish these bacterial species from each other and identified several genes encoding ribosomal proteins with high species discrimination power. Overall, the results from this study provide best practices for bacterial species delineation based on genome content and insight into the nature of bacterial species population genetics.


Plants ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 1692
Author(s):  
Li Gu ◽  
Ting Su ◽  
Ming-Tai An ◽  
Guo-Xiong Hu

Oreocharis esquirolii, a member of Gesneriaceae, is known as Thamnocharis esquirolii, which has been regarded a synonym of the former. The species is endemic to Guizhou, southwestern China, and is evaluated as vulnerable (VU) under the International Union for Conservation of Nature (IUCN) criteria. Until now, the sequence and genome information of O. esquirolii remains unknown. In this study, we assembled and characterized the complete chloroplast (cp) genome of O. esquirolii using Illumina sequencing data for the first time. The total length of the cp genome was 154,069 bp with a typical quadripartite structure consisting of a pair of inverted repeats (IRs) of 25,392 bp separated by a large single copy region (LSC) of 85,156 bp and a small single copy region (SSC) of18,129 bp. The genome comprised 114 unique genes with 80 protein-coding genes, 30 tRNA genes, and four rRNA genes. Thirty-one repeat sequences and 74 simple sequence repeats (SSRs) were identified. Genome alignment across five plastid genomes of Gesneriaceae indicated a high sequence similarity. Four highly variable sites (rps16-trnQ, trnS-trnG, ndhF-rpl32, and ycf 1) were identified. Phylogenetic analysis indicated that O. esquirolii grouped together with O. mileensis, supporting resurrection of the name Oreocharis esquirolii from Thamnocharisesquirolii. The complete cp genome sequence will contribute to further studies in molecular identification, genetic diversity, and phylogeny.


2004 ◽  
Vol 186 (9) ◽  
pp. 2629-2635 ◽  
Author(s):  
Silvia G. Acinas ◽  
Luisa A. Marcelino ◽  
Vanja Klepac-Ceraj ◽  
Martin F. Polz

ABSTRACT The level of sequence heterogeneity among rrn operons within genomes determines the accuracy of diversity estimation by 16S rRNA-based methods. Furthermore, the occurrence of widespread horizontal gene transfer (HGT) between distantly related rrn operons casts doubt on reconstructions of phylogenetic relationships. For this study, patterns of distribution of rrn copy numbers, interoperonic divergence, and redundancy of 16S rRNA sequences were evaluated. Bacterial genomes display up to 15 operons and operon numbers up to 7 are commonly found, but ∼40% of the organisms analyzed have either one or two operons. Among the Archaea, a single operon appears to dominate and the highest number of operons is five. About 40% of sequences among 380 operons in 76 bacterial genomes with multiple operons were identical to at least one other 16S rRNA sequence in the same genome, and in 38% of the genomes all 16S rRNAs were invariant. For Archaea, the number of identical operons was only 25%, but only five genomes with 21 operons are currently available. These considerations suggest an upper bound of roughly threefold overestimation of bacterial diversity resulting from cloning and sequencing of 16S rRNA genes from the environment; however, the inclusion of genomes with a single rrn operon may lower this correction factor to ∼2.5. Divergence among operons appears to be small overall for both Bacteria and Archaea, with the vast majority of 16S rRNA sequences showing <1% nucleotide differences. Only five genomes with operons with a higher level of nucleotide divergence were detected, and Thermoanaerobacter tengcongensis exhibited the highest level of divergence (11.6%) noted to date. Overall, four of the five extreme cases of operon differences occurred among thermophilic bacteria, suggesting a much higher incidence of HGT in these bacteria than in other groups.


Author(s):  
José Gonçalves-Dias ◽  
Markus G Stetter

Abstract The combination of genomic, physiological, and population genetic research has accelerated the understanding and improvement of numerous crops. For non-model crops the lack of interdisciplinary research hinders their improvement. Grain amaranth is an ancient nutritious pseudocereal that has been domesticated three times in different regions of the Americas. We present and employ PopAmaranth, a population genetic genome browser, which provides an accessible representation of the genetic variation of the three grain amaranth species (A. hypochondriacus, A. cruentus, and A. caudatus) and two wild relatives (A. hybridus and A. quitensis) along the A. hypochondriacus reference sequence. We performed population-scale diversity and selection analysis from whole-genome sequencing data of 88 curated genetically and taxonomically unambiguously classified accessions. We employ the platform to show that genetic diversity in the water stress-related MIF1 gene declined during amaranth domestication and provide evidence for convergent saponin reduction between amaranth and quinoa. PopAmaranth is available through amaranthGDB at amaranthgdb.org/popamaranth.html.


1999 ◽  
Vol 122 (2) ◽  
pp. 323-328 ◽  
Author(s):  
M. T. E. P. ALLSOPP ◽  
C. M. HATTINGH ◽  
S. W. VOGEL ◽  
B. A. ALLSOPP

A panel of 16S ribosomal RNA gene probes has been developed for the study of the epidemiology of heartwater; five of these detect different cowdria genotypes, one detects five distinct genotypes; one detects any Group III Ehrlichia species other than Cowdria and one detects any Group II Ehrlichia species. These probes have been used on PCR-amplified rickettsial 16S rRNA genes from over 200 Amblyomma hebraeum ticks. Control ticks were laboratory-reared and either uninfected or fed on sheep experimentally infected with different cowdria isolates, field ticks were collected from animals in heartwater-endemic areas. All tick-derived DNA samples were also examined by PCR amplification and probing for two other cowdria genes (map1 and pCS20) which have previously been used for heartwater epidemiology. This paper describes the first direct comparison of all currently available DNA probes for heartwater-associated organisms.


Sign in / Sign up

Export Citation Format

Share Document