scholarly journals Consensus assessment of the contamination level of publicly available cyanobacterial genomes

2018 ◽  
Author(s):  
Luc Cornet ◽  
Loïc Meunier ◽  
Mick Van Vlierberghe ◽  
Raphaël R. Léonard ◽  
Benoit Durieu ◽  
...  

AbstractBACKGROUNDPublicly available genomes are crucial for phylogenetic and metagenomic studies, in which contaminating sequences can be the cause of major problems. This issue is expected to be especially important for Cyanobacteria because axenic strains are notoriously difficult to obtain and keep in culture. Yet, despite their great scientific interest, no data are currently available concerning the quality of publicly available cyanobacterial genomes.RESULTSAs reliably detecting contaminants is a complex task, we designed a pipeline combining six methods in a consensus strategy to assess the contamination level of 440 genome assemblies of Cyanobacteria. Two methods are based on published reference databases of ribosomal genes (SSU rRNA 16S and ribosomal proteins), one is indirectly based on a reference database of marker genes (CheckM), and three are based on complete genome analysis. Among those genome-wide methods, Kraken and DIAMOND blastx share the same reference database that we derived from Ensembl Bacteria, whereas CONCOCT does not require any reference database, instead relying on differences in DNA tetramer frequencies. Given that all the six methods appear to have their own strengths and limitations, we used the consensus of their rankings to infer that >5% of cyanobacterial genome assemblies are highly contaminated by foreign DNA (i.e., contaminants were detected by 5 or 6 methods).CONCLUSIONSOur results will help researchers to check the quality of publicly available genomic data before use in their own analyses. Moreover, we argue that journals should make mandatory the submission of raw read data along with genome assemblies in order to facilitate the detection of contaminants in sequence databases.


2021 ◽  
Vol 12 ◽  
Author(s):  
Valérian Lupo ◽  
Mick Van Vlierberghe ◽  
Hervé Vanderschuren ◽  
Frédéric Kerff ◽  
Denis Baurain ◽  
...  

Contaminating sequences in public genome databases is a pervasive issue with potentially far-reaching consequences. This problem has attracted much attention in the recent literature and many different tools are now available to detect contaminants. Although these methods are based on diverse algorithms that can sometimes produce widely different estimates of the contamination level, the majority of genomic studies rely on a single method of detection, which represents a risk of systematic error. In this work, we used two orthogonal methods to assess the level of contamination among National Center for Biotechnological Information Reference Sequence Database (RefSeq) bacterial genomes. First, we applied the most popular solution, CheckM, which is based on gene markers. We then complemented this approach by a genome-wide method, termed Physeter, which now implements a k-folds algorithm to avoid inaccurate detection due to potential contamination of the reference database. We demonstrate that CheckM cannot currently be applied to all available genomes and bacterial groups. While it performed well on the majority of RefSeq genomes, it produced dubious results for 12,326 organisms. Among those, Physeter identified 239 contaminated genomes that had been missed by CheckM. In conclusion, we emphasize the importance of using multiple methods of detection while providing an upgrade of our own detection tool, Physeter, which minimizes incorrect contamination estimates in the context of unavoidably contaminated reference databases.



2021 ◽  
Author(s):  
Matthew Z. DeMaere ◽  
Aaron E. Darling

AbstractHi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, Hi-C library preparation remains a complex laboratory protocol and diligent quality management is recommended to avoid costly failure. Current wet-lab protocols for Hi-C library QC provide only a crude assay, while commonly used sequence-based QC methods demand a reference genome; the quality of which can skew results. We propose a new, reference-free approach for Hi-C library quality assessment that requires only a modest amount of sequencing data. The algorithm builds upon the observation that proximity ligation events are likely to create k -mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods.



2021 ◽  
Vol 9 (8) ◽  
pp. 1570
Author(s):  
Chien-Hsun Huang ◽  
Chih-Chieh Chen ◽  
Yu-Chun Lin ◽  
Chia-Hsuan Chen ◽  
Ai-Yun Lee ◽  
...  

The current taxonomy of the Lactiplantibacillus plantarum group comprises of 17 closely related species that are indistinguishable from each other by using commonly used 16S rRNA gene sequencing. In this study, a whole-genome-based analysis was carried out for exploring the highly distinguished target genes whose interspecific sequence identity is significantly less than those of 16S rRNA or conventional housekeeping genes. In silico analyses of 774 core genes by the cano-wgMLST_BacCompare analytics platform indicated that csbB, morA, murI, mutL, ntpJ, rutB, trmK, ydaF, and yhhX genes were the most promising candidates. Subsequently, the mutL gene was selected, and the discrimination power was further evaluated using Sanger sequencing. Among the type strains, mutL exhibited a clearly superior sequence identity (61.6–85.6%; average: 66.6%) to the 16S rRNA gene (96.7–100%; average: 98.4%) and the conventional phylogenetic marker genes (e.g., dnaJ, dnaK, pheS, recA, and rpoA), respectively, which could be used to separat tested strains into various species clusters. Consequently, species-specific primers were developed for fast and accurate identification of L. pentosus, L. argentoratensis, L. plantarum, and L. paraplantarum. During this study, one strain (BCRC 06B0048, L. pentosus) exhibited not only relatively low mutL sequence identities (97.0%) but also a low digital DNA–DNA hybridization value (78.1%) with the type strain DSM 20314T, signifying that it exhibits potential for reclassification as a novel subspecies. Our data demonstrate that mutL can be a genome-wide target for identifying and classifying the L. plantarum group species and for differentiating novel taxa from known species.



Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1736
Author(s):  
Zengchong Yang ◽  
Xiucheng Liu ◽  
Bin Wu ◽  
Ren Liu

Previous studies on Lamb wave touchscreen (LWT) were carried out based on the assumption that the unknown touch had the consistent parameters with acoustic fingerprints in the reference database. The adaptability of LWT to the variations in touch force and touch area was investigated in this study for the first time. The automatic collection of the databases of acoustic fingerprints was realized with an experimental prototype of LWT employing three pairs of transmitter–receivers. The self-adaptive updated weight coefficient of the used transmitter–receiver pairs was employed to successfully improve the accuracy of the localization model established based on a learning method. The performance of the improved method in locating single- and two-touch actions with the reference database of different parameters was carefully evaluated. The robustness of the LWT to the variation of the touch force varied with the touch area. Moreover, it was feasible to locate touch actions of large area with reference databases of small touch areas as long as the unknown touch and the reference databases met the condition of equivalent averaged stress.



2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Shiho Makino ◽  
Tomoko Kawamata ◽  
Shintaro Iwasaki ◽  
Yoshinori Ohsumi

AbstractSynthesis and degradation of cellular constituents must be balanced to maintain cellular homeostasis, especially during adaptation to environmental stress. The role of autophagy in the degradation of proteins and organelles is well-characterized. However, autophagy-mediated RNA degradation in response to stress and the potential preference of specific RNAs to undergo autophagy-mediated degradation have not been examined. In this study, we demonstrate selective mRNA degradation by rapamycin-induced autophagy in yeast. Profiling of mRNAs from the vacuole reveals that subsets of mRNAs, such as those encoding amino acid biosynthesis and ribosomal proteins, are preferentially delivered to the vacuole by autophagy for degradation. We also reveal that autophagy-mediated mRNA degradation is tightly coupled with translation by ribosomes. Genome-wide ribosome profiling suggested a high correspondence between ribosome association and targeting to the vacuole. We propose that autophagy-mediated mRNA degradation is a unique and previously-unappreciated function of autophagy that affords post-transcriptional gene regulation.



Plants ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 1154
Author(s):  
Hongjia Zhang ◽  
Seong-Gyu Jang ◽  
San Mar Lar ◽  
Ah-Rim Lee ◽  
Fang-Yuan Cao ◽  
...  

Starch is a major ingredient in rice, and the amylose content of starch significantly impacts rice quality. OsSS (starch synthase) is a gene family related to the synthesis of amylose and amylopectin, and 10 members have been reported. In the present study, a synteny analysis of a novel family member belonging to the OsSSIV subfamily that contained a starch synthase catalytic domain showed that three segmental duplications and multiple duplications were identified in rice and other species. Expression data showed that the OsSS gene family is involved in diverse expression patterns. The prediction of miRNA targets suggested that OsSS are possibly widely regulated by miRNA functions, with miR156s targeted to OsSSII-3, especially. Haplotype analysis exhibited the relationship between amylose content and diverse genotypes. These results give new insight and a theoretical basis for the improved amylose content and eating quality of rice.



2011 ◽  
Vol 14 (2) ◽  
pp. 213-224 ◽  
Author(s):  
M. C. Schatz ◽  
A. M. Phillippy ◽  
D. D. Sommer ◽  
A. L. Delcher ◽  
D. Puiu ◽  
...  
Keyword(s):  


2000 ◽  
Vol 279 (2) ◽  
pp. F383-F392 ◽  
Author(s):  
M. Ashraf El-Meanawy ◽  
Jeffrey R. Schelling ◽  
Fatima Pozuelo ◽  
Matthew M. Churpek ◽  
Eckhard K. Ficker ◽  
...  

Chronic renal disease initiation and progression remain incompletely understood. Genome-wide expression monitoring should clarify mechanisms that cause progressive renal disease by determining how clusters of genes coordinately change their activity. Serial analysis of gene expression (SAGE) is a technique of expression profiling, which permits simultaneous, comparative, and quantitative analysis of gene-specific, 9- to 13-bp sequence tags. Using SAGE, we have constructed a tag expression library from ROP-+/+ mouse kidney. Tag sequences were sorted by abundance, and identity was determined by sequence homology searching. Analyses of 3,868 tags yielded 1,453 unique kidney transcripts. Forty-two percent of these transcripts matched mRNA sequence entries with known function, 35% of the transcripts corresponded to expressed sequence tag (EST) entries or cloned genes, whose function has not been established, and 23% represented unidentified genes. Previously characterized transcripts were clustered into functional groups, and those encoding metabolic enzymes, plasma membrane proteins (transporters/receptors), and ribosomal proteins were most abundant (39, 14, and 12% of known transcripts, respectively). The most common, kidney-specific transcripts were kidney androgen-regulated protein (4% of all transcripts), sodium-phosphate cotransporter (0.3%), renal cytochrome P-450 (0.3%), parathyroid hormone receptor (0.1%), and kidney-specific cadherin (0.1%). Comprehensively characterizing and contrasting gene expression patterns in normal and diseased kidneys will provide an alternative strategy to identify candidate pathways, which regulate nephropathy susceptibility and progression, and novel targets for therapeutic intervention.



Author(s):  
Nicole Foster ◽  
Kor-jent Dijk ◽  
Ed Biffin ◽  
Jennifer Young ◽  
Vicki Thomson ◽  
...  

A proliferation in environmental DNA (eDNA) research has increased the reliance on reference sequence databases to assign unknown DNA sequences to known taxa. Without comprehensive reference databases, DNA extracted from environmental samples cannot be correctly assigned to taxa, limiting the use of this genetic information to identify organisms in unknown sample mixtures. For animals, standard metabarcoding practices involve amplification of the mitochondrial Cytochrome-c oxidase subunit 1 (CO1) region, which is a universally amplifyable region across majority of animal taxa. This region, however, does not work well as a DNA barcode for plants and fungi, and there is no similar universal single barcode locus that has the same species resolution. Therefore, generating reference sequences has been more difficult and several loci have been suggested to be used in parallel to get to species identification. For this reason, we developed a multi-gene targeted capture approach to generate reference DNA sequences for plant taxa across 20 target chloroplast gene regions in a single assay. We successfully compiled a reference database for 93 temperate coastal plants including seagrasses, mangroves, and saltmarshes/samphire’s. We demonstrate the importance of a comprehensive reference database to prevent species going undetected in eDNA studies. We also investigate how using multiple chloroplast gene regions impacts the ability to discriminate between taxa.



2015 ◽  
Author(s):  
Sanaa Afroz Ahmed ◽  
Chien-Chi Lo ◽  
Po-E Li ◽  
Karen W Davenport ◽  
Patrick S.G. Chain

Next-generation sequencing is increasingly being used to examine closely related organisms. However, while genome-wide single nucleotide polymorphisms (SNPs) provide an excellent resource for phylogenetic reconstruction, to date evolutionary analyses have been performed using different ad hoc methods that are not often widely applicable across different projects. To facilitate the construction of robust phylogenies, we have developed a method for genome-wide identification/characterization of SNPs from sequencing reads and genome assemblies. Our phylogenetic and molecular evolutionary (PhaME) analysis software is unique in its ability to take reads and draft/complete genome(s) as input, derive core genome alignments, identify SNPs, construct phylogenies and perform evolutionary analyses. Several examples using genomes and read datasets for bacterial, eukaryotic and viral linages demonstrate the broad and robust functionality of PhaME. Furthermore, the ability to incorporate raw metagenomic reads from clinical samples with suspected infectious agents shows promise for the rapid phylogenetic characterization of pathogens within complex samples.



Sign in / Sign up

Export Citation Format

Share Document