GUNC: detection of chimerism and contamination in prokaryotic genomes

AbstractGenomes are critical units in microbiology, yet ascertaining quality in prokaryotic genome assemblies remains a formidable challenge. We present GUNC (the Genome UNClutterer), a tool that accurately detects and quantifies genome chimerism based on the lineage homogeneity of individual contigs using a genome’s full complement of genes. GUNC complements existing approaches by targeting previously underdetected types of contamination: we conservatively estimate that 5.7% of genomes in GenBank, 5.2% in RefSeq, and 15–30% of pre-filtered “high-quality” metagenome-assembled genomes in recent studies are undetected chimeras. GUNC provides a fast and robust tool to substantially improve prokaryotic genome quality.

Download Full-text

GUNC: Detection of Chimerism and Contamination in Prokaryotic Genomes

10.1101/2020.12.16.422776 ◽

2020 ◽

Author(s):

Askarbek Orakov ◽

Anthony Fullam ◽

Luis Pedro Coelho ◽

Supriya Khedkar ◽

Damian Szklarczyk ◽

...

Keyword(s):

Source Code ◽

Prokaryotic Genome ◽

High Quality ◽

Link Type ◽

Formidable Challenge ◽

Prokaryotic Genomes ◽

Full Complement

AbstractGenomes are critical units in microbiology, yet ascertaining quality in prokaryotic genomes remains a formidable challenge. We present GUNC (the Genome UNClutterer), a tool that accurately detects and quantifies genome chimerism based on the lineage homogeneity of individual contigs using a genome’s full complement of genes. GUNC complements existing approaches by targeting previously underdetected types of contamination: we conservatively estimate that 5.7% of genomes in GenBank, 5.2% in RefSeq, and 15-30% of pre-filtered ‘high quality’ metagenome-assembled genomes in recent studies are undetected chimeras. GUNC provides a fast and robust tool to substantially improve prokaryotic genome quality. Source code (GPLv3+): https://github.com/grp-bork/gunc

Download Full-text

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab034 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Jean-Marc Aury ◽

Benjamin Istace

Keyword(s):

Single Molecule ◽

Direct Consequence ◽

High Quality ◽

Sequencing Errors ◽

Coding Regions ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

Download Full-text

Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise

10.1101/2019.12.19.882399 ◽

2019 ◽

Cited By ~ 5

Author(s):

Valentina Peona ◽

Mozes P.K. Blom ◽

Luohao Xu ◽

Reto Burri ◽

Shawn Sullivan ◽

...

Keyword(s):

Dark Matter ◽

Genome Assembly ◽

Sex Chromosome ◽

De Novo ◽

Model Organism ◽

Technology Choice ◽

High Quality ◽

Sequencing Technologies ◽

Downstream Analysis ◽

Genome Assemblies

AbstractGenome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies have opened up a whole new world of genomic biodiversity. Although these technologies generate high-quality genome assemblies, there are still genomic regions difficult to assemble, like repetitive elements and GC-rich regions (genomic “dark matter”). In this study, we compare the efficiency of currently used sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter starting from the same sample. By adopting different de-novo assembly strategies, we were able to compare each individual draft assembly to a curated multiplatform one and identify the nature of the previously missing dark matter with a particular focus on transposable elements, multi-copy MHC genes, and GC-rich regions. Thanks to this multiplatform approach, we demonstrate the feasibility of producing a high-quality chromosome-level assembly for a non-model organism (paradise crow) for which only suboptimal samples are available. Our approach was able to reconstruct complex chromosomes like the repeat-rich W sex chromosome and several GC-rich microchromosomes. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects around the completeness of both the coding and non-coding parts of the genomes.

Download Full-text

SIGAR: Inferring features of genome architecture and DNA rearrangements by split read mapping

10.1101/2020.05.05.079426 ◽

2020 ◽

Author(s):

Yi Feng ◽

Leslie Y. Beh ◽

Wei-Jen Chang ◽

Laura F. Landweber

Keyword(s):

Genome Assembly ◽

Repetitive Sequences ◽

Genome Architecture ◽

Dna Rearrangements ◽

High Quality ◽

Microbial Eukaryotes ◽

Ciliate Species ◽

Split Read ◽

High Level ◽

Genome Assemblies

AbstractCiliates are microbial eukaryotes with distinct somatic and germline genomes. Post-zygotic development involves extensive remodeling of the germline genome to form somatic chromosomes. Ciliates therefore offer a valuable model for studying the architecture and evolution of programmed genome rearrangements. Current studies usually focus on a few model species, where rearrangement features are annotated by aligning reference germline and somatic genomes. While many high-quality somatic genomes have been assembled, a high quality germline genome assembly is difficult to obtain due to its smaller DNA content and abundance of repetitive sequences. To overcome these hurdles, we propose a new pipeline SIGAR (Splitread Inference of Genome Architecture and Rearrangements) to infer germline genome architecture and rearrangement features without a germline genome assembly, requiring only short germline DNA sequencing reads. As a proof of principle, 93% of rearrangement junctions identified by SIGAR in the ciliate Oxytricha trifallax were validated by the existing germline assembly. We then applied SIGAR to six diverse ciliate species without germline genome assemblies, including Ichthyophthirius multifilii, a fish pathogen. Despite the high level of somatic DNA contamination in each sample, SIGAR successfully inferred rearrangement junctions, short eliminated sequences and potential scrambled genes in each species. This pipeline enables pilot surveys or exploration of DNA rearrangements in species with limited DNA material access, thereby providing new insights into the evolution of chromosome rearrangements.

Download Full-text

High-Quality Genome Assembly of Peronospora destructor, the Causal Agent of Onion Downy Mildew

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-10-19-0280-a ◽

2020 ◽

Vol 33 (5) ◽

pp. 718-720

Author(s):

Karthi Natesan ◽

Ji Yeon Park ◽

Cheol-Woo Kim ◽

Dong Suk Park ◽

Young-Seok Kwon ◽

...

Keyword(s):

Downy Mildew ◽

De Novo ◽

Gc Content ◽

Comparative Genomic ◽

High Quality ◽

Sequencing Platform ◽

Peronospora Destructor ◽

Genomic Studies ◽

Genome Assemblies ◽

High Quality Genome

Peronospora destructor is an obligate biotrophic oomycete that causes downy mildew on onion (Allium cepa). Onion is an important crop worldwide, but its production is affected by this pathogen. We sequenced the genome of P. destructor using the PacBio sequencing platform, and de novo assembly resulted in 74 contigs with a total contig size of 29.3 Mb and 48.48% GC content. Here, we report the first high-quality genome sequence of P. destructor and its comparison with the genome assemblies of other oomycetes. The genome is a very useful resource to serve as a reference for analysis of P. destructor isolates and for comparative genomic studies of the biotrophic oomycetes.

Download Full-text

A high-quality genome assembly from a single, field-collected spotted lanternfly (Lycorma delicatula) using the PacBio Sequel II system

GigaScience ◽

10.1093/gigascience/giz122 ◽

2019 ◽

Vol 8 (10) ◽

Cited By ~ 12

Author(s):

Sarah B Kingan ◽

Julie Urban ◽

Christine C Lambert ◽

Primo Baybayan ◽

Anna K Childers ◽

...

Keyword(s):

Invasive Species ◽

Genome Assembly ◽

De Novo ◽

Fragment Size ◽

High Quality ◽

De Novo Genome Assembly ◽

Lycorma Delicatula ◽

Long Read ◽

Genome Assemblies ◽

High Quality Genome

ABSTRACT Background A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. Results The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ∼20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ∼36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. Conclusions We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.

Download Full-text

Alternatives to Research

Mathematics Teacher ◽

10.5951/mt.60.7.0711 ◽

1967 ◽

Vol 60 (7) ◽

pp. 711-714

Author(s):

Dan E. Christie ◽

James H. Wells

Keyword(s):

Teacher Preparation ◽

Mathematics Education ◽

Mathematics Teaching ◽

Faculty Members ◽

The Other ◽

High Quality ◽

College Teacher ◽

Formidable Challenge ◽

Collegiate Mathematics ◽

Good College

A Formidable challenge to excellence is posed by two trends in collegiate mathematics education: one is the admirable flow of numerous students into fields requiring mathematics, with a consequent demand for good college teachers; the other is the lamentable tendency for undergraduate mathematics faculties to be understaffed and overworked. The two trends together generate a challenge which the CUPM Panel on College Teacher Preparation now confronts: How can undergraduate faculty members achieve and maintain high quality in mathematics teaching?

Download Full-text

A UNIVERSAL OPERON PREDICTOR FOR PROKARYOTIC GENOMES

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720009003984 ◽

2009 ◽

Vol 07 (01) ◽

pp. 19-38 ◽

Cited By ~ 15

Author(s):

GUOJUN LI ◽

DONGSHENG CHE ◽

YING XU

Keyword(s):

Structural Information ◽

Prokaryotic Genome ◽

Prediction Method ◽

Gene Clusters ◽

Specific Information ◽

Theoretic Approach ◽

Graph Theoretic ◽

Prokaryotic Genomes ◽

Conserved Gene ◽

Two Kingdoms

Identification of operons at the genome scale of prokaryotic organisms represents a key step in deciphering of their transcriptional regulation machinery, biological pathways, and networks. While numerous computational methods have been shown to be effective in predicting operons for well-studied organisms such as Escherichia coli K12 and Bacillus subtilis 168, these methods generally do not generalize well to genomes other than the ones used to train the methods, or closely related genomes because they rely on organism–specific information. Several methods have been explored to address this problem through utilizing only genomic structural information conserved across multiple organisms, but they all suffer from the issue of low prediction sensitivity. In this paper, we report a novel operon prediction method that is applicable to any prokaryotic genome with high prediction accuracy. The key idea of the method is to predict operons through identification of conserved gene clusters across multiple genomes and through deriving a key parameter relevant to the distribution of intergenic distances in genomes. We have implemented this method using a graph-theoretic approach, to calculate a set of maximum gene clusters in the target genome that are conserved across multiple reference genomes. Our computational results have shown that this method has higher prediction sensitivity as well as specificity than most of the published methods. We have carried out a preliminary study on operons unique to archaea and bacteria, respectively, and derived a number of interesting new insights about operons between these two kingdoms. The software and predicted operons of 365 prokaryotic genomes are available at .

Download Full-text

High-quality genome assemblies of male and female Populus x sibirica plants

Systems Biology and Bioinformatics (SBB-2020) : The Twelfth International Young Scientists School ◽

10.18699/sbb-2020-32 ◽

2020 ◽

Keyword(s):

High Quality ◽

Male And Female ◽

Genome Assemblies ◽

High Quality Genome

Download Full-text

Highly contiguous assemblies of 101 drosophilid genomes

10.1101/2020.12.14.422775 ◽

2020 ◽

Author(s):

Bernard Y Kim ◽

Jeremy Wang ◽

Danny E. Miller ◽

Olga Barmina ◽

Emily K. Delaney ◽

...

Keyword(s):

Community Resource ◽

High Quality ◽

Public Resource ◽

Oxford Nanopore ◽

Starting Point ◽

Long Read ◽

Wet Lab ◽

Species Groups ◽

Genome Assemblies ◽

High Quality Genome

Over 100 years of studies in Drosophila melanogaster and related species in the genus Drosophila have facilitated key discoveries in genetics, genomics, and evolution. While high-quality genome assemblies exist for several species in this group, they only encompass a small fraction of the genus. Recent advances in long read sequencing allow high quality genome assemblies for tens or even hundreds of species to be generated. Here, we utilize Oxford Nanopore sequencing to build an open community resource of high-quality assemblies for 101 lines of 95 drosophilid species encompassing 14 species groups and 35 sub-groups with an average contig N50 of 10.5 Mb and greater than 97% BUSCO completeness in 97/101 assemblies. These assemblies, along with detailed wet lab protocol and assembly pipelines, are released as a public resource and will serve as a starting point for addressing broad questions of genetics, ecology, and evolution within this key group.

Download Full-text