A high-quality reference genome for a parasitic bivalve with doubly uniparental inheritance (Bivalvia: Unionida)

Genome Biology and Evolution ◽

10.1093/gbe/evab029 ◽

2021 ◽

Author(s):

Chase H Smith

Keyword(s):

Single Molecule ◽

Reference Genome ◽

Economic Value ◽

Freshwater Mussel ◽

Uniparental Inheritance ◽

Doubly Uniparental Inheritance ◽

High Quality ◽

Long Reads ◽

A Genome ◽

Genomic Studies

Abstract From a genomics perspective, bivalves (Mollusca: Bivalvia) have been poorly explored with the exception for those of high economic value. The bivalve order Unionida, or freshwater mussels, has been of interest in recent genomic studies due to their unique mitochondrial biology and peculiar life cycle. However, genomic studies have been hindered by the lack of a high-quality reference genome. Here, I present a genome assembly of Potamilus streckersoni using Pacific Bioscience single-molecule real-time long reads and 10X Genomics linked read sequencing. Further, I use RNA sequencing from multiple tissue types and life stages to annotate the reference genome. The final assembly was far superior to any previously published freshwater mussel genome and was represented by 2,368 scaffolds (2,472 contigs) and 1,776,755,624 bp, with a scaffold N50 of 2,051,244 bp. A high proportion of the assembly was comprised of repetitive elements (51.03%), aligning with genomic characteristics of other bivalves. The functional annotation returned 52,407 gene models (41,065 protein, 11,342 tRNAs), which was concordant with the estimated number of genes in other freshwater mussel species. This genetic resource, along with future studies developing high-quality genome assemblies and annotations, will be integral toward unraveling the genomic bases of ecologically and evolutionarily important traits in this hyper-diverse group.

Download Full-text

Assembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads

10.1101/345983 ◽

2018 ◽

Cited By ~ 2

Author(s):

Huilong Du ◽

Chengzhi Liang

Keyword(s):

Single Molecule ◽

High Efficiency ◽

Reference Genome ◽

Repetitive Sequences ◽

Sequencing Data ◽

High Quality ◽

Single Molecule Sequencing ◽

Genome Maps ◽

Long Reads ◽

Novel Method

AbstractDue to the large number of repetitive sequences in complex eukaryotic genomes, fragmented and incompletely assembled genomes lose value as reference sequences, often due to short contigs that cannot be anchored or mispositioned onto chromosomes. Here we report a novel method Highly Efficient Repeat Assembly (HERA), which includes a new concept called a connection graph as well as algorithms for constructing the graph. HERA resolves repeats at high efficiency with single-molecule sequencing data, and enables the assembly of chromosome-scale contigs by further integrating genome maps and Hi-C data. We tested HERA with the genomes of rice R498, maize B73, human HX1 and Tartary buckwheat Pinku1. HERA can correctly assemble most of the tandemly repetitive sequences in rice using single-molecule sequencing data only. Using the same maize and human sequencing data published by Jiao et al. (2017) and Shi et al. (2016), respectively, we dramatically improved on the sequence contiguity compared with the published assemblies, increasing the contig N50 from 1.3 Mb to 61.2 Mb in maize B73 assembly and from 8.3 Mb to 54.4 Mb in human HX1 assembly with HERA. We provided a high-quality maize reference genome with 96.9% of the gaps filled (only 76 gaps left) and several incorrectly positioned sequences fixed compared with the B73 RefGen_v4 assembly. Comparisons between the HERA assembly of HX1 and the human GRCh38 reference genome showed that many gaps in GRCh38 could be filled, and that GRCh38 contained some potential errors that could be fixed. We assembled the Pinku1 genome into 12 scaffolds with a contig N50 size of 27.85 Mb. HERA serves as a new genome assembly/phasing method to generate high quality sequences for complex genomes and as a curation tool to improve the contiguity and completeness of existing reference genomes, including the correction of assembly errors in repetitive regions.

Download Full-text

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab034 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Jean-Marc Aury ◽

Benjamin Istace

Keyword(s):

Single Molecule ◽

Direct Consequence ◽

High Quality ◽

Sequencing Errors ◽

Coding Regions ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

Download Full-text

Genome assembly of the JD17 soybean provides a new reference genome for Comparative genomics

10.1101/2021.11.23.469778 ◽

2021 ◽

Author(s):

Xinxin Yi ◽

Jing Liu ◽

Shengcai Chen ◽

Hao Wu ◽

Min Liu ◽

...

Keyword(s):

Nitrogen Fixation ◽

Genome Assembly ◽

Reference Genome ◽

De Novo ◽

Genomic Analysis ◽

Comparative Genomic ◽

High Quality ◽

Genome Wide ◽

A Genome ◽

Cultivated Soybean

Cultivated soybean (Glycine max) is an important source for protein and oil. Many elite cultivars with different traits have been developed for different conditions. Each soybean strain has its own genetic diversity, and the availability of more high-quality soybean genomes can enhance comparative genomic analysis for identifying genetic underpinnings for its unique traits. In this study, we constructed a high-quality de novo assembly of an elite soybean cultivar Jidou 17 (JD17) with chromsome contiguity and high accuracy. We annotated 52,840 gene models and reconstructed 74,054 high-quality full-length transcripts. We performed a genome-wide comparative analysis based on the reference genome of JD17 with three published soybeans (WM82, ZH13 and W05) , which identified five large inversions and two large translocations specific to JD17, 20,984 - 46,912 PAVs spanning 13.1 - 46.9 Mb in size, and 5 - 53 large PAV clusters larger than 500kb. 1,695,741 - 3,664,629 SNPs and 446,689 - 800,489 Indels were identified and annotated between JD17 and them. Symbiotic nitrogen fixation (SNF) genes were identified and the effects from these variants were further evaluated. It was found that the coding sequences of 9 nitrogen fixation-related genes were greatly affected. The high-quality genome assembly of JD17 can serve as a valuable reference for soybean functional genomics research.

Download Full-text

De novo Assembly of the Brugia malayi Genome Using Long Reads from a Single MinION Flowcell

Scientific Reports ◽

10.1038/s41598-019-55908-y ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 3

Author(s):

Joseph R. Fauver ◽

John Martin ◽

Gary J. Weil ◽

Makedonka Mitreva ◽

Peter U. Fischer

Keyword(s):

Single Molecule ◽

New Technologies ◽

Reference Genome ◽

De Novo ◽

Complete Mitochondrial Genome ◽

Nuclear Genome ◽

Brugia Malayi ◽

Field Isolates ◽

Sequencing Technologies ◽

Long Reads

AbstractFilarial nematode infections cause a substantial global disease burden. Genomic studies of filarial worms can improve our understanding of their biology and epidemiology. However, genomic information from field isolates is limited and available reference genomes are often discontinuous. Single molecule sequencing technologies can reduce the cost of genome sequencing and long reads produced from these devices can improve the contiguity and completeness of genome assemblies. In addition, these new technologies can make generation and analysis of large numbers of field isolates feasible. In this study, we assessed the performance of the Oxford Nanopore Technologies MinION for sequencing and assembling the genome of Brugia malayi, a human parasite widely used in filariasis research. Using data from a single MinION flowcell, a 90.3 Mb nuclear genome was assembled into 202 contigs with an N50 of 2.4 Mb. This assembly covered 96.9% of the well-defined B. malayi reference genome with 99.2% identity. The complete mitochondrial genome was obtained with individual reads and the nearly complete genome of the endosymbiotic bacteria Wolbachia was assembled alongside the nuclear genome. Long-read data from the MinION produced an assembly that approached the quality of a well-established reference genome using comparably fewer resources.

Download Full-text

Fully resolved assembly of Cryptosporidium parvum

10.1101/2021.07.07.451495 ◽

2021 ◽

Author(s):

Vipin K. Menon ◽

Pablo C. Okhuysen ◽

Cynthia Chappell ◽

Medhat Mahmoud ◽

Qingchang Meng ◽

...

Keyword(s):

Cryptosporidium Parvum ◽

Reference Genome ◽

Single Copy ◽

Comparative Genomic ◽

Infection Prevalence ◽

High Quality ◽

Apicomplexan Parasites ◽

Oxford Nanopore ◽

Genomic Study ◽

Genomic Studies

Background Cryptosporidium parvum are apicomplexan parasites commonly found across many species with a global infection prevalence of 7.6%. As such it is important to understand the diversity and genomic makeup of this prevalent parasite to prohibit further spread and to fight an infection. The general basis of every genomic study is a high quality reference genome that has continuity and completeness, and is of high quality and thus enables comprehensive comparative studies. Findings Here we provide a highly accurate and complete reference genome of Cryptosporidium spp.. The assembly is based on Oxford Nanopore reads and was improved using Illumina reads for error correction. The assembly encompasses 8 chromosomes and includes 13 telomeres that were resolved. Overall the assembly shows a high completion rate with 98.4% single copy Busco genes. This is also shown by the identification of 13 telomeric regions across the 8 chromosomes. The consensus accuracy of the established reference genome was further validated by sequence alignment of established genetic markers for C.parvum. Conclusions This high quality reference genome provides the basis for subsequent studies and comparative genomic studies across the Cryptosporidium clade.

Download Full-text

QAlign: Aligning nanopore reads accurately using current-level modeling

10.1101/862813 ◽

2019 ◽

Author(s):

Dhaivat Joshi ◽

Shunfu Mao ◽

Sreeram Kannan ◽

Suhas Diggavi

Keyword(s):

Reference Genome ◽

Genomic Analysis ◽

Vital Role ◽

High Error Rate ◽

Sequencing Technology ◽

Long Reads ◽

A Genome ◽

Long Read ◽

Nanopore Sequencer ◽

Sequencing Process

AbstractMotivationEfficient and accurate alignment of DNA / RNA sequence reads to each other or to a reference genome / transcriptome is an important problem in genomic analysis. Nanopore sequencing has emerged as a major sequencing technology and many long-read aligners have been designed for aligning nanopore reads. However, the high error rate makes accurate and efficient alignment difficult. Utilizing the noise and error characteristics inherent in the sequencing process properly can play a vital role in constructing a robust aligner. In this paper, we design QAlign, a pre-processor that can be used with any long-read aligner for aligning long reads to a genome / transcriptome or to other long reads. The key idea in QAlign is to convert the nucleotide reads into discretized current levels that capture the error modes of the nanopore sequencer before running it through a sequence aligner.ResultsWe show that QAlign is able to improve alignment rates from around 80% up to 90% with nanopore reads when aligning to the genome. We also show that QAlign improves the average overlap quality by 9.2%, 2.5% and 10.8% in three real datasets for read-to-read alignment. Read-to-transcriptome alignment rates are improved from 51.6% to 75.4% and 82.6% to 90% in two real datasets.Availabilityhttps://github.com/joshidhaivat/QAlign.git

Download Full-text

An improved de novo assembly and annotation of the tomato reference genome using single-molecule sequencing, Hi-C proximity ligation and optical maps

10.1101/767764 ◽

2019 ◽

Cited By ~ 16

Author(s):

Prashant S. Hosmani ◽

Mirella Flores-Gonzalez ◽

Henri van de Geest ◽

Florian Maumus ◽

Linda V. Bakker ◽

...

Keyword(s):

Single Molecule ◽

Reference Genome ◽

De Novo ◽

Proximity Ligation ◽

Contact Maps ◽

Long Reads ◽

Blast Database ◽

Optical Maps ◽

Almost All ◽

454 Sequences

AbstractThe original Heinz 1706 reference genome was produced by a large team of scientists from across the globe from a variety of input sources that included 454 sequences in addition to full-length BACs, BAC and fosmid ends sequenced with Sanger technology. We present here the latest tomato reference genome (SL4.0) assembled de novo from PacBio long reads and scaffolded using Hi-C contact maps. The assembly was validated using Bionano optical maps and 10X linked-read sequences. This assembly is highly contiguous with fewer gaps compared to previous genome builds and almost all scaffolds have been anchored and oriented to the 12 tomato chromosomes. We have found more repeats compared to the previous versions and one of the largest repeat classes identified are the LTR retrotransposons. We also describe updates to the reference genome and annotation since the last publication. The corresponding ITAG4.0 annotation has 4,794 novel genes along with 29,281 genes preserved from ITAG2.4. Most of the updated genes have extensions in the 5’ and 3’ UTRs resulting in doubling of annotated UTRs per gene. The genome and annotation can be accessed using SGN through BLAST database, Pathway database (SolCyc), Apollo, JBrowse genome browser and FTP available at https://solgenomics.net.

Download Full-text

Genome Assembly of the Dogface Butterfly Zerene cesonia

Genome Biology and Evolution ◽

10.1093/gbe/evz254 ◽

2019 ◽

Vol 12 (1) ◽

pp. 3580-3585 ◽

Cited By ~ 2

Author(s):

Luis Rodriguez-Caro ◽

Jennifer Fenner ◽

Caleb Benson ◽

Steven M Van Belleghem ◽

Brian A Counterman

Keyword(s):

Genome Assembly ◽

Developmental Plasticity ◽

Hybrid Approach ◽

Single Copy ◽

Z Chromosome ◽

High Quality ◽

Protein Coding ◽

Genomic Change ◽

A Genome ◽

Genomic Studies

Abstract Comparisons of high-quality, reference butterfly, and moth genomes have been instrumental to advancing our understanding of how hybridization, and natural selection drive genomic change during the origin of new species and novel traits. Here, we present a genome assembly of the Southern Dogface butterfly, Zerene cesonia (Pieridae) whose brilliant wing colorations have been implicated in developmental plasticity, hybridization, sexual selection, and speciation. We assembled 266,407,278 bp of the Z. cesonia genome, which accounts for 98.3% of the estimated 271 Mb genome size. Using a hybrid approach involving Chicago libraries with Hi-Rise assembly and a diploid Meraculous assembly, the final haploid genome was assembled. In the final assembly, nearly all autosomes and the Z chromosome were assembled into single scaffolds. The largest 29 scaffolds accounted for 91.4% of the genome assembly, with the remaining ∼8% distributed among another 247 scaffolds and overall N50 of 9.2 Mb. Tissue-specific RNA-seq informed annotations identified 16,442 protein-coding genes, which included 93.2% of the arthropod Benchmarking Universal Single-Copy Orthologs (BUSCO). The Z. cesonia genome assembly had ∼9% identified as repetitive elements, with a transposable element landscape rich in helitrons. Similar to other Lepidoptera genomes, Z. cesonia showed a high conservation of chromosomal synteny. The Z. cesonia assembly provides a high-quality reference for studies of chromosomal arrangements in the Pierid family, as well as for population, phylo, and functional genomic studies of adaptation and speciation.

Download Full-text

Variability of mitochondrial ORFans hints at possible differences in the system of doubly uniparental inheritance of mitochondria among families of freshwater mussels (Bivalvia: Unionida)

BMC Evolutionary Biology ◽

10.1186/s12862-019-1554-5 ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 1

Author(s):

Davide Guerra ◽

Manuel Lopes-Lima ◽

Elsa Froufe ◽

Han Ming Gan ◽

Paz Ondina ◽

...

Keyword(s):

Freshwater Mussels ◽

Complete Mitochondrial Genome ◽

Physiological Role ◽

Freshwater Mussel ◽

Single Species ◽

Open Reading Frames ◽

Uniparental Inheritance ◽

Doubly Uniparental Inheritance ◽

Sexual Systems ◽

Mitogenome Sequence

Abstract Background Supernumerary ORFan genes (i.e., open reading frames without obvious homology to other genes) are present in the mitochondrial genomes of gonochoric freshwater mussels (Bivalvia: Unionida) showing doubly uniparental inheritance (DUI) of mitochondria. DUI is a system in which distinct female-transmitted and male-transmitted mitotypes coexist in a single species. In families Unionidae and Margaritiferidae, the transition from dioecy to hermaphroditism and the loss of DUI appear to be linked, and this event seems to affect the integrity of the ORFan genes. These observations led to the hypothesis that the ORFans have a role in DUI and/or sex determination. Complete mitochondrial genome sequences are however scarce for most families of freshwater mussels, therefore hindering a clear localization of DUI in the various lineages and a comprehensive understanding of the influence of the ORFans on DUI and sexual systems. Therefore, we sequenced and characterized eleven new mitogenomes from poorly sampled freshwater mussel families to gather information on the evolution and variability of the ORFan genes and their protein products. Results We obtained ten complete plus one almost complete mitogenome sequence from ten representative species (gonochoric and hermaphroditic) of families Margaritiferidae, Hyriidae, Mulleriidae, and Iridinidae. ORFan genes are present only in DUI species from Margaritiferidae and Hyriidae, while non-DUI species from Hyriidae, Iridinidae, and Mulleriidae lack them completely, independently of their sexual system. Comparisons among the proteins translated from the newly characterized ORFans and already known ones provide evidence of conserved structures, as well as family-specific features. Conclusions The ORFan proteins show a comparable organization of secondary structures among different families of freshwater mussels, which supports a conserved physiological role, but also have distinctive family-specific features. Given this latter observation and the fact that the ORFans can be either highly mutated or completely absent in species that secondarily lost DUI depending on their respective family, we hypothesize that some aspects of the connection among ORFans, sexual systems, and DUI may differ in the various lineages of unionids.

Download Full-text

Assembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads

Nature Communications ◽

10.1038/s41467-019-13355-3 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 10

Author(s):

Huilong Du ◽

Chengzhi Liang

Keyword(s):

Single Molecule ◽

Repetitive Sequences ◽

Tartary Buckwheat ◽

Gene Sequences ◽

Assembly Method ◽

Long Reads ◽

A Genome ◽

Genome Assemblies ◽

Reference Genomes

AbstractThe abundant repetitive sequences in complex eukaryotic genomes cause fragmented assemblies, which lose value as reference genomes, often due to incomplete gene sequences and unanchored or mispositioned contigs on chromosomes. Here we report a genome assembly method HERA, which resolves repeats efficiently by constructing a connection graph from an overlap graph. We test HERA on the genomes of rice, maize, human, and Tartary buckwheat with single-molecule sequencing and mapping data. HERA correctly assembles most of the previously unassembled regions, resulting in dramatically improved, highly contiguous genome assemblies with newly assembled gene sequences. For example, the maize contig N50 size reaches 61.2 Mb and the Tartary buckwheat genome comprises only 20 contigs. HERA can also be used to fill gaps and fix errors in reference genomes. The application of HERA will greatly improve the quality of new or existing assemblies of complex genomes.

Download Full-text