Hybrid error correction approach and de novo assembly for minion sequencing long reads

Abstract Currently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2 N = 64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.

Download Full-text

Oxford Nanopore Sequencing, Hybrid Error Correction, and de novo Assembly of a Eukaryotic Genome

10.1101/013490 ◽

2015 ◽

Cited By ~ 23

Author(s):

Sara Goodwin ◽

James Gurtowski ◽

Scott Ethe-Sayers ◽

Panchajanya Deshpande ◽

Michael Schatz ◽

...

Keyword(s):

Error Correction ◽

De Novo Assembly ◽

De Novo ◽

Correction Algorithm ◽

Membrane Pore ◽

Complete Representation ◽

Oxford Nanopore ◽

Long Read ◽

Error Correction Algorithm ◽

Sequencing Instrument

Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available that we used for sequencing the S. cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr (https://github.com/jgurtowski/nanocorr) specifically for Oxford Nanopore reads, as existing packages were incapable of assembling the long read lengths (5-50kbp) at such high error rate (between ~5 and 40% error). With this new method we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: the contig N50 length is more than ten-times greater than an Illumina-only assembly (678kb versus 59.9kbp), and has greater than 99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly.

Download Full-text

De novo assembly of the complex genome of Nippostrongylus brasiliensis using MinION long reads

BMC Biology ◽

10.1186/s12915-017-0473-4 ◽

2018 ◽

Vol 16 (1) ◽

Cited By ~ 16

Author(s):

David Eccles ◽

Jodie Chandler ◽

Mali Camberis ◽

Bernard Henrissat ◽

Sergey Koren ◽

...

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Nippostrongylus Brasiliensis ◽

Long Reads ◽

Complex Genome

Download Full-text

Fast and accurate de novo genome assembly from long uncorrected reads

10.1101/068122 ◽

2016 ◽

Cited By ~ 8

Author(s):

Robert Vaser ◽

Ivan Sović ◽

Niranjan Nagarajan ◽

Mile Šikić

Keyword(s):

Error Correction ◽

De Novo ◽

High Quality ◽

De Novo Genome Assembly ◽

Consensus Sequences ◽

Long Reads ◽

Oxford Nanopore ◽

Order Of Magnitude ◽

Correction Step ◽

Consensus Module

The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource intensive error correction and consensus generation steps to obtain high quality assemblies. We show that the error correction step can be omitted and high quality consensus sequences can be generated efficiently with a SIMD accelerated, partial order alignment based stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore datasets we show that Racon coupled with Miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.Racon is available open source under the MIT license at https://github.com/isovic/racon.git.

Download Full-text

De novo whole-genome assembly of Chrysanthemum makinoi, a key wild chrysanthemum

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab358 ◽

2021 ◽

Author(s):

Natascha van Lieshout ◽

Martijn van Kaauwen ◽

Linda Kodde ◽

Paul Arens ◽

Marinus J M Smulders ◽

...

Keyword(s):

Ab Initio ◽

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Its Sequence ◽

Whole Genome ◽

Annotation Pipeline ◽

Long Reads ◽

Oxford Nanopore ◽

The World

Abstract Chrysanthemum is among the top ten cut, potted and perennial garden flowers in the world. Despite this, to date, only the genomes of two wild diploid chrysanthemums have been sequenced and assembled. Here we present the most complete and contiguous chrysanthemum de novo assembly published so far, as well as a corresponding ab initio annotation. The cultivated hexaploid varieties are thought to originate from a hybrid of wild chrysanthemums, among which the diploid Chrysanthemum makinoi has been mentioned. Using a combination of Oxford Nanopore long reads, Pacific Biosciences long reads, Illumina short reads, Dovetail sequences and a genetic map, we assembled 3.1 Gb of its sequence into 9 pseudochromosomes, with an N50 of 330 Mb and BUSCO complete score of 92.1%. Our ab initio annotation pipeline predicted 95 074 genes and marked 80.0% of the genome as repetitive. This genome assembly of C. makinoi provides an important step forward in understanding the chrysanthemum genome, evolution and history.

Download Full-text

Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome

Genome Research ◽

10.1101/gr.191395.115 ◽

2015 ◽

Vol 25 (11) ◽

pp. 1750-1756 ◽

Cited By ~ 223

Author(s):

Sara Goodwin ◽

James Gurtowski ◽

Scott Ethe-Sayers ◽

Panchajanya Deshpande ◽

Michael C. Schatz ◽

...

Keyword(s):

Error Correction ◽

De Novo Assembly ◽

De Novo ◽

Eukaryotic Genome ◽

Nanopore Sequencing ◽

Oxford Nanopore

Download Full-text

Whole-Genome Sequencing and De Novo Assembly of Malassezia pachydermatis Isolated from the Ear Canal of a Dog with Otitis

Microbiology Resource Announcements ◽

10.1128/mra.00205-21 ◽

2021 ◽

Vol 10 (21) ◽

Author(s):

S. D’Andreano ◽

J. Viñes ◽

O. Francino

Keyword(s):

Genome Sequencing ◽

Genome Sequence ◽

De Novo Assembly ◽

De Novo ◽

Whole Genome ◽

Ear Canal ◽

Malassezia Pachydermatis ◽

Content Type ◽

Long Reads ◽

Genome Assemblies

We have de novo assembled the genome sequence of Malassezia pachydermatis isolated from a canine otitis sample with Nanopore-only long reads. With 99× coverage and 8.23 Mbp, the genome sequence was assembled in 10 contigs, with 6 of them corresponding to chromosomes, improving the scaffolding of previous genome assemblies for the species.

Download Full-text

Illumina TruSeq Synthetic Long-Reads Empower De Novo Assembly and Resolve Complex, Highly-Repetitive Transposable Elements

PLoS ONE ◽

10.1371/journal.pone.0106689 ◽

2014 ◽

Vol 9 (9) ◽

pp. e106689 ◽

Cited By ~ 137

Author(s):

Rajiv C. McCoy ◽

Ryan W. Taylor ◽

Timothy A. Blauwkamp ◽

Joanna L. Kelley ◽

Michael Kertesz ◽

...

Keyword(s):

Transposable Elements ◽

De Novo Assembly ◽

De Novo ◽

Long Reads

Download Full-text

A long reads-based de-novo assembly of the genome of the Arlee homozygous line reveals structural genome variation in rainbow trout

10.1101/2020.12.28.424581 ◽

2020 ◽

Author(s):

Guangtu Gao ◽

Susana Magadan ◽

Geoffrey C. Waldbieser ◽

Ramey C. Youngblood ◽

Paul A. Wheeler ◽

...

Keyword(s):

Rainbow Trout ◽

Chromosome Number ◽

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Sequence Data ◽

Haploid Chromosome Number ◽

Long Reads ◽

Homozygous Line ◽

Igh Genes

AbstractCurrently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2N=64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.Article SummaryA de-novo genome assembly was generated for the Arlee homozygous line of rainbow trout to enable identification and characterization of genome variants towards developing a rainbow trout pan-genome reference. The new assembly was generated using the PacBio sequencing technology and scaffolding with Hi-C contact maps and Bionano optical mapping. A contiguous genome assembly was obtained, with the contig and scaffold N50 over 15.6 Mb and 39 Mb, respectively, and 95% of the assembly in chromosome sequences. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes.

Download Full-text

FinisherSC : A repeat-aware tool for upgrading de-novo assembly using long reads

10.1101/010215 ◽

2014 ◽

Author(s):

Ka Kit Lam ◽

Kurt LaButti ◽

Asif Khalak ◽

David Tse

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Real Data ◽

Long Reads ◽

High Concordance

We introduce FinisherSC, which is a repeat-aware and scalable tool for upgrading de-novo assembly using long reads. Experiments with real data suggest that FinisherSC can provide longer and higher quality contigs than existing tools while maintaining high concordance.

Download Full-text