celera assembler Latest Research Papers

AbstractBackgroundOver 3,000 species of octocorals (Cnidaria, Anthozoa) inhabit an expansive range of environments, from shallow tropical seas to the deep-ocean floor. They are important foundation species that create coral “forests” which provide unique niches and three-dimensional living space for other organisms. The octocoral genusRenillainhabits sandy, continental shelves in the subtropical and tropical Atlantic and eastern Pacific Oceans.Renillais especially interesting because it produces secondary metabolites for defense, exhibits bioluminescence, and produces a luciferase that is widely used in dual-reporter assays in molecular biology. Although several cnidarian genomes are currently available, the majority are from hexacorals. Here, we present ade novoassembly of theR. muellerigenome, making this the first complete draft genome from an octocoral.FindingsWe generated a hybridde novoassembly using the Maryland Super-Read Celera Assembler v.3.2.6 (MaSuRCA). The final assembly included 4,825 scaffolds and a haploid genome size of 172 Mb. A BUSCO assessment found 88% of metazoan orthologs present in the genome. An Augustusab initiogene prediction found 23,660 genes, of which 66% (15,635) had detectable similarity to annotated genes from the starlet sea anemone,Nematostella vectensis,or to the Uniprot database. Although theR. muellerigenome is smaller (172 Mb) than other publicly available, hexacoral genomes (256-448 Mb), theR. muellerigenome is similar to the hexacoral genomes in terms of the number of complete metazoan BUSCOs and predicted gene models.ConclusionsTheR. muellerihybrid genome provides a novel resource for researchers to investigate the evolution of genes and gene families within Octocorallia and more widely across Anthozoa. It will be a key resource for future comparative genomics with other corals and for understanding the genomic basis of coral diversity.

Download Full-text

MECAT: an ultra-fast mapping, error correction andde novoassembly tool for single-molecule sequencing reads

10.1101/089250 ◽

2016 ◽

Cited By ~ 2

Author(s):

Chuan-Le Xiao ◽

Ying Chen ◽

Shang-qian Xie ◽

Kai-Ning Chen ◽

Yan Wang ◽

...

Keyword(s):

Error Correction ◽

Single Molecule ◽

De Novo ◽

Computational Cost ◽

Pairwise Alignment ◽

Global Alignment ◽

Chinese Han ◽

Celera Assembler ◽

Reference Quality ◽

Molecular Sequencing

ABSTRACTThe high computational cost of current assembly methods for the long, noisy single molecular sequencing (SMS) reads has prevented them from assembling large genomes. We introduce an ultra-fast alignment method based on a novel global alignment score. For large human SMS data, our method is 7X faster than MHAP for pairwise alignment and 15X faster than BLASR for reference mapping. We develop a Mapping, Error Correction and de novo Assembly Tool (MECAT) by integrating our new alignment and error correction methods, with the Celera Assembler. MECAT is capable of producing high qualityde novoassembly of large genome from SMS reads with low computational cost. MECAT produces reference-quality assemblies ofSaccharomyces cerevisiae,Arabidopsis thaliana,Drosophila melanogasterand reconstructs the human CHM1 genome with 15% longer NG50 in only 7600 CPU core hours using 54X SMS reads and a Chinese Han genome in 19200 CPU core hours using 102X SMS reads.

Download Full-text

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation

10.1101/071282 ◽

2016 ◽

Cited By ~ 96

Author(s):

Sergey Koren ◽

Brian P. Walenz ◽

Konstantin Berlin ◽

Jason R. Miller ◽

Nicholas H. Bergman ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Error Rates ◽

Celera Assembler ◽

Oxford Nanopore ◽

Long Read ◽

Reference Quality ◽

Order Of Magnitude ◽

Assembly Algorithms ◽

Oxford Nanopore Technologies

AbstractLong-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either PacBio or Oxford Nanopore technologies, and achieves a contig NG50 of greater than 21 Mbp on both human and Drosophila melanogaster PacBio datasets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.

Download Full-text

Improved assembly of noisy long reads by k-mer validation

10.1101/053256 ◽

2016 ◽

Author(s):

A. Bernardo Carvalho ◽

Eduardo G Dupim ◽

Gabriel Nassar

Keyword(s):

Low Cost ◽

Error Rates ◽

Read Length ◽

Celera Assembler ◽

Short Reads ◽

Sequencing Errors ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Validation Procedure

Genome assembly depends critically on read length. Two recent technologies, PacBio and Oxford Nanopore, produce read lengths above 20 kb, which yield genome assemblies that are vastly superior to those based on Sanger or short-reads. However, the very high error rates of both technologies (around 15%-20%) makes assembly computationally expensive and imprecise at repeats longer than the read length. Here we show that the efficiency and quality of the assembly of these noisy reads can be significantly improved at a minimal cost, by leveraging on the low error rate and low cost of Illumina short reads. Namely, k-mers from the PacBio raw reads that are not present in the Illumina reads (which account for ~95% of the distinct k-mers) are deemed as sequencing errors and ignored at the seed alignment step. By focusing on ~5% of the k-mers which are error-free, read overlap sensitivity is dramatically increased. Equally important, the validation procedure can be extended to exclude repetitive k-mers, which avoids read miscorrection at repeats and further improve the resulting assemblies. We tested the k-mer validation procedure in one long-read technology (PacBio) and one assembler (MHAP/ Celera Assembler), but is likely to yield analogous improvements with alternative long-read technologies and overlappers, such as Oxford Nanopore and BLASR/DAligner.

Download Full-text

A complete bacterial genome assembled de novo using only nanopore sequencing data

10.1101/015552 ◽

2015 ◽

Cited By ~ 10

Author(s):

Nicholas James Loman ◽

Joshua Quick ◽

Jared T Simpson

Keyword(s):

De Novo ◽

Bacterial Genome ◽

Reference Sequence ◽

Nucleotide Identity ◽

Sequencing Data ◽

Celera Assembler ◽

Level Data ◽

Complete Bacterial Genome ◽

Sequencing Platforms ◽

K 12

A method for de novo assembly of data from the Oxford Nanopore MinION instrument is presented which is able to reconstruct the sequence of an entire bacterial chromosome in a single contig. Initially, overlaps between nanopore reads are detected. Reads are then subjected to one or more rounds of error correction by a multiple alignment process employing partial order graphs. After correction, reads are assembled using the Celera assembler. Finally, the assembly is polished using signal-level data from the nanopore employing a novel hidden Markov model. We show that this method is able to assemble nanopore reads from Escherichia coli K-12 MG1655 into a single contig of length 4.6Mb permitting a full reconstruction of gene order. The resulting draft assembly has 98.4% nucleotide identity compared to the finished reference genome. After polishing the assembly with our signal-level HMM, the nucleotide identity is improved to 99.4%. We show that MinION sequencing data can be used to reconstruct genomes without the need for a reference sequence or data from other sequencing platforms.

Download Full-text

Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing

10.1101/008003 ◽

2014 ◽

Cited By ~ 13

Author(s):

Konstantin Berlin ◽

Sergey Koren ◽

Chen-Shan Chin ◽

James Drake ◽

Jane M Landolin ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Locality Sensitive Hashing ◽

Model Organisms ◽

Smrt Sequencing ◽

High Coverage ◽

Celera Assembler ◽

Single Molecule Sequencing ◽

Long Reads ◽

Long Read

We report reference-grade de novo assemblies of four model organisms and the human genome from single-molecule, real-time (SMRT) sequencing. Long-read SMRT sequencing is routinely used to finish microbial genomes, but the available assembly methods have not scaled well to larger genomes. Here we introduce the MinHash Alignment Process (MHAP) for efficient overlapping of noisy, long reads using probabilistic, locality-sensitive hashing. Together with Celera Assembler, MHAP was used to reconstruct the genomes of Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster, and human from high-coverage SMRT sequencing. The resulting assemblies include fully resolved chromosome arms and close persistent gaps in these important reference genomes, including heterochromatic and telomeric transition sequences. For D. melanogaster, MHAP achieved a 600-fold speedup relative to prior methods and a cloud computing cost of a few hundred dollars. These results demonstrate that single-molecule sequencing alone can produce near-complete eukaryotic genomes at modest cost.

Download Full-text

Sergey Koren on "Celera Assembler and Automated Finishing"

SciVee ◽

10.4016/11500.01 ◽

2009 ◽

Keyword(s):

Celera Assembler

Download Full-text

Consensus generation and variant detection by Celera Assembler

Bioinformatics ◽

10.1093/bioinformatics/btn074 ◽

2008 ◽

Vol 24 (8) ◽

pp. 1035-1040 ◽

Cited By ~ 50

Author(s):

G. Denisov ◽

B. Walenz ◽

A. L. Halpern ◽

J. Miller ◽

N. Axelrod ◽

...

Keyword(s):

Celera Assembler ◽

Variant Detection

Download Full-text

celera assembler
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Hybridde novoAssembly of the Sea Pansy (Renilla muelleri) Genome

MECAT: an ultra-fast mapping, error correction andde novoassembly tool for single-molecule sequencing reads

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation

Improved assembly of noisy long reads by k-mer validation

A complete bacterial genome assembled de novo using only nanopore sequencing data

Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing

Sergey Koren on "Celera Assembler and Automated Finishing"

Consensus generation and variant detection by Celera Assembler

Export Citation Format

celera assemblerRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Hybridde novoAssembly of the Sea Pansy (Renilla muelleri) Genome

MECAT: an ultra-fast mapping, error correction andde novoassembly tool for single-molecule sequencing reads

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation

Improved assembly of noisy long reads by k-mer validation

A complete bacterial genome assembled de novo using only nanopore sequencing data

Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing

Sergey Koren on "Celera Assembler and Automated Finishing"

Consensus generation and variant detection by Celera Assembler

celera assembler
Recently Published Documents