celera assembler
Recently Published Documents


TOTAL DOCUMENTS

8
(FIVE YEARS 0)

H-INDEX

4
(FIVE YEARS 0)

2018 ◽  
Author(s):  
Justin Jiang ◽  
Andrea M. Quattrini ◽  
Warren R. Francis ◽  
Joseph F. Ryan ◽  
Estefanía Rodríguez ◽  
...  

AbstractBackgroundOver 3,000 species of octocorals (Cnidaria, Anthozoa) inhabit an expansive range of environments, from shallow tropical seas to the deep-ocean floor. They are important foundation species that create coral “forests” which provide unique niches and three-dimensional living space for other organisms. The octocoral genusRenillainhabits sandy, continental shelves in the subtropical and tropical Atlantic and eastern Pacific Oceans.Renillais especially interesting because it produces secondary metabolites for defense, exhibits bioluminescence, and produces a luciferase that is widely used in dual-reporter assays in molecular biology. Although several cnidarian genomes are currently available, the majority are from hexacorals. Here, we present ade novoassembly of theR. muellerigenome, making this the first complete draft genome from an octocoral.FindingsWe generated a hybridde novoassembly using the Maryland Super-Read Celera Assembler v.3.2.6 (MaSuRCA). The final assembly included 4,825 scaffolds and a haploid genome size of 172 Mb. A BUSCO assessment found 88% of metazoan orthologs present in the genome. An Augustusab initiogene prediction found 23,660 genes, of which 66% (15,635) had detectable similarity to annotated genes from the starlet sea anemone,Nematostella vectensis,or to the Uniprot database. Although theR. muellerigenome is smaller (172 Mb) than other publicly available, hexacoral genomes (256-448 Mb), theR. muellerigenome is similar to the hexacoral genomes in terms of the number of complete metazoan BUSCOs and predicted gene models.ConclusionsTheR. muellerihybrid genome provides a novel resource for researchers to investigate the evolution of genes and gene families within Octocorallia and more widely across Anthozoa. It will be a key resource for future comparative genomics with other corals and for understanding the genomic basis of coral diversity.


2016 ◽  
Author(s):  
Chuan-Le Xiao ◽  
Ying Chen ◽  
Shang-qian Xie ◽  
Kai-Ning Chen ◽  
Yan Wang ◽  
...  

ABSTRACTThe high computational cost of current assembly methods for the long, noisy single molecular sequencing (SMS) reads has prevented them from assembling large genomes. We introduce an ultra-fast alignment method based on a novel global alignment score. For large human SMS data, our method is 7X faster than MHAP for pairwise alignment and 15X faster than BLASR for reference mapping. We develop a Mapping, Error Correction and de novo Assembly Tool (MECAT) by integrating our new alignment and error correction methods, with the Celera Assembler. MECAT is capable of producing high qualityde novoassembly of large genome from SMS reads with low computational cost. MECAT produces reference-quality assemblies ofSaccharomyces cerevisiae,Arabidopsis thaliana,Drosophila melanogasterand reconstructs the human CHM1 genome with 15% longer NG50 in only 7600 CPU core hours using 54X SMS reads and a Chinese Han genome in 19200 CPU core hours using 102X SMS reads.


2016 ◽  
Author(s):  
Sergey Koren ◽  
Brian P. Walenz ◽  
Konstantin Berlin ◽  
Jason R. Miller ◽  
Nicholas H. Bergman ◽  
...  

AbstractLong-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either PacBio or Oxford Nanopore technologies, and achieves a contig NG50 of greater than 21 Mbp on both human and Drosophila melanogaster PacBio datasets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.


2016 ◽  
Author(s):  
A. Bernardo Carvalho ◽  
Eduardo G Dupim ◽  
Gabriel Nassar

Genome assembly depends critically on read length. Two recent technologies, PacBio and Oxford Nanopore, produce read lengths above 20 kb, which yield genome assemblies that are vastly superior to those based on Sanger or short-reads. However, the very high error rates of both technologies (around 15%-20%) makes assembly computationally expensive and imprecise at repeats longer than the read length. Here we show that the efficiency and quality of the assembly of these noisy reads can be significantly improved at a minimal cost, by leveraging on the low error rate and low cost of Illumina short reads. Namely, k-mers from the PacBio raw reads that are not present in the Illumina reads (which account for ~95% of the distinct k-mers) are deemed as sequencing errors and ignored at the seed alignment step. By focusing on ~5% of the k-mers which are error-free, read overlap sensitivity is dramatically increased. Equally important, the validation procedure can be extended to exclude repetitive k-mers, which avoids read miscorrection at repeats and further improve the resulting assemblies. We tested the k-mer validation procedure in one long-read technology (PacBio) and one assembler (MHAP/ Celera Assembler), but is likely to yield analogous improvements with alternative long-read technologies and overlappers, such as Oxford Nanopore and BLASR/DAligner.


2015 ◽  
Author(s):  
Nicholas James Loman ◽  
Joshua Quick ◽  
Jared T Simpson

A method for de novo assembly of data from the Oxford Nanopore MinION instrument is presented which is able to reconstruct the sequence of an entire bacterial chromosome in a single contig. Initially, overlaps between nanopore reads are detected. Reads are then subjected to one or more rounds of error correction by a multiple alignment process employing partial order graphs. After correction, reads are assembled using the Celera assembler. Finally, the assembly is polished using signal-level data from the nanopore employing a novel hidden Markov model. We show that this method is able to assemble nanopore reads from Escherichia coli K-12 MG1655 into a single contig of length 4.6Mb permitting a full reconstruction of gene order. The resulting draft assembly has 98.4% nucleotide identity compared to the finished reference genome. After polishing the assembly with our signal-level HMM, the nucleotide identity is improved to 99.4%. We show that MinION sequencing data can be used to reconstruct genomes without the need for a reference sequence or data from other sequencing platforms.


2014 ◽  
Author(s):  
Konstantin Berlin ◽  
Sergey Koren ◽  
Chen-Shan Chin ◽  
James Drake ◽  
Jane M Landolin ◽  
...  

We report reference-grade de novo assemblies of four model organisms and the human genome from single-molecule, real-time (SMRT) sequencing. Long-read SMRT sequencing is routinely used to finish microbial genomes, but the available assembly methods have not scaled well to larger genomes. Here we introduce the MinHash Alignment Process (MHAP) for efficient overlapping of noisy, long reads using probabilistic, locality-sensitive hashing. Together with Celera Assembler, MHAP was used to reconstruct the genomes of Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster, and human from high-coverage SMRT sequencing. The resulting assemblies include fully resolved chromosome arms and close persistent gaps in these important reference genomes, including heterochromatic and telomeric transition sequences. For D. melanogaster, MHAP achieved a 600-fold speedup relative to prior methods and a cloud computing cost of a few hundred dollars. These results demonstrate that single-molecule sequencing alone can produce near-complete eukaryotic genomes at modest cost.


2008 ◽  
Vol 24 (8) ◽  
pp. 1035-1040 ◽  
Author(s):  
G. Denisov ◽  
B. Walenz ◽  
A. L. Halpern ◽  
J. Miller ◽  
N. Axelrod ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document