Benchmarking of long-read assemblers for prokaryote whole genome sequencing

Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of eight long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, NextDenovo/NextPolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v2.0 produced reliable assemblies and was good with plasmids, but it performed poorly with circularisation and had the longest runtimes of all assemblers tested. Flye v2.8 was also reliable and made the smallest sequence errors, though it used the most RAM. Miniasm/Minipolish v0.3/v0.1.3 was the most likely to produce clean contig circularisation. NECAT v20200119 was reliable and good at circularisation but tended to make larger sequence errors. NextDenovo/NextPolish v2.3.0/v1.2.4 was reliable with chromosome assembly but bad with plasmid assembly. Raven v1.1.10 was the most reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.5.1 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.

Download Full-text

Benchmarking of long-read assemblers for prokaryote whole genome sequencing

F1000Research ◽

10.12688/f1000research.21782.4 ◽

2021 ◽

Vol 8 ◽

pp. 2138

Author(s):

Ryan R. Wick ◽

Kathryn E. Holt

Keyword(s):

Data Sets ◽

Computationally Efficient ◽

Oxford Nanopore ◽

Long Read ◽

Sequencing Platforms ◽

Computational Resources ◽

Assembly Algorithms ◽

Oxford Nanopore Technologies ◽

Sequence Errors ◽

Multiple Assembly

Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of eight long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, NextDenovo/NextPolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v2.1 produced reliable assemblies and was good with plasmids, but it performed poorly with circularisation and had the longest runtimes of all assemblers tested. Flye v2.8 was also reliable and made the smallest sequence errors, though it used the most RAM. Miniasm/Minipolish v0.3/v0.1.3 was the most likely to produce clean contig circularisation. NECAT v20200803 was reliable and good at circularisation but tended to make larger sequence errors. NextDenovo/NextPolish v2.3.1/v1.3.1 was reliable with chromosome assembly but bad with plasmid assembly. Raven v1.3.0 was reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.7.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish, NextDenovo/NextPolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.

Download Full-text

Benchmarking of long-read assemblers for prokaryote whole genome sequencing

F1000Research ◽

10.12688/f1000research.21782.1 ◽

2019 ◽

Vol 8 ◽

pp. 2138 ◽

Cited By ~ 15

Author(s):

Ryan R. Wick ◽

Kathryn E. Holt

Keyword(s):

Data Sets ◽

Computationally Efficient ◽

Short Read Sequencing ◽

Oxford Nanopore ◽

Long Read ◽

Sequencing Platforms ◽

Computational Resources ◽

Assembly Algorithms ◽

Oxford Nanopore Technologies ◽

Multiple Assembly

Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of six long-read assemblers (Canu, Flye, Miniasm/Minipolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v1.9 produced moderately reliable assemblies but had the longest runtimes of all assemblers tested. Flye v2.6 was more reliable and did particularly well with plasmid assembly. Miniasm/Minipolish v0.3 was the only assembler which consistently produced clean contig circularisation. Raven v0.0.5 was the most reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.3.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.

Download Full-text

Benchmarking of long-read assemblers for prokaryote whole genome sequencing

F1000Research ◽

10.12688/f1000research.21782.2 ◽

2020 ◽

Vol 8 ◽

pp. 2138 ◽

Cited By ~ 4

Author(s):

Ryan R. Wick ◽

Kathryn E. Holt

Keyword(s):

Data Sets ◽

Computationally Efficient ◽

Short Read Sequencing ◽

Oxford Nanopore ◽

Long Read ◽

Sequencing Platforms ◽

Computational Resources ◽

Assembly Algorithms ◽

Oxford Nanopore Technologies ◽

Multiple Assembly

Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of seven long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v1.9 produced moderately reliable assemblies but had the longest runtimes of all assemblers tested. Flye v2.7 was more reliable and did particularly well with plasmid assembly. Miniasm/Minipolish v0.3 and NECAT v20200119 were the most likely to produce clean contig circularisation. Raven v0.0.8 was the most reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.4.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.

Download Full-text

Updated Genome Sequence for the Probiotic Bacterium Bifidobacterium animalis subsp. lactis BB-12

Microbiology Resource Announcements ◽

10.1128/mra.00078-21 ◽

2021 ◽

Vol 10 (27) ◽

Author(s):

Kristian Jensen ◽

Kosai Al-Nakeeb ◽

Anna Koza ◽

Ahmad A. Zeidan

Keyword(s):

Genome Sequence ◽

Probiotic Bacterium ◽

Content Type ◽

Short Read Sequencing ◽

Bifidobacterium Animalis ◽

Oxford Nanopore ◽

Hybrid Genome ◽

Long Read ◽

Sequencing Platforms ◽

Oxford Nanopore Technologies

The genome of Bifidobacterium animalis subsp. lactis BB-12 was sequenced using Oxford Nanopore Technologies long-read and Illumina short-read sequencing platforms. A hybrid genome assembly approach was used to construct an updated complete genome sequence for BB-12 containing 1,944,152 bp, with a G+C content of 60.5% and 1,615 genes.

Download Full-text

Accurate profiling of forensic autosomal STRs using the Oxford Nanopore Technologies MinION device

10.1101/2021.07.01.450747 ◽

2021 ◽

Author(s):

Courtney L. Hall ◽

Rupesh K. Kesharwani ◽

Nicole R. Phillips ◽

John V. Planz ◽

Fritz J. Sedlazeck ◽

...

Keyword(s):

Specific Method ◽

Nanopore Sequencing ◽

Autosomal Strs ◽

Str Typing ◽

Str Loci ◽

Oxford Nanopore ◽

Long Read ◽

Flanking Region ◽

Sequencing Platforms ◽

Oxford Nanopore Technologies

The high variability characteristic of short tandem repeat (STR) markers is harnessed for human identification in forensic genetic analyses. Despite the power and reliability of current typing techniques, sequence-level information both within and around STRs are masked in the length-based profiles generated. Forensic STR typing using next generation sequencing (NGS) has therefore gained attention as an alternative to traditional capillary electrophoresis (CE) approaches. In this proof-of-principle study, we evaluate the forensic applicability of the newest and smallest NGS platform available — the Oxford Nanopore Technologies (ONT) MinION device. Although nanopore sequencing on the handheld MinION offers numerous advantages, including on-site sample processing, the relatively high error rate and lack of forensic-specific analysis software has prevented accurate profiling across STR panels in previous studies. Here we present STRspy, a streamlined method capable of producing length- and sequence-based STR allele designations from noisy, long-read data. To demonstrate the capabilities of STRspy, seven reference samples (female: n = 2; male: n = 5) were amplified at 15 and 30 PCR cycles using the Promega PowerSeq 46GY System and sequenced on the ONT MinION device in triplicate. Basecalled reads were processed with STRspy using a custom database containing alleles reported in the STRSeq BioProject NIST 1036 dataset. Resultant STR allele designations and flanking region single nucleotide polymorphism (SNP) calls were compared to the manufacturer-validated genotypes for each sample. STRspy generated robust and reliable genotypes across all autosomal STR loci amplified with 30 PCR cycles, achieving 100% concordance based on both length and sequence. Furthermore, we were able to identify flanking region SNPs with >90% accuracy. These results demonstrate that nanopore sequencing platforms are capable of revealing additional variation in and around STR loci depending on read coverage. As the first long-read platform-specific method to successfully profile the entire panel of autosomal STRs amplified by a commercially available multiplex, STRspy significantly increases the feasibility of nanopore sequencing in forensic applications.

Download Full-text

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation

10.1101/071282 ◽

2016 ◽

Cited By ~ 96

Author(s):

Sergey Koren ◽

Brian P. Walenz ◽

Konstantin Berlin ◽

Jason R. Miller ◽

Nicholas H. Bergman ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Error Rates ◽

Celera Assembler ◽

Oxford Nanopore ◽

Long Read ◽

Reference Quality ◽

Order Of Magnitude ◽

Assembly Algorithms ◽

Oxford Nanopore Technologies

AbstractLong-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either PacBio or Oxford Nanopore technologies, and achieves a contig NG50 of greater than 21 Mbp on both human and Drosophila melanogaster PacBio datasets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.

Download Full-text

SiLiCO: A Simulator of Long Read Sequencing in PacBio and Oxford Nanopore

10.1101/076901 ◽

2016 ◽

Cited By ~ 2

Author(s):

Ethan Alexander García Baker ◽

Sara Goodwin ◽

W. Richard McCombie ◽

Olivia Mendivil Ramos

Keyword(s):

Reference Data ◽

Supplementary Information ◽

Data Sets ◽

Simulation Tool ◽

Supplementary Data ◽

Structural Variants ◽

Oxford Nanopore ◽

Long Read ◽

Sequencing Platforms ◽

Core Facilities

AbstractSummaryLong read sequencing platforms, which include the widely used Pacific Biosciences (PacBio) platform and the emerging Oxford Nanopore platform, aim to produce sequence fragments in excess of 15-20 kilobases, and have proved advantageous in the identification of structural variants and easing genome assembly. However, long read sequencing remains relatively expensive and error prone, and failed sequencing runs represent a significant problem for genomics core facilities. To quantitatively assess the underlying mechanics of sequencing failure, it is essential to have highly reproducible and controllable reference data sets to which sequencing results can be compared. Here, we present SiLiCO, the first in silico simulation tool to generate standardized sequencing results from both of the leading long read sequencing platforms.AvailabilitySiLiCO is an open source package written in Python. It is freely available at https://www.github.com/ethanagbaker/SiLiCO under the GNU GPL 3.0 license.Contact<emails>Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Dual Isoform Sequencing Reveals a Multifaceted Transcriptional Architecture of a Prototype Baculovirus

10.21203/rs.3.rs-637036/v1 ◽

2021 ◽

Author(s):

Gábor Torma ◽

Dóra Tombácz ◽

Norbert Moldován ◽

Ádám Fülöp ◽

István Prazsák ◽

...

Keyword(s):

Protein Coding ◽

Rna Molecules ◽

Non Coding Rna ◽

Oxford Nanopore ◽

The Pacific ◽

Viral Genes ◽

Long Read ◽

Oxford Nanopore Technologies ◽

Overlapping Transcripts

Abstract In this study, we used two long-read sequencing (LRS) techniques, Sequel from the Pacific Biosciences and MinION from Oxford Nanopore Technologies, for the transcriptional characterization of a prototype baculovirus, Autographacalifornica multiple nucleopolyhedrovirus. LRS is able to read full-length RNA molecules, and thereby to distinguish between transcript isoforms, mono- and polycistronic RNAs, and overlapping transcripts. Altogether, we detected 875 transcripts, of which 759 are novel and 116 have been annotated previously. These RNA molecules include 41 novel putative protein coding transcript (each containing 5’-truncated in-frame ORFs), 14 monocistronic transcripts, 99 multicistronic RNAs, 101 non-coding RNA, and 504 length isoforms. We also detected RNA methylation in 12 viral genes and RNA hyper-editing in the longer 5’-UTR transcript isoform of ORF 19 gene.

Download Full-text

Microbial diversity characterization of seawater in a pilot study using Oxford Nanopore Technologies long-read sequencing

10.21203/rs.3.rs-17068/v2 ◽

2020 ◽

Author(s):

Michael Liem ◽

Tonny Regensburg-Tuïnk ◽

Christiaan Henkel ◽

Hans Jansen ◽

Herman Spaink

Keyword(s):

Microbial Diversity ◽

Environmental Samples ◽

Sea Water ◽

Flow Cells ◽

Oxford Nanopore ◽

Challenging Tasks ◽

Long Read ◽

Close Relatives ◽

Oxford Nanopore Technologies

Abstract Objective: Currently the majority of non-culturable microbes in sea water are yet to be discovered, Nanopore offers a solution to overcome the challenging tasks to identify the genomes and complex composition of oceanic microbiomes. In this study we evaluate the utility of Oxford Nanopore Technologies (ONT) sequencing to characterize microbial diversity in seawater from multiple locations. We compared the microbial species diversity of retrieved environmental samples from two different locations and time points.Results: With only three ONT flow cells we were able to identify thousands of organisms, including bacteriophages, from which a large part at species level. It was possible to assemble genomes from environmental samples with Flye. In several cases this resulted in >1 Mbp contigs and in the particular case of a Thioglobus singularis species it even produced a near complete genome. k-mer analysis reveals that a large part of the data represents species of which close relatives have not yet been deposited to the database. These results show that our approach is suitable for scalable genomic investigations such as monitoring oceanic biodiversity and provides a new platform for education in biodiversity.

Download Full-text

Plasmidome analysis of carbapenem-resistant Enterobacteriaceae isolated in Vietnam

10.1101/2020.03.18.996710 ◽

2020 ◽

Author(s):

Aki Hirabayashi ◽

Koji Yahara ◽

Satomi Mitsuhashi ◽

So Nakagawa ◽

Tadashi Imanishi ◽

...

Keyword(s):

Carbapenem Resistance ◽

Genomic Epidemiology ◽

Carbapenem Resistant ◽

Oxford Nanopore ◽

Carbapenemase Gene ◽

Long Read ◽

Severe Infections ◽

Oxford Nanopore Technologies ◽

Carbapenem Resistant Enterobacteriaceae

Carbapenem-resistant Enterobacteriaceae (CRE) represent a serious threat to public health due to limited management of severe infections and high mortality. The rate of resistance of Enterobacteriaceae isolates to major antimicrobials, including carbapenems, is much higher in Vietnam than in Western countries, but the reasons remain unknown due to the lack of genomic epidemiology research. A previous study suggested that carbapenem resistance genes, such as the carbapenemase gene bla NDM-1 , spread via plasmids among Enterobacteriaceae in Vietnam. In this study, we performed detection and molecular characterization of bla NDM-1 -carrying plasmids in CRE isolated in Vietnam, and identified several possible cases of horizontal transfer of plasmids both within and among species of bacteria. Twenty-five carbapenem-resistant isolates from Enterobacteriaceae clinically isolated in a reference medical institution in Hanoi were sequenced on Illumina short-read sequencers, and 12 isolates harboring bla NDM-1 were sequenced on an Oxford Nanopore Technologies long-read sequencer to obtain complete plasmid sequences. Most of the plasmids co-carried genes conferring resistance to clinically relevant antimicrobials, including third-generation cephalosporins, aminoglycosides, and fluoroquinolones, in addition to bla NDM-1 , leading to multidrug resistance of their bacterial hosts. These results provide insight into the genetic basis of CRE in Vietnam, and could help control nosocomial infections.

Download Full-text