SiLiCO: A Simulator of Long Read Sequencing in PacBio and Oxford Nanopore

AbstractSummaryLong read sequencing platforms, which include the widely used Pacific Biosciences (PacBio) platform and the emerging Oxford Nanopore platform, aim to produce sequence fragments in excess of 15-20 kilobases, and have proved advantageous in the identification of structural variants and easing genome assembly. However, long read sequencing remains relatively expensive and error prone, and failed sequencing runs represent a significant problem for genomics core facilities. To quantitatively assess the underlying mechanics of sequencing failure, it is essential to have highly reproducible and controllable reference data sets to which sequencing results can be compared. Here, we present SiLiCO, the first in silico simulation tool to generate standardized sequencing results from both of the leading long read sequencing platforms.AvailabilitySiLiCO is an open source package written in Python. It is freely available at https://www.github.com/ethanagbaker/SiLiCO under the GNU GPL 3.0 license.Contact<emails>Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Benchmarking of long-read assemblers for prokaryote whole genome sequencing

F1000Research ◽

10.12688/f1000research.21782.1 ◽

2019 ◽

Vol 8 ◽

pp. 2138 ◽

Cited By ~ 15

Author(s):

Ryan R. Wick ◽

Kathryn E. Holt

Keyword(s):

Data Sets ◽

Computationally Efficient ◽

Short Read Sequencing ◽

Oxford Nanopore ◽

Long Read ◽

Sequencing Platforms ◽

Computational Resources ◽

Assembly Algorithms ◽

Oxford Nanopore Technologies ◽

Multiple Assembly

Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of six long-read assemblers (Canu, Flye, Miniasm/Minipolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v1.9 produced moderately reliable assemblies but had the longest runtimes of all assemblers tested. Flye v2.6 was more reliable and did particularly well with plasmid assembly. Miniasm/Minipolish v0.3 was the only assembler which consistently produced clean contig circularisation. Raven v0.0.5 was the most reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.3.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.

Download Full-text

Benchmarking of long-read assemblers for prokaryote whole genome sequencing

F1000Research ◽

10.12688/f1000research.21782.3 ◽

2020 ◽

Vol 8 ◽

pp. 2138 ◽

Cited By ~ 2

Author(s):

Ryan R. Wick ◽

Kathryn E. Holt

Keyword(s):

Data Sets ◽

Computationally Efficient ◽

Oxford Nanopore ◽

Long Read ◽

Sequencing Platforms ◽

Computational Resources ◽

Assembly Algorithms ◽

Oxford Nanopore Technologies ◽

Sequence Errors ◽

Multiple Assembly

Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of eight long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, NextDenovo/NextPolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v2.0 produced reliable assemblies and was good with plasmids, but it performed poorly with circularisation and had the longest runtimes of all assemblers tested. Flye v2.8 was also reliable and made the smallest sequence errors, though it used the most RAM. Miniasm/Minipolish v0.3/v0.1.3 was the most likely to produce clean contig circularisation. NECAT v20200119 was reliable and good at circularisation but tended to make larger sequence errors. NextDenovo/NextPolish v2.3.0/v1.2.4 was reliable with chromosome assembly but bad with plasmid assembly. Raven v1.1.10 was the most reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.5.1 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.

Download Full-text

Benchmarking of long-read assemblers for prokaryote whole genome sequencing

F1000Research ◽

10.12688/f1000research.21782.2 ◽

2020 ◽

Vol 8 ◽

pp. 2138 ◽

Cited By ~ 4

Author(s):

Ryan R. Wick ◽

Kathryn E. Holt

Keyword(s):

Data Sets ◽

Computationally Efficient ◽

Short Read Sequencing ◽

Oxford Nanopore ◽

Long Read ◽

Sequencing Platforms ◽

Computational Resources ◽

Assembly Algorithms ◽

Oxford Nanopore Technologies ◽

Multiple Assembly

Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of seven long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v1.9 produced moderately reliable assemblies but had the longest runtimes of all assemblers tested. Flye v2.7 was more reliable and did particularly well with plasmid assembly. Miniasm/Minipolish v0.3 and NECAT v20200119 were the most likely to produce clean contig circularisation. Raven v0.0.8 was the most reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.4.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.

Download Full-text

Benchmarking of long-read assemblers for prokaryote whole genome sequencing

F1000Research ◽

10.12688/f1000research.21782.4 ◽

2021 ◽

Vol 8 ◽

pp. 2138

Author(s):

Ryan R. Wick ◽

Kathryn E. Holt

Keyword(s):

Data Sets ◽

Computationally Efficient ◽

Oxford Nanopore ◽

Long Read ◽

Sequencing Platforms ◽

Computational Resources ◽

Assembly Algorithms ◽

Oxford Nanopore Technologies ◽

Sequence Errors ◽

Multiple Assembly

Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of eight long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, NextDenovo/NextPolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v2.1 produced reliable assemblies and was good with plasmids, but it performed poorly with circularisation and had the longest runtimes of all assemblers tested. Flye v2.8 was also reliable and made the smallest sequence errors, though it used the most RAM. Miniasm/Minipolish v0.3/v0.1.3 was the most likely to produce clean contig circularisation. NECAT v20200803 was reliable and good at circularisation but tended to make larger sequence errors. NextDenovo/NextPolish v2.3.1/v1.3.1 was reliable with chromosome assembly but bad with plasmid assembly. Raven v1.3.0 was reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.7.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish, NextDenovo/NextPolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.

Download Full-text

Ribbon: intuitive visualization for complex genomic variation

Bioinformatics ◽

10.1093/bioinformatics/btaa680 ◽

2020 ◽

Cited By ~ 5

Author(s):

Maria Nattestad ◽

Robert Aboukhalil ◽

Chen-Shan Chin ◽

Michael C Schatz

Keyword(s):

Genomic Variation ◽

Supplementary Information ◽

Visualization Tool ◽

Visualization Method ◽

Structural Variants ◽

Long Read ◽

Complex Structural ◽

Intuitive View ◽

Genome Comparisons ◽

Shed Light

Abstract Summary Ribbon is an alignment visualization tool that shows how alignments are positioned within both the reference and read contexts, giving an intuitive view that enables a better understanding of structural variants and the read evidence supporting them. Ribbon was born out of a need to curate complex structural variant calls and determine whether each was well supported by long-read evidence, and it uses the same intuitive visualization method to shed light on contig alignments from genome-to-genome comparisons. Availability and implementation Ribbon is freely available online at http://genomeribbon.com/ and is open-source at https://github.com/marianattestad/ribbon. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TandemTools: mapping long reads and assessing/improving assembly quality in extra-long tandem repeats

Bioinformatics ◽

10.1093/bioinformatics/btaa440 ◽

2020 ◽

Vol 36 (Supplement_1) ◽

pp. i75-i83 ◽

Cited By ~ 5

Author(s):

Alla Mikheenko ◽

Andrey V Bzikadze ◽

Alexey Gurevich ◽

Karen H Miga ◽

Pavel A Pevzner

Keyword(s):

Quality Assessment ◽

Chromosome Segregation ◽

Tandem Repeats ◽

Supplementary Information ◽

Supplementary Data ◽

Assembly Quality ◽

Cellular Processes ◽

Long Reads ◽

Long Read ◽

Eukaryotic Genomes

Abstract Motivation Extra-long tandem repeats (ETRs) are widespread in eukaryotic genomes and play an important role in fundamental cellular processes, such as chromosome segregation. Although emerging long-read technologies have enabled ETR assemblies, the accuracy of such assemblies is difficult to evaluate since there are no tools for their quality assessment. Moreover, since the mapping of error-prone reads to ETRs remains an open problem, it is not clear how to polish draft ETR assemblies. Results To address these problems, we developed the TandemTools software that includes the TandemMapper tool for mapping reads to ETRs and the TandemQUAST tool for polishing ETR assemblies and their quality assessment. We demonstrate that TandemTools not only reveals errors in ETR assemblies but also improves the recently generated assemblies of human centromeres. Availability and implementation https://github.com/ablab/TandemTools. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Updated Genome Sequence for the Probiotic Bacterium Bifidobacterium animalis subsp. lactis BB-12

Microbiology Resource Announcements ◽

10.1128/mra.00078-21 ◽

2021 ◽

Vol 10 (27) ◽

Author(s):

Kristian Jensen ◽

Kosai Al-Nakeeb ◽

Anna Koza ◽

Ahmad A. Zeidan

Keyword(s):

Genome Sequence ◽

Probiotic Bacterium ◽

Content Type ◽

Short Read Sequencing ◽

Bifidobacterium Animalis ◽

Oxford Nanopore ◽

Hybrid Genome ◽

Long Read ◽

Sequencing Platforms ◽

Oxford Nanopore Technologies

The genome of Bifidobacterium animalis subsp. lactis BB-12 was sequenced using Oxford Nanopore Technologies long-read and Illumina short-read sequencing platforms. A hybrid genome assembly approach was used to construct an updated complete genome sequence for BB-12 containing 1,944,152 bp, with a G+C content of 60.5% and 1,615 genes.

Download Full-text

Population genomic evidence of selection on structural variants in a natural hybrid zone

10.1101/2022.01.14.476419 ◽

2022 ◽

Author(s):

Linyi Zhang ◽

Samridhi Chaturvedi ◽

Chris Nice ◽

Lauren Lucas ◽

Zachariah Gompert

Keyword(s):

Reproductive Isolation ◽

Hybrid Zone ◽

Structural Variants ◽

Hybrid Fitness ◽

Genome Wide ◽

Oxford Nanopore ◽

Jackson Hole ◽

Long Read ◽

Genomic Regions

Structural variants (SVs) can promote speciation by directly causing reproductive isolation or by suppressing recombination across large genomic regions. Whereas examples of each mechanism have been documented, systematic tests of the role of SVs in speciation are lacking. Here, we take advantage of long-read (Oxford nanopore) whole-genome sequencing and a hybrid zone between two Lycaeides butterfly taxa (L. melissa and Jackson Hole Lycaeides) to comprehensively evaluate genome-wide patterns of introgression for SVs and relate these patterns to hypotheses about speciation. We found >100,000 SVs segregating within or between the two hybridizing species. SVs and SNPs exhibited similar levels of genetic differentiation between species, with the exception of inversions, which were more differentiated. We detected credible variation in patterns of introgression among SV loci in the hybrid zone, with 562 of 1419 ancestry-informative SVs exhibiting genomic clines that deviating from null expectations based on genome-average ancestry. Overall, hybrids exhibited a directional shift towards Jackson Hole Lycaeides ancestry at SV loci, consistent with the hypothesis that these loci experienced more selection on average then SNP loci. Surprisingly, we found that deletions, rather than inversions, showed the highest skew towards excess introgression from Jackson Hole Lycaeides. Excess Jackson Hole Lycaeides ancestry in hybrids was also especially pronounced for Z-linked SVs and inversions containing many genes. In conclusion, our results show that SVs are ubiquitous and suggest that SVs in general, but especially deletions, might contribute disproportionately to hybrid fitness and thus (partial) reproductive isolation.

Download Full-text

BleTIES: Annotation of natural genome editing in ciliates using long read sequencing

10.1101/2021.05.18.444610 ◽

2021 ◽

Author(s):

Brandon K. B. Seah ◽

Estienne C. Swart

Keyword(s):

Dna Sequences ◽

Sequence Data ◽

Low Complexity ◽

Supplementary Information ◽

Neighboring Element ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Element Elimination

Ciliates are single-celled eukaryotes that eliminate specific, interspersed DNA sequences (internally eliminated sequences, IESs) from their genomes during development. These are challenging to annotate and assemble because IES-containing sequences are much less abundant in the cell than those without, and IES sequences themselves often contain repetitive and low-complexity sequences. Long read sequencing technologies from Pacific Biosciences and Oxford Nanopore have the potential to reconstruct longer IESs than has been possible with short reads, and also the ability to detect correlations of neighboring element elimination. Here we present BleTIES, a software toolkit for detecting, assembling, and analyzing IESs using mapped long reads. Availability and implementation: BleTIES is implemented in Python 3. Source code is available at https://github.com/Swart-lab/bleties (MIT license), and also distributed via Bioconda. Contact: [email protected] Supplementary information: Benchmarking of BleTIES with published sequence data.

Download Full-text

Nanopype: a modular and scalable nanopore data processing pipeline

Bioinformatics ◽

10.1093/bioinformatics/btz461 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4770-4772

Author(s):

Pay Giesselmann ◽

Sara Hetzel ◽

Franz-Josef Müller ◽

Alexander Meissner ◽

Helene Kretzmer

Keyword(s):

Data Processing ◽

Supplementary Information ◽

Nanopore Sequencing ◽

Third Generation ◽

Supplementary Data ◽

Seamless Integration ◽

Short Read ◽

Processing Pipeline ◽

Bioinformatics Software ◽

Long Read

Abstract Summary Long-read third-generation nanopore sequencing enables researchers to now address a range of questions that are difficult to tackle with short read approaches. The rapidly expanding user base and continuously increasing throughput have sparked the development of a growing number of specialized analysis tools. However, streamlined processing of nanopore datasets using reproducible and transparent workflows is still lacking. Here we present Nanopype, a nanopore data processing pipeline that integrates a diverse set of established bioinformatics software while maintaining consistent and standardized output formats. Seamless integration into compute cluster environments makes the framework suitable for high-throughput applications. As a result, Nanopype facilitates comparability of nanopore data analysis workflows and thereby should enhance the reproducibility of biological insights. Availability and implementation https://github.com/giesselmann/nanopype, https://nanopype.readthedocs.io. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text