Comparison of long read sequencing technologies in resolving bacteria and fly genomes

ABSTRACTBackgroundThe newest generation of DNA sequencing technology is highlighted by the ability to sequence reads hundreds of kilobases in length, and the increased availability of long read data has democratized the genome sequencing and assembly process. PacBio and Oxford Nanopore Technologies (ONT) have pioneered competitive long read platforms, with more recent work focused on improving sequencing throughput and per-base accuracy. Released in 2019, the PacBio Sequel II platform advertises substantial enhancements over previous PacBio systems.ResultsWe used whole-genome sequencing data produced by two PacBio platforms (Sequel II and RS II) and two ONT protocols (Rapid Sequencing and Ligation Sequencing) to compare assemblies of the bacteria Escherichia coli and the fruit fly Drosophila ananassae. Sequel II assemblies had higher contiguity and consensus accuracy relative to other methods, even after accounting for differences in sequencing throughput. ONT RAPID libraries had the fewest chimeric reads in addition to superior quantification of E. coli plasmids versus ligation-based libraries. The quality of assemblies can be enhanced by adopting hybrid approaches using Illumina libraries for bacterial genome assemblies or combined ONT and Sequel II libraries for eukaryotic genome assemblies. Genome-wide DNA methylation could be detected using both technologies, however ONT libraries enabled the identification of a broader range of known E. coli methyltransferase recognition motifs in addition to undocumented D. ananassae motifs.ConclusionsThe ideal choice of long read technology may depend on several factors including the question or hypothesis under examination. No single technology outperformed others in all metrics examined.

Download Full-text

Comparison of long-read sequencing technologies in interrogating bacteria and fly genomes

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab083 ◽

2021 ◽

Author(s):

Eric S Tvedte ◽

Mark Gasser ◽

Benjamin C Sparklin ◽

Jane Michalski ◽

Carl E Hjelmen ◽

...

Keyword(s):

Bacterial Genome ◽

Hybrid Approach ◽

Cost Effective ◽

Fruit Fly ◽

Drosophila Ananassae ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

E Coli ◽

Hybrid Approaches ◽

Long Read

Abstract The newest generation of DNA sequencing technology is highlighted by the ability to generate sequence reads hundreds of kilobases in length. Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have pioneered competitive long read platforms, with more recent work focused on improving sequencing throughput and per-base accuracy. We used whole-genome sequencing data produced by three PacBio protocols (Sequel II CLR, Sequel II HiFi, RS II) and two ONT protocols (Rapid Sequencing and Ligation Sequencing) to compare assemblies of the bacteria Escherichia coli and the fruit fly Drosophila ananassae. In both organisms tested, Sequel II assemblies had the highest consensus accuracy, even after accounting for differences in sequencing throughput. ONT and PacBio CLR had the longest reads sequenced compared to PacBio RS II and HiFi, and genome contiguity was highest when assembling these datasets. ONT Rapid Sequencing libraries had the fewest chimeric reads in addition to superior quantification of E. coli plasmids versus ligation-based libraries. The quality of assemblies can be enhanced by adopting hybrid approaches using Illumina libraries for bacterial genome assembly or polishing eukaryotic genome assemblies, and an ONT-Illumina hybrid approach would be more cost-effective for many users. Genome-wide DNA methylation could be detected using both technologies, however ONT libraries enabled the identification of a broader range of known E. coli methyltransferase recognition motifs in addition to undocumented D. ananassae motifs. The ideal choice of long read technology may depend on several factors including the question or hypothesis under examination. No single technology outperformed others in all metrics examined.

Download Full-text

Draft genome assemblies using sequencing reads from Oxford Nanopore Technology and Illumina platforms for four species of North American Fundulus killifish

GigaScience ◽

10.1093/gigascience/giaa067 ◽

2020 ◽

Vol 9 (6) ◽

Cited By ~ 3

Author(s):

Lisa K Johnson ◽

Ruta Sahasrabudhe ◽

James Anthony Gill ◽

Jennifer L Roach ◽

Lutz Froenicke ◽

...

Keyword(s):

North American ◽

De Novo ◽

Draft Genome ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Sequence Coverage ◽

Short Read ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Background Whole-genome sequencing data from wild-caught individuals of closely related North American killifish species (Fundulus xenicus, Fundulus catenatus, Fundulus nottii, and Fundulus olivaceus) were obtained using long-read Oxford Nanopore Technology (ONT) PromethION and short-read Illumina platforms. Findings Draft de novo reference genome assemblies were generated using a combination of long and short sequencing reads. For each species, the PromethION platform was used to generate 30–45× sequence coverage, and the Illumina platform was used to generate 50–160× sequence coverage. Illumina-only assemblies were fragmented with high numbers of contigs, while ONT-only assemblies were error prone with low BUSCO scores. The highest N50 values, ranging from 0.4 to 2.7 Mb, were from assemblies generated using a combination of short- and long-read data. BUSCO scores were consistently >90% complete using the Eukaryota database. Conclusions High-quality genomes can be obtained from a combination of using short-read Illumina data to polish assemblies generated with long-read ONT data. Draft assemblies and raw sequencing data are available for public use. We encourage use and reuse of these data for assembly benchmarking and other analyses.

Download Full-text

Comparison of MiSeq, MinION, and hybrid genome sequencing for analysis of Campylobacter jejuni

Scientific Reports ◽

10.1038/s41598-021-84956-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jason M. Neal-McKinney ◽

Kun C. Liu ◽

Christopher M. Lock ◽

Wen-Hsin Wu ◽

Jinxin Hu

Keyword(s):

Genome Sequencing ◽

Sequence Data ◽

Bacterial Genome ◽

Illumina Miseq ◽

Trna Genes ◽

Sequencing Data ◽

Data Types ◽

Field Isolates ◽

Hybrid Genome ◽

Genome Assemblies

AbstractThe sequencing, assembly, and analysis of bacterial genomes is central to tracking and characterizing foodborne pathogens. The bulk of bacterial genome sequencing at the US Food and Drug Administration is performed using short-read Illumina MiSeq technology, resulting in highly accurate but fragmented genomic sequences. The MinION sequencer from Oxford Nanopore is an evolving technology that produces long-read sequencing data with low equipment cost. The goal of this study was to compare Campylobacter genome assemblies generated from MiSeq and MinION data independently, as well as hybrid genome assemblies combining both data types. Two reference strains and two field isolates of C. jejuni were sequenced using MiSeq and MinION, and the sequence data were assembled using the software programs SPAdes and Canu, respectively. Hybrid genome assembly was performed using the program Unicycler. Comparison of the C. jejuni 81-176 and RM1221 genome assemblies to the PacBio reference genomes revealed that the SPAdes assemblies had the most accurate nucleotide identity, while the hybrid assemblies were the most contiguous. Assemblies generated only from MinION data using Canu were the least accurate, containing many indels and substitutions that affected downstream analyses. The hybrid sequencing approach was the most useful for detecting plasmids, large genome rearrangements, and repetitive elements such as rRNA and tRNA genes. The full genomes of both C. jejuni field isolates were completed and circularized using hybrid sequencing, and a plasmid was detected in one isolate. Continued development of nanopore sequencing technologies will likely enhance the accuracy of hybrid genome assemblies and enable public health laboratories to routinely generate complete circularized bacterial genome sequences.

Download Full-text

Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences

Genome Biology ◽

10.1186/s13059-021-02447-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Readman Chiu ◽

Indhu-Shree Rajan-Babu ◽

Jan M. Friedman ◽

Inanc Birol

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Tandem Repeat ◽

Neurological Disorders ◽

Software Tool ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Long Read ◽

Repeat Expansions

AbstractTandem repeat (TR) expansion is the underlying cause of over 40 neurological disorders. Long-read sequencing offers an exciting avenue over conventional technologies for detecting TR expansions. Here, we present Straglr, a robust software tool for both targeted genotyping and novel expansion detection from long-read alignments. We benchmark Straglr using various simulations, targeted genotyping data of cell lines carrying expansions of known diseases, and whole genome sequencing data with chromosome-scale assembly. Our results suggest that Straglr may be useful for investigating disease-associated TR expansions using long-read sequencing.

Download Full-text

SVants – A long-read based method for structural variation detection in bacterial genomes

10.1101/822312 ◽

2019 ◽

Cited By ~ 1

Author(s):

BM Hanson ◽

JS Johnson ◽

SR Leopold ◽

E Sodergren ◽

GM Weinstock

Keyword(s):

Structural Variation ◽

Tandem Repeats ◽

Bacterial Genome ◽

Genetic Material ◽

Bacterial Cells ◽

Sequencing Data ◽

E Coli ◽

Sequencing Technologies ◽

Long Read ◽

New Locations

AbstractMotivationMobile genetic elements (MGEs) are genetic material that can transfer between bacterial cells and move to new locations within a single bacterial genome. These elements range from several hundred to tens of thousands of bases, and are often bordered by repeat regions, which makes resolving these elements difficult with short-read sequencing data. The development and availability of long-read sequencing technologies has opened up new opportunities in the study of structural variation but there is a lack of bioinformatics tools designed to take advantage of these longer reads.ResultsWe present an assembly-free method for identifying the location of these MGEs when compared to any reference genome (including draft genomes). Using an artificially constructed Escherichia coli genome containing single and tandem-repeats of a Tn9 transposon, we demonstrate the ability of SVants to accurately identify multiple insertion sites as well as count the number of repeats of this MGE. Additionally, we show that SVants accurately identifies the transposon of interest, Tn9, but does not erroneously identify existing IS1 regions present within the chromosome of the E. coli artificial reference.Availability and ImplementationSVants is available as open-source software at https://github.com/EpiBlake/SVants

Download Full-text