Rapid, multiplexed, whole genome and plasmid sequencing of foodborne pathogens using long-read nanopore technology

AbstractUnited States public health agencies are focusing on next-generation sequencing (NGS) to quickly identify and characterize foodborne pathogens. Here, the MinION nanopore, long-read sequencer was used to simultaneously sequence the entire chromosome and plasmids of Salmonella enterica subsp. enterica serovar Bareilly and Escherichia coli O157:H7. A rapid, random sequencing approach, coupled with de novo genome assembly within a customized data analysis workflow, that can resolve highly-repetitive genomic regions, was developed. In sequencing runs, as short as four hours, using nanopore data alone, full-length genomes were obtained with an average identity of 99.87% for Salmonella Bareilly and 99.89% for E. coli in comparison to the respective MiSeq references. These long-read assemblies provided information on serotype, virulence factors, and antimicrobial resistance genes. Using a custom-developed, SNP-selection workflow, the potential of the nanopore-only assemblies (after only 30 minutes of sequencing) for rapid phylogenetic inference, with identical topology compared to the published dataset, was demonstrated. To achieve maximum quality assemblies, the developed bioinformatics workflow employed additional polishing steps to correct the systematic errors produced by the nanopore-only assemblies. Nanopore sequencing provided a shorter (10 hours library preparation and sequencing) turnaround time compared to other NGS technologies.

Download Full-text

Rapid, multiplexed, whole genome and plasmid sequencing of foodborne pathogens using long-read nanopore technology

Scientific Reports ◽

10.1038/s41598-019-52424-x ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 8

Author(s):

Tonya L. Taylor ◽

Jeremy D. Volkening ◽

Eric DeJesus ◽

Mustafa Simmons ◽

Kiril M. Dimitrov ◽

...

Keyword(s):

Foodborne Pathogens ◽

Capital Investment ◽

De Novo ◽

Turnaround Time ◽

Nanopore Sequencing ◽

Public Health Agencies ◽

Genomic Arrangement ◽

Antimicrobial Resistance Genes ◽

Bacterial Chromosomes ◽

Long Read

Abstract U.S. public health agencies have employed next-generation sequencing (NGS) as a tool to quickly identify foodborne pathogens during outbreaks. Although established short-read NGS technologies are known to provide highly accurate data, long-read sequencing is still needed to resolve highly-repetitive genomic regions and genomic arrangement, and to close the sequences of bacterial chromosomes and plasmids. Here, we report the use of long-read nanopore sequencing to simultaneously sequence the entire chromosome and plasmid of Salmonella enterica subsp. enterica serovar Bareilly and Escherichia coli O157:H7. We developed a rapid and random sequencing approach coupled with de novo genome assembly within a customized data analysis workflow that uses publicly-available tools. In sequencing runs as short as four hours, using the MinION instrument, we obtained full-length genomes with an average identity of 99.87% for Salmonella Bareilly and 99.89% for E. coli in comparison to the respective MiSeq references. These nanopore-only assemblies provided readily available information on serotype, virulence factors, and antimicrobial resistance genes. We also demonstrate the potential of nanopore sequencing assemblies for rapid preliminary phylogenetic inference. Nanopore sequencing provides additional advantages as very low capital investment and footprint, and shorter (10 hours library preparation and sequencing) turnaround time compared to other NGS technologies.

Download Full-text

HAPHPIPE: Haplotype Reconstruction and Phylodynamics for Deep Sequencing of Intra-Host Viral Populations

Molecular Biology and Evolution ◽

10.1093/molbev/msaa315 ◽

2020 ◽

Author(s):

Matthew L Bendall ◽

Keylie M Gibson ◽

Margaret C Steiner ◽

Uzma Rentia ◽

Marcos Pérez-Losada ◽

...

Keyword(s):

Deep Sequencing ◽

De Novo ◽

Consensus Sequence ◽

Haplotype Reconstruction ◽

Consensus Sequences ◽

Genome Wide ◽

Genomic Regions ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Abstract Deep sequencing of viral populations using next generation sequencing (NGS) offers opportunities to understand and investigate evolution, transmission dynamics, and population genetics. Currently, the standard practice for processing NGS data to study viral populations is to summarize all the observed sequences from a sample as a single consensus sequence, thus discarding valuable information about the intra-host viral molecular epidemiology. Furthermore, existing analytical pipelines may only analyze genomic regions involved in drug resistance, thus are not suited for full viral genome analysis. Here we present HAPHPIPE, a HAplotype and PHylodynamics PIPEline for genome-wide assembly of viral consensus sequences and haplotypes. The HAPHPIPE protocol includes modules for quality trimming, error correction, de novo assembly, alignment, and haplotype reconstruction. The resulting consensus sequences, haplotypes, and alignments can be further analyzed using a variety of phylogenetic and population genetic software. HAPHPIPE is designed to provide users with a single pipeline to rapidly analyze sequences from viral populations generated from NGS platforms and provide quality output properly formatted for downstream evolutionary analyses.

Download Full-text

LongStitch: High-quality genome assembly correction and scaffolding using long reads

10.1101/2021.06.17.448848 ◽

2021 ◽

Author(s):

Lauren Coombe ◽

Janet X Li ◽

Theodora Lo ◽

Johnathan Wong ◽

Vladimir Nikolic ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Draft Genome ◽

Model Organisms ◽

High Quality ◽

De Novo Genome Assembly ◽

Long Reads ◽

Long Read ◽

Genomic Regions ◽

Genome Assemblies

Background Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. Results LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 2.0-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently runs in under five hours using less than 23GB of RAM. Conclusions Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch.

Download Full-text

Target Capture and Massive Sequencing of Genes Transcribed inMytilus galloprovincialis

BioMed Research International ◽

10.1155/2014/538549 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 6

Author(s):

Umberto Rosani ◽

Stefania Domeneghetti ◽

Alberto Pallavicini ◽

Paola Venier

Keyword(s):

De Novo ◽

454 Sequencing ◽

Pcr Amplification ◽

Gene Sequences ◽

Target Capture ◽

Reference Transcript ◽

Mediterranean Mussel ◽

Massive Sequencing ◽

Genomic Regions ◽

Next Generation Sequencing Ngs

Next generation sequencing (NGS) allows fast and massive production of both genome and transcriptome sequence datasets. As the genome of the Mediterranean musselMytilus galloprovincialisis not available at present, we have explored the possibility of reducing the whole genome sequencing efforts by using capture probes coupled with PCR amplification and high-throughput 454-sequencing to enrich selected genomic regions. The enrichment of DNA target sequences was validated by real-time PCR, whereas the efficacy of the applied strategy was evaluated by mapping the 454-output reads against reference transcript data already available forM. galloprovincialisand by measuring coverage, SNPs, number ofde novosequenced introns, and complete gene sequences. Focusing on a target size of nearly 1.5 Mbp, we obtained a target coverage which allowed the identification of more than 250 complete introns, 10,741 SNPs, and also complete gene sequences. This study confirms the transcriptome-based enrichment of gDNA regions as a good strategy to expand knowledge on specific subsets of genes also in nonmodel organisms.

Download Full-text

Technologies for Pharmacogenomics: A Review

Genes ◽

10.3390/genes11121456 ◽

2020 ◽

Vol 11 (12) ◽

pp. 1456

Author(s):

Maaike van der Lee ◽

Marjolein Kriek ◽

Henk-Jan Guchelaar ◽

Jesse J. Swen

Keyword(s):

Clinical Practice ◽

Rare Variants ◽

Turnaround Time ◽

Sequencing Data ◽

Single Nucleotide ◽

Long Read ◽

The Right ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing ◽

Straightforward Interpretation

The continuous development of new genotyping technologies requires awareness of their potential advantages and limitations concerning utility for pharmacogenomics (PGx). In this review, we provide an overview of technologies that can be applied in PGx research and clinical practice. Most commonly used are single nucleotide variant (SNV) panels which contain a pre-selected panel of genetic variants. SNV panels offer a short turnaround time and straightforward interpretation, making them suitable for clinical practice. However, they are limited in their ability to assess rare and structural variants. Next-generation sequencing (NGS) and long-read sequencing are promising technologies for the field of PGx research. Both NGS and long-read sequencing often provide more data and more options with regard to deciphering structural and rare variants compared to SNV panels—in particular, in regard to the number of variants that can be identified, as well as the option for haplotype phasing. Nonetheless, while useful for research, not all sequencing data can be applied to clinical practice yet. Ultimately, selecting the right technology is not a matter of fact but a matter of choosing the right technique for the right problem.

Download Full-text

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly

10.1101/072116 ◽

2016 ◽

Cited By ~ 8

Author(s):

Valerie A. Schneider ◽

Tina Graves-Lindsay ◽

Kerstin Howe ◽

Nathan Bouk ◽

Hsiu-Chuan Chen ◽

...

Keyword(s):

De Novo ◽

Genome Mapping ◽

Population Variation ◽

Reference Assembly ◽

Sequence Generation ◽

Long Read ◽

Genomic Regions ◽

Genome Assemblies ◽

First Time

AbstractThe human reference genome assembly plays a central role in nearly all aspects of today’s basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009 and reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that while the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.

Download Full-text

LongStitch: high-quality genome assembly correction and scaffolding using long reads

BMC Bioinformatics ◽

10.1186/s12859-021-04451-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Lauren Coombe ◽

Janet X. Li ◽

Theodora Lo ◽

Johnathan Wong ◽

Vladimir Nikolic ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Draft Genome ◽

Model Organisms ◽

High Quality ◽

De Novo Genome Assembly ◽

Long Reads ◽

Long Read ◽

Genomic Regions ◽

Genome Assemblies

Abstract Background Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. Results LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of Caenorhabditis elegans, Oryza sativa, and three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 1.2-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently improves upon human assemblies in under five hours using less than 23 GB of RAM. Conclusions Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch.

Download Full-text

Complete and haplotype-specific sequence assembly of segmental duplication-mediated genome rearrangements using CRISPR-targeted ultra-long read sequencing (CTLR-Seq)

10.1101/2020.10.23.349621 ◽

2020 ◽

Author(s):

Bo Zhou ◽

GiWon Shin ◽

Stephanie U. Greer ◽

Lisanne Vervoort ◽

Yiling Huang ◽

...

Keyword(s):

Segmental Duplication ◽

Base Pair ◽

De Novo ◽

Genome Rearrangements ◽

Pulse Field ◽

Sequencing Analysis ◽

Single Base ◽

Long Read ◽

Single Base Pair ◽

Genomic Regions

ABSTRACTWe have developed a generally applicable method based on CRISPR/Cas9-targeted ultra-long read sequencing (CTLR-Seq) to completely and haplotype-specifically resolve, at base-pair resolution, large, complex, and highly repetitive genomic regions that had been previously impenetrable to next-generation sequencing analysis such as large segmental duplication (SegDup) regions and their associated genome rearrangements that stretch hundreds of kilobases. Our method combines in vitro Cas9-mediated cutting of the genome and pulse-field gel electrophoresis to haplotype-specifically isolate intact large (200-550 kb) target regions that encompass previously unresolvable genomic sequences. These target fragments are then sequenced (amplification-free) to produce ultra-long reads at up to 40x on-target coverage using Oxford nanopore technology, allowing for the complete assembly of the complex genomic regions of interest at single base-pair resolution. We applied CTLR-Seq to resolve the exact sequence of SegDup rearrangements that constitute the boundary regions of the 22q11.2 deletion CNV and of the 16p11.2 deletion and duplication CNVs. These CNVs are among the strongest known risk factors for schizophrenia and autism. We then perform de novo assembly to resolve, for the first time, at single base-pair resolution, the sequence rearrangements of the 22q11.2 and 16p11.2 CNVs, mapping out exactly the genes and non-coding regions that are affected by the CNV for different carriers.

Download Full-text