Resolving structural diversity of Carbapenemase-producing gram-negative bacteria using single molecule sequencing

Mapping Intimacies ◽

10.1101/456897 ◽

2018 ◽

Cited By ~ 3

Author(s):

Nicholas Noll ◽

Eric Urich ◽

Daniel Wüthrich ◽

Vladimira Hinic ◽

Adrian Egli ◽

...

Keyword(s):

Single Molecule ◽

Structural Diversity ◽

Genomic Variation ◽

Species Boundaries ◽

Health Crisis ◽

Single Molecule Sequencing ◽

Oxford Nanopore ◽

Almost All ◽

Sequence Types

Carbapenemase-producing bacteria are resistant against almost all commonly used betalactam and cephalosporin antibiotics and represent a growing public health crisis. Carbapenemases reside predominantly in mobile genetic elements and rapidly spread across genetic backgrounds and species boundaries. Here, we report more than one hundred finished, high quality genomes of carbapenemase producing enterobacteriaceae, P. aeruginosa and A. baumannii sequenced with Oxford Nanopore and Illumina technologies. We developed a number of high-throughput criteria to assess the quality of fully assembled genomes for which curated references do not exist. Using this diverse collection of closed genomes and plasmids, we demonstrate rapid movement of carbapenemase between genomic neighborhoods, sequence types, and across species boundaries with distinct patterns for different carbapenemases. Lastly, we present evidence of multiple ancestral recombination events between different Enterobacteriaceae MLSTs. Taken together, our samples suggest a hierarchical picture of genomic variation produced by the evolution of carbapenemase producing bacteria that will require new models to adequately understand and track.

NanoBLASTer: Fast alignment and characterization of Oxford Nanopore single molecule sequencing reads

2016 IEEE 6th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS) ◽

10.1109/iccabs.2016.7802776 ◽

2016 ◽

Cited By ~ 3

Author(s):

Mohammad Ruhul Amin ◽

Steven Skiena ◽

Michael C. Schatz

Keyword(s):

Single Molecule ◽

Single Molecule Sequencing ◽

Oxford Nanopore

Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing

10.1101/310458 ◽

2018 ◽

Cited By ~ 9

Author(s):

Ruibang Luo ◽

Fritz J. Sedlazeck ◽

Tak-Wah Lam ◽

Michael C. Schatz

Keyword(s):

Neural Network ◽

Single Molecule ◽

Variant Calling ◽

Accurate Identification ◽

Whole Genome Analysis ◽

Single Molecule Sequencing ◽

Oxford Nanopore ◽

Indel Length ◽

Human Sample ◽

Dna Sequence Variants

AbstractThe accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5%-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieved 99.73%, 97.68% and 95.36% precision on known variants, and 98.65%, 92.57%, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than two hours on a standard server. Furthermore, we identified 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source (https://github.com/aquaskyline/Clairvoyante), with modules to train, utilize and visualize the model.

Prospects for the use of third generation sequencers for quantitative profiling of transcriptome

Biomedical Chemistry Research and Methods ◽

10.18097/bmcrm00086 ◽

2018 ◽

Vol 1 (4) ◽

pp. e00086

Author(s):

S.P. Radko ◽

L.K. Kurbatov ◽

K.G. Ptitsyn ◽

Y.Y. Kiseleva ◽

E.A. Ponomarenko ◽

...

Keyword(s):

Single Molecule ◽

Transcriptome Profiling ◽

Third Generation ◽

Sequencing Technology ◽

Single Molecule Sequencing ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Biotechnology Companies ◽

Oxford Nanopore Technologies ◽

Quantitative Profiling

Transcriptome profiling is widely employed to analyze transcriptome dynamics when studying various biological processes at the cell and tissue levels. Unlike the second generation sequencers, which sequence relatively short fragments of nucleic acids, the third generation DNA/RNA sequencers developed by biotechnology companies “PacBio” and “Oxford Nanopore Technologies” allow one to sequence transcripts as single molecules and may be considered as potential molecular counters capable to measure the number of copies of each transcript with high throughput, sensitivity, and specificity. In the present review, the features of single molecule sequencing technologies offered by “PacBio” and “Oxford Nanopore Technologies” are considered alongside with their utility for transcriptome analysis, including the analysis of transcript isoforms. The prospects and limitations of the single molecule sequencing technology in application to quantitative transcriptome profiling are also discussed.

NanoSim: nanopore sequence read simulator based on statistical characterization

10.1101/044545 ◽

2016 ◽

Cited By ~ 7

Author(s):

Chen Yang ◽

Justin Chu ◽

Ren&eacute L Warren ◽

Inanç Birol

Keyword(s):

Single Molecule ◽

Sequencing Technology ◽

Statistical Characterization ◽

Single Molecule Sequencing ◽

Sequencing Platform ◽

Early Access ◽

Oxford Nanopore ◽

Commercial Technology ◽

Read Simulator ◽

Oxford Nanopore Technologies

Motivation: In 2014, Oxford Nanopore Technologies (ONT) announced a new sequencing platform called MinION. The particular features of MinION reads, longer read lengths and single-molecule sequencing in particular, show potential for genome characterization. As of yet, the pre-commercial technology is exclusively available through early-access, and only a few datasets are publically available for testing. Further, no software exists that simulates MinION platform reads with genuine ONT characteristics. Results: In this article, we introduce NanoSim, a fast and scalable read simulator that captures the technology-specific features of ONT data, and allows for adjustments upon improvement of nanopore sequencing technology. Availability: NanoSim is written in Python and R. The source files and manual are available at the Genome Sciences Centre website: http://www.bcgsc.ca/platform/bioinfo/software/nanosim

A chromosome-scale assembly of the major African malaria vector Anopheles funestus

10.1101/492777 ◽

2018 ◽

Cited By ~ 3

Author(s):

Jay Ghurye ◽

Sergey Koren ◽

Scott T Small ◽

Seth Redmond ◽

Paul Howell ◽

...

Keyword(s):

Single Molecule ◽

Reference Genome ◽

Anopheles Funestus ◽

Genomic Variation ◽

Phenotypic Traits ◽

High Quality ◽

Single Molecule Sequencing ◽

Long Read ◽

Haploid Genome Size ◽

Important Disease

Background: Anopheles funestus is one of the three most consequential and widespread vectors of human malaria in tropical Africa. However, the lack of a high-quality reference genome has hindered the association of phenotypic traits with their genetic basis in this important mosquito. Findings: Here we present a new high-quality An. funestus reference genome (AfunF3) assembled using 240x coverage of long-read single-molecule sequencing for contigging, combined with 100x coverage of short-read Hi-C data for chromosome scaffolding. The assembled contigs total 446 Mbp of sequence and contain substantial duplication due to alternative alleles present in the sequenced pool of mosquitos from the FUMOZ colony. Using alignment and depth-of-coverage information, these contigs were deduplicated to a 211 Mbp primary assembly, which is closer to the expected haploid genome size of 250 Mbp. This primary assembly consists of 1,053 contigs organized into 3 chromosome-scale scaffolds with an N50 contig size of 632 kbp and an N50 scaffold size of 93.811 Mbp, representing a 100-fold improvement in continuity versus the current reference assembly, AfunF1. Conclusion: This highly contiguous and complete An. funestus reference genome assembly will serve as an improved basis for future studies of genomic variation and organization in this important disease vector.

Transcriptome assembly from long-read RNA-seq alignments with StringTie2

Genome Biology ◽

10.1186/s13059-019-1910-1 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 54

Author(s):

Sam Kovaka ◽

Aleksey V. Zimin ◽

Geo M. Pertea ◽

Roham Razaghi ◽

Steven L. Salzberg ◽

...

Keyword(s):

Single Molecule ◽

Transcriptome Assembly ◽

Rna Seq ◽

Ability To Work ◽

Single Molecule Sequencing ◽

Short Read ◽

New Methods ◽

Long Reads ◽

Long Read

Hapo-G, Haplotype-Aware Polishing Of Genome Assemblies

10.1101/2020.12.14.422624 ◽

2020 ◽

Author(s):

Jean-Marc Aury ◽

Benjamin Istace

Keyword(s):

Single Molecule ◽

Direct Consequence ◽

Short Reads ◽

Sequencing Errors ◽

Coding Regions ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from short reads to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

lra: A long read aligner for sequences and contigs

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009078 ◽

2021 ◽

Vol 17 (6) ◽

pp. e1009078

Author(s):

Jingwen Ren ◽

Mark J. P. Chaisson

Keyword(s):

Dynamic Programming ◽

Single Molecule ◽

De Novo Assembly ◽

De Novo ◽

Concave Function ◽

Single Molecule Sequencing ◽

Link Type ◽

Oxford Nanopore ◽

Concave Cost ◽

Long Read

It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).

Transcriptome assembly from long-read RNA-seq alignments with StringTie2

10.1101/694554 ◽

2019 ◽

Author(s):

Sam Kovaka ◽

Aleksey V. Zimin ◽

Geo M. Pertea ◽

Roham Razaghi ◽

Steven L. Salzberg ◽

...

Keyword(s):

Single Molecule ◽

Transcriptome Assembly ◽

Rna Seq ◽

High Error Rate ◽

Sequencing Technology ◽

Ability To Work ◽

Single Molecule Sequencing ◽

Long Reads ◽

Long Read

AbstractRNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new computational methods to handle the high error rate of long-read sequencing technology, which previous assemblers could not tolerate. It also offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of assemblies. On 33 short-read datasets from humans and two plant species, StringTie2 is 47.3% more precise and 3.9% more sensitive than Scallop. On multiple long read datasets, StringTie2 on average correctly assembles 8.3 and 2.6 times as many transcripts as FLAIR and Traphlor, respectively, with substantially higher precision. StringTie2 is also faster and has a smaller memory footprint than all comparable tools.

NAD tagSeq reveals that NAD+-capped RNAs are mostly produced from a large number of protein-coding genes in Arabidopsis

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1903683116 ◽

2019 ◽

pp. 201903683 ◽

Cited By ~ 9

Author(s):

Hailei Zhang ◽

Huan Zhong ◽

Shoudong Zhang ◽

Xiaojian Shao ◽

Min Ni ◽

...

Keyword(s):

Single Molecule ◽

Enzymatic Reaction ◽

Accurate Identification ◽

Protein Coding ◽

Rna Molecules ◽

Rna Transcripts ◽

Protein Coding Genes ◽

Oxford Nanopore ◽

Almost All ◽

Identification And Quantification

The 5′ end of a eukaryotic mRNA transcript generally has a 7-methylguanosine (m7G) cap that protects mRNA from degradation and mediates almost all other aspects of gene expression. Some RNAs in Escherichia coli, yeast, and mammals were recently found to contain an NAD+ cap. Here, we report the development of the method NAD tagSeq for transcriptome-wide identification and quantification of NAD+-capped RNAs (NAD-RNAs). The method uses an enzymatic reaction and then a click chemistry reaction to label NAD-RNAs with a synthetic RNA tag. The tagged RNA molecules can be enriched and directly sequenced using the Oxford Nanopore sequencing technology. NAD tagSeq can allow more accurate identification and quantification of NAD-RNAs, as well as reveal the sequences of whole NAD-RNA transcripts using single-molecule RNA sequencing. Using NAD tagSeq, we found that NAD-RNAs in Arabidopsis were produced by at least several thousand genes, most of which are protein-coding genes, with the majority of these transcripts coming from <200 genes. For some Arabidopsis genes, over 5% of their transcripts were NAD capped. Gene ontology terms overrepresented in the 2,000 genes that produced the highest numbers of NAD-RNAs are related to photosynthesis, protein synthesis, and responses to cytokinin and stresses. The NAD-RNAs in Arabidopsis generally have the same overall sequence structures as the canonical m7G-capped mRNAs, although most of them appear to have a shorter 5′ untranslated region (5′ UTR). The identification and quantification of NAD-RNAs and revelation of their sequence features can provide essential steps toward understanding the functions of NAD-RNAs.