scholarly journals Resolving structural diversity of Carbapenemase-producing gram-negative bacteria using single molecule sequencing

2018 ◽  
Author(s):  
Nicholas Noll ◽  
Eric Urich ◽  
Daniel Wüthrich ◽  
Vladimira Hinic ◽  
Adrian Egli ◽  
...  

Carbapenemase-producing bacteria are resistant against almost all commonly used betalactam and cephalosporin antibiotics and represent a growing public health crisis. Carbapenemases reside predominantly in mobile genetic elements and rapidly spread across genetic backgrounds and species boundaries. Here, we report more than one hundred finished, high quality genomes of carbapenemase producing enterobacteriaceae, P. aeruginosa and A. baumannii sequenced with Oxford Nanopore and Illumina technologies. We developed a number of high-throughput criteria to assess the quality of fully assembled genomes for which curated references do not exist. Using this diverse collection of closed genomes and plasmids, we demonstrate rapid movement of carbapenemase between genomic neighborhoods, sequence types, and across species boundaries with distinct patterns for different carbapenemases. Lastly, we present evidence of multiple ancestral recombination events between different Enterobacteriaceae MLSTs. Taken together, our samples suggest a hierarchical picture of genomic variation produced by the evolution of carbapenemase producing bacteria that will require new models to adequately understand and track.

2018 ◽  
Author(s):  
Ruibang Luo ◽  
Fritz J. Sedlazeck ◽  
Tak-Wah Lam ◽  
Michael C. Schatz

AbstractThe accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5%-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieved 99.73%, 97.68% and 95.36% precision on known variants, and 98.65%, 92.57%, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than two hours on a standard server. Furthermore, we identified 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source (https://github.com/aquaskyline/Clairvoyante), with modules to train, utilize and visualize the model.


2018 ◽  
Vol 1 (4) ◽  
pp. e00086
Author(s):  
S.P. Radko ◽  
L.K. Kurbatov ◽  
K.G. Ptitsyn ◽  
Y.Y. Kiseleva ◽  
E.A. Ponomarenko ◽  
...  

Transcriptome profiling is widely employed to analyze transcriptome dynamics when studying various biological processes at the cell and tissue levels. Unlike the second generation sequencers, which sequence relatively short fragments of nucleic acids, the third generation DNA/RNA sequencers developed by biotechnology companies “PacBio” and “Oxford Nanopore Technologies” allow one to sequence transcripts as single molecules and may be considered as potential molecular counters capable to measure the number of copies of each transcript with high throughput, sensitivity, and specificity. In the present review, the features of single molecule sequencing technologies offered by “PacBio” and “Oxford Nanopore Technologies” are considered alongside with their utility for transcriptome analysis, including the analysis of transcript isoforms. The prospects and limitations of the single molecule sequencing technology in application to quantitative transcriptome profiling are also discussed.


2016 ◽  
Author(s):  
Chen Yang ◽  
Justin Chu ◽  
Ren&eacute L Warren ◽  
Inanç Birol

Motivation: In 2014, Oxford Nanopore Technologies (ONT) announced a new sequencing platform called MinION. The particular features of MinION reads, longer read lengths and single-molecule sequencing in particular, show potential for genome characterization. As of yet, the pre-commercial technology is exclusively available through early-access, and only a few datasets are publically available for testing. Further, no software exists that simulates MinION platform reads with genuine ONT characteristics. Results: In this article, we introduce NanoSim, a fast and scalable read simulator that captures the technology-specific features of ONT data, and allows for adjustments upon improvement of nanopore sequencing technology. Availability: NanoSim is written in Python and R. The source files and manual are available at the Genome Sciences Centre website: http://www.bcgsc.ca/platform/bioinfo/software/nanosim


2018 ◽  
Author(s):  
Jay Ghurye ◽  
Sergey Koren ◽  
Scott T Small ◽  
Seth Redmond ◽  
Paul Howell ◽  
...  

Background: Anopheles funestus is one of the three most consequential and widespread vectors of human malaria in tropical Africa. However, the lack of a high-quality reference genome has hindered the association of phenotypic traits with their genetic basis in this important mosquito. Findings: Here we present a new high-quality An. funestus reference genome (AfunF3) assembled using 240x coverage of long-read single-molecule sequencing for contigging, combined with 100x coverage of short-read Hi-C data for chromosome scaffolding. The assembled contigs total 446 Mbp of sequence and contain substantial duplication due to alternative alleles present in the sequenced pool of mosquitos from the FUMOZ colony. Using alignment and depth-of-coverage information, these contigs were deduplicated to a 211 Mbp primary assembly, which is closer to the expected haploid genome size of 250 Mbp. This primary assembly consists of 1,053 contigs organized into 3 chromosome-scale scaffolds with an N50 contig size of 632 kbp and an N50 scaffold size of 93.811 Mbp, representing a 100-fold improvement in continuity versus the current reference assembly, AfunF1. Conclusion: This highly contiguous and complete An. funestus reference genome assembly will serve as an improved basis for future studies of genomic variation and organization in this important disease vector.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Sam Kovaka ◽  
Aleksey V. Zimin ◽  
Geo M. Pertea ◽  
Roham Razaghi ◽  
Steven L. Salzberg ◽  
...  

AbstractRNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.


2020 ◽  
Author(s):  
Jean-Marc Aury ◽  
Benjamin Istace

Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from short reads to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.


2021 ◽  
Vol 17 (6) ◽  
pp. e1009078
Author(s):  
Jingwen Ren ◽  
Mark J. P. Chaisson

It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).


2019 ◽  
Author(s):  
Sam Kovaka ◽  
Aleksey V. Zimin ◽  
Geo M. Pertea ◽  
Roham Razaghi ◽  
Steven L. Salzberg ◽  
...  

AbstractRNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new computational methods to handle the high error rate of long-read sequencing technology, which previous assemblers could not tolerate. It also offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of assemblies. On 33 short-read datasets from humans and two plant species, StringTie2 is 47.3% more precise and 3.9% more sensitive than Scallop. On multiple long read datasets, StringTie2 on average correctly assembles 8.3 and 2.6 times as many transcripts as FLAIR and Traphlor, respectively, with substantially higher precision. StringTie2 is also faster and has a smaller memory footprint than all comparable tools.


Author(s):  
Hailei Zhang ◽  
Huan Zhong ◽  
Shoudong Zhang ◽  
Xiaojian Shao ◽  
Min Ni ◽  
...  

The 5′ end of a eukaryotic mRNA transcript generally has a 7-methylguanosine (m7G) cap that protects mRNA from degradation and mediates almost all other aspects of gene expression. Some RNAs in Escherichia coli, yeast, and mammals were recently found to contain an NAD+ cap. Here, we report the development of the method NAD tagSeq for transcriptome-wide identification and quantification of NAD+-capped RNAs (NAD-RNAs). The method uses an enzymatic reaction and then a click chemistry reaction to label NAD-RNAs with a synthetic RNA tag. The tagged RNA molecules can be enriched and directly sequenced using the Oxford Nanopore sequencing technology. NAD tagSeq can allow more accurate identification and quantification of NAD-RNAs, as well as reveal the sequences of whole NAD-RNA transcripts using single-molecule RNA sequencing. Using NAD tagSeq, we found that NAD-RNAs in Arabidopsis were produced by at least several thousand genes, most of which are protein-coding genes, with the majority of these transcripts coming from <200 genes. For some Arabidopsis genes, over 5% of their transcripts were NAD capped. Gene ontology terms overrepresented in the 2,000 genes that produced the highest numbers of NAD-RNAs are related to photosynthesis, protein synthesis, and responses to cytokinin and stresses. The NAD-RNAs in Arabidopsis generally have the same overall sequence structures as the canonical m7G-capped mRNAs, although most of them appear to have a shorter 5′ untranslated region (5′ UTR). The identification and quantification of NAD-RNAs and revelation of their sequence features can provide essential steps toward understanding the functions of NAD-RNAs.


Sign in / Sign up

Export Citation Format

Share Document