Nanopore sequencing of RNA and cDNA molecules expands the transcriptomic toolbox in prokaryotes

High-throughput sequencing dramatically changed our view of transcriptome architectures and allowed for ground-breaking discoveries in RNA biology. Recently, sequencing of full-length transcripts based on the single-molecule sequencing platform from Oxford Nanopore Technologies (ONT) was introduced and is widely employed to sequence eukaryotic and viral RNAs. However, experimental approaches implementing this technique for prokaryotic transcriptomes remain scarce. Here, we present an experimental and bioinformatic workflow for ONT RNA-seq in the bacterial model organism Escherichia coli, which can be applied to any microorganism. Our study highlights critical steps of library preparation and computational analysis and compares the results to gold standards in the field. Furthermore, we comprehensively evaluate the applicability and advantages of different ONT-based RNA sequencing protocols, including direct RNA, direct cDNA, and PCR-cDNA. We find that cDNA-seq offers improved yield and accuracy without bias in quantification compared to direct RNA sequencing. Notably, cDNA-seq can be readily used for simultaneous transcript quantification, accurate detection of transcript 5 ′ and 3′ boundaries, analysis of transcriptional units and transcriptional heterogeneity. In summary, we establish Nanopore RNA-seq to be a ready-to-use tool allowing rapid, cost-effective, and accurate annotation of multiple transcriptomic features thereby advancing it to become a standard method for RNA analysis in prokaryotes.

Download Full-text

Nanopore sequencing of RNA and cDNA molecules in Escherichia coli

RNA ◽

10.1261/rna.078937.121 ◽

2021 ◽

pp. rna.078937.121

Author(s):

Felix Grünberger ◽

Sébastien Ferreira-Cerca ◽

Dina Grohmann

Keyword(s):

Escherichia Coli ◽

Rna Sequencing ◽

Single Molecule ◽

High Throughput Sequencing ◽

Model Organism ◽

Cost Effective ◽

Rna Seq ◽

Sequencing Platform ◽

Quantitative Measurements ◽

Oxford Nanopore

High-throughput sequencing dramatically changed our view of transcriptome architectures and allowed for ground-breaking discoveries in RNA biology. Recently, sequencing of full-length transcripts based on the single-molecule sequencing platform from Oxford Nanopore Technologies (ONT) was introduced and is widely employed to sequence eukaryotic and viral RNAs. However, experimental approaches implementing this technique for prokaryotic transcriptomes remain scarce. Here, we present an experimental and bioinformatic workflow for ONT RNA-seq in the bacterial model organism Escherichia coli, which can be applied to any microorganism. Our study highlights critical steps of library preparation and computational analysis and compares the results to gold standards in the field. Furthermore, we comprehensively evaluate the applicability and advantages of different ONT-based RNA sequencing protocols, including direct RNA, direct cDNA, and PCR-cDNA. We find that (PCR)-cDNA-seq offers improved yield and accuracy compared to direct RNA sequencing. Notably, (PCR)-cDNA-seq is suitable for quantitative measurements and can be readily used for simultaneous and accurate detection of transcript 5'and 3' boundaries, analysis of transcriptional units and transcriptional heterogeneity. In summary, based on our comprehensive study, we show that Nanopore RNA-seq to be a ready-to-use tool allowing rapid, cost-effective, and accurate annotation of multiple transcriptomic features. Thereby Nanopore RNA-seq holds the potential to become a valuable alternative method for RNA analysis in prokaryotes.

Download Full-text

Parallel and scalable workflow for the analysis of Oxford Nanopore direct RNA sequencing datasets

10.1101/818336 ◽

2019 ◽

Author(s):

Luca Cozzuto ◽

Huanle Liu ◽

Leszek P. Pryszcz ◽

Toni Hermoso Pulido ◽

Julia Ponomarenko ◽

...

Keyword(s):

Rna Sequencing ◽

Single Molecule ◽

Tail Length ◽

Rna Modification ◽

Sequencing Data ◽

Polya Tail ◽

Sequencing Platform ◽

Rna Molecules ◽

Oxford Nanopore ◽

Quality Filtering

ABSTRACTThe direct RNA sequencing platform offered by Oxford Nanopore Technologies allows for direct measurement of RNA molecules without the need of conversion to complementary DNA, fragmentation or amplification. As such, it is virtually capable of detecting any given RNA modification present in the molecule that is being sequenced, as well as provide polyA tail length estimations at the level of individual RNA molecules. Although this technology has been publicly available since 2017, the complexity of the raw Nanopore data, together with the lack of systematic and reproducible pipelines, have greatly hindered the access of this technology to the general user. Here we address this problem by providing a fully benchmarked workflow for the analysis of direct RNA sequencing reads, termed MasterOfPores. The pipeline converts raw current intensities into multiple types of processed data, providing metrics of the quality of the run, quality-filtering, base-calling and mapping. The output of the pipeline can in turn be used to compute per-gene counts, RNA modifications, and prediction of polyA tail length and RNA isoforms. The software is written using the NextFlow framework for parallelization and portability, and relies on Linux containers such as Docker and Singularity for achieving better reproducibility. The MasterOfPores workflow can be executed on any Unix-compatible OS on a computer, cluster or cloud without the need of installing any additional software or dependencies, and is freely available in Github (https://github.com/biocorecrg/master_of_pores). This workflow will significantly simplify the analysis of nanopore direct RNA sequencing data by non-bioinformatics experts, thus boosting the understanding of the (epi)transcriptome with single molecule resolution.

Download Full-text

Penguin: A Tool for Predicting Pseudouridine Sites in Direct RNA Nanopore Sequencing Data

10.1101/2021.03.31.437901 ◽

2021 ◽

Author(s):

Doaa Hassan ◽

Daniel Acevedo ◽

Swapna Vidhur Daulatabad ◽

Quoseena Mir ◽

Sarath Chandra Janga

Keyword(s):

Rna Sequencing ◽

Single Molecule ◽

Direct Detection ◽

Support Vector ◽

Sequencing Data ◽

Rna Modifications ◽

Sequencing Platform ◽

Independent Validation ◽

Oxford Nanopore ◽

Validation Testing

AbstractPseudouridine is one of the most abundant RNA modifications, occurring when uridines are catalyzed by Pseudouridine synthase proteins. It plays an important role in many biological processes and also has an importance in drug development. Recently, the single-molecule sequencing techniques such as the direct RNA sequencing platform offered by Oxford Nanopore technologies enable direct detection of RNA modifications on the molecule that is being sequenced, but to our knowledge this technology has not been used to identify RNA Pseudouridine sites. To this end, in this paper, we address this limitation by introducing a tool called Penguin that integrates several developed machine learning (ML) models (i.e., predictors) to identify RNA Pseudouridine sites in Nanopore direct RNA sequencing reads. Penguin extracts a set of features from the raw signal measured by the Oxford Nanopore and the corresponding basecalled k-mer. Those features are used to train the predictors included in Penguin, which in turn, is able to predict whether the signal is modified by the presence of Pseudouridine sites. We have included various predictors in Penguin including Support vector machine (SVM), Random Forest (RF), and Neural network (NN). The results on the two benchmark data sets show that Penguin is able to identify Pseudouridine sites with a high accuracy of 93.38% and 92.61% using SVM in random split testing and independent validation testing respectively. Thus, Penguin outperforms the existing Pseudouridine predictors in the literature that achieved an accuracy of 76.0 at most with an independent validation testing. A GitHub of the tool is accessible at https://github.com/Janga-Lab/Penguin.

Download Full-text

RNA modifications detection by comparative Nanopore direct RNA sequencing

10.1101/843136 ◽

2019 ◽

Cited By ~ 16

Author(s):

Adrien Leger ◽

Paulo P. Amaral ◽

Luca Pandolfini ◽

Charlotte Capitanchik ◽

Federica Capraro ◽

...

Keyword(s):

Rna Sequencing ◽

Single Molecule ◽

High Throughput Sequencing ◽

Control Sample ◽

Analytical Framework ◽

Rna Modifications ◽

Rna Molecules ◽

Naturally Occurring ◽

Oxford Nanopore ◽

Experimental Approaches

AbstractRNA molecules undergo a vast array of chemical post-transcriptional modifications (PTMs) that can affect their structure and interaction properties. To date, over 150 naturally occurring PTMs have been identified, however the overwhelming majority of their functions remain elusive. In recent years, a small number of PTMs have been successfully mapped to the transcriptome using experimental approaches relying on high-throughput sequencing. Oxford Nanopore direct-RNA sequencing (DRS) technology has been shown to be sensitive to RNA modifications. We developed and validated Nanocompore, a robust analytical framework to evaluate the presence of modifications in DRS data. To do so, we compare an RNA sample of interest against a non-modified control sample. Our strategy does not require a training set and allows the use of replicates to model biological variability. Here, we demonstrate the ability of Nanocompore to detect RNA modifications at single-molecule resolution in human polyA+ RNAs, as well as in targeted non-coding RNAs. Our results correlate well with orthogonal methods, confirm previous observations on the distribution of N6-methyladenosine sites and provide novel insights into the distribution of RNA modifications in the coding and non-coding transcriptomes. The latest version of Nanocompore can be obtained at https://github.com/tleonardi/nanocompore.

Download Full-text

Genome-Wide Identification of 5-Methylcytosine Sites in Bacterial Genomes By High-Throughput Sequencing of MspJI Restriction Fragments

10.1101/2021.02.10.430591 ◽

2021 ◽

Author(s):

Brian P. Anton ◽

Alexey Fomenkov ◽

Victoria Wu ◽

Richard J. Roberts

Keyword(s):

Single Molecule ◽

Dna Sequences ◽

High Throughput Sequencing ◽

Cost Effective ◽

Restriction Enzymes ◽

Specific Sequence ◽

Genome Wide ◽

Cost Effective Alternative ◽

Simple Column ◽

Sequencing Platforms

ABSTRACTSingle-molecule Real-Time (SMRT) sequencing can easily identify sites of N6-methyladenine and N4-methylcytosine within DNA sequences, but similar identification of 5-methylcytosine sites is not as straightforward. In prokaryotic DNA, methylation typically occurs within specific sequence contexts, or motifs, that are a property of the methyltransferases that “write” these epigenetic marks. We present here a straightforward, cost-effective alternative to both SMRT and bisulfite sequencing for the determination of prokaryotic 5-methylcytosine methylation motifs. The method, called MFRE-Seq, relies on excision and isolation of fully methylated fragments of predictable size using MspJI-Family Restriction Enzymes (MFREs), which depend on the presence of 5-methylcytosine for cleavage. We demonstrate that MFRE-Seq is compatible with both Illumina and Ion Torrent sequencing platforms and requires only a digestion step and simple column purification of size-selected digest fragments prior to standard library preparation procedures. We applied MFRE-Seq to numerous bacterial and archaeal genomic DNA preparations and successfully confirmed known motifs and identified novel ones. This method should be a useful complement to existing methodologies for studying prokaryotic methylomes and characterizing the contributing methyltransferases.

Download Full-text

Genome-wide identification of 5-methylcytosine sites in bacterial genomes by high-throughput sequencing of MspJI restriction fragments

PLoS ONE ◽

10.1371/journal.pone.0247541 ◽

2021 ◽

Vol 16 (5) ◽

pp. e0247541

Author(s):

Brian P. Anton ◽

Alexey Fomenkov ◽

Victoria Wu ◽

Richard J. Roberts

Keyword(s):

Single Molecule ◽

Dna Sequences ◽

High Throughput Sequencing ◽

Cost Effective ◽

Restriction Enzymes ◽

Specific Sequence ◽

Genome Wide ◽

Cost Effective Alternative ◽

Simple Column ◽

Sequencing Platforms

Single-molecule Real-Time (SMRT) sequencing can easily identify sites of N6-methyladenine and N4-methylcytosine within DNA sequences, but similar identification of 5-methylcytosine sites is not as straightforward. In prokaryotic DNA, methylation typically occurs within specific sequence contexts, or motifs, that are a property of the methyltransferases that “write” these epigenetic marks. We present here a straightforward, cost-effective alternative to both SMRT and bisulfite sequencing for the determination of prokaryotic 5-methylcytosine methylation motifs. The method, called MFRE-Seq, relies on excision and isolation of fully methylated fragments of predictable size using MspJI-Family Restriction Enzymes (MFREs), which depend on the presence of 5-methylcytosine for cleavage. We demonstrate that MFRE-Seq is compatible with both Illumina and Ion Torrent sequencing platforms and requires only a digestion step and simple column purification of size-selected digest fragments prior to standard library preparation procedures. We applied MFRE-Seq to numerous bacterial and archaeal genomic DNA preparations and successfully confirmed known motifs and identified novel ones. This method should be a useful complement to existing methodologies for studying prokaryotic methylomes and characterizing the contributing methyltransferases.

Download Full-text

Improving the Chromosome-Level Genome Assembly of the Siamese Fighting Fish (Betta splendens) in a University Master’s Course

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401205 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2179-2183 ◽

Cited By ~ 1

Author(s):

Stefan Prost ◽

Malte Petersen ◽

Martin Grethlein ◽

Sarah Joy Hahn ◽

Nina Kuschik-Maczollek ◽

...

Keyword(s):

Genome Assembly ◽

High Throughput Sequencing ◽

Siamese Fighting Fish ◽

Betta Splendens ◽

High Quality ◽

Sequencing Platform ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Long Read ◽

Chromosome Level

Ever decreasing costs along with advances in sequencing and library preparation technologies enable even small research groups to generate chromosome-level assemblies today. Here we report the generation of an improved chromosome-level assembly for the Siamese fighting fish (Betta splendens) that was carried out during a practical university master’s course. The Siamese fighting fish is a popular aquarium fish and an emerging model species for research on aggressive behavior. We updated the current genome assembly by generating a new long-read nanopore-based assembly with subsequent scaffolding to chromosome-level using previously published Hi-C data. The use of ∼35x nanopore-based long-read data sequenced on a MinION platform (Oxford Nanopore Technologies) allowed us to generate a baseline assembly of only 1,276 contigs with a contig N50 of 2.1 Mbp, and a total length of 441 Mbp. Scaffolding using the Hi-C data resulted in 109 scaffolds with a scaffold N50 of 20.7 Mbp. More than 99% of the assembly is comprised in 21 scaffolds. The assembly showed the presence of 96.1% complete BUSCO genes from the Actinopterygii dataset indicating a high quality of the assembly. We present an improved full chromosome-level assembly of the Siamese fighting fish generated during a university master’s course. The use of ∼35× long-read nanopore data drastically improved the baseline assembly in terms of continuity. We show that relatively in-expensive high-throughput sequencing technologies such as the long-read MinION sequencing platform can be used in educational settings allowing the students to gain practical skills in modern genomics and generate high quality results that benefit downstream research projects.

Download Full-text

A Comprehensive Coexpression Network Analysis in Vibrio cholerae

mSystems ◽

10.1128/msystems.00550-20 ◽

2020 ◽

Vol 5 (4) ◽

Author(s):

Cory D. DuPai ◽

Claus O. Wilke ◽

Bryan W. Davies

Keyword(s):

Network Analysis ◽

Vibrio Cholerae ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Model Organism ◽

Coexpression Network ◽

Sequencing Data ◽

Content Type ◽

The Impact ◽

Coexpression Network Analysis

ABSTRACT Research into the evolution and pathogenesis of Vibrio cholerae has benefited greatly from the generation of high-throughput sequencing data to drive molecular analyses. The steady accumulation of these data sets now provides a unique opportunity for in silico hypothesis generation via coexpression analysis. Here, we leverage all published V. cholerae RNA sequencing data, in combination with select data from other platforms, to generate a gene coexpression network that validates known gene interactions and identifies novel genetic partners across the entire V. cholerae genome. This network provides direct insights into genes influencing pathogenicity, metabolism, and transcriptional regulation, further clarifies results from previous sequencing experiments in V. cholerae (e.g., transposon insertion sequencing [Tn-seq] and chromatin immunoprecipitation sequencing [ChIP-seq]), and expands upon microarray-based findings in related Gram-negative bacteria. IMPORTANCE Cholera is a devastating illness that kills tens of thousands of people annually. Vibrio cholerae, the causative agent of cholera, is an important model organism to investigate both bacterial pathogenesis and the impact of horizontal gene transfer on the emergence and dissemination of new virulent strains. Despite the importance of this pathogen, roughly one-third of V. cholerae genes are functionally unannotated, leaving large gaps in our understanding of this microbe. Through coexpression network analysis of existing RNA sequencing data, this work develops an approach to uncover novel gene-gene relationships and contextualize genes with no known function, which will advance our understanding of V. cholerae virulence and evolution.

Download Full-text

Nanopore device-based fingerprinting of RNA oligos and microRNAs enhanced with an Osmium tag

Scientific Reports ◽

10.1038/s41598-019-50459-8 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 3

Author(s):

Madiha Sultan ◽

Anastassia Kanavarioti

Keyword(s):

Solid State ◽

Nucleotide Sequence ◽

Nucleic Acids ◽

Rna Sequencing ◽

Single Molecule ◽

Selective Labeling ◽

Base Calling ◽

Alpha Hemolysin ◽

Oxford Nanopore ◽

Single Molecule Analysis

Abstract Protein and solid-state nanopores are used for DNA/RNA sequencing as well as for single molecule analysis. We proposed that selective labeling/tagging may improve base-to-base resolution of nucleic acids via nanopores. We have explored one specific tag, the Osmium tetroxide 2,2′-bipyridine (OsBp), which conjugates to pyrimidines and leaves purines intact. Earlier reports using OsBp-tagged oligodeoxyribonucleotides demonstrated proof-of-principle during unassisted voltage-driven translocation via either alpha-Hemolysin or a solid-state nanopore. Here we extend this work to RNA oligos and a third nanopore by employing the MinION, a commercially available device from Oxford Nanopore Technologies (ONT). Conductance measurements demonstrate that the MinION visibly discriminates oligoriboadenylates with sequence A15PyA15, where Py is an OsBp-tagged pyrimidine. Such resolution rivals traditional chromatography, suggesting that nanopore devices could be exploited for the characterization of RNA oligos and microRNAs enhanced by selective labeling. The data also reveal marked discrimination between a single pyrimidine and two consecutive pyrimidines in OsBp-tagged AnPyAn and AnPyPyAn. This observation leads to the conjecture that the MinION/OsBp platform senses a 2-nucleotide sequence, in contrast to the reported 5-nucleotide sequence with native nucleic acids. Such improvement in sensing, enabled by the presence of OsBp, may enhance base-calling accuracy in enzyme-assisted DNA/RNA sequencing.

Download Full-text

SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data

BioMed Research International ◽

10.1155/2015/780519 ◽

2015 ◽

Vol 2015 ◽

pp. 1-5 ◽

Cited By ~ 2

Author(s):

Yuxiang Tan ◽

Yann Tambouret ◽

Stefano Monti

Keyword(s):

Sample Size ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Performance Metrics ◽

Simulated Data ◽

Real Data ◽

Rna Seq ◽

Sequencing Data ◽

Detection Algorithms ◽

Fusion Detection

The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq) are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions’ background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions’ supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated.

Download Full-text