scholarly journals Yanocomp: robust prediction of m6A modifications in individual nanopore direct RNA reads.

2021 ◽  
Author(s):  
Matthew T Parker ◽  
Geoffrey J Barton ◽  
Gordon G Simpson

Yanocomp is a tool for predicting the positions and stoichiometries of RNA modifications in Nanopore direct RNA sequencing data. It uses general mixture models to identify differentially modified sites between two conditions, with good support for replicates. Yanocomp models across adjacent kmers and uses a uniform component to account for outliers, improving the accuracy of single molecule predictions. Consequently, Yanocomp can be used to measure modification stoichiometry, and correlate modifications with other RNA processing events. Yanocomp is available under an MIT license at www.github.com/bartongroup/yanocomp.

2021 ◽  
Author(s):  
Doaa Hassan ◽  
Daniel Acevedo ◽  
Swapna Vidhur Daulatabad ◽  
Quoseena Mir ◽  
Sarath Chandra Janga

AbstractPseudouridine is one of the most abundant RNA modifications, occurring when uridines are catalyzed by Pseudouridine synthase proteins. It plays an important role in many biological processes and also has an importance in drug development. Recently, the single-molecule sequencing techniques such as the direct RNA sequencing platform offered by Oxford Nanopore technologies enable direct detection of RNA modifications on the molecule that is being sequenced, but to our knowledge this technology has not been used to identify RNA Pseudouridine sites. To this end, in this paper, we address this limitation by introducing a tool called Penguin that integrates several developed machine learning (ML) models (i.e., predictors) to identify RNA Pseudouridine sites in Nanopore direct RNA sequencing reads. Penguin extracts a set of features from the raw signal measured by the Oxford Nanopore and the corresponding basecalled k-mer. Those features are used to train the predictors included in Penguin, which in turn, is able to predict whether the signal is modified by the presence of Pseudouridine sites. We have included various predictors in Penguin including Support vector machine (SVM), Random Forest (RF), and Neural network (NN). The results on the two benchmark data sets show that Penguin is able to identify Pseudouridine sites with a high accuracy of 93.38% and 92.61% using SVM in random split testing and independent validation testing respectively. Thus, Penguin outperforms the existing Pseudouridine predictors in the literature that achieved an accuracy of 76.0 at most with an independent validation testing. A GitHub of the tool is accessible at https://github.com/Janga-Lab/Penguin.


2020 ◽  
Vol 12 (574) ◽  
pp. eabe4282 ◽  
Author(s):  
Ankit Bharat ◽  
Melissa Querrey ◽  
Nikolay S. Markov ◽  
Samuel Kim ◽  
Chitaru Kurihara ◽  
...  

Lung transplantation can potentially be a life-saving treatment for patients with nonresolving COVID-19–associated respiratory failure. Concerns limiting lung transplantation include recurrence of SARS-CoV-2 infection in the allograft, technical challenges imposed by viral-mediated injury to the native lung, and the potential risk for allograft infection by pathogens causing ventilator-associated pneumonia in the native lung. Additionally, the native lung might recover, resulting in long-term outcomes preferable to those of transplant. Here, we report the results of lung transplantation in three patients with nonresolving COVID-19–associated respiratory failure. We performed single-molecule fluorescence in situ hybridization (smFISH) to detect both positive and negative strands of SARS-CoV-2 RNA in explanted lung tissue from the three patients and in additional control lung tissue samples. We conducted extracellular matrix imaging and single-cell RNA sequencing on explanted lung tissue from the three patients who underwent transplantation and on warm postmortem lung biopsies from two patients who had died from COVID-19–associated pneumonia. Lungs from these five patients with prolonged COVID-19 disease were free of SARS-CoV-2 as detected by smFISH, but pathology showed extensive evidence of injury and fibrosis that resembled end-stage pulmonary fibrosis. Using machine learning, we compared single-cell RNA sequencing data from the lungs of patients with late-stage COVID-19 to that from the lungs of patients with pulmonary fibrosis and identified similarities in gene expression across cell lineages. Our findings suggest that some patients with severe COVID-19 develop fibrotic lung disease for which lung transplantation is their only option for survival.


2011 ◽  
Vol 392 (4) ◽  
Author(s):  
Sven Findeiß ◽  
David Langenberger ◽  
Peter F. Stadler ◽  
Steve Hoffmann

Abstract Many aspects of the RNA maturation leave traces in RNA sequencing data in the form of deviations from the reference genomic DNA. This includes, in particular, genomically non-encoded nucleotides and chemical modifications. The latter leave their signatures in the form of mismatches and conspicuous patterns of sequencing reads. Modified mapping procedures focusing on particular types of deviations can help to unravel post-transcriptional modification, maturation and degradation processes. Here, we focus on small RNA sequencing data that is produced in large quantities aimed at the analysis of microRNA expression. Starting from the recovery of many well known modified sites in tRNAs, we provide evidence that modified nucleotides are a pervasive phenomenon in these data sets. Regarding non-encoded nucleotides we concentrate on CCA tails, which surprisingly can be found in a diverse collection of transcripts including sub-populations of mature microRNAs. Although small RNA sequencing libraries alone are insufficient to obtain a complete picture, they can inform on many aspects of the complex processes of RNA maturation.


2019 ◽  
Author(s):  
Luca Cozzuto ◽  
Huanle Liu ◽  
Leszek P. Pryszcz ◽  
Toni Hermoso Pulido ◽  
Julia Ponomarenko ◽  
...  

ABSTRACTThe direct RNA sequencing platform offered by Oxford Nanopore Technologies allows for direct measurement of RNA molecules without the need of conversion to complementary DNA, fragmentation or amplification. As such, it is virtually capable of detecting any given RNA modification present in the molecule that is being sequenced, as well as provide polyA tail length estimations at the level of individual RNA molecules. Although this technology has been publicly available since 2017, the complexity of the raw Nanopore data, together with the lack of systematic and reproducible pipelines, have greatly hindered the access of this technology to the general user. Here we address this problem by providing a fully benchmarked workflow for the analysis of direct RNA sequencing reads, termed MasterOfPores. The pipeline converts raw current intensities into multiple types of processed data, providing metrics of the quality of the run, quality-filtering, base-calling and mapping. The output of the pipeline can in turn be used to compute per-gene counts, RNA modifications, and prediction of polyA tail length and RNA isoforms. The software is written using the NextFlow framework for parallelization and portability, and relies on Linux containers such as Docker and Singularity for achieving better reproducibility. The MasterOfPores workflow can be executed on any Unix-compatible OS on a computer, cluster or cloud without the need of installing any additional software or dependencies, and is freely available in Github (https://github.com/biocorecrg/master_of_pores). This workflow will significantly simplify the analysis of nanopore direct RNA sequencing data by non-bioinformatics experts, thus boosting the understanding of the (epi)transcriptome with single molecule resolution.


2018 ◽  
Author(s):  
Wenhao Tang ◽  
François Bertaux ◽  
Philipp Thomas ◽  
Claire Stefanelli ◽  
Malika Saint ◽  
...  

Normalisation of single cell RNA sequencing (scRNA-seq) data is a prerequisite to their interpretation. The marked technical variability and high amounts of missing observations typical of scRNA-seq datasets make this task particularly challenging. Here, we introduce bayNorm, a novel Bayesian approach for scaling and inference of scRNA-seq counts. The method’s likelihood function follows a binomial model of mRNA capture, while priors are estimated from expression values across cells using an empirical Bayes approach. We demonstrate using publicly-available scRNA-seq datasets and simulated expression data that bayNorm allows robust imputation of missing values generating realistic transcript distributions that match single molecule FISH measurements. Moreover, by using priors informed by dataset structures, bayNorm improves accuracy and sensitivity of differential expression analysis and reduces batch effect compared to other existing methods. Altogether, bayNorm provides an efficient, integrated solution for global scaling normalisation, imputation and true count recovery of gene expression measurements from scRNA-seq data.


2021 ◽  
Author(s):  
Isabel S Naarmann-de Vries ◽  
Christiane Zorbas ◽  
Amina Lemsara ◽  
Maja Bencun ◽  
Sarah Schudy ◽  
...  

The catalytically active component of ribosomes, rRNA, is long studied and heavily modified. However, little is known about functional and pathological consequences of changes in human rRNA modification status. Direct RNA sequencing on the Nanopore platform enables the direct assessment of rRNA modifications. We established a targeted Nanopore direct rRNA sequencing approach and applied it to CRISPR-Cas9 engineered HCT116 cells, lacking specific enzymatic activities required to establish defined rRNA base modifications. We analyzed these sequencing data along with wild type samples and in vitro transcribed reference sequences to specifically detect changes in modification status. We show for the first time that direct RNA-sequencing is feasible on smaller, i.e. Flongle, flow cells. Our targeted approach reduces RNA input requirements, making it accessible to the analysis of limited samples such as patient derived material. The analysis of rRNA modifications during cardiomyocyte differentiation of human induced pluripotent stem cells, and of heart biopsies from cardiomyopathy patients revealed altered modifications of specific sites, among them pseudouridine, 2-O-methylation of ribose and acetylation of cytidine. Targeted direct rRNA-seq analysis with JACUSA2 opens up the possibility to analyze dynamic changes in rRNA modifications in a wide range of biological and clinical samples.


GigaScience ◽  
2020 ◽  
Vol 9 (6) ◽  
Author(s):  
Saber Hafezqorani ◽  
Chen Yang ◽  
Theodora Lo ◽  
Ka Ming Nip ◽  
René L Warren ◽  
...  

Abstract Background Compared with second-generation sequencing technologies, third-generation single-molecule RNA sequencing has unprecedented advantages; the long reads it generates facilitate isoform-level transcript characterization. In particular, the Oxford Nanopore Technology sequencing platforms have become more popular in recent years owing to their relatively high affordability and portability compared with other third-generation sequencing technologies. To aid the development of analytical tools that leverage the power of this technology, simulated data provide a cost-effective solution with ground truth. However, a nanopore sequence simulator targeting transcriptomic data is not available yet. Findings We introduce Trans-NanoSim, a tool that simulates reads with technical and transcriptome-specific features learnt from nanopore RNA-sequncing data. We comprehensively benchmarked Trans-NanoSim on direct RNA and complementary DNA datasets describing human and mouse transcriptomes. Through comparison against other nanopore read simulators, we show the unique advantage and robustness of Trans-NanoSim in capturing the characteristics of nanopore complementary DNA and direct RNA reads. Conclusions As a cost-effective alternative to sequencing real transcriptomes, Trans-NanoSim will facilitate the rapid development of analytical tools for nanopore RNA-sequencing data. Trans-NanoSim and its pre-trained models are freely accessible at https://github.com/bcgsc/NanoSim.


2020 ◽  
Vol 103 (2) ◽  
pp. 843-857 ◽  
Author(s):  
Shengli Yao ◽  
Fan Liang ◽  
Rafaqat Ali Gill ◽  
Junyan Huang ◽  
Xiaohui Cheng ◽  
...  

2019 ◽  
Author(s):  
Adrien Leger ◽  
Paulo P. Amaral ◽  
Luca Pandolfini ◽  
Charlotte Capitanchik ◽  
Federica Capraro ◽  
...  

AbstractRNA molecules undergo a vast array of chemical post-transcriptional modifications (PTMs) that can affect their structure and interaction properties. To date, over 150 naturally occurring PTMs have been identified, however the overwhelming majority of their functions remain elusive. In recent years, a small number of PTMs have been successfully mapped to the transcriptome using experimental approaches relying on high-throughput sequencing. Oxford Nanopore direct-RNA sequencing (DRS) technology has been shown to be sensitive to RNA modifications. We developed and validated Nanocompore, a robust analytical framework to evaluate the presence of modifications in DRS data. To do so, we compare an RNA sample of interest against a non-modified control sample. Our strategy does not require a training set and allows the use of replicates to model biological variability. Here, we demonstrate the ability of Nanocompore to detect RNA modifications at single-molecule resolution in human polyA+ RNAs, as well as in targeted non-coding RNAs. Our results correlate well with orthogonal methods, confirm previous observations on the distribution of N6-methyladenosine sites and provide novel insights into the distribution of RNA modifications in the coding and non-coding transcriptomes. The latest version of Nanocompore can be obtained at https://github.com/tleonardi/nanocompore.


Author(s):  
Felix Grünberger ◽  
Robert Knüppel ◽  
Michael Jüttner ◽  
Martin Fenk ◽  
Andreas Borst ◽  
...  

AbstractThe prokaryotic transcriptome is shaped by transcriptional and posttranscriptional events that define the characteristics of an RNA, including transcript boundaries, the base modification status, and processing pathways to yield mature RNAs. Currently, a combination of several specialised short-read sequencing approaches and additional biochemical experiments are required to describe all transcriptomic features. In this study, we present native RNA sequencing of bacterial (E. coli) and archaeal (H. volcanii, P. furiosus) transcriptomes employing the Oxford Nanopore sequencing technology. Based on this approach, we could address multiple transcriptomic characteristics simultaneously with single-molecule resolution. Taking advantage of long RNA reads provided by the Nanopore platform, we could (re-)annotate large transcriptional units and boundaries. Our analysis of transcription termination sites suggests that diverse termination mechanisms are in place in archaea. Moreover, we shed additional light on the poorly understood rRNA processing pathway in Archaea. One of the key features of native RNA sequencing is that RNA modifications are retained. We could confirm this ability by analysing the well-known KsgA-dependent methylation sites and mapping of N4-acetylcytosines modifications in rRNAs. Notably, we were able to follow the relative timely order of the installation of these modifications in the rRNA processing pathway.


Sign in / Sign up

Export Citation Format

Share Document