scholarly journals Circall: fast and accurate methodology for discovery of circular RNAs from paired-end RNA-sequencing data

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dat Thanh Nguyen ◽  
Quang Thinh Trac ◽  
Thi-Hau Nguyen ◽  
Ha-Nam Nguyen ◽  
Nir Ohad ◽  
...  

Abstract Background Circular RNA (circRNA) is an emerging class of RNA molecules attracting researchers due to its potential for serving as markers for diagnosis, prognosis, or therapeutic targets of cancer, cardiovascular, and autoimmune diseases. Current methods for detection of circRNA from RNA sequencing (RNA-seq) focus mostly on improving mapping quality of reads supporting the back-splicing junction (BSJ) of a circRNA to eliminate false positives (FPs). We show that mapping information alone often cannot predict if a BSJ-supporting read is derived from a true circRNA or not, thus increasing the rate of FP circRNAs. Results We have developed Circall, a novel circRNA detection method from RNA-seq. Circall controls the FPs using a robust multidimensional local false discovery rate method based on the length and expression of circRNAs. It is computationally highly efficient by using a quasi-mapping algorithm for fast and accurate RNA read alignments. We applied Circall on two simulated datasets and three experimental datasets of human cell-lines. The results show that Circall achieves high sensitivity and precision in the simulated data. In the experimental datasets it performs well against current leading methods. Circall is also substantially faster than the other methods, particularly for large datasets. Conclusions With those better performances in the detection of circRNAs and in computational time, Circall facilitates the analyses of circRNAs in large numbers of samples. Circall is implemented in C++ and R, and available for use at https://www.meb.ki.se/sites/biostatwiki/circall and https://github.com/datngu/Circall.

2020 ◽  
Author(s):  
Zelin Liu ◽  
Huiru Ding ◽  
Jianqi She ◽  
Chunhua Chen ◽  
Weiguang Zhang ◽  
...  

AbstractCircular RNAs (circRNAs) are involved in various biological processes and in disease pathogenesis. However, only a small number of functional circRNAs have been identified among hundreds of thousands of circRNA species, partly because most current methods are based on circular junction counts and overlook the fact that circRNA is formed from the host gene by back-splicing (BS). To distinguish between expression originating from BS and that from the host gene, we present DEBKS, a software program to streamline the discovery of differential BS between two rRNA-depleted RNA sequencing (RNA-seq) sample groups. By applying real and simulated data and employing RT-qPCR for validation, we demonstrate that DEBKS is efficient and accurate in detecting circRNAs with differential BS events between paired and unpaired sample groups. DEBKS is available at https://github.com/yangence/DEBKS as open-source software.


2015 ◽  
Vol 2015 ◽  
pp. 1-5 ◽  
Author(s):  
Yuxiang Tan ◽  
Yann Tambouret ◽  
Stefano Monti

The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq) are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions’ background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions’ supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated.


2018 ◽  
Author(s):  
Verboom Karen ◽  
Everaert Celine ◽  
Bolduc Nathalie ◽  
Livak J. Kenneth ◽  
Yigit Nurten ◽  
...  

AbstractSingle cell RNA sequencing methods have been increasingly used to understand cellular heterogeneity. Nevertheless, most of these methods suffer from one or more limitations, such as focusing only on polyadenylated RNA, sequencing of only the 3’ end of the transcript, an exuberant fraction of reads mapping to ribosomal RNA, and the unstranded nature of the sequencing data. Here, we developed a novel single cell strand-specific total RNA library preparation method addressing all the aforementioned shortcomings. Our method was validated on a microfluidics system using three different cancer cell lines undergoing a chemical or genetic perturbation. We demonstrate that our total RNA-seq method detects an equal or higher number of genes compared to classic polyA[+] RNA-seq, including novel and non-polyadenylated genes. The obtained RNA expression patterns also recapitulate the expected biological signal. Inherent to total RNA-seq, our method is also able to detect circular RNAs. Taken together, SMARTer single cell total RNA sequencing is very well suited for any single cell sequencing experiment in which transcript level information is needed beyond polyadenylated genes.


2021 ◽  
Author(s):  
Kristoffer Vitting-Seerup

RNA-sequencing (RNA-seq) has revolutionized our understanding of molecular and cellular biology. A central cornerstone in the analysis of RNA-seq is the bioinformatic tools that quantify the data. To evaluate the efficacy of these tools, scientists rely heavily on simulation of RNA-seq. Recently Varabyou et al. took simulation of RNA-seq data to the next level by providing simulated data, that includes simulation of transcriptional noise. While this represents a significant step forward in our ability to perform realistic benchmarks of RNA-seq tools, the data provided by Varabyou et al. need refinement. In the following, I suggest a few improvements with a specific focus on splicing noise.


Diagnostics ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 964
Author(s):  
Sarka Benesova ◽  
Mikael Kubista ◽  
Lukas Valihrach

MicroRNAs (miRNAs) are a class of small RNA molecules that have an important regulatory role in multiple physiological and pathological processes. Their disease-specific profiles and presence in biofluids are properties that enable miRNAs to be employed as non-invasive biomarkers. In the past decades, several methods have been developed for miRNA analysis, including small RNA sequencing (RNA-seq). Small RNA-seq enables genome-wide profiling and analysis of known, as well as novel, miRNA variants. Moreover, its high sensitivity allows for profiling of low input samples such as liquid biopsies, which have now found applications in diagnostics and prognostics. Still, due to technical bias and the limited ability to capture the true miRNA representation, its potential remains unfulfilled. The introduction of many new small RNA-seq approaches that tried to minimize this bias, has led to the existence of the many small RNA-seq protocols seen today. Here, we review all current approaches to cDNA library construction used during the small RNA-seq workflow, with particular focus on their implementation in commercially available protocols. We provide an overview of each protocol and discuss their applicability. We also review recent benchmarking studies comparing each protocol’s performance and summarize the major conclusions that can be gathered from their usage. The result documents variable performance of the protocols and highlights their different applications in miRNA research. Taken together, our review provides a comprehensive overview of all the current small RNA-seq approaches, summarizes their strengths and weaknesses, and provides guidelines for their applications in miRNA research.


2018 ◽  
Vol 35 (13) ◽  
pp. 2326-2328 ◽  
Author(s):  
Tobias Jakobi ◽  
Alexey Uvarovskii ◽  
Christoph Dieterich

Abstract Motivation Circular RNAs (circRNAs) originate through back-splicing events from linear primary transcripts, are resistant to exonucleases, are not polyadenylated and have been shown to be highly specific for cell type and developmental stage. CircRNA detection starts from high-throughput sequencing data and is a multi-stage bioinformatics process yielding sets of potential circRNA candidates that require further analyses. While a number of tools for the prediction process already exist, publicly available analysis tools for further characterization are rare. Our work provides researchers with a harmonized workflow that covers different stages of in silico circRNA analyses, from prediction to first functional insights. Results Here, we present circtools, a modular, Python-based framework for computational circRNA analyses. The software includes modules for circRNA detection, internal sequence reconstruction, quality checking, statistical testing, screening for enrichment of RBP binding sites, differential exon RNase R resistance and circRNA-specific primer design. circtools supports researchers with visualization options and data export into commonly used formats. Availability and implementation circtools is available via https://github.com/dieterich-lab/circtools and http://circ.tools under GPLv3.0. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Shaomin Yang ◽  
Hong Zhou ◽  
Ruth Cruz-Cosme ◽  
Mingde Liu ◽  
Jiayu Xu ◽  
...  

ABSTRACTCircular RNAs (circRNAs) encoded by DNA genomes have been identified across host and pathogen species as parts of the transcriptome. Accumulating evidences indicate that circRNAs play critical roles in autoimmune diseases and viral pathogenesis. Here we report that RNA viruses of the Betacoronavirus genus of Coronaviridae, SARS-CoV-2, SARS-CoV and MERS-CoV, encode a novel type of circRNAs. Through de novo circRNA analyses of publicly available coronavirus-infection related deep RNA-Sequencing data, we identified 351, 224 and 2,764 circRNAs derived from SARS-CoV-2, SARS-CoV and MERS-CoV, respectively, and characterized two major back-splice events shared by these viruses. Coronavirus-derived circRNAs are more abundant and longer compared to host genome-derived circRNAs. Using a systematic strategy to amplify and identify back-splice junction sequences, we experimentally identified over 100 viral circRNAs from SARS-CoV-2 infected Vero E6 cells. This collection of circRNAs provided the first line of evidence for the abundance and diversity of coronavirus-derived circRNAs and suggested possible mechanisms driving circRNA biogenesis from RNA genomes. Our findings highlight circRNAs as an important component of the coronavirus transcriptome.SummaryWe report for the first time that abundant and diverse circRNAs are generated by SARS-CoV-2, SARS-CoV and MERS-CoV and represent a novel type of circRNAs that differ from circRNAs encoded by DNA genomes.


Author(s):  
Paul L. Auer ◽  
Rebecca W Doerge

RNA sequencing technology is providing data of unprecedented throughput, resolution, and accuracy. Although there are many different computational tools for processing these data, there are a limited number of statistical methods for analyzing them, and even fewer that acknowledge the unique nature of individual gene transcription. We introduce a simple and powerful statistical approach, based on a two-stage Poisson model, for modeling RNA sequencing data and testing for biologically important changes in gene expression. The advantages of this approach are demonstrated through simulations and real data applications.


2015 ◽  
Vol 61 (1) ◽  
pp. 221-230 ◽  
Author(s):  
Jae Hoon Bahn ◽  
Qing Zhang ◽  
Feng Li ◽  
Tak-Ming Chan ◽  
Xianzhi Lin ◽  
...  

Abstract BACKGROUND Extracellular RNAs (exRNAs) in human body fluids are emerging as effective biomarkers for detection of diseases. Saliva, as the most accessible and noninvasive body fluid, has been shown to harbor exRNA biomarkers for several human diseases. However, the entire spectrum of exRNA from saliva has not been fully characterized. METHODS Using high-throughput RNA sequencing (RNA-Seq), we conducted an in-depth bioinformatic analysis of noncoding RNAs (ncRNAs) in human cell-free saliva (CFS) from healthy individuals, with a focus on microRNAs (miRNAs), piwi-interacting RNAs (piRNAs), and circular RNAs (circRNAs). RESULTS Our data demonstrated robust reproducibility of miRNA and piRNA profiles across individuals. Furthermore, individual variability of these salivary RNA species was highly similar to those in other body fluids or cellular samples, despite the direct exposure of saliva to environmental impacts. By comparative analysis of >90 RNA-Seq data sets of different origins, we observed that piRNAs were surprisingly abundant in CFS compared with other body fluid or intracellular samples, with expression levels in CFS comparable to those found in embryonic stem cells and skin cells. Conversely, miRNA expression profiles in CFS were highly similar to those in serum and cerebrospinal fluid. Using a customized bioinformatics method, we identified >400 circRNAs in CFS. These data represent the first global characterization and experimental validation of circRNAs in any type of extracellular body fluid. CONCLUSIONS Our study provides a comprehensive landscape of ncRNA species in human saliva that will facilitate further biomarker discoveries and lay a foundation for future studies related to ncRNAs in human saliva.


2018 ◽  
Author(s):  
Xianwen Ren ◽  
Liangtao Zheng ◽  
Zemin Zhang

ABSTRACTClustering is a prevalent analytical means to analyze single cell RNA sequencing data but the rapidly expanding data volume can make this process computational challenging. New methods for both accurate and efficient clustering are of pressing needs. Here we proposed a new clustering framework based on random projection and feature construction for large scale single-cell RNA sequencing data, which greatly improves clustering accuracy, robustness and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, our method reached 20% improvements for clustering accuracy and 50-fold acceleration but only consumed 66% memory usage compared to the widely-used software package SC3. Compared to k-means, the accuracy improvement can reach 3-fold depending on the concrete dataset. An R implementation of the framework is available from https://github.com/Japrin/sscClust.


Sign in / Sign up

Export Citation Format

Share Document