scholarly journals Detecting Circular RNA from High-throughput Sequence Data with de Bruijn Graph

2019 ◽  
Author(s):  
Xin Li ◽  
Yufeng Wu

AbstractCircular RNA is a type of non-coding RNA, which has a circular structure. Many circular RNAs are stable and contain exons, but are not translated into proteins. Circular RNA has important functions in gene regulation and plays an important role in some human diseases. Several biological methods, such as RNase R treatment, have been developed to identify circular RNA. Multiple bioinformatics tools have also been developed for circular RNA detection with high-throughput sequence data. In this paper, we present circDBG, a new method for circular RNA detection with de Bruijn graph. We conduct various experiments to evaluate the performance of CircDBG based on both simulated and real data. Our results show that CircDBG finds more reliable cir-cRNAs with low bias, has more efficiency in running time, and performs better in balancing accuracy and sensitivity than existing methods. As a byproduct, we also introduce a new method to classify circular RNAs based on reads alignment. Finally, we report a potential chimeric circular RNA that is found by CircDBG based on real sequence data. CircDBG can be downloaded from https://github.com/lxwgcool/CircDBG.

2019 ◽  
Author(s):  
Camille Marchet ◽  
Mael Kerbiriou ◽  
Antoine Limasset

AbstractMotivationA plethora of methods and applications share the fundamental need to associate information to words for high throughput sequence analysis. Indexing billions of k-mers is promptly a scalability problem, as exact associative indexes can be memory expensive. Recent works take advantage of the properties of the k-mer sets to leverage this challenge. They exploit the overlaps shared among k-mers by using a de Bruijn graph as a compact k-mer set to provide lightweight structures.ResultsWe present Blight, a static and exact index structure able to associate unique identifiers to indexed k-mers and to reject alien k-mers that scales to the largest kmer sets with a low memory cost. The proposed index combines an extremely compact representation along with very high throughput. Besides, its construction from the de Bruijn graph sequences is efficient and does not need supplementary memory. The efficient index implementation achieves to index the k-mers from the human genome with 8GB within 10 minutes and can scale up to the large axolotl genome with 63 GB within 76 minutes. Furthermore, while being memory efficient, the index allows above a million queries per second on a single CPU in our experiments, and the use of multiple cores raises its throughput. Finally, we also present how the index can practically represent metagenomic and transcriptomic sequencing data to highlight its wide applicative range.AvailabilityThe index is implemented as a C++ library, is open source under AGPL3 license, and available at github.com/Malfoy/Blight. It is designed as a user-friendly library and comes along with samples code usage.


2015 ◽  
Vol 13 (02) ◽  
pp. 1550008
Author(s):  
Farhad Hormozdiari ◽  
Eleazar Eskin

The ability to detect the genetic variations between two individuals is an essential component for genetic studies. In these studies, obtaining the genome sequence of both individuals is the first step toward variation detection problem. The emergence of high-throughput sequencing (HTS) technology has made DNA sequencing practical, and is widely used by diagnosticians to increase their knowledge about the casual factor in genetic related diseases. As HTS advances, more data are generated every day than the amount that scientists can process. Genome assembly is one of the existing methods to tackle the variation detection problem. The de Bruijn graph formulation of the assembly problem is widely used in the field. Furthermore, it is the only method which can assemble any genome in linear time. However, it requires an enormous amount of memory in order to assemble any mammalian size genome. The high demands of sequencing more individuals and the urge to assemble them are the driving forces for a memory efficient assembler. In this work, we propose a novel method which builds the de Bruijn graph while consuming lower memory. Moreover, our proposed method can reduce the memory usage by 37% compared to the existing methods. In addition, we used a real data set (chromosome 17 of A/J strain) to illustrate the performance of our method.


2019 ◽  
Vol 35 (18) ◽  
pp. 3250-3256 ◽  
Author(s):  
Kingshuk Mukherjee ◽  
Bahar Alipanahi ◽  
Tamer Kahveci ◽  
Leena Salmela ◽  
Christina Boucher

Abstract Motivation Optical maps are high-resolution restriction maps (Rmaps) that give a unique numeric representation to a genome. Used in concert with sequence reads, they provide a useful tool for genome assembly and for discovering structural variations and rearrangements. Although they have been a regular feature of modern genome assembly projects, optical maps have been mainly used in post-processing step and not in the genome assembly process itself. Several methods have been proposed for pairwise alignment of single molecule optical maps—called Rmaps, or for aligning optical maps to assembled reads. However, the problem of aligning an Rmap to a graph representing the sequence data of the same genome has not been studied before. Such an alignment provides a mapping between two sets of data: optical maps and sequence data which will facilitate the usage of optical maps in the sequence assembly step itself. Results We define the problem of aligning an Rmap to a de Bruijn graph and present the first algorithm for solving this problem which is based on a seed-and-extend approach. We demonstrate that our method is capable of aligning 73% of Rmaps generated from the Escherichia coli genome to the de Bruijn graph constructed from short reads generated from the same genome. We validate the alignments and show that our method achieves an accuracy of 99.6%. We also show that our method scales to larger genomes. In particular, we show that 76% of Rmaps can be aligned to the de Bruijn graph in the case of human data. Availability and implementation The software for aligning optical maps to de Bruijn graph, omGraph is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/omGraph. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 (8) ◽  
Author(s):  
Chen Yang ◽  
Zezhong Mou ◽  
Siqi Wu ◽  
Yuxi Ou ◽  
Zheyu Zhang ◽  
...  

AbstractBladder cancer (BC) is known as a common and lethal urinary malignancy worldwide. Circular RNAs (circRNAs), an emerging non-coding RNA, participate in carcinogenesis process of several cancers including BC. In this study, high-throughput sequencing and RT-qPCR were applied to discover and validate abnormal high expression of circUBE2K in BC tissues. Fluorescence in situ hybridization (FISH) was used to detect hsa_circ_0009154 (circUBE2K) expression and subcellular localization in BC tissues. High circUBE2K predicted unfavorable prognoses in BCs, as well as correlated with clinical features. CCK8, transwell, EdU and wound healing assays demonstrated down-regulating circUBE2K decreased BC cell phenotype as proliferation, invasion, and migration, respectively. Further studies showed that circUBE2K promoted BC progression via sponging miR-516b-5p and enhancing ARHGAP5 expression through regulating RhoA activity. Dual-luciferase reporter, FISH and RNA pulldown assays were employed to verify the relationships among circUBE2K/miR-516b-5p/ARHGAP5/RhoA axis. Down-regulating miR-516b-5p or overexpressing ARHGAP5 restored RhoA activity mediated BC cell properties after silencing circUBE2K. Subcutaneous xenograft and metastasis model identified circUBE2K significantly increased BC cell metastasis and proliferation in-vivo. Taken together, we found that circUBE2K is a tumor-promoting circRNA in BC that functions as a ceRNA to regulate ARHGAP5 expression via sponging miR-516b-5p.


2018 ◽  
Vol 45 (2) ◽  
pp. 677-691 ◽  
Author(s):  
Jiaxin Li ◽  
Haijun Lin ◽  
Zhenrong Sun ◽  
Guanyi Kong ◽  
Xu Yan ◽  
...  

Background/Aims: Circular RNAs (circRNAs) are a class of long noncoding RNAs with a closed loop structure that regulate gene expression as microRNA sponges. CircRNAs are more enriched in brain tissue, but knowledge of the role of circRNAs in temporal lobe epilepsy (TLE) has remained limited. This study is the first to identify the global expression profiles and characteristics of circRNAs in human temporal cortex tissue from TLE patients. Methods: Temporal cortices were collected from 17 TLE patients and 17 non-TLE patients. Total RNA was isolated, and high-throughput sequencing was used to profile the transcriptome of dysregulated circRNAs. Quantitative PCR was performed for the validation of changed circRNAs. Results: In total, 78983 circRNAs, including 15.29% known and 84.71% novel circRNAs, were detected in this study. Intriguingly, 442 circRNAs were differentially expressed between the TLE and non-TLE groups (fold change≥2.0 and FDR≤0.05). Of these circRNAs, 188 were up-regulated, and 254 were down-regulated in the TLE patient group. Eight circRNAs were validated by real-time PCR. Remarkably, circ-EFCAB2 was intensely up-regulated, while circ-DROSHA expression was significantly lower in the TLE group than in the non-TLE group (P<0.05). Bioinformatic analysis revealed that circ-EFCAB2 binds to miR-485-5p to increase the expression level of the ion channel CLCN6, while circ-DROSHA interacts with miR-1252-5p to decrease the expression level of ATP1A2. Conclusions: The dysregulations of circRNAs may reflect the pathogenesis of TLE and circ-EFCAB2 and circ-DROSHA might be potential therapeutic targets and biomarkers in TLE patients.


2015 ◽  
Vol 16 (1) ◽  
Author(s):  
Gaëtan Benoit ◽  
Claire Lemaitre ◽  
Dominique Lavenier ◽  
Erwan Drezen ◽  
Thibault Dayris ◽  
...  

2021 ◽  
Author(s):  
Thomas Krannich ◽  
Walton Timothy James White ◽  
Sebastian Niehus ◽  
Guillaume Holley ◽  
Bjarni Halldorsson ◽  
...  

With the increasing throughput of sequencing technologies, structural variant (SV) detection has become possible across ten of thousands of genomes. Non-reference sequence (NRS) variants have drawn less attention compared to other types of SVs due to the computational complexity of detecting them. When using short-read data the detection of NRS variants inevitably involves a de novo assembly which requires high-quality sequence data at high coverage. Previous studies have demonstrated how sequence data of multiple genomes can be combined for the reliable detection of NRS variants. However, the algorithms proposed in these studies have limited scalability to larger sets of genomes. We introduce PopIns2, a tool to discover and characterize NRS variants in many genomes, which scales to considerably larger numbers of genomes than its predecessor PopIns. In this article, we briefly outline the workflow of PopIns and highlight the novel algorithmic contributions. We developed an entirely new approach for merging contig assemblies of unaligned reads from many genomes into a single set of NRS using a colored de Bruijn graph. Our tests on simulated data indicate that the new merging algorithm ranks among the best approaches in terms of quality and reliability and that PopIns2 shows the best precision for a growing number of genomes processed. Results on the Polaris Diversity Cohort and a set of 1000 Icelandic human genomes demonstrate unmatched scalability for the application on population-scale datasets.


Sign in / Sign up

Export Citation Format

Share Document