scholarly journals isoCirc catalogs full-length circular RNA isoforms in human transcriptomes

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Ruijiao Xin ◽  
Yan Gao ◽  
Yuan Gao ◽  
Robert Wang ◽  
Kathryn E. Kadash-Edmondson ◽  
...  

AbstractCircular RNAs (circRNAs) have emerged as an important class of functional RNA molecules. Short-read RNA sequencing (RNA-seq) is a widely used strategy to identify circRNAs. However, an inherent limitation of short-read RNA-seq is that it does not experimentally determine the full-length sequences and exact exonic compositions of circRNAs. Here, we report isoCirc, a strategy for sequencing full-length circRNA isoforms, using rolling circle amplification followed by nanopore long-read sequencing. We describe an integrated computational pipeline to reliably characterize full-length circRNA isoforms using isoCirc data. Using isoCirc, we generate a comprehensive catalog of 107,147 full-length circRNA isoforms across 12 human tissues and one human cell line (HEK293), including 40,628 isoforms ≥500 nt in length. We identify widespread alternative splicing events within the internal part of circRNAs, including 720 retained intron events corresponding to a class of exon-intron circRNAs (EIciRNAs). Collectively, isoCirc and the companion dataset provide a useful strategy and resource for studying circRNAs in human transcriptomes.

2020 ◽  
Vol 48 (18) ◽  
pp. e104-e104 ◽  
Author(s):  
Jingwen Wang ◽  
Bingnan Li ◽  
Sueli Marques ◽  
Lars M Steinmetz ◽  
Wu Wei ◽  
...  

Abstract Eukaryotic transcriptomes are complex, involving thousands of overlapping transcripts. The interleaved nature of the transcriptomes limits our ability to identify regulatory regions, and in some cases can lead to misinterpretation of gene expression. To improve the understanding of the overlapping transcriptomes, we have developed an optimized method, TIF-Seq2, able to sequence simultaneously the 5′ and 3′ ends of individual RNA molecules at single-nucleotide resolution. We investigated the transcriptome of a well characterized human cell line (K562) and identified thousands of unannotated transcript isoforms. By focusing on transcripts which are challenging to be investigated with RNA-Seq, we accurately defined boundaries of lowly expressed unannotated and read-through transcripts putatively encoding fusion genes. We validated our results by targeted long-read sequencing and standard RNA-Seq for chronic myeloid leukaemia patient samples. Taking the advantage of TIF-Seq2, we explored transcription regulation among overlapping units and investigated their crosstalk. We show that most overlapping upstream transcripts use poly(A) sites within the first 2 kb of the downstream transcription units. Our work shows that, by paring the 5′ and 3′ end of each RNA, TIF-Seq2 can improve the annotation of complex genomes, facilitate accurate assignment of promoters to genes and easily identify transcriptionally fused genes.


Author(s):  
Akihito Otsuki ◽  
Yasunobu Okamura ◽  
Yuichi Aoki ◽  
Noriko Ishida ◽  
Kazuki Kumada ◽  
...  

Our body responds to environmental stress by changing the expression levels of a series of cytoprotective enzymes/proteins through multilayered regulatory mechanisms, including the KEAP1-NRF2 system. While NRF2 upregulates the expression of many cytoprotective genes, there are fundamental limitations in short-read RNA sequencing (RNA-Seq), resulting in confusion regarding interpreting the effectiveness of cytoprotective gene induction at transcript level. To precisely delineate isoform usage in the stress response, we conducted independent full-length transcriptome profiling (isoform sequencing; Iso-Seq) analyses of lymphoblastoid cells from three volunteers under normal and electrophilic stress-induced conditions. We first determined the first exon usage in KEAP1 and NFE2L2 (encoding NRF2) and found the presence of transcript diversity. We then examined changes in isoform usage of NRF2 target genes under stress conditions and identified a few isoforms dominantly expressed in the majority of NRF2 target genes. The expression levels of isoforms determined by Iso-Seq analyses showed striking differences from those determined by short-read RNA-Seq; the latter could be misleading in regards to the abundance of transcripts. These results support that transcript usage is tightly regulated to produce functional proteins under electrophilic stress. Our present study strongly argues that there are important benefits that can be achieved by long-read transcriptome sequencing.


2019 ◽  
Author(s):  
Jingwen Wang ◽  
Bingnan Li ◽  
Sueli Marques ◽  
Lars M. Steinmetz ◽  
Wu Wei ◽  
...  

ABSTRACTEukaryotic transcriptomes are complex involving thousands of overlapping transcripts. The interleaved nature of the transcriptome limits our ability to identify regulatory regions and, in some cases, can lead to misinterpretation of gene expression. To improve the understanding of the overlapping transcriptome, we have developed an optimized method, TIF-Seq2, able to sequence simultaneously the 5’ and 3’ ends of individual RNA molecules at single-nucleotide resolution. We investigated the transcriptome of a well characterized human cell line (K562) and identify thousands of unannotated transcript isoforms. By focusing on transcripts which are challenging to be investigated with RNA-seq, we accurately defined boundaries of lowly expressed unannotated and read-though transcripts putatively encoding fusion genes. We validated our results by targeted long-read sequencing and standard RNA-Seq for chronic myeloid leukaemia patient samples. Taking the advantage of TIF-Seq2, we explore transcription regulation among the overlapping units and investigate their crosstalk. We show that most overlapping upstream transcripts use poly(A) sites within the first 2 kb of the downstream transcription unit. Our work shows that, by paring the 5’ and 3’ end of each RNA, TIF-Seq2 can improve the annotation of complex genomes, facilitates accurate assignment of promoters to genes and easily identify transcriptionally fused genes.Key pointsStudy of TSS-PAS co-occurrence allows dissecting complex overlapping transcription units.Partially overlapping transcription units in human commonly use PAS within the first 2Kb.TIF-Seq2 facilitates the identification of lowly expressed and transcriptionally fused genes.


Life ◽  
2022 ◽  
Vol 12 (1) ◽  
pp. 103
Author(s):  
Benjamin D. Lee ◽  
Eugene V. Koonin

Viroids are a unique class of plant pathogens that consist of small circular RNA molecules, between 220 and 450 nucleotides in size. Viroids encode no proteins and are the smallest known infectious agents. Viroids replicate via the rolling circle mechanism, producing multimeric intermediates which are cleaved to unit length either by ribozymes formed from both polarities of the viroid genomic RNA or by coopted host RNAses. Many viroid-like small circular RNAs are satellites of plant RNA viruses. Ribozyviruses, represented by human hepatitis delta virus, are larger viroid-like circular RNAs that additionally encode the viral nucleocapsid protein. It has been proposed that viroids are direct descendants of primordial RNA replicons that were present in the hypothetical RNA world. We argue, however, that much later origin of viroids, possibly, from recently discovered mobile genetic elements known as retrozymes, is a far more parsimonious evolutionary scenario. Nevertheless, viroids and viroid-like circular RNAs are minimal replicators that are likely to be close to the theoretical lower limit of replicator size and arguably comprise the paradigm for replicator emergence. Thus, although viroid-like replicators are unlikely to be direct descendants of primordial RNA replicators, the study of the diversity and evolution of these ultimate genetic parasites can yield insights into the earliest stages of the evolution of life.


2021 ◽  
Author(s):  
Valentin Waschulin ◽  
Chiara Borsetto ◽  
Robert James ◽  
Kevin K. Newsham ◽  
Stefano Donadio ◽  
...  

AbstractThe growing problem of antibiotic resistance has led to the exploration of uncultured bacteria as potential sources of new antimicrobials. PCR amplicon analyses and short-read sequencing studies of samples from different environments have reported evidence of high biosynthetic gene cluster (BGC) diversity in metagenomes, indicating their potential for producing novel and useful compounds. However, recovering full-length BGC sequences from uncultivated bacteria remains a challenge due to the technological restraints of short-read sequencing, thus making assessment of BGC diversity difficult. Here, long-read sequencing and genome mining were used to recover >1400 mostly full-length BGCs that demonstrate the rich diversity of BGCs from uncultivated lineages present in soil from Mars Oasis, Antarctica. A large number of highly divergent BGCs were not only found in the phyla Acidobacteriota, Verrucomicrobiota and Gemmatimonadota but also in the actinobacterial classes Acidimicrobiia and Thermoleophilia and the gammaproteobacterial order UBA7966. The latter furthermore contained a potential novel family of RiPPs. Our findings underline the biosynthetic potential of underexplored phyla as well as unexplored lineages within seemingly well-studied producer phyla. They also showcase long-read metagenomic sequencing as a promising way to access the untapped genetic reservoir of specialised metabolite gene clusters of the uncultured majority of microbes.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Xueyi Dong ◽  
Luyi Tian ◽  
Quentin Gouil ◽  
Hasaru Kariyawasam ◽  
Shian Su ◽  
...  

Abstract Application of Oxford Nanopore Technologies’ long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs (‘sequins’) as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.


2020 ◽  
Author(s):  
Zelin Liu ◽  
Huiru Ding ◽  
Jianqi She ◽  
Chunhua Chen ◽  
Weiguang Zhang ◽  
...  

AbstractCircular RNAs (circRNAs) are involved in various biological processes and in disease pathogenesis. However, only a small number of functional circRNAs have been identified among hundreds of thousands of circRNA species, partly because most current methods are based on circular junction counts and overlook the fact that circRNA is formed from the host gene by back-splicing (BS). To distinguish between expression originating from BS and that from the host gene, we present DEBKS, a software program to streamline the discovery of differential BS between two rRNA-depleted RNA sequencing (RNA-seq) sample groups. By applying real and simulated data and employing RT-qPCR for validation, we demonstrate that DEBKS is efficient and accurate in detecting circRNAs with differential BS events between paired and unpaired sample groups. DEBKS is available at https://github.com/yangence/DEBKS as open-source software.


Author(s):  
Catherine D. Aimone ◽  
J. Steen Hoyer ◽  
Anna E. Dye ◽  
David O. Deppong ◽  
Siobain Duffy ◽  
...  

AbstractWe present an optimized protocol for enhanced amplification and enrichment of viral DNA for Next Generation Sequencing of begomovirus genomes. The rapid ability of these viruses to evolve threatens many crops and underscores the importance of using next generation sequencing efficiently to detect and understand the diversity of these viruses. We combined enhanced rolling circle amplification (RCA) with EquiPhi29 polymerase and size selection to generate a cost-effective, short-read sequencing method. This optimized protocol produced short-read sequencing with at least 50% of the reads mapping to the viral reference genome. We provide other insights into common misconceptions about RCA and lessons we have learned from sequencing single-stranded DNA viruses. Our protocol can be used to examine viral DNA as it moves through the entire pathosystem from host to vector, providing valuable information for viral DNA population studies, and would likely work well with other CRESS DNA viruses.HighlightsProtocol for short-read, high throughput sequencing of single-stranded DNA viruses using random primersComparison of the sequencing of total DNA versus size-selected DNAComparison of phi29 and Equiphi29 DNA polymerases for rolling circle amplification of viral single-stranded DNA genomes


2018 ◽  
Vol 68 ◽  
pp. S762 ◽  
Author(s):  
A. Mcnaughton ◽  
D. Bonsall ◽  
M.D. Cesare ◽  
A. Brown ◽  
D. Parkes ◽  
...  

2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Md. Tofazzal Hossain ◽  
Yin Peng ◽  
Shengzhong Feng ◽  
Yanjie Wei

Circular RNAs (circRNAs) are formed by joining the 3′ and 5′ ends of RNA molecules. Identification of circRNAs is an important part of circRNA research. The circRNA prediction methods can predict the circRNAs with start and end positions in the chromosome but cannot identify the full-length circRNA sequences. We present an R package FcircSEC (Full Length circRNA Sequence Extraction and Classification) to extract the full-length circRNA sequences based on gene annotation and the output of any circRNA prediction tools whose output has a chromosome, start and end positions, and a strand for each circRNA. To validate FcircSEC, we have used three databases, circbase, circRNAdb, and plantcircbase. With information such as the chromosome and strand of each circRNA as the input, the identified sequences by FcircSEC are consistent with the databases. The novelty of FcircSEC is that it can take the output of state-of-the-art circRNA prediction tools as input and is applicable for human and other species. We also classify the circRNAs as exonic, intronic, and others. The R package FcircSEC is freely available.


Sign in / Sign up

Export Citation Format

Share Document