scholarly journals Numerous recursive sites contribute to accuracy of splicing of long introns in flies

2018 ◽  
Author(s):  
Athma A. Pai ◽  
Joseph Paggi ◽  
Karen Adelman ◽  
Christopher B. Burge

AbstractRecursive splicing, a process by which a single intron is removed from pre-mRNA transcripts in multiple distinct segments, has been observed in a small subset of Drosophila melanogaster introns. However, detection of recursive splicing requires observation of splicing intermediates which are inherently unstable, making it difficult to study. Here we developed new computational approaches to identify recursively spliced introns and applied them, in combination with existing methods, to nascent RNA sequencing data from Drosophila S2 cells. These approaches identified hundreds of novel sites of recursive splicing, expanding the catalog of recursively spliced fly introns by 4-fold. Recursive sites occur in most very long (> 40 kb) fly introns, including many genes involved in morphogenesis and development, and tend to occur near the midpoints of introns. Suggesting a possible function for recursive splicing, we observe that fly introns with recursive sites are spliced more accurately than comparably sized non-recursive introns.

2021 ◽  
Author(s):  
Yixin Zhao ◽  
Noah Dukler ◽  
Gilad Barshad ◽  
Shushan Toneyan ◽  
Charles G. Danko ◽  
...  

AbstractQuantification of mature-RNA isoform abundance from RNA-seq data has been extensively studied, but much less attention has been devoted to quantifying the abundance of distinct precursor RNAs based on nascent RNA sequencing data. Here we address this problem with a new computational method called Deconvolution of Expression for Nascent RNA sequencing data (DENR). DENR models the nascent RNA read counts at each locus as a mixture of user-provided isoforms. The performance of the baseline algorithm is enhanced by the use of machine-learning predictions of transcription start sites (TSSs) and an adjustment for the typical “shape profile” of read counts along a transcription unit. We show using simulated data that DENR clearly outperforms simple read-count-based methods for estimating the abundances of both whole genes and isoforms. By applying DENR to previously published PRO-seq data from K562 and CD4+ T cells, we find that transcription of multiple isoforms per gene is widespread, and the dominant isoform frequently makes use of an internal TSS. We also identify > 200 genes whose dominant isoforms make use of different TSSs in these two cell types. Finally, we apply DENR and StringTie to newly generated PRO-seq and RNA-seq data, respectively, for human CD4+ T cells and CD14+ monocytes, and show that entropy at the pre-RNA level makes a disproportionate contribution to overall isoform diversity, especially across cell types. Altogether, DENR is the first computational tool to enable abundance quantification of pre-RNA isoforms based on nascent RNA sequencing data, and it reveals high levels of pre-RNA isoform diversity in human cells.


2022 ◽  
Vol 3 (1) ◽  
pp. 101036
Author(s):  
Adelina Rabenius ◽  
Sajitha Chandrakumaran ◽  
Lea Sistonen ◽  
Anniina Vihervaara

2021 ◽  
Author(s):  
Dean Light ◽  
Roni Haas ◽  
Mahmoud Yazbak ◽  
Tal Elfand ◽  
Tal Blau ◽  
...  

AbstractAdenosine to inosine (A-to-I) RNA editing, the most prevalent type of RNA editing in metazoans, is carried out by adenosine deaminases (ADARs) in double-stranded RNA regions. Several computational approaches have been recently developed to identify A-to-I RNA editing sites from sequencing data, each addressing a particular issue. Here we present RESIC, an efficient pipeline that combines several approaches for the detection and classification of RNA editing sites. The pipeline can be used for all organisms and can use any number of RNA-sequencing datasets as input. RESIC provides 1. The detection of editing sites in both repetitive and non-repetitive genomic regions; 2. The identification of hyper-edited regions; 3. Optional exclusion of polymorphism sites to increase reliability, based on DNA, and ADAR-mutant RNA sequencing datasets, or SNP databases. We demonstrate the utility of RESIC by applying it to human, successfully overlapping and extending the list of known putative editing sites. We further tested changes in the patterns of A-to-I RNA editing, and RNA abundance of ADAR enzymes, following SARS-CoV-2 infection in human cell lines. Our results suggest that upon SARS-CoV-2 infection, compared to mock, the number of hyper editing sites is increased, and in agreement, the activity of ADAR1, which catalyzes hyper-editing, is enhanced. These results imply the involvement of A-to-I RNA editing in conceiving the unpredicted phenotype of COVID-19 disease. RESIC code is open-source and is easily extendable.


2021 ◽  
Author(s):  
Adam Siepel

AbstractNascent RNA sequencing protocols, such as GRO-seq and PRO-seq, are now widely used in the study of eukaryotic transcription, and these experimental techniques have given rise to a variety of statistical and machine-learning methods for data analysis. These computational methods, however, are generally designed to address specialized signal-processing or prediction tasks, rather than directly describing the dynamics of RNA polymerases as they move along the DNA template. Here, I introduce a general probabilistic model that describes the kinetics of transcription initiation, elongation, pause release, and termination, as well as the generation of sequencing read counts. I show that this generative model enables estimation of separate rates of initiation, pause-release, and termination, up to a proportionality constant. Furthermore, if applied to time-course data in a nonequilibrium setting, the model can be used to estimate elongation rates. This model additionally leads naturally to likelihood ratio tests for differences between genes, conditions, or species in various rates of interest. A version of the model in which read counts are assumed to be Poisson-distributed leads to convenient, closed-form solutions for parameter estimates and likelihood ratio tests. I present extensions to Bayesian inference and to a generalized linear model that can be used to discover genomic features associated with rates of elongation. Finally, I address technicalities concerning estimation of library size, normalization and sequencing replicates. Altogether, this modeling framework enables a unified treatment of many common tasks in the analysis of nascent RNA sequencing data.


2021 ◽  
Vol 12 ◽  
Author(s):  
Cheng-Chih Hsiao ◽  
Roman Sankowski ◽  
Marco Prinz ◽  
Joost Smolders ◽  
Inge Huitinga ◽  
...  

G-protein-coupled receptors (GPCRs) are critical sensors affecting the state of eukaryotic cells. To get systematic insight into the GPCRome of microglia, we analyzed publicly available RNA-sequencing data of bulk and single cells obtained from human and mouse brains. We identified 17 rhodopsin and adhesion family GPCRs robustly expressed in microglia from human brains, including the homeostasis-associated genes CX3CR1, GPR34, GPR183, P2RY12, P2RY13, and ADGRG1. Expression of these microglial core genes was lost upon culture of isolated cells ex vivo but could be acquired by human induced pluripotent stem cell (iPSC)-derived microglial precursors transplanted into mouse brains. CXCR4 and PTGER4 were higher expressed in subcortical white matter compared to cortical grey matter microglia, and ADGRG1 was downregulated in microglia obtained from normal-appearing white and grey matter tissue of multiple sclerosis (MS) brains. Single-cell RNA sequencing of microglia from active lesions, obtained early during MS, revealed downregulation of homeostasis-associated GPCR genes and upregulation of CXCR4 expression in a small subset of MS-associated lesional microglia. Functional presence of low levels of CXCR4 on human microglia was confirmed using flow cytometry and transwell migration towards SDF-1. Microglia abundantly expressed the GPCR down-stream signaling mediator genes GNAI2 (αi2), GNAS (αs), and GNA13 (α13), the latter particularly in white matter. Drugs against several microglia GPCRs are available to target microglia in brain diseases. In conclusion, transcriptome profiling allowed us to identify expression of GPCRs that may contribute to brain (patho)physiology and have diagnostic and therapeutic potential in human microglia.


2021 ◽  
Vol 12 ◽  
Author(s):  
Dean Light ◽  
Roni Haas ◽  
Mahmoud Yazbak ◽  
Tal Elfand ◽  
Tal Blau ◽  
...  

Adenosine to inosine (A-to-I) RNA editing, the most prevalent type of RNA editing in metazoans, is carried out by adenosine deaminases (ADARs) in double-stranded RNA regions. Several computational approaches have been recently developed to identify A-to-I RNA editing sites from sequencing data, each addressing a particular issue. Here, we present RNA Editing Sites Identification and Classification (RESIC), an efficient pipeline that combines several approaches for the detection and classification of RNA editing sites. The pipeline can be used for all organisms and can use any number of RNA-sequencing datasets as input. RESIC provides (1) the detection of editing sites in both repetitive and non-repetitive genomic regions; (2) the identification of hyper-edited regions; and (3) optional exclusion of polymorphism sites to increase reliability, based on DNA, and ADAR-mutant RNA sequencing datasets, or SNP databases. We demonstrate the utility of RESIC by applying it to human, successfully overlapping and extending the list of known putative editing sites. We further tested changes in the patterns of A-to-I RNA editing, and RNA abundance of ADAR enzymes, following SARS-CoV-2 infection in human cell lines. Our results suggest that upon SARS-CoV-2 infection, compared to mock, the number of hyper editing sites is increased, and in agreement, the activity of ADAR1, which catalyzes hyper-editing, is enhanced. These results imply the involvement of A-to-I RNA editing in conceiving the unpredicted phenotype of COVID-19 disease. RESIC code is open-source and is easily extendable.


2021 ◽  
Author(s):  
Adelina Rabenius ◽  
Sajitha Chandrakumaran ◽  
Lea Sistonen ◽  
Anniina Vihervaara

Nascent RNA-sequencing tracks transcription at nucleotide resolution. The genomic distribution of engaged transcription complexes, in turn, uncovers functional genomic regions. Here, we provide data-analytical steps to 1) identify transcribed regulatory elements de novo genome-wide, 2) quantify engaged transcription complexes at enhancers, promoter-proximal regions, divergent transcripts, gene bodies and termination windows, and 3) measure distribution of transcription machineries and regulatory proteins across functional genomic regions. This protocol follows RNA synthesis and genome-regulation in mammals, as demonstrated in human K562 erythroleukemia cells.


Genes ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 120
Author(s):  
Yiyun Sun ◽  
Dandan Xu ◽  
Chundong Zhang ◽  
Yitao Wang ◽  
Lian Zhang ◽  
...  

We previously demonstrated that proline-rich protein 11 (PRR11) and spindle and kinetochore associated 2 (SKA2) constituted a head-to-head gene pair driven by a prototypical bidirectional promoter. This gene pair synergistically promoted the development of non-small cell lung cancer. However, the signaling pathways leading to the ectopic expression of this gene pair remains obscure. In the present study, we first analyzed the lung squamous cell carcinoma (LSCC) relevant RNA sequencing data from The Cancer Genome Atlas (TCGA) database using the correlation analysis of gene expression and gene set enrichment analysis (GSEA), which revealed that the PRR11-SKA2 correlated gene list highly resembled the Hedgehog (Hh) pathway activation-related gene set. Subsequently, GLI1/2 inhibitor GANT-61 or GLI1/2-siRNA inhibited the Hh pathway of LSCC cells, concomitantly decreasing the expression levels of PRR11 and SKA2. Furthermore, the mRNA expression profile of LSCC cells treated with GANT-61 was detected using RNA sequencing, displaying 397 differentially expressed genes (203 upregulated genes and 194 downregulated genes). Out of them, one gene set, including BIRC5, NCAPG, CCNB2, and BUB1, was involved in cell division and interacted with both PRR11 and SKA2. These genes were verified as the downregulated genes via RT-PCR and their high expression significantly correlated with the shorter overall survival of LSCC patients. Taken together, our results indicate that GLI1/2 mediates the expression of the PRR11-SKA2-centric gene set that serves as an unfavorable prognostic indicator for LSCC patients, potentializing new combinatorial diagnostic and therapeutic strategies in LSCC.


Author(s):  
Vincent M. Tutino ◽  
Haley R. Zebraski ◽  
Hamidreza Rajabzadeh-Oghaz ◽  
Lee Chaves ◽  
Adam A. Dmytriw ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document