scholarly journals Open Problems in Extracellular RNA Data Analysis: Insights From an ERCC Online Workshop

2022 ◽  
Vol 12 ◽  
Author(s):  
Roger P. Alexander ◽  
Robert R Kitchen ◽  
Juan Pablo Tosar ◽  
Matthew Roth ◽  
Pieter Mestdagh ◽  
...  

We now know RNA can survive the harsh environment of biofluids when encapsulated in vesicles or by associating with lipoproteins or RNA binding proteins. These extracellular RNA (exRNA) play a role in intercellular signaling, serve as biomarkers of disease, and form the basis of new strategies for disease treatment. The Extracellular RNA Communication Consortium (ERCC) hosted a two-day online workshop (April 19–20, 2021) on the unique challenges of exRNA data analysis. The goal was to foster an open dialog about best practices and discuss open problems in the field, focusing initially on small exRNA sequencing data. Video recordings of workshop presentations and discussions are available (https://exRNA.org/exRNAdata2021-videos/). There were three target audiences: experimentalists who generate exRNA sequencing data, computational and data scientists who work with those groups to analyze their data, and experimental and data scientists new to the field. Here we summarize issues explored during the workshop, including progress on an effort to develop an exRNA data analysis challenge to engage the community in solving some of these open problems.

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Mariana G. Ferrarini ◽  
Avantika Lal ◽  
Rita Rebollo ◽  
Andreas J. Gruber ◽  
Andrea Guarracino ◽  
...  

AbstractThe novel betacoronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused a worldwide pandemic (COVID-19) after emerging in Wuhan, China. Here we analyzed public host and viral RNA sequencing data to better understand how SARS-CoV-2 interacts with human respiratory cells. We identified genes, isoforms and transposable element families that are specifically altered in SARS-CoV-2-infected respiratory cells. Well-known immunoregulatory genes including CSF2, IL32, IL-6 and SERPINA3 were differentially expressed, while immunoregulatory transposable element families were upregulated. We predicted conserved interactions between the SARS-CoV-2 genome and human RNA-binding proteins such as the heterogeneous nuclear ribonucleoprotein A1 (hnRNPA1) and eukaryotic initiation factor 4 (eIF4b). We also identified a viral sequence variant with a statistically significant skew associated with age of infection, that may contribute to intracellular host–pathogen interactions. These findings can help identify host mechanisms that can be targeted by prophylactics and/or therapeutics to reduce the severity of COVID-19.


2020 ◽  
Vol 117 (42) ◽  
pp. 26520-26530
Author(s):  
Amir K. Foroushani ◽  
Bryan Chim ◽  
Madeline Wong ◽  
Andre Rastegar ◽  
Patrick T. Smith ◽  
...  

The human genome encodes for over 1,500 RNA-binding proteins (RBPs), which coordinate regulatory events on RNA transcripts. Most studies of RBPs have concentrated on their action on host protein-encoding mRNAs, which constitute a minority of the transcriptome. A widely neglected subset of our transcriptome derives from integrated retroviral elements, termed endogenous retroviruses (ERVs), that comprise ∼8% of the human genome. Some ERVs have been shown to be transcribed under physiological and pathological conditions, suggesting that sophisticated regulatory mechanisms to coordinate and prevent their ectopic expression exist. However, it is unknown how broadly RBPs and ERV transcripts directly interact to provide a posttranscriptional layer of regulation. Here, we implemented a computational pipeline to determine the correlation of expression between individual RBPs and ERVs from single-cell or bulk RNA-sequencing data. One of our top candidates for an RBP negatively regulating ERV expression was RNA-binding motif protein 4 (RBM4). We used photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation to demonstrate that RBM4 indeed bound ERV transcripts at CGG consensus elements. Loss of RBM4 resulted in an elevated transcript level of bound ERVs of the HERV-K and -H families, as well as increased expression of HERV-K envelope protein. We pinpointed RBM4 regulation of HERV-K to a CGG-containing element that is conserved in the LTRs of HERV-K-10, -K-11, and -K-20, and validated the functionality of this site using reporter assays. In summary, we systematically identified RBPs that may regulate ERV function and demonstrate a role for RBM4 in controlling ERV expression.


2020 ◽  
Vol 21 (12) ◽  
pp. 4302 ◽  
Author(s):  
Debojyoti Das ◽  
Aniruddha Das ◽  
Mousumi Sahu ◽  
Smruti Sambhav Mishra ◽  
Shaheerah Khan ◽  
...  

Circular RNAs (circRNAs) are a large family of noncoding RNAs that have emerged as novel regulators of gene expression. However, little is known about the function of circRNAs in pancreatic β-cells. Here, transcriptomic analysis of mice pancreatic islet RNA-sequencing data identified 77 differentially expressed circRNAs between mice fed with a normal diet and a high-fat diet. Surprisingly, multiple circRNAs were derived from the intron 2 of the preproinsulin 2 (Ins2) gene and are termed as circular intronic (ci)-Ins2. The expression of ci-Ins2 transcripts in mouse pancreatic islets, and βTC6 cells were confirmed by reverse transcription PCR, DNA sequencing, and RNase R treatment experiments. The level of ci-Ins2 was altered in βTC6 cells upon exposure to elevated levels of palmitate and glucose. Computational analysis predicted the interaction of several RNA-binding proteins with ci-Ins2 and their flanking region, suggesting their role in the ci-Ins2 function or biogenesis. Additionally, bioinformatics analysis predicted the association of several microRNAs with ci-Ins2. Gene ontology and pathway analysis of genes targeted by miRNAs associated with ci-Ins2 suggested the regulation of several key biological processes. Together, our findings indicate that differential expression of circRNAs, especially ci-Ins2 transcripts, may regulate β-cell function and may play a critical role in the development of diabetes.


GigaScience ◽  
2020 ◽  
Vol 9 (11) ◽  
Author(s):  
Florian Heyl ◽  
Daniel Maticzka ◽  
Michael Uhl ◽  
Rolf Backofen

Abstract Background Post-transcriptional regulation via RNA-binding proteins plays a fundamental role in every organism, but the regulatory mechanisms lack important understanding. Nevertheless, they can be elucidated by cross-linking immunoprecipitation in combination with high-throughput sequencing (CLIP-Seq). CLIP-Seq answers questions about the functional role of an RNA-binding protein and its targets by determining binding sites on a nucleotide level and associated sequence and structural binding patterns. In recent years the amount of CLIP-Seq data skyrocketed, urging the need for an automatic data analysis that can deal with different experimental set-ups. However, noncanonical data, new protocols, and a huge variety of tools, especially for peak calling, made it difficult to define a standard. Findings CLIP-Explorer is a flexible and reproducible data analysis pipeline for iCLIP data that supports for the first time eCLIP, FLASH, and uvCLAP data. Individual steps like peak calling can be changed to adapt to different experimental settings. We validate CLIP-Explorer on eCLIP data, finding similar or nearly identical motifs for various proteins in comparison with other databases. In addition, we detect new sequence motifs for PTBP1 and U2AF2. Finally, we optimize the peak calling with 3 different peak callers on RBFOX2 data, discuss the difficulty of the peak-calling step, and give advice for different experimental set-ups. Conclusion CLIP-Explorer finally fills the demand for a flexible CLIP-Seq data analysis pipeline that is applicable to the up-to-date CLIP protocols. The article further shows the limitations of current peak-calling algorithms and the importance of a robust peak detection.


Author(s):  
Jinkai Wang

Abstract Post-transcriptional processing of RNAs plays important roles in a variety of physiological and pathological processes. These processes can be precisely controlled by a series of RNA binding proteins and cotranscriptionally regulated by transcription factors as well as histone modifications. With the rapid development of high-throughput sequencing techniques, multiomics data have been broadly used to study the mechanisms underlying the important biological processes. However, how to use these high-throughput sequencing data to elucidate the fundamental regulatory roles of post-transcriptional processes is still of great challenge. This review summarizes the regulatory mechanisms of post-transcriptional processes and the general principles and approaches to dissect these mechanisms by integrating multiomics data as well as public resources.


2021 ◽  
Author(s):  
Baptiste Kerouanton ◽  
Sebastian Schafer ◽  
Lena Ho ◽  
Sonia Chothani ◽  
Owen JL Rackham

Motivation: The creation and analysis of gene regulatory networks have been the focus of bioinformatic research and underpins much of what is known about gene regulation. However, as a result of a bias in the availability of data-types that are collected, the vast majority of gene regulatory network resources and tools have focused on either transcriptional regulation or protein-protein interactions. This has left other areas of regulation, for instance translational regulation, vastly underrepresented despite them having been shown to play a critical role in both health and disease. Results: In order to address this we have developed CLIPreg, a package that integrates RNA, Ribo and CLIP- sequencing data in order to construct translational regulatory networks coordinated by RNA-binding proteins. This is the first tool of its type to be created, allowing for detailed investigation into a previously unseen layer of regulation.


GigaScience ◽  
2021 ◽  
Vol 10 (6) ◽  
Author(s):  
Florian Heyl ◽  
Rolf Backofen

Abstract Background The prediction of binding sites (peak-calling) is a common task in the data analysis of methods such as cross-linking immunoprecipitation in combination with high-throughput sequencing (CLIP-Seq). The predicted binding sites are often further analyzed to predict sequence motifs or structure patterns. When looking at a typical result of such high-throughput experiments, the obtained peak profiles differ largely on a genomic level. Thus, a tool is missing that evaluates and classifies the predicted peaks on the basis of their shapes. We hereby present StoatyDive, a tool that can be used to filter for specific peak profile shapes of sequencing data such as CLIP. Findings With StoatyDive we are able to classify peak profile shapes from CLIP-seq data of the histone stem-loop-binding protein (SLBP). We compare the results to existing tools and show that StoatyDive finds more distinct peak shape clusters for CLIP data. Furthermore, we present StoatyDive’s capabilities as a quality control tool and as a filter to pick different shapes based on biological or technical questions for other CLIP data from different RNA binding proteins with different biological functions and numbers of RNA recognition motifs. We finally show that proteins involved in splicing, such as RBM22 and U2AF1, have potentially sharper-shaped peaks than other RNA binding proteins. Conclusion StoatyDive finally fills the demand for a peak shape clustering tool for CLIP-Seq data that fine-tunes downstream analysis steps such as structure or sequence motif predictions and that acts as a quality control.


2019 ◽  
Author(s):  
Jack Humphrey ◽  
Nicol Birsa ◽  
Carmelo Milioto ◽  
David Robaldo ◽  
Andrea B Eberle ◽  
...  

AbstractMutations in the RNA-binding protein FUS cause amyotrophic lateral sclerosis (ALS), a devastating neurodegenerative disease in which the loss of motor neurons induces progressive weakness and death from respiratory failure, typically only 3-5 years after onset. FUS plays a role in numerous aspects of RNA metabolism, including mRNA splicing. However, the impact of ALS-causative mutations on splicing has not been fully characterised, as most disease models have been based on FUS overexpression, which in itself alters its RNA processing functions. To overcome this, we and others have recently created knock-in models, and have generated high depth RNA-sequencing data on FUS mutants in parallel to FUS knockout. We combined three independent datasets with a joint modelling approach, allowing us to compare the mutation-induced changes to genuine loss of function. We find that FUS ALS-mutations induce a widespread loss of function on expression and splicing, with a preferential effect on RNA binding proteins. Mutant FUS induces intron retention changes through RNA binding, and we identify an intron retention event in FUS itself that is associated with its autoregulation. Altered FUS regulation has been linked to disease, and intriguingly, we find FUS autoregulation to be altered not only by FUS mutations, but also in other genetic forms of ALS, including those caused by TDP-43, VCP and SOD1 mutations, supporting the concept that multiple ALS genes interact in a regulatory network.


2020 ◽  
Author(s):  
Mariana Ferrarini ◽  
Avantika Lal ◽  
Rita Rebollo ◽  
Andreas Gruber ◽  
Andrea Guarracino ◽  
...  

Abstract The novel betacoronavirus named Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) caused a worldwide pandemic (COVID-19) after initially emerging in Wuhan, China. Here we applied a novel, comprehensive bioinformatic strategy to public RNA sequencing and viral genome sequencing data, to better understand how SARS-CoV-2 interacts with human cells. To our knowledge, this is the first meta-analysis to predict host factors that play a specific role in SARS-CoV-2 pathogenesis, distinct from other respiratory viruses. We identified differentially expressed genes, isoforms and transposable element families specifically altered in SARS-CoV-2 infected cells. Well-known immunoregulators including CSF2, IL-32, IL-6 and SERPINA3 were differentially expressed, while immunoregulatory transposable element families were overexpressed. We predicted conserved interactions between the SARS-CoV-2 genome and human RNA-binding proteins such as hnRNPA1, PABPC1 and eIF4b, which may play important roles in the viral life cycle. We also detected four viral sequence variants in the spike, polymerase, and nonstructural proteins that correlate with severity of COVID-19. The host factors we identified likely represent important mechanisms in the disease profile of this pathogen, and could be targeted by prophylactics and/or therapeutics against SARS-CoV-2.


Sign in / Sign up

Export Citation Format

Share Document