scholarly journals The importance of incorporating transcript information in CLIP-seq data analysis

2020 ◽  
Author(s):  
Michael Uhl ◽  
Dinh Van Tran ◽  
Rolf Backofen

Abstract Background: Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue. Results: Here we show that current peak callers are susceptible to false peak calling near exon borders. We further quantify its extent in publicly available datasets, which turns out to be substantial. Finally, by providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we demonstrate that context choice also affects the performances of RBP binding site prediction tools. Conclusions: Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.

2020 ◽  
Author(s):  
Michael Uhl ◽  
Dinh Van Tran ◽  
Rolf Backofen

Abstract Background: Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue. Results: Here we show that current peak callers are susceptible to false peak calling near exon borders. We further quantify its extent in publicly available datasets, which turns out to be substantial. Finally, by providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we demonstrate that context choice also affects the performances of RBP binding site prediction tools. Conclusions: Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Michael Uhl ◽  
Van Dinh Tran ◽  
Rolf Backofen

Abstract Background Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue. Results Here we show that current peak callers are susceptible to false peak calling near exon borders. We quantify its extent in publicly available datasets, which turns out to be substantial. By providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we further demonstrate that context choice affects the performances of RBP binding site prediction tools. Moreover, we show that known motifs of exon-binding RBPs are often enriched in transcript context sites, which should enable the recovery of more authentic binding sites. Finally, we discuss possible strategies on how to integrate transcript information into future workflows. Conclusions Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.


2020 ◽  
Author(s):  
Michael Uhl ◽  
Van Dinh Tran ◽  
Rolf Backofen

Abstract Background: Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue. Results: Here we show that current peak callers are susceptible to false peak calling near exon borders. We quantify its extent in publicly available datasets, which turns out to be substantial. By providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we further demonstrate that context choice affects the performances of RBP binding site prediction tools. Moreover, we show that known motifs of exon-binding RBPs are often enriched in transcript context sites, which should enable the recovery of more authentic binding sites. Finally, we discuss possible strategies on how to integrate transcript information into future workflows. Conclusions: Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.


2016 ◽  
Author(s):  
Yaron Orenstein ◽  
Raghavendra Hosur ◽  
Sean Simmons ◽  
Jadwiga Bienkoswka ◽  
Bonnie Berger

We report a newly-identified bias in CLIP data that results from cleaving enzyme specificity. This bias is inadvertently incorporated into standard peak calling methods [1], which identify the most likely locations where proteins bind RNA. We further show how, in downstream analysis, this bias is incorporated into models inferred by the state-of-the-art GraphProt method to predict protein RNA-binding. We call for both experimental controls to measure enzyme specificities and algorithms to identify unbiased CLIP binding sites.


2011 ◽  
Vol 286 (43) ◽  
pp. 37063-37066 ◽  
Author(s):  
Philip J. Uren ◽  
Suzanne C. Burns ◽  
Jianhua Ruan ◽  
Kusum K. Singh ◽  
Andrew D. Smith ◽  
...  

2019 ◽  
Vol 4 (Spring 2019) ◽  
Author(s):  
Alexa Vandenburg

The Norris lab recently identified two RNA binding proteins required for proper neuron-specific splicing. The lab conducted touch- response behavioral assays to assess the function of these proteins in touch-sensing neurons. After isolating C. elegans worms with specific phenotypes, the lab used automated computer tracking and video analysis to record the worms’ behavior. The behavior of mutant worms differed from that of wild-type worms. The Norris lab also discovered two possible RNA binding protein sites in SAD-1, a neuronal gene implicated in the neuronal development of C. elegans1. These two binding sites may control the splicing of SAD-1. The lab transferred mutated DNA into the genome of wild-type worms by injecting a mutated plasmid. The newly transformed worms fluoresced green, indicating that the two binding sites control SAD-1 splicing.


Author(s):  
Noa Katz ◽  
Eitamar Tripto ◽  
Sarah Goldberg ◽  
Orna Atar ◽  
Zohar Yakhini ◽  
...  

AbstractThe design-build-test (DBT) cycle in synthetic biology is considered to be a major bottleneck for progress in the field. The emergence of high-throughput experimental techniques, such as oligo libraries (OLs), combined with machine learning (ML) algorithms, provide the ingredients for a potential “big-data” solution that can generate a sufficient predictive capability to overcome the DBT bottleneck. In this work, we apply the OL-ML approach to the design of RNA cassettes used in gene editing and RNA tracking systems. RNA cassettes are typically made of repetitive hairpins, therefore hindering their retention, synthesis, and functionality. Here, we carried out a high-throughput OL-based experiment to generate thousands of new binding sites for the phage coat proteins of bacteriophages MS2 (MCP), PP7 (PCP), and Qβ (QCP). We then applied a neural network to vastly expand this space of binding sites to millions of additional predicted sites, which allowed us to identify the structural and sequence features that are critical for the binding of each RBP. To verify our approach, we designed new non-repetitive binding site cassettes and tested their functionality in U2OS mammalian cells. We found that all our cassettes exhibited multiple trackable puncta. Additionally, we designed and verified two additional cassettes, the first containing sites that can bind both PCP and QCP, and the second with sites that can bind either MCP or QCP, allowing for an additional orthogonal channel. Consequently, we provide the scientific community with a novel resource for rapidly creating functional non-repetitive binding site cassettes using one or more of three phage coat proteins with a variety of binding affinities for any application spanning bacteria to mammalian cells.


2020 ◽  
Author(s):  
Clémentine Delan-Forino ◽  
Christos Spanos ◽  
Juri Rappsilber ◽  
David Tollervey

ABSTRACTDuring nuclear surveillance in yeast, the RNA exosome functions together with the TRAMP complexes. These include the DEAH-box RNA helicase Mtr4 together with an RNA-binding protein (Air1 or Air2) and a poly(A) polymerase (Trf4 or Trf5). To better determine how RNA substrates are targeted, we analyzed protein and RNA interactions for TRAMP components. Mass spectrometry identified three distinct TRAMP complexes formed in vivo. These complexes preferentially assemble on different classes of transcripts. Unexpectedly, on many substrates, including pre-rRNAs and pre-mRNAs, binding specificity was apparently conferred by Trf4 and Trf5. Clustering of mRNAs by TRAMP association showed co-enrichment for mRNAs with functionally related products, supporting the significance of surveillance in regulating gene expression. We compared binding sites of TRAMP components with multiple nuclear RNA binding proteins, revealing preferential colocalization of subsets of factors. TRF5 deletion reduced Mtr4 recruitment and increased RNA abundance for mRNAs specifically showing high Trf5 binding.


GigaScience ◽  
2020 ◽  
Vol 9 (11) ◽  
Author(s):  
Florian Heyl ◽  
Daniel Maticzka ◽  
Michael Uhl ◽  
Rolf Backofen

Abstract Background Post-transcriptional regulation via RNA-binding proteins plays a fundamental role in every organism, but the regulatory mechanisms lack important understanding. Nevertheless, they can be elucidated by cross-linking immunoprecipitation in combination with high-throughput sequencing (CLIP-Seq). CLIP-Seq answers questions about the functional role of an RNA-binding protein and its targets by determining binding sites on a nucleotide level and associated sequence and structural binding patterns. In recent years the amount of CLIP-Seq data skyrocketed, urging the need for an automatic data analysis that can deal with different experimental set-ups. However, noncanonical data, new protocols, and a huge variety of tools, especially for peak calling, made it difficult to define a standard. Findings CLIP-Explorer is a flexible and reproducible data analysis pipeline for iCLIP data that supports for the first time eCLIP, FLASH, and uvCLAP data. Individual steps like peak calling can be changed to adapt to different experimental settings. We validate CLIP-Explorer on eCLIP data, finding similar or nearly identical motifs for various proteins in comparison with other databases. In addition, we detect new sequence motifs for PTBP1 and U2AF2. Finally, we optimize the peak calling with 3 different peak callers on RBFOX2 data, discuss the difficulty of the peak-calling step, and give advice for different experimental set-ups. Conclusion CLIP-Explorer finally fills the demand for a flexible CLIP-Seq data analysis pipeline that is applicable to the up-to-date CLIP protocols. The article further shows the limitations of current peak-calling algorithms and the importance of a robust peak detection.


Sign in / Sign up

Export Citation Format

Share Document