scholarly journals Sequence biases in CLIP experimental data are incorporated in protein RNA-binding models

2016 ◽  
Author(s):  
Yaron Orenstein ◽  
Raghavendra Hosur ◽  
Sean Simmons ◽  
Jadwiga Bienkoswka ◽  
Bonnie Berger

We report a newly-identified bias in CLIP data that results from cleaving enzyme specificity. This bias is inadvertently incorporated into standard peak calling methods [1], which identify the most likely locations where proteins bind RNA. We further show how, in downstream analysis, this bias is incorporated into models inferred by the state-of-the-art GraphProt method to predict protein RNA-binding. We call for both experimental controls to measure enzyme specificities and algorithms to identify unbiased CLIP binding sites.

2020 ◽  
Author(s):  
Michael Uhl ◽  
Dinh Van Tran ◽  
Rolf Backofen

Abstract Background: Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue. Results: Here we show that current peak callers are susceptible to false peak calling near exon borders. We further quantify its extent in publicly available datasets, which turns out to be substantial. Finally, by providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we demonstrate that context choice also affects the performances of RBP binding site prediction tools. Conclusions: Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Michael Uhl ◽  
Van Dinh Tran ◽  
Rolf Backofen

Abstract Background Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue. Results Here we show that current peak callers are susceptible to false peak calling near exon borders. We quantify its extent in publicly available datasets, which turns out to be substantial. By providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we further demonstrate that context choice affects the performances of RBP binding site prediction tools. Moreover, we show that known motifs of exon-binding RBPs are often enriched in transcript context sites, which should enable the recovery of more authentic binding sites. Finally, we discuss possible strategies on how to integrate transcript information into future workflows. Conclusions Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.


2020 ◽  
Author(s):  
Michael Uhl ◽  
Dinh Van Tran ◽  
Rolf Backofen

Abstract Background: Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue. Results: Here we show that current peak callers are susceptible to false peak calling near exon borders. We further quantify its extent in publicly available datasets, which turns out to be substantial. Finally, by providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we demonstrate that context choice also affects the performances of RBP binding site prediction tools. Conclusions: Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.


2020 ◽  
Author(s):  
Michael Uhl ◽  
Van Dinh Tran ◽  
Rolf Backofen

Abstract Background: Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue. Results: Here we show that current peak callers are susceptible to false peak calling near exon borders. We quantify its extent in publicly available datasets, which turns out to be substantial. By providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we further demonstrate that context choice affects the performances of RBP binding site prediction tools. Moreover, we show that known motifs of exon-binding RBPs are often enriched in transcript context sites, which should enable the recovery of more authentic binding sites. Finally, we discuss possible strategies on how to integrate transcript information into future workflows. Conclusions: Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.


Author(s):  
Paolo Marcatili ◽  
Anna Tramontano

This chapter provides an overview of the current computational methods for PPI network cleansing. The authors first present the issue of identifying reliable PPIs from noisy and incomplete experimental data. Next, they address the questions of which are the expected results of the different experimental studies, of what can be defined as true interactions, of which kind of data are to be integrated in assigning reliability levels to PPIs and which gold standard should the authors use in training and testing PPI filtering methods. Finally, Marcatili and Tramontano describe the state of the art in the field, presenting the different classes of algorithms and comparing their results. The aim of the chapter is to guide the reader in the choice of the most convenient methods, experiments and integrative data and to underline the most common biases and errors to obtain a portrait of PINs which is not only reliable but as well able to correctly retrieve the biological information contained in such data.


2019 ◽  
Author(s):  
Michael Uhl ◽  
Van Dinh Tran ◽  
Rolf Backofen

AbstractCLIP-seq is the state-of-the-art technique to experimentally determine transcriptome-wide binding sites of RNA-binding proteins (RBPs). However, it relies on gene expression which can be highly variable between conditions, and thus cannot provide a complete picture of the RBP binding landscape. This necessitates the use of computational methods to predict missing binding sites. Here we present GraphProt2, a computational RBP binding site prediction method based on graph convolutional neural networks (GCN). In contrast to current CNN methods, GraphProt2 supports variable length input as well as the possibility to accurately predict nucleotide-wise binding profiles. We demonstrate its superior performance compared to GraphProt and a CNN-based method on single as well as combined CLIP-seq datasets.


2020 ◽  
Author(s):  
Veronica F. Busa ◽  
Alexander V. Favorov ◽  
Elana J. Fertig ◽  
Anthony K. L. Leung

AbstractThe etiology of diseases driven by dysregulated mRNA metabolism can be elucidated by characterizing the responsible RNA-binding proteins (RBPs). Although characterizations of RBPs have been mainly focused on their binding sequences, not much has been investigated about their preferences for RNA structures. We present nearBynding, an R/Bioconductor pipeline that incorporates RBP binding sites and RNA structure information to discern structural binding preferences for an RBP. nearBynding visualizes RNA structure at and proximal to sites of RBP binding transcriptome-wide, analyzes CLIP-seq data without peak-calling, and provides a flexible scaffold to study RBP binding preferences relative to diverse RNA structure data types.


1986 ◽  
Vol 23 (01) ◽  
pp. 35-54
Author(s):  
Grant R. Hagen ◽  
Edward N. Comstock ◽  
John J. Slager

This paper follows two earlier papers, published by the Society in 1962 and 1979, dealing with correlation allowance and design power margin. For some time it has been perceived that a need exists for changes in the numerical quantities which have been specified by the U.S. Navy for correlation allowance coefficients and design power margins. This perception results from the recognition of a growing body of experimental data, both from model experiments and from ship standardization trials, that provide the basis for both correlation and margin policies. In response to this need, an exhaustive investigation was undertaken to establish a sound basis for a revised correlation allowance policy and to evaluate its impact on design power margin policy. The investigation, which led to proposed revisions in both policies, provided the material for this paper. Presented herein are:a review of the state of the art in the areas of correlation allowance and speed-power margin;an updated database derived primarily from model experiments and standardization trials of U.S. Navy ships;an assessment and interpretation of the database;a proposed alternative to the current correlation allowance policy;an evaluation of the impact of applying the proposed policy in determining required speed-power margins for U.S. Navy ships; anda proposed alternative to the current design power margin policy for new U.S. Navy ships.


Author(s):  
Cunjing Ge ◽  
Feifei Ma ◽  
Xutong Ma ◽  
Fan Zhang ◽  
Pei Huang ◽  
...  

Solution counting or solution space quantification (means volume computation and volume estimation) for linear constraints (LCs) has found interesting applications in various fields. Experimental data shows that integer solution counting is usually more expensive than quantifying volume of solution space while their output values are close. So it is helpful to approximate the number of integer solutions by the volume if the error is acceptable. In this paper, we present and prove a bound of such error for LCs. It is the first bound that can be used to approximate the integer solution counts. Based on this result, an approximate integer solution counting method for LCs is proposed. Experiments show that our approach is over 20x faster than the state-of-the-art integer solution counters. Moreover, such advantage increases with the problem scale.


2018 ◽  
Vol 1 (1) ◽  
pp. 235-261 ◽  
Author(s):  
Anob M. Chakrabarti ◽  
Nejc Haberman ◽  
Arne Praznik ◽  
Nicholas M. Luscombe ◽  
Jernej Ule

An interplay of experimental and computational methods is required to achieve a comprehensive understanding of protein–RNA interactions. UV crosslinking and immunoprecipitation (CLIP) identifies endogenous interactions by sequencing RNA fragments that copurify with a selected RNA-binding protein under stringent conditions. Here we focus on approaches for the analysis of the resulting data and appraise the methods for peak calling, visualization, analysis, and computational modeling of protein–RNA binding sites. We advocate that the sensitivity and specificity of data be assessed in combination for computational quality control. Moreover, we demonstrate the value of analyzing sequence motif enrichment in peaks assigned from CLIP data and of visualizing RNA maps, which examine the positional distribution of peaks around regulated landmarks in transcripts. We use these to assess how variations in CLIP data quality and in different peak calling methods affect the insights into regulatory mechanisms. We conclude by discussing future opportunities for the computational analysis of protein–RNA interaction experiments.


Sign in / Sign up

Export Citation Format

Share Document