scholarly journals Improving CLIP-Seq Data Analysis by Incorporating Transcript Information

2020 ◽  
Author(s):  
Michael Uhl ◽  
Dinh Van Tran ◽  
Rolf Backofen

Abstract Background: Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue. Results: Here we show that current peak callers are susceptible to false peak calling near exon borders. We further quantify its extent in publicly available datasets, which turns out to be substantial. Finally, by providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we demonstrate that context choice also affects the performances of RBP binding site prediction tools. Conclusions: Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.

BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Michael Uhl ◽  
Van Dinh Tran ◽  
Rolf Backofen

Abstract Background Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue. Results Here we show that current peak callers are susceptible to false peak calling near exon borders. We quantify its extent in publicly available datasets, which turns out to be substantial. By providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we further demonstrate that context choice affects the performances of RBP binding site prediction tools. Moreover, we show that known motifs of exon-binding RBPs are often enriched in transcript context sites, which should enable the recovery of more authentic binding sites. Finally, we discuss possible strategies on how to integrate transcript information into future workflows. Conclusions Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.


2020 ◽  
Author(s):  
Michael Uhl ◽  
Van Dinh Tran ◽  
Rolf Backofen

Abstract Background: Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue. Results: Here we show that current peak callers are susceptible to false peak calling near exon borders. We quantify its extent in publicly available datasets, which turns out to be substantial. By providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we further demonstrate that context choice affects the performances of RBP binding site prediction tools. Moreover, we show that known motifs of exon-binding RBPs are often enriched in transcript context sites, which should enable the recovery of more authentic binding sites. Finally, we discuss possible strategies on how to integrate transcript information into future workflows. Conclusions: Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.


2020 ◽  
Author(s):  
Michael Uhl ◽  
Dinh Van Tran ◽  
Rolf Backofen

Abstract Background: Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue. Results: Here we show that current peak callers are susceptible to false peak calling near exon borders. We further quantify its extent in publicly available datasets, which turns out to be substantial. Finally, by providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we demonstrate that context choice also affects the performances of RBP binding site prediction tools. Conclusions: Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.


2016 ◽  
Author(s):  
Yaron Orenstein ◽  
Raghavendra Hosur ◽  
Sean Simmons ◽  
Jadwiga Bienkoswka ◽  
Bonnie Berger

We report a newly-identified bias in CLIP data that results from cleaving enzyme specificity. This bias is inadvertently incorporated into standard peak calling methods [1], which identify the most likely locations where proteins bind RNA. We further show how, in downstream analysis, this bias is incorporated into models inferred by the state-of-the-art GraphProt method to predict protein RNA-binding. We call for both experimental controls to measure enzyme specificities and algorithms to identify unbiased CLIP binding sites.


2011 ◽  
Vol 286 (43) ◽  
pp. 37063-37066 ◽  
Author(s):  
Philip J. Uren ◽  
Suzanne C. Burns ◽  
Jianhua Ruan ◽  
Kusum K. Singh ◽  
Andrew D. Smith ◽  
...  

2019 ◽  
Vol 4 (Spring 2019) ◽  
Author(s):  
Alexa Vandenburg

The Norris lab recently identified two RNA binding proteins required for proper neuron-specific splicing. The lab conducted touch- response behavioral assays to assess the function of these proteins in touch-sensing neurons. After isolating C. elegans worms with specific phenotypes, the lab used automated computer tracking and video analysis to record the worms’ behavior. The behavior of mutant worms differed from that of wild-type worms. The Norris lab also discovered two possible RNA binding protein sites in SAD-1, a neuronal gene implicated in the neuronal development of C. elegans1. These two binding sites may control the splicing of SAD-1. The lab transferred mutated DNA into the genome of wild-type worms by injecting a mutated plasmid. The newly transformed worms fluoresced green, indicating that the two binding sites control SAD-1 splicing.


2021 ◽  
Vol 19 (02) ◽  
pp. 2150006
Author(s):  
Fatemeh Nazem ◽  
Fahimeh Ghasemi ◽  
Afshin Fassihi ◽  
Alireza Mehri Dehnavi

Binding site prediction for new proteins is important in structure-based drug design. The identified binding sites may be helpful in the development of treatments for new viral outbreaks in the world when there is no information available about their pockets with COVID-19 being a case in point. Identification of the pockets using computational methods, as an alternative method, has recently attracted much interest. In this study, the binding site prediction is viewed as a semantic segmentation problem. An improved 3D version of the U-Net model based on the dice loss function is utilized to predict the binding sites accurately. The performance of the proposed model on the independent test datasets and SARS-COV-2 shows the segmentation model could predict the binding sites with a more accurate shape than the recently published deep learning model, i.e. DeepSite. Therefore, the model may help predict the binding sites of proteins and could be used in drug design for novel proteins.


Author(s):  
Noa Katz ◽  
Eitamar Tripto ◽  
Sarah Goldberg ◽  
Orna Atar ◽  
Zohar Yakhini ◽  
...  

AbstractThe design-build-test (DBT) cycle in synthetic biology is considered to be a major bottleneck for progress in the field. The emergence of high-throughput experimental techniques, such as oligo libraries (OLs), combined with machine learning (ML) algorithms, provide the ingredients for a potential “big-data” solution that can generate a sufficient predictive capability to overcome the DBT bottleneck. In this work, we apply the OL-ML approach to the design of RNA cassettes used in gene editing and RNA tracking systems. RNA cassettes are typically made of repetitive hairpins, therefore hindering their retention, synthesis, and functionality. Here, we carried out a high-throughput OL-based experiment to generate thousands of new binding sites for the phage coat proteins of bacteriophages MS2 (MCP), PP7 (PCP), and Qβ (QCP). We then applied a neural network to vastly expand this space of binding sites to millions of additional predicted sites, which allowed us to identify the structural and sequence features that are critical for the binding of each RBP. To verify our approach, we designed new non-repetitive binding site cassettes and tested their functionality in U2OS mammalian cells. We found that all our cassettes exhibited multiple trackable puncta. Additionally, we designed and verified two additional cassettes, the first containing sites that can bind both PCP and QCP, and the second with sites that can bind either MCP or QCP, allowing for an additional orthogonal channel. Consequently, we provide the scientific community with a novel resource for rapidly creating functional non-repetitive binding site cassettes using one or more of three phage coat proteins with a variety of binding affinities for any application spanning bacteria to mammalian cells.


2022 ◽  
Author(s):  
Adam Zemla ◽  
Jonathan E. Allen ◽  
Dan Kirshner ◽  
Felice C. Lightstone

We present a structure-based method for finding and evaluating structural similarities in protein regions relevant to ligand binding. PDBspheres comprises an exhaustive library of protein structure regions (spheres) adjacent to complexed ligands derived from the Protein Data Bank (PDB), along with methods to find and evaluate structural matches between a protein of interest and spheres in the library. Currently, PDBspheres library contains more than 2 million spheres, organized to facilitate searches by sequence and/or structure similarity of protein-ligand binding sites or interfaces between interacting molecules. PDBspheres uses the LGA structure alignment algorithm as the main engine for detecting structure similarities between the protein of interest and library spheres. An all-atom structure similarity metric ensures that sidechain placement is taken into account in the PDBspheres primary assessment of confidence in structural matches. In this paper, we (1) describe the PDBspheres method, (2) demonstrate how PDBspheres can be used to detect and characterize binding sites in protein structures, (3) compare PDBspheres use for binding site prediction with seven other binding site prediction methods using a curated dataset of 2,528 ligand-bound and ligand-free crystal structures, and (4) use PDBspheres to cluster pockets and assess structural similarities among protein binding sites of the 4,876 structures in the refined set of PDBbind 2019 dataset. The PDBspheres library is made publicly available for download at https://proteinmodel.org/AS2TS/PDBspheres


Sign in / Sign up

Export Citation Format

Share Document