Large scale interaction analysis of RNA binding proteins/LncRNAs to identify lncRNA nuclear localization mechanisms

Author(s):  
Yile Huang
2016 ◽  
Author(s):  
Shuya Li ◽  
Fanghong Dong ◽  
Yuexin Wu ◽  
Sai Zhang ◽  
Chen Zhang ◽  
...  

AbstractCharacterizing the binding behaviors of RNA-binding proteins (RBPs) is important for understanding their functional roles in gene expression regulation. However, current high-throughput experimental methods for identifying RBP targets, such as CLIP-seq and RNAcompete, usually suffer from the false positive and false negative issues. Here, we develop a deep boosting based machine learning approach, called DeBooster, to accurately model the binding sequence preferences and identify the corresponding binding targets of RBPs from CLIP-seq data. Comprehensive validation tests have shown that DeBooster can outperform other state-of-the-art approaches in predicting RBP targets and recover false negatives that are common in current CLIP-seq data. In addition, we have demonstrated several new potential applications of DeBooster in understanding the regulatory functions of RBPs, including the binding effects of the RNA helicase MOV10 on mRNA degradation, the influence of different binding behaviors of the ADAR proteins on RNA editing, as well as the antagonizing effect of RBP binding on miRNA repression. Moreover, DeBooster may provide an effective index to investigate the effect of pathogenic mutations in RBP binding sites, especially those related to splicing events. We expect that DeBooster will be widely applied to analyze large-scale CLIP-seq experimental data and can provide a practically useful tool for novel biological discoveries in understanding the regulatory mechanisms of RBPs.


2020 ◽  
Vol 21 (20) ◽  
pp. 7803
Author(s):  
Julie Miro ◽  
Anne-Laure Bougé ◽  
Eva Murauer ◽  
Emmanuelle Beyne ◽  
Dylan Da Cunha ◽  
...  

The Duchenne muscular dystrophy (DMD) gene has a complex expression pattern regulated by multiple tissue-specific promoters and by alternative splicing (AS) of the resulting transcripts. Here, we used an RNAi-based approach coupled with DMD-targeted RNA-seq to identify RNA-binding proteins (RBPs) that regulate splicing of its skeletal muscle isoform (Dp427m) in a human muscular cell line. A total of 16 RBPs comprising the major regulators of muscle-specific splicing events were tested. We show that distinct combinations of RBPs maintain the correct inclusion in the Dp427m of exons that undergo spatio-temporal AS in other dystrophin isoforms. In particular, our findings revealed the complex networks of RBPs contributing to the splicing of the two short DMD exons 71 and 78, the inclusion of exon 78 in the adult Dp427m isoform being crucial for muscle function. Among the RBPs tested, QKI and DDX5/DDX17 proteins are important determinants of DMD exon inclusion. This is the first large-scale study to determine which RBP proteins act on the physiological splicing of the DMD gene. Our data shed light on molecular mechanisms contributing to the expression of the different dystrophin isoforms, which could be influenced by a change in the function or expression level of the identified RBPs.


1995 ◽  
Vol 129 (3) ◽  
pp. 551-560 ◽  
Author(s):  
H Siomi ◽  
G Dreyfuss

The heterogeneous nuclear RNP (hnRNP) A1 protein is one of the major pre-mRNA/mRNA binding proteins in eukaryotic cells and one of the most abundant proteins in the nucleus. It is localized to the nucleoplasm and it also shuttles between the nucleus and the cytoplasm. The amino acid sequence of A1 contains two RNP motif RNA-binding domains (RBDs) at the amino terminus and a glycine-rich domain at the carboxyl terminus. This configuration, designated 2x RBD-Gly, is representative of perhaps the largest family of hnRNP proteins. Unlike most nuclear proteins characterized so far, A1 (and most 2x RBD-Gly proteins) does not contain a recognizable nuclear localization signal (NLS). We have found that a segment of ca. 40 amino acids near the carboxyl end of the protein (designated M9) is necessary and sufficient for nuclear localization; attaching this segment to the bacterial protein beta-galactosidase or to pyruvate kinase completely localized these otherwise cytoplasmic proteins to the nucleus. The RBDs and another RNA binding motif found in the glycine-rich domain, the RGG box, are not required for A1 nuclear localization. M9 is a novel type of nuclear localization domain as it does not contain sequences similar to classical basic-type NLS. Interestingly, sequences similar to M9 are found in other nuclear RNA-binding proteins including hnRNP A2.


Nature ◽  
2021 ◽  
Author(s):  
Eric L. Van Nostrand ◽  
Peter Freese ◽  
Gabriel A. Pratt ◽  
Xiaofeng Wang ◽  
Xintao Wei ◽  
...  

BMC Genomics ◽  
2020 ◽  
Vol 21 (S13) ◽  
Author(s):  
Lei Deng ◽  
Youzhi Liu ◽  
Yechuan Shi ◽  
Wenhao Zhang ◽  
Chun Yang ◽  
...  

Abstract Background RNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanism of post-transcriptional gene regulation. However, the determination of RBP binding sites on a large scale is a challenging task due to high cost of biochemical assays. Quite a number of studies have exploited machine learning methods to predict binding sites. Especially, deep learning is increasingly used in the bioinformatics field by virtue of its ability to learn generalized representations from DNA and protein sequences. Results In this paper, we implemented a novel deep neural network model, DeepRKE, which combines primary RNA sequence and secondary structure information to effectively predict RBP binding sites. Specifically, we used word embedding algorithm to extract features of RNA sequences and secondary structures, i.e., distributed representation of k-mers sequence rather than traditional one-hot encoding. The distributed representations are taken as input of convolutional neural networks (CNN) and bidirectional long-term short-term memory networks (BiLSTM) to identify RBP binding sites. Our results show that deepRKE outperforms existing counterpart methods on two large-scale benchmark datasets. Conclusions Our extensive experimental results show that DeepRKE is an efficacious tool for predicting RBP binding sites. The distributed representations of RNA sequences and secondary structures can effectively detect the latent relationship and similarity between k-mers, and thus improve the predictive performance. The source code of DeepRKE is available at https://github.com/youzhiliu/DeepRKE/.


2003 ◽  
Vol 23 (23) ◽  
pp. 8405-8415 ◽  
Author(s):  
Alexander N. Chkheidze ◽  
Stephen A. Liebhaber

ABSTRACT αCPs comprise a subfamily of KH-domain-containing RNA-binding proteins with specificity for C-rich pyrimidine tracts. These proteins play pivotal roles in a broad spectrum of posttranscriptional events. The five major αCP isoforms are encoded by four dispersed loci. Each isoform contains three repeats of the RNA-binding KH domain (KH1, KH2, and KH3) but lacks other identifiable motifs. To explore the complexity of their respective functions, we examined the subcellular localization of each αCP isoform. Immunofluorescence studies revealed three distinct distributions: αCP1 and αCP2 are predominantly nuclear with specific enrichment of αCP1 in nuclear speckles, αCP3 and αCP4 are restricted to the cytoplasm, and αCP2-KL, an αCP2 splice variant, is present at significant levels in both the nucleus and the cytoplasm. We mapped nuclear localization signals (NLSs) for αCP isoforms. αCP2 contains two functionally independent NLS. Both NLSs appear to be novel and were mapped to a 9-amino-acid segment between KH2 and KH3 (NLS I) and to a 12-amino-acid segment within KH3 (NLS II). NLS I is conserved in αCP1, whereas NLS II is inactivated by two amino acid substitutions. Neither NLS is present in αCP3 or αCP4. Consistent with mapping studies, deletion of NLS I from αCP1 blocks its nuclear accumulation, whereas NLS I and NLS II must both be inactivated to block nuclear accumulation of αCP2. These data demonstrate an unexpected complexity in the compartmentalization of αCP isoforms and identify two novel NLS that play roles in their respective distributions. This complexity of αCP distribution is likely to contribute to the diverse functions mediated by this group of abundant RNA-binding proteins.


2020 ◽  
Vol 117 (10) ◽  
pp. 5269-5279 ◽  
Author(s):  
John W. Phillips ◽  
Yang Pan ◽  
Brandon L. Tsai ◽  
Zhijie Xie ◽  
Levon Demirdjian ◽  
...  

We sought to define the landscape of alternative pre-mRNA splicing in prostate cancers and the relationship of exon choice to known cancer driver alterations. To do so, we compiled a metadataset composed of 876 RNA-sequencing (RNA-Seq) samples from five publicly available sources representing a range of prostate phenotypes from normal tissue to drug-resistant metastases. We subjected these samples to exon-level analysis with rMATS-turbo, purpose-built software designed for large-scale analyses of splicing, and identified 13,149 high-confidence cassette exon events with variable incorporation across samples. We then developed a computational framework, pathway enrichment-guided activity study of alternative splicing (PEGASAS), to correlate transcriptional signatures of 50 different cancer driver pathways with these alternative splicing events. We discovered that Myc signaling was correlated with incorporation of a set of 1,039 cassette exons enriched in genes encoding RNA binding proteins. Using a human prostate epithelial transformation assay, we confirmed the Myc regulation of 147 of these exons, many of which introduced frameshifts or encoded premature stop codons. Our results connect changes in alternative pre-mRNA splicing to oncogenic alterations common in prostate and many other cancers. We also establish a role for Myc in regulating RNA splicing by controlling the incorporation of nonsense-mediated decay-determinant exons in genes encoding RNA binding proteins.


2019 ◽  
Author(s):  
Jian-You Liao ◽  
Bing Yang ◽  
Yu-Chan Zhang ◽  
Xiao-Juan Wang ◽  
Yushan Ye ◽  
...  

ABSTRACTRNA binding proteins (RBPs) are a large protein family that plays important roles at almost all levels of gene regulation through interacting with RNAs, and contributes to numerous biological processes. However, the complete list of eukaryotic RBPs including human is still unavailable. In this study, we systematically identified RBPs in 162 eukaryotic species based on both computational analysis of RNA binding domains (RBDs) and large-scale RNA binding proteomic (RBPome) data, and established a comprehensive eukaryotic RBP database, EuRBPDB (http://EuRBPDB.syshospital.org:8081). We identified a total of 311,571 RBPs with RBDs and 3,639 non-canonical RBPs without known RBDs. EuRBPDB provides detailed annotations for each RBP, including basic information and functional annotation. Moreover, we systematically investigated RBPs in the context of cancer biology based on published literatures and large-scale omics data. To facilitate the exploration of the clinical relevance of RBPs, we additionally designed a cancer web interface to systematically and interactively display the biological features of RBPs in various types of cancers. EuRBPDB has a user-friendly web interface with browse and search functions, as well as data downloading function. We expect that EuRBPDB will be a widely-used resource and platform for the RNA biology community.


2019 ◽  
Author(s):  
Eric L Van Nostrand ◽  
Gabriel A Pratt ◽  
Brian A Yee ◽  
Emily Wheeler ◽  
Steven M Blue ◽  
...  

AbstractA critical step in uncovering rules of RNA processing is to study the in vivo regulatory networks of RNA binding proteins (RBPs). Crosslinking and immunoprecipitation (CLIP) methods enabled mapping RBP targets transcriptome-wide, but methodological differences present challenges to large-scale integrated analysis across datasets. The development of enhanced CLIP (eCLIP) enabled the large-scale mapping of targets for 150 RBPs in K562 and HepG2, creating a unique resource of RBP interactomes profiled with a standardized methodology in the same cell types. Here we describe our analysis of 223 enhanced (eCLIP) datasets characterizing 150 RBPs in K562 and HepG2 cell lines, revealing a range of binding modalities, including highly resolved positioning around splicing signals and mRNA untranslated regions that associate with distinct RBP functions. Quantification of enrichment for repetitive and abundant multi-copy elements reveals 70% of RBPs have enrichment for non-mRNA element classes, enables identification of novel ribosomal RNA processing factors and sites and suggests that association with retrotransposable elements reflects multiple RBP mechanisms of action. Analysis of spliceosomal RBPs indicates that eCLIP resolves AQR association after intronic lariat formation (enabling identification of branch points with single-nucleotide resolution) and provides genome-wide validation for a branch point-based scanning model for 3’ splice site recognition. Further, we show that eCLIP peak co-occurrences across RBPs enables the discovery of novel co-interacting RBPs. Finally, we present a protocol for visualization of RBP:RNA complexes in the eCLIP workflow using biotin and standard chemiluminescent visualization reagents, enabling simplified confirmation of ribonucleoprotein enrichment without radioactivity. This work illustrates the value of integrated analysis across eCLIP profiling of RBPs with widely distinct functions to reveal novel RNA biology. Further, our quantification of both mRNA and other element association will enable further research to identify novel roles of RBPs in regulating RNA processing.


Sign in / Sign up

Export Citation Format

Share Document