scholarly journals Zooming in on protein–RNA interactions: a multi-level workflow to identify interaction partners

2020 ◽  
Vol 48 (4) ◽  
pp. 1529-1543
Author(s):  
Alessio Colantoni ◽  
Jakob Rupert ◽  
Andrea Vandelli ◽  
Gian Gaetano Tartaglia ◽  
Elsa Zacco

Interactions between proteins and RNA are at the base of numerous cellular regulatory and functional phenomena. The investigation of the biological relevance of non-coding RNAs has led to the identification of numerous novel RNA-binding proteins (RBPs). However, defining the RNA sequences and structures that are selectively recognised by an RBP remains challenging, since these interactions can be transient and highly dynamic, and may be mediated by unstructured regions in the protein, as in the case of many non-canonical RBPs. Numerous experimental and computational methodologies have been developed to predict, identify and verify the binding between a given RBP and potential RNA partners, but navigating across the vast ocean of data can be frustrating and misleading. In this mini-review, we propose a workflow for the identification of the RNA binding partners of putative, newly identified RBPs. The large pool of potential binders selected by in-cell experiments can be enriched by in silico tools such as catRAPID, which is able to predict the RNA sequences more likely to interact with specific RBP regions with high accuracy. The RNA candidates with the highest potential can then be analysed in vitro to determine the binding strength and to precisely identify the binding sites. The results thus obtained can furthermore validate the computational predictions, offering an all-round solution to the issue of finding the most likely RNA binding partners for a newly identified potential RBP.

2018 ◽  
Author(s):  
Alina Munteanu ◽  
Neelanjan Mukherjee ◽  
Uwe Ohler

AbstractMotivationRNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized.ResultsWe developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3‘UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP.AvailabilitySSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/[email protected]


2021 ◽  
Author(s):  
Kevin McKernan ◽  
Anthony M. Kyriakopoulos ◽  
Peter McCullough

Codon optimization describes the process used to increase protein production by use of alternative but synonymous codon changes. In SARS-CoV-2 mRNA vaccines codon optimizations can result in differential secondary conformations that inevitably affect a protein’s function with significant consequences to the cell. Importantly, when codon optimization increases the GC content of synthetic mRNAs, there can be an inevitable enrichment of G-quartets which potentially form G-quadruplex structures. The emerging G-quadruplexes are favorable binding sites of RNA binding proteins like helicases that inevitably affect epigenetic reprogramming of the cell by altering transcription, translation and replication. In this study, we performed a RNAfold analysis to investigate alterations in secondary structures of mRNAs in SARS-CoV-2 vaccines due to codon optimization. We show a significant increase in the GC content of mRNAs in vaccines as compared to native SARS-CoV-2 RNA sequences encoding the spike protein. As the GC enrichment leads to more G-quadruplex structure formations, these may contribute to potential pathological processes initiated by SARS-CoV-2 molecular vaccination.


2021 ◽  
Author(s):  
Ionut Atanasoai ◽  
Sofia Papavasileiou ◽  
Natalie Preiss ◽  
Claudia Kutter

Over the past decade, thousands of putative human RNA binding proteins (RBPs) have been identified and increased the demand for specifying RNA binding capacities. Here, we developed RNA affinity purification followed by sequencing (RAPseq) that enables in vitro large-scale profiling of RBP binding to native RNAs. First, by employing RAPseq, we found that vertebrate HURs recognize a conserved RNA binding motif and bind predominantly to introns in zebrafish compared to 3'UTRs in human RNAs. Second, our dual RBP assays (co-RAPseq) uncovered cooperative RNA binding of HUR and PTBP1 within an optimal distance of 27 nucleotides. Third, we developed T7-RAPseq to discern m6A-dependent and -independent RNA binding sites of YTHDF1. Fourth, RAPseq of 26 novel non-canonical RBPs revealed specialized moonlighting interactions. Last, five pathological IGF2BP family variants exhibited different RNA binding patterns. Overall, our simple, scalable and versatile method enables to fast-forward RBP-related questions.


2017 ◽  
Author(s):  
Jonathan M. Howard ◽  
Hai Lin ◽  
Garam Kim ◽  
Jolene M Draper ◽  
Maximilian Haeussler ◽  
...  

AbstractAlternative pre-mRNA splicing plays a major role in expanding the transcript output of human genes. This process is regulated, in part, by the interplay of trans-acting RNA binding proteins (RBPs) with myriad cis-regulatory elements scattered throughout pre-mRNAs. These molecular recognition events are critical for defining the protein coding sequences (exons) within pre-mRNAs and directing spliceosome assembly on non-coding regions (introns). One of the earliest events in this process is recognition of the 3’ splice site by U2 small nuclear RNA auxiliary factor 2 (U2AF2). Splicing regulators, such as the heterogeneous nuclear ribonucleoprotein A1 (HNRNPA1), influence spliceosome assembly both in vitro and in vivo, but their mechanisms of action remain poorly described on a global scale. HNRNPA1 also promotes proof reading of 3’ss sequences though a direct interaction with the U2AF heterodimer. To determine how HNRNPA1 regulates U2AF-RNA interactions in vivo, we analyzed U2AF2 RNA binding specificity using individual-nucleotide resolution crosslinking immunoprecipitation (iCLIP) in control- and HNRNPA1 over-expression cells. We observed changes in the distribution of U2AF2 crosslinking sites relative to the 3’ splice sites of alternative cassette exons but not constitutive exons upon HNRNPA1 over-expression. A subset of these events shows a concomitant increase of U2AF2 crosslinking at distal intronic regions, suggesting a shift of U2AF2 to “decoy” binding sites. Of the many non-canonical U2AF2 binding sites, Alu-derived RNA sequences represented one of the most abundant classes of HNRNPA1-dependent decoys. Splicing reporter assays demonstrated that mutation of U2AF2 decoy sites inhibited HNRNPA1-dependent exon skipping in vivo. We propose that HNRNPA1 regulates exon definition by modulating the interaction of U2AF2 with decoy or bona fide 3’ splice sites.


2017 ◽  
Author(s):  
Xiaoyong Pan ◽  
Peter Rijnbeek ◽  
Junchi Yan ◽  
Hong-Bin Shen

AbstractRNA regulation is significantly dependent on its binding protein partner, which is known as the RNA-binding proteins (RBPs). Unfortunately, the binding preferences for most RBPs are still not well characterized, especially on the structure point of view. Informative signals hiding and interdependencies between sequence and structure specificities are two challenging problems for both predicting RBP binding sites and accurate sequence and structure motifs mining.In this study, we propose a deep learning-based method, iDeepS, to simultaneously identify the binding sequence and structure motifs from RNA sequences using convolutional neural networks (CNNs) and a bidirectional long short term memory network (BLSTM). We first perform one-hot encoding for both the sequence and predicted secondary structure, which are appropriate for subsequent convolution operations. To reveal the hidden binding knowledge from the observations, the CNNs are applied to learn the abstract motif features. Considering the close relationship between sequences and predicted structures, we use the BLSTM to capture the long range dependencies between binding sequence and structure motifs identified by the CNNs. Finally, the learned weighted representations are fed into a classification layer to predict the RBP binding sites. We evaluated iDeepS on verified RBP binding sites derived from large-scale representative CLIP-seq datasets, and the results demonstrate that iDeepS can reliably predict the RBP binding sites on RNAs, and outperforms the state-of-the-art methods. An important advantage is that iDeepS is able to automatically extract both binding sequence and structure motifs, which will improve our transparent understanding of the mechanisms of binding specificities of RBPs. iDeepS is available at https://github.com/xypan1232/iDeepS.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S13) ◽  
Author(s):  
Lei Deng ◽  
Youzhi Liu ◽  
Yechuan Shi ◽  
Wenhao Zhang ◽  
Chun Yang ◽  
...  

Abstract Background RNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanism of post-transcriptional gene regulation. However, the determination of RBP binding sites on a large scale is a challenging task due to high cost of biochemical assays. Quite a number of studies have exploited machine learning methods to predict binding sites. Especially, deep learning is increasingly used in the bioinformatics field by virtue of its ability to learn generalized representations from DNA and protein sequences. Results In this paper, we implemented a novel deep neural network model, DeepRKE, which combines primary RNA sequence and secondary structure information to effectively predict RBP binding sites. Specifically, we used word embedding algorithm to extract features of RNA sequences and secondary structures, i.e., distributed representation of k-mers sequence rather than traditional one-hot encoding. The distributed representations are taken as input of convolutional neural networks (CNN) and bidirectional long-term short-term memory networks (BiLSTM) to identify RBP binding sites. Our results show that deepRKE outperforms existing counterpart methods on two large-scale benchmark datasets. Conclusions Our extensive experimental results show that DeepRKE is an efficacious tool for predicting RBP binding sites. The distributed representations of RNA sequences and secondary structures can effectively detect the latent relationship and similarity between k-mers, and thus improve the predictive performance. The source code of DeepRKE is available at https://github.com/youzhiliu/DeepRKE/.


2018 ◽  
Author(s):  
Fernando Cid-Samper ◽  
Mariona Gelabert-Baldrich ◽  
Benjamin Lang ◽  
Nieves Lorenzo-Gotor ◽  
Riccardo Delli Ponti ◽  
...  

SummaryRecent evidence indicates that specific RNAs promote formation of ribonucleoprotein condensates by acting as scaffolds for RNA-binding proteins (RBPs).We systematically investigated RNA-RBP interaction networks to understand ribonucleoprotein assembly. We found that highly-contacted RNAs are highly structured, have long untranslated regions (UTRs) and contain nucleotide repeat expansions. Among the RNAs with such properties, we identified the FMR1 3’ UTR that harbors CGG expansions implicated in Fragile X-associated Tremor/Ataxia Syndrome (FXTAS).We studied FMR1 binding partners in silico and in vitro and prioritized the splicing regulator TRA2A for further characterization. In a FXTAS cellular model we validated TRA2A-FRM1 interaction and investigated implications of its sequestration at both transcriptomic and post-transcriptomic levels. We found that TRA2A co-aggregates with FMR1 in a FXTAS mouse model and in post mortem human samples.Our integrative study identifies key components of ribonucleoprotein aggregates, providing links to neurodegenerative disease and allowing the discovery of new therapeutic targets.


2018 ◽  
Vol 4 (4) ◽  
pp. 28 ◽  
Author(s):  
Neil Brockdorff

Xist, the master regulator of the X chromosome inactivation in mammals, is a 17 kb lncRNA that acts in cis to silence the majority of genes along the chromosome from which it is transcribed. The two key processes required for Xist RNA function, localisation in cis and recruitment of silencing factors, are genetically separable, at least in part. Recent studies have identified Xist RNA sequences and associated RNA-binding proteins (RBPs) that are important for these processes. Notably, several of the key Xist RNA elements correspond to local tandem repeats. In this review, I use examples to illustrate different modes whereby tandem repeat amplification has been exploited to allow orthodox RBPs to confer new functions for Xist-mediated chromosome inactivation. I further discuss the potential generality of tandem repeat expansion in the evolution of functional long non-coding RNAs (lncRNAs).


2021 ◽  
Author(s):  
Hongli Ma ◽  
Han Wen ◽  
Zhiyuan Xue ◽  
Guojun Li ◽  
Zhaolei Zhang

RNA molecules can adopt stable secondary and tertiary structures, which is essential in mediating physical interactions with other partners such as RNA binding proteins (RBPs) and in carrying out their cellular functions. In vivo and in vitro experiments such as RNAcompete and eCLIP have revealed in vitro binding preferences of RBPs to RNA oligomers and in vivo binding sites in cells. Analysis of these binding data showed that the structure properties of the RNAs in these binding sites are important determinants of the binding events; however, it has been a challenge to incorporate the structure information into an interpretable model. Here we describe a new approach, RNANetMotif, which takes predicted secondary structure of thousands of RNA sequences bound by an RBP as input and uses a graph theory approach to recognize enriched subgraphs. These enriched subgraphs are in essence shared sequence-structure elements that are important in RBP-RNA binding. To validate our approach, we performed RNA structure modeling via discrete molecular dynamics folding simulations for selected 4 RBPs, and RNA-protein docking for LIN28. The simulation results, e.g., solvent accessibility and energetics, further support the biological relevance of the discovered network subgraphs.


2021 ◽  
Vol 15 ◽  
Author(s):  
Lichao Zhang ◽  
Zihong Huang ◽  
Liang Kong

Background: RNA-binding proteins establish posttranscriptional gene regulation by coordinating the maturation, editing, transport, stability, and translation of cellular RNAs. The immunoprecipitation experiments could identify interaction between RNA and proteins, but they are limited due to the experimental environment and material. Therefore, it is essential to construct computational models to identify the function sites. Objective: Although some computational methods have been proposed to predict RNA binding sites, the accuracy could be further improved. Moreover, it is necessary to construct a dataset with more samples to design a reliable model. Here we present a computational model based on multi-information sources to identify RNA binding sites. Method: We construct an accurate computational model named CSBPI_Site, based on xtreme gradient boosting. The specifically designed 15-dimensional feature vector captures four types of information (chemical shift, chemical bond, chemical properties and position information). Results: The satisfied accuracy of 0.86 and AUC of 0.89 were obtained by leave-one-out cross validation. Meanwhile, the accuracies were slightly different (range from 0.83 to 0.85) among three classifiers algorithm, which showed the novel features are stable and fit to multiple classifiers. These results showed that the proposed method is effective and robust for noncoding RNA binding sites identification. Conclusion: Our method based on multi-information sources is effective to represent the binding sites information among ncRNAs. The satisfied prediction results of Diels-Alder riboz-yme based on CSBPI_Site indicates that our model is valuable to identify the function site.


Sign in / Sign up

Export Citation Format

Share Document