scholarly journals Predicting localized affinity of RNA binding proteins to transcripts with convolutional neural networks

2021 ◽  
Author(s):  
Alexander Kitaygorodsky ◽  
Emily Jin ◽  
Yufeng Shen

RNA binding proteins (RBPs) are important regulators of transcriptional and post-transcriptional processes. Computational prediction of localized RBP binding affinity with transcripts is important for interpretation of genetic variation, especially variants outside of protein coding region. Here we describe POLARIS (Prediction Of Localized Affinity for RBPs In Sequence), a new deep-learning method for achieving fast, site-specific binding affinity predictions of RNA-binding proteins (RBPs) to the transcribed genome. POLARIS has two modules: 1. a convolutional neural network (CNN) to predict overall RBP binding within a region based on transcript sequence content and expression level; 2. a Gradient-weighted Class Activation Mapping (GradCAM) implementation for efficient signal backpropagation to individual sequence positions. We trained the model using enhanced crosslinking and immunoprecipitation (eCLIP) data from ENCODE. POLARIS has good performance with a median AUC ~ 0.96 for 160 RBPs across three different cell lines, substantially higher than selected popular published methods trained and tested on the same data sets. When tested on data from a different cell line with the same RBPs, the overall performance is maintained, supporting the ability of cell-type specific affinity prediction. Finally, the GradCAM module allows the model to identify the informative sites in a region that drive prediction. The localized prediction facilitates interpretation of the results and provides basis for inference of functional impact of noncoding variants.

2019 ◽  
Vol 14 (7) ◽  
pp. 621-627 ◽  
Author(s):  
Youhuang Bai ◽  
Xiaozhuan Dai ◽  
Tiantian Ye ◽  
Peijing Zhang ◽  
Xu Yan ◽  
...  

Background: Long noncoding RNAs (lncRNAs) are endogenous noncoding RNAs, arbitrarily longer than 200 nucleotides, that play critical roles in diverse biological processes. LncRNAs exist in different genomes ranging from animals to plants. Objective: PlncRNADB is a searchable database of lncRNA sequences and annotation in plants. Methods: We built a pipeline for lncRNA prediction in plants, providing a convenient utility for users to quickly distinguish potential noncoding RNAs from protein-coding transcripts. Results: More than five thousand lncRNAs are collected from four plant species (Arabidopsis thaliana, Arabidopsis lyrata, Populus trichocarpa and Zea mays) in PlncRNADB. Moreover, our database provides the relationship between lncRNAs and various RNA-binding proteins (RBPs), which can be displayed through a user-friendly web interface. Conclusion: PlncRNADB can serve as a reference database to investigate the lncRNAs and their interaction with RNA-binding proteins in plants. The PlncRNADB is freely available at http://bis.zju.edu.cn/PlncRNADB/.


2006 ◽  
Vol 26 (8) ◽  
pp. 3295-3307 ◽  
Author(s):  
Tomoko Kawai ◽  
Ashish Lal ◽  
Xiaoling Yang ◽  
Stefanie Galban ◽  
Krystyna Mazan-Mamczarz ◽  
...  

ABSTRACT Stresses affecting the endoplasmic reticulum (ER) globally modulate gene expression patterns by altering posttranscriptional processes such as translation. Here, we use tunicamycin (Tn) to investigate ER stress-triggered changes in the translation of cytochrome c, a pivotal regulator of apoptosis. We identified two RNA-binding proteins that associate with its ∼900-bp-long, adenine- and uridine-rich 3′ untranslated region (UTR): HuR, which displayed affinity for several regions of the cytochrome c 3′UTR, and T-cell-restricted intracellular antigen 1 (TIA-1), which preferentially bound the segment proximal to the coding region. HuR did not appear to influence the cytochrome c mRNA levels but instead promoted cytochrome c translation, as HuR silencing greatly diminished the levels of nascent cytochrome c protein. By contrast, TIA-1 functioned as a translational repressor of cytochrome c, with interventions to silence TIA-1 dramatically increasing cytochrome c translation. Following treatment with Tn, HuR binding to cytochrome c mRNA decreased, and both the presence of cytochrome c mRNA within actively translating polysomes and the rate of cytochrome c translation declined. Taken together, our data suggest that the translation rate of cytochrome c is determined by the opposing influences of HuR and TIA-1 upon the cytochrome c mRNA. Under unstressed conditions, cytochrome c mRNA is actively translated, but in response to ER stress agents, both HuR and TIA-1 contribute to lowering its biosynthesis rate. We propose that HuR and TIA-1 function coordinately to maintain precise levels of cytochrome c production under unstimulated conditions and to modify cytochrome c translation when damaged cells are faced with molecular decisions to follow a prosurvival or a prodeath path.


2020 ◽  
Vol 477 (2) ◽  
pp. 509-524
Author(s):  
Oumayma Rouis ◽  
Cédric Broussard ◽  
François Guillonneau ◽  
Jean-Baptiste Boulé ◽  
Emmanuelle Delagoutte

DNA hemicatenanes (HCs) are four-way junctions in which one strand of a double-stranded helix is catenated with one strand of another double-stranded DNA. Frequently mentioned as DNA replication, recombination and repair intermediates, they have been proposed to participate in the spatial organization of chromosomes and in the regulation of gene expression. To explore potential roles of HCs in genome metabolism, we sought to purify proteins capable of binding specifically HCs by fractionating nuclear extracts from HeLa cells. This approach identified three RNA-binding proteins: the Tudor-staphylococcal nuclease domain 1 (SND1) protein and two proteins from the Drosophila behavior human splicing family, the paraspeckle protein component 1 and the splicing factor proline- and glutamine-rich protein. Since these proteins were partially pure after fractionation, truncated forms of these proteins were expressed in Escherichia coli and purified to near homogeneity. The specificity of their interaction with HCs was re-examined in vitro. The two truncated purified SND1 proteins exhibited specificity for HCs, opening the interesting possibility of a link between the basic transcription machinery and HC structures via SND1.


2020 ◽  
Vol 21 (18) ◽  
pp. 6835
Author(s):  
Jonas Weiße ◽  
Julia Rosemann ◽  
Vanessa Krauspe ◽  
Matthias Kappler ◽  
Alexander W. Eckert ◽  
...  

Nearly 7.5% of all human protein-coding genes have been assigned to the class of RNA-binding proteins (RBPs), and over the past decade, RBPs have been increasingly recognized as important regulators of molecular and cellular homeostasis. RBPs regulate the post-transcriptional processing of their target RNAs, i.e., alternative splicing, polyadenylation, stability and turnover, localization, or translation as well as editing and chemical modification, thereby tuning gene expression programs of diverse cellular processes such as cell survival and malignant spread. Importantly, metastases are the major cause of cancer-associated deaths in general, and particularly in oral cancers, which account for 2% of the global cancer mortality. However, the roles and architecture of RBPs and RBP-controlled expression networks during the diverse steps of the metastatic cascade are only incompletely understood. In this review, we will offer a brief overview about RBPs and their general contribution to post-transcriptional regulation of gene expression. Subsequently, we will highlight selected examples of RBPs that have been shown to play a role in oral cancer cell migration, invasion, and metastasis. Last but not least, we will present targeting strategies that have been developed to interfere with the function of some of these RBPs.


2020 ◽  
Vol 21 (8) ◽  
pp. 2969 ◽  
Author(s):  
Katharina Jonas ◽  
George A. Calin ◽  
Martin Pichler

The majority of the genome is transcribed into pieces of non-(protein) coding RNA, among which long non-coding RNAs (lncRNAs) constitute a large group of particularly versatile molecules that govern basic cellular processes including transcription, splicing, RNA stability, and translation. The frequent deregulation of numerous lncRNAs in cancer is known to contribute to virtually all hallmarks of cancer. An important regulatory mechanism of lncRNAs is the post-transcriptional regulation mediated by RNA-binding proteins (RBPs). So far, however, only a small number of known cancer-associated lncRNAs have been found to be regulated by the interaction with RBPs like human antigen R (HuR), ARE/poly(U)-binding/degradation factor 1 (AUF1), insulin-like growth factor 2 mRNA-binding protein 1 (IGF2BP1), and tristetraprolin (TTP). These RBPs regulate, by various means, two aspects in particular, namely the stability and the localization of lncRNAs. Importantly, these RBPs themselves are commonly deregulated in cancer and might thus play a major role in the deregulation of cancer-related lncRNAs. There are, however, still many open questions, for example regarding the context specificity of these regulatory mechanisms that, in part, is based on the synergistic or competitive interaction between different RBPs. There is also a lack of knowledge on how RBPs facilitate the transport of lncRNAs between different cellular compartments.


2015 ◽  
Vol 71 (2) ◽  
pp. 196-208 ◽  
Author(s):  
Benjamin S. Gully ◽  
Kunal R. Shah ◽  
Mihwa Lee ◽  
Kate Shearston ◽  
Nicole M. Smith ◽  
...  

Proteins of the pentatricopeptide repeat (PPR) superfamily are characterized by tandem arrays of a degenerate 35-amino-acid α-hairpin motif. PPR proteins are typically single-stranded RNA-binding proteins with essential roles in organelle biogenesis, RNA editing and mRNA maturation. A modular, predictable code for sequence-specific binding of RNA by PPR proteins has recently been revealed, which opens the door to thede novodesign of bespoke proteins with specific RNA targets, with widespread biotechnological potential. Here, the design and production of a synthetic PPR protein based on a consensus sequence and the determination of its crystal structure to 2.2 Å resolution are described. The crystal structure displays helical disorder, resulting in electron density representing an infinite superhelical PPR protein. A structural comparison with related tetratricopeptide repeat (TPR) proteins, and with native PPR proteins, reveals key roles for conserved residues in directing the structure and function of PPR proteins. The designed proteins have high solubility and thermal stability, and can form long tracts of PPR repeats. Thus, consensus-sequence synthetic PPR proteins could provide a suitable backbone for the design of bespoke RNA-binding proteins with the potential for high specificity.


2020 ◽  
Vol 17 ◽  
Author(s):  
Yongmei Li ◽  
Baicai Yang ◽  
Yali Zhang ◽  
Kaiwen Hei ◽  
Mingming Xiao

Background: To investigate the interactions between RNA and proteins is essential to understand how these macromolecule complexes exert their functions. RNA pull-down is a classic technique to enrich RNA binding proteins, however, a large number of non-specific binding proteins may be enriched during sample preparation, interfering with the downstream mass spectrometric analyses and also causing false positives. Objective: In this study we examined the background contaminates in RNA pull-down experiment using mass spectrometric analysis. Method Antisense MALAT1 was first synthesized using in vitro transcription and incubated with cellular proteins extracted from HepG2 cells. The non-specific binding proteins were isolated using streptavidin conjugated magnetic beads and separated on SDS-PAGE. Each gel lane was divided into nine bands and digested with trypsin for the downstream LC-MS/MS analyses. Results: 191 protein groups were identified as non-specific binding proteins in RNA pull-down samples. In addition, comparison between different sample preparation conditions showed that the level of background contaminates were mostly induced by the solid phase support and not affected by the studied RNA. In addition, using more stringent detergent and streptavidin magnetic beads with smaller size could reduce the amount of background interfering proteins. Conclusion: This study provides a reference to distinguish bona fide RNA interacting proteins from the background contaminants. The results also demonstrate that different sample preparation conditions have great impacts on the level of enriched background contaminates, shedding new light on the optimization of RNA pull-down experiment.


2018 ◽  
Author(s):  
Ei-Wen Yang ◽  
Jae Hoon Bahn ◽  
Esther Yun-Hua Hsiao ◽  
Boon Xin Tan ◽  
Yiwei Sun ◽  
...  

AbstractAllele-specific protein-RNA binding is an essential aspect that may reveal functional genetic variants influencing RNA processing and gene expression phenotypes. Recently, genome-wide detection of in vivo binding sites of RNA binding proteins (RBPs) is greatly facilitated by the enhanced UV crosslinking and immunoprecipitation (eCLIP) protocol. Hundreds of eCLIP-Seq data sets were generated from HepG2 and K562 cells during the ENCODE3 phase. These data afford a valuable opportunity to examine allele-specific binding (ASB) of RBPs. To this end, we developed a new computational algorithm, called BEAPR (Binding Estimation of Allele-specific Protein-RNA interaction). In identifying statistically significant ASB sites, BEAPR takes into account UV cross-linking induced sequence propensity and technical variations between replicated experiments. Using simulated data and actual eCLIP-Seq data, we show that BEAPR largely outperforms often-used methods Chi-Squared test and Fisher’s Exact test. Importantly, BEAPR overcomes the inherent over-dispersion problem of the other methods. Complemented by experimental validations, we demonstrate that ASB events are significantly associated with genetic regulation of splicing and mRNA abundance, supporting the usage of this method to pinpoint functional genetic variants in post-transcriptional gene regulation. Many variants with ASB patterns of RBPs were found as genetic variants with cancer or other disease relevance. About 38% of ASB variants were in linkage disequilibrium with single nucleotide polymorphisms from genome-wide association studies. Overall, our results suggest that BEAPR is an effective method to reveal ASB patterns in eCLIP and can inform functional interpretation of disease-related genetic variants.


1998 ◽  
Vol 26 (22) ◽  
pp. 5036-5044 ◽  
Author(s):  
G. A. R. Doyle ◽  
P. F. Leeds ◽  
A. J. Fleisig ◽  
J. Ross ◽  
N. A. Betz ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document