scholarly journals GraphProt2: A novel deep learning-based method for predicting binding sites of RNA-binding proteins

2019 ◽  
Author(s):  
Michael Uhl ◽  
Van Dinh Tran ◽  
Rolf Backofen

AbstractCLIP-seq is the state-of-the-art technique to experimentally determine transcriptome-wide binding sites of RNA-binding proteins (RBPs). However, it relies on gene expression which can be highly variable between conditions, and thus cannot provide a complete picture of the RBP binding landscape. This necessitates the use of computational methods to predict missing binding sites. Here we present GraphProt2, a computational RBP binding site prediction method based on graph convolutional neural networks (GCN). In contrast to current CNN methods, GraphProt2 supports variable length input as well as the possibility to accurately predict nucleotide-wise binding profiles. We demonstrate its superior performance compared to GraphProt and a CNN-based method on single as well as combined CLIP-seq datasets.

2020 ◽  
Author(s):  
Kotaro Chihara ◽  
Lars Barquist ◽  
Kenichi Takasugi ◽  
Naohiro Noda ◽  
Satoshi Tsuneda

ABSTRACTPosttranscriptional regulation of gene expression in bacteria is performed by a complex and hierarchical signaling cascade. Pseudomonas aeruginosa harbors two redundant RNA-binding proteins RsmA/RsmN (RsmA/N), which play a critical role in balancing acute and chronic infections. However, in vivo binding sites on target transcripts and the overall impact on the physiology remains unclear. In this study, we applied in vivo UV crosslinking immunoprecipitation followed by RNA-sequencing (UV CLIP-seq) to detect RsmA/N binding sites at single-nucleotide resolution and mapped more than 500 peaks to approximately 400 genes directly bound by RsmA/N in P. aeruginosa. This also demonstrated the ANGGA sequence in apical loops skewed towards 5’UTRs as a consensus motif for RsmA/N binding. Genetic analysis combined with CLIP-seq results identified previously unrecognized RsmA/N targets involved in LPS modification. Moreover, the small non-coding RNAs RsmY/RsmZ, which sequester RsmA/N away from target mRNAs, are positively regulated by the RsmA/N-mediated translational repression of hptB, encoding a histidine phosphotransfer protein, and cafA, encoding a cytoplasmic axial filament protein, thus providing a possible mechanistic explanation for homeostasis of the Rsm system. Our findings present the global RsmA/N-RNA interaction network that exerts pleiotropic effects on gene expression in P. aeruginosa.IMPORTANCEThe ubiquitous bacterium Pseudomonas aeruginosa is notorious as an opportunistic pathogen causing life-threatening acute and chronic infections in immunocompromised patients. P. aeruginosa infection processes are governed by two major gene regulatory systems, namely, the GacA/GacS (GacAS) two-component system and the RNA-binding proteins RsmA/RsmN (RsmA/N). RsmA/N basically function as a translational repressor or activator directly by competing with the ribosome. In this study, we identified hundreds of RsmA/N regulatory target RNAs and the consensus motifs for RsmA/N bindings by UV crosslinking in vivo. Moreover, our CLIP-seq revealed that RsmA/N posttranscriptionally regulate cell wall organization and exert feedback control on GacAS-RsmA/N systems. Many genes including small regulatory RNAs identified in this study are attractive targets for further elucidating the regulatory mechanisms of RsmA/N in P. aeruginosa.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S13) ◽  
Author(s):  
Lei Deng ◽  
Youzhi Liu ◽  
Yechuan Shi ◽  
Wenhao Zhang ◽  
Chun Yang ◽  
...  

Abstract Background RNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanism of post-transcriptional gene regulation. However, the determination of RBP binding sites on a large scale is a challenging task due to high cost of biochemical assays. Quite a number of studies have exploited machine learning methods to predict binding sites. Especially, deep learning is increasingly used in the bioinformatics field by virtue of its ability to learn generalized representations from DNA and protein sequences. Results In this paper, we implemented a novel deep neural network model, DeepRKE, which combines primary RNA sequence and secondary structure information to effectively predict RBP binding sites. Specifically, we used word embedding algorithm to extract features of RNA sequences and secondary structures, i.e., distributed representation of k-mers sequence rather than traditional one-hot encoding. The distributed representations are taken as input of convolutional neural networks (CNN) and bidirectional long-term short-term memory networks (BiLSTM) to identify RBP binding sites. Our results show that deepRKE outperforms existing counterpart methods on two large-scale benchmark datasets. Conclusions Our extensive experimental results show that DeepRKE is an efficacious tool for predicting RBP binding sites. The distributed representations of RNA sequences and secondary structures can effectively detect the latent relationship and similarity between k-mers, and thus improve the predictive performance. The source code of DeepRKE is available at https://github.com/youzhiliu/DeepRKE/.


2018 ◽  
Vol 200 (16) ◽  
Author(s):  
Kayley H. Janssen ◽  
Manisha R. Diaz ◽  
Cindy J. Gode ◽  
Matthew C. Wolfgang ◽  
Timothy L. Yahr

ABSTRACT The Gram-negative opportunistic pathogen Pseudomonas aeruginosa has distinct genetic programs that favor either acute or chronic virulence gene expression. Acute virulence is associated with twitching and swimming motility, expression of a type III secretion system (T3SS), and the absence of alginate, Psl, or Pel polysaccharide production. Traits associated with chronic infection include growth as a biofilm, reduced motility, and expression of a type VI secretion system (T6SS). The Rsm posttranscriptional regulatory system plays important roles in the inverse control of phenotypes associated with acute and chronic virulence. RsmA and RsmF are RNA-binding proteins that interact with target mRNAs to control gene expression at the posttranscriptional level. Previous work found that RsmA activity is controlled by at least three small, noncoding regulatory RNAs (RsmW, RsmY, and RsmZ). In this study, we took an in silico approach to identify additional small RNAs (sRNAs) that might function in the sequestration of RsmA and/or RsmF (RsmA/RsmF) and identified RsmV, a 192-nucleotide (nt) transcript with four predicted RsmA/RsmF consensus binding sites. RsmV is capable of sequestering RsmA and RsmF in vivo to activate translation of tssA1, a component of the T6SS, and to inhibit T3SS gene expression. Each of the predicted RsmA/RsmF consensus binding sites contributes to RsmV activity. Electrophoretic mobility shifts assays show that RsmF binds RsmV with >10-fold higher affinity than RsmY and RsmZ. Gene expression studies revealed that the temporal expression pattern of RsmV differs from those of RsmW, RsmY, and RsmZ. These findings suggest that each sRNA may play a distinct role in controlling RsmA and RsmF activity. IMPORTANCE The members of the CsrA/RsmA family of RNA-binding proteins play important roles in posttranscriptional control of gene expression. The activity of CsrA/RsmA proteins is controlled by small noncoding RNAs that function as decoys to sequester CsrA/RsmA from target mRNAs. Pseudomonas aeruginosa has two CsrA family proteins (RsmA and RsmF) and at least four sequestering sRNAs (RsmV [identified in this study], RsmW, RsmY, and RsmZ) that control RsmA/RsmF activity. RsmY and RsmZ are the primary sRNAs that sequester RsmA/RsmF, and RsmV and RsmW appear to play smaller roles. Differences in the temporal and absolute expression levels of the sRNAs and in their binding affinities for RsmA/RsmF may provide a mechanism of fine-tuning the output of the Rsm system in response to environmental cues.


2019 ◽  
Author(s):  
Alexander Gulliver Bjørnholt Grønning ◽  
Thomas Koed Doktor ◽  
Simon Jonas Larsen ◽  
Ulrika Simone Spangsberg Petersen ◽  
Lise Lolle Holm ◽  
...  

ABSTRACTNucleotide variants can cause functional changes by altering protein-RNA binding in various ways that are not easy to predict. This can affect processes such as splicing, nuclear shuttling, and stability of the transcript. Therefore, correct modelling of protein-RNA binding is critical when predicting the effects of sequence variations. Many RNA-binding proteins recognize a diverse set of motifs and binding is typically also dependent on the genomic context, making this task particularly challenging. Here, we present DeepCLIP, the first method for context-aware modeling and predicting protein binding to nucleic acids using exclusively sequence data as input. We show that DeepCLIP outperforms existing methods for modelling RNA-protein binding. Importantly, we demonstrate that DeepCLIP is able to reliably predict the functional effects of contextually dependent nucleotide variants in independent wet lab experiments. Furthermore, we show how DeepCLIP binding profiles can be used in the design of therapeutically relevant antisense oligonucleotides, and to uncover possible position-dependent regulation in a tissue-specific manner. DeepCLIP can be freely used at http://deepclip.compbio.sdu.dk.HighlightsWe have designed DeepCLIP as a simple neural network that requires only CLIP binding sites as input. The architecture and parameter settings of DeepCLIP makes it an efficient classifier and robust to train, making high performing models easy to train and recreate.Using an extensive benchmark dataset, we demonstrate that DeepCLIP outperforms existing tools in classification. Furthermore, DeepCLIP provides direct information about the neural network’s decision process through visualization of binding motifs and a binding profile that directly indicates sequence elements contributing to the classification.To show that DeepCLIP models generalize to different datasets we have demonstrated that predictions correlate with in vivo and in vitro experiments using quantitative binding assays and minigenes.Identifying the binding sites for regulatory RNA-binding proteins is fundamental for efficient design of (therapeutic) antisense oligonucleotides. Employing a reported disease associated mutation, we demonstrate that DeepCLIP can be used for design of therapeutic antisense oligonucleotides that block regions important for binding of regulatory proteins and correct aberrant splicing.Using DeepCLIP binding profiles, we uncovered a possible position-dependent mechanism behind the reported tissue-specificity of a group of TDP-43 repressed pseudoexons.We have made DeepCLIP available as an online tool for training and application of proteinRNA binding deep learning models and prediction of the potential effects of clinically detected sequence variations (http://deepclip.compbio.sdu.dk/). We also provide DeepCLIP as a configurable stand-alone program (http://www.github.com/deepclip).


2018 ◽  
Author(s):  
Peter A Noble ◽  
Alexander E. Pozhitkov

ABSTRACTOur previous study found more than 500 transcripts significantly increased in abundance in the zebrafish and mouse several hours to days postmortem relative to live controls. The current literature suggests that most mRNAs are post-transcriptionally regulated in stressful conditions, we rationalized that the postmortem transcripts must contain sequence features (3 to 9 mers) that are unique from those in the rest of the transcriptome – specifically, binding sites for proteins and/or non-coding RNAs involved in regulation. Our new study identified 5117 and 2245 over-represented sequence features in the mouse and zebrafish, respectively. Some of these features were disproportionately distributed along the transcripts with high densities in the 3-UTR region of the zebrafish (0.3 mers/nt) and the ORFs of the mouse (0.6 mers/nt). Yet, the highest density (2.3 mers/nt) occurred in the ORFs of 11 mouse transcripts that lacked UTRs. Our results suggest that these transcripts might serve as ‘molecular sponges’ that sequester RNA binding proteins and/or microRNAs, increasing the stability and gene expression of other transcripts. In addition, some features were identified as binding sites forRbfoxandHudproteins that are also involved in increasing transcript stability and gene expression. Hence, our results are consistent with the hypothesis that transcripts involved in responding to extreme stress have sequence features that make them different from the rest of the transcriptome, which presumably has implications for post-transcriptional regulation in disease, starvation, and cancer.ABBREVIATIONSUTRuntranslated regionsORFsopen reading framesOPoverabundant transcript poolCPcontrol transcript poolFPfalse positiveRBPRNA binding proteinsncRNAnon-coding RNAmiRNAmicroRNA


Antioxidants ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 552
Author(s):  
Jasmine Harley ◽  
Benjamin E. Clarke ◽  
Rickie Patani

RNA binding proteins fulfil a wide number of roles in gene expression. Multiple mechanisms of RNA binding protein dysregulation have been implicated in the pathomechanisms of several neurodegenerative diseases including amyotrophic lateral sclerosis (ALS). Oxidative stress and mitochondrial dysfunction also play important roles in these diseases. In this review, we highlight the mechanistic interplay between RNA binding protein dysregulation, oxidative stress and mitochondrial dysfunction in ALS. We also discuss different potential therapeutic strategies targeting these pathways.


2021 ◽  
Vol 9 (3) ◽  
pp. 34
Author(s):  
Thomas E. Forman ◽  
Brenna J. C. Dennison ◽  
Katherine A. Fantauzzo

Cranial neural crest (NC) cells delaminate from the neural folds in the forebrain to the hindbrain during mammalian embryogenesis and migrate into the frontonasal prominence and pharyngeal arches. These cells generate the bone and cartilage of the frontonasal skeleton, among other diverse derivatives. RNA-binding proteins (RBPs) have emerged as critical regulators of NC and craniofacial development in mammals. Conventional RBPs bind to specific sequence and/or structural motifs in a target RNA via one or more RNA-binding domains to regulate multiple aspects of RNA metabolism and ultimately affect gene expression. In this review, we discuss the roles of RBPs other than core spliceosome components during human and mouse NC and craniofacial development. Where applicable, we review data on these same RBPs from additional vertebrate species, including chicken, Xenopus and zebrafish models. Knockdown or ablation of several RBPs discussed here results in altered expression of transcripts encoding components of developmental signaling pathways, as well as reduced cell proliferation and/or increased cell death, indicating that these are common mechanisms contributing to the observed phenotypes. The study of these proteins offers a relatively untapped opportunity to provide significant insight into the mechanisms underlying gene expression regulation during craniofacial morphogenesis.


2018 ◽  
Author(s):  
Alina Munteanu ◽  
Neelanjan Mukherjee ◽  
Uwe Ohler

AbstractMotivationRNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized.ResultsWe developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3‘UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP.AvailabilitySSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/[email protected]


2021 ◽  
Author(s):  
Rui Fu ◽  
Kimberly Wellman ◽  
Amber Baldwin ◽  
Juilee Rege ◽  
Kathryn Walters ◽  
...  

ABSTRACTAngiotensin II (AngII) binds to the type I angiotensin receptor in the adrenal cortex to initiate a cascade of events leading to the production of aldosterone, a master regulator of blood pressure. Despite extensive characterization of the transcriptional and enzymatic control of adrenocortical steroidogenesis, there are still major gaps in our knowledge related to precise regulation of AII-induced gene expression kinetics. Specifically, we do not know the regulatory contribution of RNA-binding proteins (RBPs) and RNA decay, which can control the timing of stimulus-induced gene expression. To investigate this question, we performed a high-resolution RNA-seq time course of the AngII stimulation response and 4-thiouridine pulse labeling in a steroidogenic human cell line (H295R). We identified twelve temporally distinct gene expression responses that contained mRNA encoding proteins known to be important for various steps of aldosterone production, such as cAMP signaling components and steroidogenic enzymes. AngII response kinetics for many of these mRNAs revealed a coordinated increase in both synthesis and decay. These findings were validated in primary human adrenocortical cells stimulated ex vivo with AngII. Using a candidate siRNA screen, we identified a subset of RNA-binding protein and RNA decay factors that activate or repress AngII-stimulated aldosterone production. Among the repressors of aldosterone were BTG2, which promotes deadenylation and global RNA decay. BTG2 was induced in response to AngII stimulation and promoted the repression of mRNAs encoding pro-steroidogenic factors indicating the existence of an incoherent feedforward loop controlling aldosterone homeostasis. Together, these data support a model in which coordinated increases in transcription and regulated RNA decay facilitates the major transcriptomic changes required to implement a pro-steroidogenic gene expression program that is temporally restricted to prevent aldosterone overproduction.


Sign in / Sign up

Export Citation Format

Share Document