scholarly journals SEA: Simple Enrichment Analysis of motifs

2021 ◽  
Author(s):  
Timothy L. Bailey ◽  
Charles E. Grant

Motif enrichment algorithms can identify known sequence motifs that are present to a statistically significant degree in DNA, RNA and protein sequences. Databases of such known motifs exist for DNA- and RNA-binding proteins, as well as for many functional protein motifs. The SEA ("Simple Enrichment Analysis") algorithm presented here uses a simple, consistent approach for detecting the enrichment of motifs in DNA, RNA or protein sequences, as well as in sequences using user-defined alphabets. SEA can identify known motifs that are enriched in a single set of input sequences, and can also perform differential motif enrichment analysis when presented with an additional set of control sequences. Using in vivo DNA (ChIP-seq) data as input to SEA, and validating motifs with reference motifs derived from in vitro data, we show that SEA is is faster than three widely-used motif enrichment algorithms (AME, CentriMo and Pscan), while delivering comparable accuracy. We also show that, in contrast to other motif enrichment algorithms, SEA reports accurate estimates of statistical significance. SEA is easy to use via its web server at https://meme-suite.org, and is fully integrated with the widely-used MEME Suite of sequence analysis tools, which can be freely downloaded at the same web site for non-commercial use.

2021 ◽  
Author(s):  
András L. Szabó ◽  
Anna Sánta ◽  
Zoltán Gáspári

AbstractProtein phase separation has been shown to be a major governing factor in multiple cellular processes, especially ones concerning RNA and RNA-binding proteins. Despite many key observations, the exact structural characteristics of proteins involved in the process are still not fully deciphered. In this work we show that proteins harbouring sequences with specific regions of charged residues are significantly associated with phase separation phenomena. In particular, regions with repetitive arrays of alternating charges (termed charged residue repeats, CRRs) show the strongest association, whereas segments with generally high charge density (charge-dense regions, CDRs) and single alpha-helices (SAHs) show also detectable but weaker connections.It is known to contribute to the formation of membrane-less organelles (MLOs) and to an extent the aggregation of proteins. The causes and consequences of phase separation has been a rigorously researched topic in the last few years, as the condensation of specific phase-separating proteins is known to promote several diseases.In this work we carried out a computational analysis to examine the presence of repetitive segments with high charge density in proteins prone to phase separation. Free resources such as the Charged Single α-Helix (CSAH) web server and the PhaSepDB online database were used to examine possible links between the charged side-chain content of protein sequences and their partition into membrane-less condensates. Furthermore, we carried out the development of a novel algorithm aimed to detect a larger variety of charged protein segments, in order to examine their relationship to the phenomenon. Fisher’s exact test of independence was implemented on several generated data sets to confirm correlation between charged residue repeats (CRRs) and charge-dense regions (CDRs) within human protein sequences and their affinity for phase separation.


2021 ◽  
Author(s):  
Klara Kuret ◽  
Aram Gustav Amalietti ◽  
Jernej Ule

AbstractBackgroundCrosslinking and immunoprecipitation (CLIP) is a method used to identify in vivo RNA– protein binding sites on a transcriptome-wide scale. With the increasing amounts of available data for RNA-binding proteins (RBPs), it is important to understand to what degree the enriched motifs specify the RNA binding profiles of RBPs in cells.ResultsWe develop positionally-enriched k-mer analysis (PEKA), a computational tool for efficient analysis of enriched motifs from individual CLIP datasets, which minimises the impact of technical and regional genomic biases by internal data normalisation. We cross-validate PEKA with mCross, and show that background correction by size-matched input doesn’t generally improve the specificity of detected motifs. We identify motif classes with common enrichment patterns across eCLIP datasets and across RNA regions, while also observing variations in the specificity and the extent of motif enrichment across eCLIP datasets, between variant CLIP protocols, and between CLIP and in vitro binding data. Thereby we gain insights into the contributions of technical and regional genomic biases to the enriched motifs, and find how motif enrichment features relate to the domain composition and low-complexity regions (LCRs) of the studied proteins.ConclusionsOur study provides insights into the overall contributions of regional binding preferences, protein domains and LCRs to the specificity of protein-RNA interactions, and shows the value of cross-motif and cross-RBP comparison for data interpretation. Our results are presented for exploratory analysis via an online platform in an RBP-centric and motif-centric manner (https://imaps.goodwright.com/apps/peka/). PEKA is available from https://github.com/ulelab/peka.


2018 ◽  
Vol 19 (12) ◽  
pp. 4075 ◽  
Author(s):  
Martyna Urbanek-Trzeciak ◽  
Edyta Jaworska ◽  
Wlodzimierz Krzyzosiak

MicroRNAs (miRNAs) are short, non-coding post-transcriptional gene regulators. In mammalian cells, mature miRNAs are produced from primary precursors (pri-miRNAs) using canonical protein machinery, which includes Drosha/DGCR8 and Dicer, or the non-canonical mirtron pathway. In plant cells, mature miRNAs are excised from pri-miRNAs by the DICER-LIKE1 (DCL1) protein complex. The involvement of multiple regulatory proteins that bind directly to distinct miRNA precursors in a sequence- or structure-dependent manner adds to the complexity of the miRNA maturation process. Here, we present a web server that enables searches for miRNA precursors that can be recognized by diverse RNA-binding proteins based on known sequence motifs to facilitate the identification of other proteins involved in miRNA biogenesis. The database used by the web server contains known human, murine, and Arabidopsis thaliana pre-miRNAs. The web server can also be used to predict new RNA-binding protein motifs based on a list of user-provided sequences. We show examples of miRNAmotif applications, presenting precursors that contain motifs recognized by Lin28, MCPIP1, and DGCR8 and predicting motifs within pre-miRNA precursors that are recognized by two DEAD-box helicases—DDX1 and DDX17. miRNAmotif is released as an open-source software under the MIT License. The code is available at GitHub (www.github.com/martynaut/mirnamotif). The webserver is freely available at http://mirnamotif.ibch.poznan.pl.


2020 ◽  
Author(s):  
Timothy L. Bailey

AbstractSequence motif discovery algorithms can identify novel sequence patterns that perform biological functions in DNA, RNA and protein sequences—for example, the binding site motifs of DNA- and RNA-binding proteins. The STREME algorithm presented here advances the state-of-the-art in ab initio motif discovery in terms of both accuracy and versatility. Using in vivo DNA (ChIP-seq) and RNA (CLIP-seq) data, and validating motifs with reference motifs derived from in vitro data, we show that STREME is more accurate, sensitive, thorough and rapid than several widely used algorithms (DREME, HOMER, MEME, Peak-motifs and Weeder). STREME’s capabilities include the ability to find motifs in datasets with hundreds of thousands of sequences, to find both short and long motifs (from 3 to 30 positions), to perform differential motif discovery in pairs of sequence datasets, and to find motifs in sequences over virtually any alphabet (DNA, RNA, protein and user-defined alphabets). Unlike most motif discovery algorithms, STREME accurately estimates and reports the statistical significance of each motif that it discovers. STREME is easy to use via its web server at http://meme-suite.org, and is fully integrated with the widely-used MEME Suite of sequence analysis tools, which can be freely downloaded at the same web site for non-commercial use.


Parasitology ◽  
2012 ◽  
Vol 139 (8) ◽  
pp. 998-1004 ◽  
Author(s):  
X. CUI ◽  
T. LEI ◽  
D. Y. YANG ◽  
P. HAO ◽  
Q. LIU

SUMMARYImmune mapped protein 1 (IMP1) is a newly discovered protein in Eimeria maxima. It is recognized as a potential vaccine candidate against E. maxima and a highly conserved protein in apicomplexan parasites. Although the Neospora caninum IMP1 (NcIMP1) orthologue of E. maxima IMP1 was predicted in the N. caninum genome, it was still not identified and characterized. In this study, cDNA sequence encoding NcIMP1 was cloned by RT-PCR from RNA isolated from Nc1 tachyzoites. NcIMP1 was encoded by an open reading frame of 1182 bp, which encoded a protein of 393 amino acids with a predicted molecular weight of 42·9 kDa. Sequence analysis showed that there was neither a signal peptide nor a transmembrane region present in the NcIMP1 amino acid sequence. However, several kinds of functional protein motifs, including an N-myristoylation site and a palmitoylation site were predicted. Recombinant NcIMP1 (rNcIMP1) was expressed in Escherichia coli and then purified rNcIMP1 was used to prepare specific antisera in mice. Mouse polyclonal antibodies raised against the rNcIMP1 recognized an approximate 43 kDa native IMP1 protein. Immunofluorescence analysis showed that NcIMP1 was localized on the membrane of N. caninum tachyzoites. The N-myristoylation site and the palmitoylation site were found to contribute to the localization of NcIMP1. Furthermore, the rNcIMP1-specific antibodies could inhibit cell invasion by N. caninum tachyzoites in vitro. All the results indicate that NcIMP1 is likely to be a membrane protein of N. caninum and may be involved in parasite invasion.


1991 ◽  
Vol 11 (2) ◽  
pp. 894-905
Author(s):  
R A Voelker ◽  
W Gibson ◽  
J P Graves ◽  
J F Sterling ◽  
M T Eisenberg

The nucleotide sequence of the Drosophila melanogaster suppressor of sable [su(s)] gene has been determined. Comparison of genomic and cDNA sequences indicates that an approximately 7,860-nucleotide primary transcript is processed into an approximately 5-kb message, expressed during all stages of the life cycle, that contains an open reading frame capable of encoding a 1,322-amino-acid protein of approximately 150 kDa. The putative protein contains an RNA recognition motif-like region and a highly charged arginine-, lysine-, serine-, aspartic or glutamic acid-rich region that is similar to a region contained in several RNA-processing proteins. In vitro translation of in vitro-transcribed RNA from a complete cDNA yields a product whose size agrees with the size predicted by the open reading frame. Antisera against su(s) fusion proteins recognize the in vitro-translated protein and detect a protein of identical size in the nuclear fractions from tissue culture cells and embryos. The protein is also present in smaller amounts in cytoplasmic fractions of embryos. That the su(s) protein has regions similar in structure to RNA-processing protein is consistent with its known role in affecting the transcript levels of those alleles that it suppresses.


2003 ◽  
Vol 23 (19) ◽  
pp. 7055-7067 ◽  
Author(s):  
Shelly A. Waggoner ◽  
Stephen A. Liebhaber

ABSTRACT Posttranscriptional controls in higher eukaryotes are central to cell differentiation and developmental programs. These controls reflect sequence-specific interactions of mRNAs with one or more RNA binding proteins. The α-globin poly(C) binding proteins (αCPs) comprise a highly abundant subset of K homology (KH) domain RNA binding proteins and have a characteristic preference for binding single-stranded C-rich motifs. αCPs have been implicated in translation control and stabilization of multiple cellular and viral mRNAs. To explore the full contribution of αCPs to cell function, we have identified a set of mRNAs that associate in vivo with the major αCP2 isoforms. One hundred sixty mRNA species were consistently identified in three independent analyses of αCP2-RNP complexes immunopurified from a human hematopoietic cell line (K562). These mRNAs could be grouped into subsets encoding cytoskeletal components, transcription factors, proto-oncogenes, and cell signaling factors. Two mRNAs were linked to ceroid lipofuscinosis, indicating a potential role for αCP2 in this infantile neurodegenerative disease. Surprisingly, αCP2 mRNA itself was represented in αCP2-RNP complexes, suggesting autoregulatory control of αCP2 expression. In vitro analyses of representative target mRNAs confirmed direct binding of αCP2 within their 3′ untranslated regions. These data expand the list of mRNAs that associate with αCP2 in vivo and establish a foundation for modeling its role in coordinating pathways of posttranscriptional gene regulation.


2019 ◽  
Author(s):  
Isabelle Leticia Zaboroski Silva ◽  
Anny Waloski Robert ◽  
Guillermo Cabrera Cabo ◽  
Lucia Spangenberg ◽  
Marco Augusto Stimamiglio ◽  
...  

AbstractPosttranscriptional regulation plays a fundamental role in the biology of embryonic stem cells (ESCs). Many studies have demonstrated that multiple mRNAs are coregulated by one or more RNA binding proteins (RBPs) that orchestrate the expression of these molecules. A family of RBPs, known as PUF (Pumilio-FBF), is highly conserved among species and has been associated with the undifferentiated and differentiated states of different cell lines. In humans, two homologs of the PUF family have been found: Pumilio 1 (PUM1) and Pumilio 2 (PUM2). To understand the role of these proteins in human ESCs (hESCs), we first demonstrated the influence of the silencing of PUM1 and PUM2 on pluripotency genes. OCT4 and NANOG mRNA levels decreased significantly with the knockdown of Pumilio, suggesting that PUMILIO proteins play a role in the maintenance of pluripotency in hESCs. Furthermore, we observed that the hESCs silenced for PUM1 and 2 exhibited an improvement in efficiency of in vitro cardiomyogenic differentiation. Using in silico analysis, we identified mRNA targets of PUM1 and PUM2 expressed during cardiomyogenesis. With the reduction of PUM1 and 2, these target mRNAs would be active and could be involved in the progression of cardiomyogenesis.


Author(s):  
Moumita Mukherjee ◽  
Srikanta Goswami

RNA-binding proteins (RBPs) play a significant role in multiple cellular processes with their deregulations strongly associated with cancer. However, there are not adequate evidences regarding global alteration and functions of RBPs in pancreatic cancer, interrogated in a systematic manner. In this study, we have prepared an exhaustive list of RBPs from multiple sources, downloaded gene expression microarray data from a total of 241 pancreatic tumors and 124 normal pancreatic tissues, performed a meta-analysis, and obtained differentially expressed RBPs (DE-RBPs) using the Limma package of R Bioconductor. The results were validated in microarray datasets and the Cancer Genome Atlas (TCGA) RNA sequencing dataset for pancreatic adenocarcinoma (PAAD). Pathway enrichment analysis was performed using DE-RBPs, and we also constructed the protein–protein interaction (PPI) network to detect key modules and hub-RBPs. Coding and noncoding targets for top altered and hub RBPs were identified, and altered pathways modulated by these targets were also investigated. Our meta-analysis identified 45 upregulated and 15 downregulated RBPs as differentially expressed in pancreatic cancer, and pathway enrichment analysis demonstrated their important contribution in tumor development. As a result of PPI network analysis, 26 hub RBPs were detected and coding and noncoding targets for all these RBPs were categorized. Functional exploration characterized the pathways related to epithelial-to-mesenchymal transition (EMT), cell migration, and metastasis to emerge as major pathways interfered by the targets of these RBPs. Our study identified a unique meta-signature of 26 hub-RBPs to primarily modulate pancreatic tumor cell migration and metastasis in pancreatic cancer. IGF2BP3, ISG20, NIP7, PRDX1, RCC2, RUVBL1, SNRPD1, PAIP2B, and SIDT2 were found to play the most prominent role in the regulation of EMT in the process. The findings not only contribute to understand the biology of RBPs in pancreatic cancer but also to evaluate their candidature as possible therapeutic targets.


2018 ◽  
Author(s):  
Alina Munteanu ◽  
Neelanjan Mukherjee ◽  
Uwe Ohler

AbstractMotivationRNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized.ResultsWe developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3‘UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP.AvailabilitySSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/[email protected]


Sign in / Sign up

Export Citation Format

Share Document