scholarly journals Searching for ncRNAs in eukaryotic genomes: Maximizing biological input with RNAmotif

2004 ◽  
Vol 1 (1) ◽  
pp. 64-79 ◽  
Author(s):  
Lesley J. Collins ◽  
Thomas J. Macke ◽  
David Penny

Summary Non-coding RNAs (ncRNAs) contain both characteristic secondary-structure and short sequence motifs. However, “complex” ncRNAs (RNA bound to proteins in ribonucleoprotein complexes) can be hard to identify in genomic sequence data. Programs able to search for ncRNAs were previously limited to ncRNA molecules that either align very well or have highly conserved secondary-structure. The RNAmotif program uses additional information to find ncRNA gene candidates through the design of an appropriate “descriptor” to model sequence motifs, secondary-structure and protein/RNA binding information. This enables searches of those ncRNAs that contain variable secondary-structure and limited sequence motif information. Applying the biologically-based concept of “positive and negative controls” to the RNAmotif search technique, we can now go beyond the testing phase to successfully search real genomes, complete with their background noise and related molecules. Descriptors are designed for two “complex” ncRNAs, the U5snRNA (from the spliceosome) and RNaseP RNA, which successfully uncover these sequences from some eukaryotic genomes. We include explanations about the construction of the input “descriptors” from known biological information, to allow searches for other ncRNAs. RNAmotif maximizes the input of biological knowledge into a search for an ncRNA gene and now allows the investigation of some of the hardest-to-find, yet important, genes in some very interesting eukaryotic organisms.

2012 ◽  
Vol 2012 ◽  
pp. 1-5 ◽  
Author(s):  
Hamed Bostan ◽  
Naomie Salim ◽  
Zeti Azura Hussein ◽  
Peter Klappa ◽  
Mohd Shahir Shamsir

Computational approaches to the disulphide bonding state and its connectivity pattern prediction are based on various descriptors. One descriptor is the amino acid sequence motifs flanking the cysteine residue motifs. Despite the existence of disulphide bonding information in many databases and applications, there is no complete reference and motif query available at the moment. Cysteine motif database (CMD) is the first online resource that stores all cysteine residues, their flanking motifs with their secondary structure, and propensity values assignment derived from the laboratory data. We extracted more than 3 million cysteine motifs from PDB and UniProt data, annotated with secondary structure assignment, propensity value assignment, and frequency of occurrence and coefficiency of their bonding status. Removal of redundancies generated 15875 unique flanking motifs that are always bonded and 41577 unique patterns that are always nonbonded. Queries are based on the protein ID, FASTA sequence, sequence motif, and secondary structure individually or in batch format using the provided APIs that allow remote users to query our database via third party software and/or high throughput screening/querying. The CMD offers extensive information about the bonded, free cysteine residues, and their motifs that allows in-depth characterization of the sequence motif composition.


2021 ◽  
Vol 4 (9) ◽  
pp. e202000659
Author(s):  
Mengge Shan ◽  
Xinjun Ji ◽  
Kevin Janssen ◽  
Ian M Silverman ◽  
Jesse Humenik ◽  
...  

Two features of eukaryotic RNA molecules that regulate their post-transcriptional fates are RNA secondary structure and RNA-binding protein (RBP) interaction sites. However, a comprehensive global overview of the dynamic nature of these sequence features during erythropoiesis has never been obtained. Here, we use our ribonuclease-mediated structure and RBP-binding site mapping approach to reveal the global landscape of RNA secondary structure and RBP–RNA interaction sites and the dynamics of these features during this important developmental process. We identify dynamic patterns of RNA secondary structure and RBP binding throughout the process and determine a set of corresponding protein-bound sequence motifs along with their dynamic structural and RBP-binding contexts. Finally, using these dynamically bound sequences, we identify a number of RBPs that have known and putative key functions in post-transcriptional regulation during mammalian erythropoiesis. In total, this global analysis reveals new post-transcriptional regulators of mammalian blood cell development.


2016 ◽  
Author(s):  
David Heller ◽  
Martin Vingron ◽  
Ralf Krestel ◽  
Uwe Ohler ◽  
Annalisa Marsico

AbstractRNA-binding proteins (RBPs) play important roles in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. To which extent RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders which produce informative motifs and simultaneously capture the relationship between primary sequence and different RNA secondary structures are missing. We developed ssHMM, an RNA motif finder that combines a hidden Markov model (HMM) with Gibbs sampling to learn the joint sequence and structure binding preferences of RBPs from high-throughput data, such as CLIP-Seq sequences, and visualizes them as a graph. Evaluations on synthetic data showed that ssHMM reliably recovers fuzzy sequence motifs in 80 to 100% of the cases. It produces motifs with higher information content than existing tools and is faster than other methods on large datasets. Examples of new sequence-structure motifs identified by ssHMM for uncharacterized RBPs are also discussed. ssHMM is freely available on Github at https://github.molgen.mpg.de/heller/ssHMM.


2003 ◽  
Vol 23 (17) ◽  
pp. 5959-5971 ◽  
Author(s):  
Hui Zhu ◽  
Robert A. Hasman ◽  
Katherine M. Young ◽  
Nancy L. Kedersha ◽  
Hua Lou

ABSTRACT Alternative RNA processing of human calcitonin/CGRP pre-mRNA is regulated by an intronic enhancer element. Previous studies have demonstrated that multiple sequence motifs within the enhancer and a number of trans-acting factors play critical roles in the regulation. Here, we report the identification of TIAR as a novel player in the regulation of human calcitonin/CGRP alternative RNA processing. TIAR binds to the U tract sequence motif downstream of a pseudo 5′ splice site within the previously characterized intron enhancer element. Binding of TIAR promotes inclusion of the alternative 3′-terminal exon located more than 200 nucleotides upstream from the U tract. In cells that preferentially include this exon, overexpression of a mutant TIAR that lacks the RNA binding domains suppressed inclusion of this exon. In this report, we also demonstrate an unusual novel interaction between U6 snRNA and the pseudo 5′ splice site, which was shown previously to bind U1 snRNA. Interestingly, TIAR binding to the U tract sequence depends on the interaction of not only U1 but also U6 snRNA with the pseudo 5′ splice site. Conversely, TIAR binding promotes U6 snRNA binding to its target. The synergistic relationship between TIAR and U6 snRNA strongly suggests a novel role of U6 snRNP in regulated alternative RNA processing.


1998 ◽  
Vol 143 (4) ◽  
pp. 887-899 ◽  
Author(s):  
Jonathan S. Rosenblum ◽  
Lucy F. Pemberton ◽  
Neris Bonifaci ◽  
Günter Blobel

La (SS-B) is a highly expressed protein that is able to bind 3′-oligouridylate and other common RNA sequence/structural motifs. By virtue of these interactions, La is present in a myriad of nuclear and cytoplasmic ribonucleoprotein complexes in vivo where it may function as an RNA-folding protein or RNA chaperone. We have recently characterized the nuclear import pathway of the S. cerevisiae La, Lhp1p. The soluble transport factor, or karyopherin, that mediates the import of Lhp1p is Kap108p/Sxm1p. We have now determined a 113-amino acid domain of Lhp1p that is brought to the nucleus by Kap108p. Unexpectedly, this domain does not coincide with the previously identified nuclear localization signal of human La. Furthermore, when expressed in Saccharomyces cerevisiae, the nuclear localization of Schizosaccharomyces pombe, Drosophila, and human La proteins are independent of Kap108p. We have been able to reconstitute the nuclear import of human La into permeabilized HeLa cells using the recombinant human factors karyopherin α2, karyopherin β1, Ran, and p10. As such, the yeast and human La proteins are imported using different sequence motifs and dissimilar karyopherins. Our results are consistent with an intermingling of the nuclear import and evolution of La.


2018 ◽  
Author(s):  
Peter K. Koo ◽  
Sean R. Eddy

AbstractAlthough convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they build representations of regulatory genomic sequences. Here we perform systematic experiments on synthetic sequences to reveal how CNN architecture, specifically convolutional filter size and max-pooling, influences the extent that sequence motif representations are learned by first layer filters. We find that CNNs designed to foster hierarchical representation learning of sequence motifs - assembling partial features into whole features in deeper layers - tend to learn distributed representations, i.e. partial motifs. On the other hand, CNNs that are designed to limit the ability to hierarchically build sequence motif representations in deeper layers tend to learn more interpretable localist representations, i.e. whole motifs. We then validate that this representation learning principle established from synthetic sequences generalizes to in vivo sequences.


2020 ◽  
Vol 402 (1) ◽  
pp. 89-98
Author(s):  
Nathalie Meiser ◽  
Nicole Mench ◽  
Martin Hengesbach

AbstractN6-methyladenosine (m6A) is the most abundant modification in mRNA. The core of the human N6-methyltransferase complex (MTC) is formed by a heterodimer consisting of METTL3 and METTL14, which specifically catalyzes m6A formation within an RRACH sequence context. Using recombinant proteins in a site-specific methylation assay that allows determination of quantitative methylation yields, our results show that this complex methylates its target RNAs not only sequence but also secondary structure dependent. Furthermore, we demonstrate the role of specific protein domains on both RNA binding and substrate turnover, focusing on postulated RNA binding elements. Our results show that one zinc finger motif within the complex is sufficient to bind RNA, however, both zinc fingers are required for methylation activity. We show that the N-terminal domain of METTL3 alters the secondary structure dependence of methylation yields. Our results demonstrate that a cooperative effect of all RNA-binding elements in the METTL3–METTL14 complex is required for efficient catalysis, and that binding of further proteins affecting the NTD of METTL3 may regulate substrate specificity.


Oncogene ◽  
2021 ◽  
Author(s):  
Panagiotis Papoutsoglou ◽  
Dorival Mendes Rodrigues-Junior ◽  
Anita Morén ◽  
Andrew Bergman ◽  
Fredrik Pontén ◽  
...  

AbstractActivation of the transforming growth factor β (TGFβ) pathway modulates the expression of genes involved in cell growth arrest, motility, and embryogenesis. An expression screen for long noncoding RNAs indicated that TGFβ induced mir-100-let-7a-2-mir-125b-1 cluster host gene (MIR100HG) expression in diverse cancer types, thus confirming an earlier demonstration of TGFβ-mediated transcriptional induction of MIR100HG in pancreatic adenocarcinoma. MIR100HG depletion attenuated TGFβ signaling, expression of TGFβ-target genes, and TGFβ-mediated cell cycle arrest. Moreover, MIR100HG silencing inhibited both normal and cancer cell motility and enhanced the cytotoxicity of cytostatic drugs. MIR100HG overexpression had an inverse impact on TGFβ signaling responses. Screening for downstream effectors of MIR100HG identified the ligand TGFβ1. MIR100HG and TGFB1 mRNA formed ribonucleoprotein complexes with the RNA-binding protein HuR, promoting TGFβ1 cytokine secretion. In addition, TGFβ regulated let-7a-2–3p, miR-125b-5p, and miR-125b-1–3p expression, all encoded by MIR100HG intron-3. Certain intron-3 miRNAs may be involved in TGFβ/SMAD-mediated responses (let-7a-2–3p) and others (miR-100, miR-125b) in resistance to cytotoxic drugs mediated by MIR100HG. In support of a model whereby TGFβ induces MIR100HG, which then enhances TGFβ1 secretion, analysis of human carcinomas showed that MIR100HG expression correlated with expression of TGFB1 and its downstream extracellular target TGFBI. Thus, MIR100HG controls the magnitude of TGFβ signaling via TGFβ1 autoinduction and secretion in carcinomas.


1993 ◽  
Vol 13 (5) ◽  
pp. 3002-3014
Author(s):  
K Kudrycki ◽  
C Stein-Izsak ◽  
C Behn ◽  
M Grillo ◽  
R Akeson ◽  
...  

We report characterization of several domains within the 5' flanking region of the olfactory marker protein (OMP) gene that may participate in regulating transcription of this and other olfactory neuron-specific genes. Analysis by electrophoretic mobility shift assay and DNase I footprinting identifies two regions that contain a novel sequence motif. Interactions between this motif and nuclear proteins were detected only with nuclear protein extracts derived from olfactory neuroepithelium, and this activity is more abundant in olfactory epithelium enriched in immature neurons. We have designated a factor(s) involved in this binding as Olf-1. The Olf-1-binding motif consensus sequence was defined as TCCCC(A/T)NGGAG. Studies with transgenic mice indicate that a 0.3-kb fragment of the OMP gene containing one Olf-1 motif is sufficient for olfactory tissue-specific expression of the reporter gene. Some of the other identified sequence motifs also interact specifically with olfactory nuclear protein extracts. We propose that Olf-1 is a novel, olfactory neuron-specific trans-acting factor involved in the cell-specific expression of OMP.


2008 ◽  
Vol 86 (1) ◽  
pp. 31-36 ◽  
Author(s):  
Zachery R. Belak ◽  
Andrew Ficzycz ◽  
Nick Ovsenek

YY1 (Yin Yang 1) is present in the Xenopus oocyte cytoplasm as a constituent of messenger ribonucleoprotein complexes (mRNPs). Association of YY1 with mRNPs requires direct RNA-binding activity. Previously, we have shown YY1 has a high affinity for U-rich RNA; however, potential interactions with plausible in vivo targets have not been investigated. Here we report a biochemical characterization of the YY1–RNA interaction including an investigation of the stability, potential 5′-methylguanosine affinity, and specificity for target RNAs. The formation of YY1–RNA complexes in vitro was highly resistant to thermal, ionic, and detergent disruption. The endogenous oocyte YY1–mRNA interactions were also found to be highly stable. Specific YY1–RNA interactions were observed with selected mRNA and 5S RNA probes. The affinity of YY1 for these substrates was within an order of magnitude of that for its cognate DNA element. Experiments aimed at determining the potential role of the 7-methylguanosine cap on RNA-binding reveal no significant difference in the affinity of YY1 for capped or uncapped mRNA. Taken together, the results show that the YY1–RNA interaction is highly stable, and that YY1 possesses the ability to interact with structurally divergent RNA substrates. These data are the first to specifically document the interaction between YY1 and potential in vivo targets.


Sign in / Sign up

Export Citation Format

Share Document