scholarly journals ProTG4: A Web Server to Approximate the Sequence of a Generic Protein From an in Silico Library of Translatable G-Quadruplex (TG4)-Mapped Peptides

2021 ◽  
Vol 15 ◽  
pp. 117793222110458
Author(s):  
Siddhartha Kundu

An RNA G-quadruplex in the protein coding segment of mRNA is translatable [Formula: see text] and may potentially impact protein translation. This can be consequent to staggered ribosomal synthesis and/or result in an increased frequency of missense translational events. A mathematical model of the peptides that encompass the substituted amino acids, ie, the [Formula: see text]-mapped peptidome, has been previously studied. However, the significance and relevance to disease biology of this model remains to be established. ProTG4 computes a confidence-of-sequence-identity [Formula: see text]-score, which is the average weighted length of every matched [Formula: see text]-mapped peptide in a generic protein sequence. The weighted length is the product of the length of the peptide and the probability of its non-random occurrence in a library of randomly generated sequences of equivalent lengths. This is then averaged over the entire length of the protein sequence. ProTG4 is simple to operate, has clear instructions, and is accompanied by a set of ready-to-use examples. The rationale of the study, algorithms deployed, and the computational pipeline deployed are also part of the web page. Analyses by ProTG4 of taxonomically diverse protein sequences suggest that there is significant homology to [Formula: see text]-mapped peptides. These findings, especially in potentially infectious and infesting agents, offer plausible explanations into the aetiology and pathogenesis of certain proteopathies. ProTG4 can also provide a quantitative measure to identify and annotate the canonical form of a generic protein sequence from its known isoforms. The article presents several case studies and discusses the relevance of ProTG4-assisted peptide analysis in gaining insights into various mechanisms of disease biology (mistranslation, alternate splicing, amino acid substitutions).

2008 ◽  
Vol 33 (2) ◽  
pp. 139-147 ◽  
Author(s):  
Chunxiang Zhang

Genomic evidence reveals that gene expression in humans is precisely controlled in cellular, tissue-type, temporal, and condition-specific manners. Completely understanding the regulatory mechanisms of gene expression is therefore one of the most important issues in genomic medicine. Surprisingly, recent analyses of the human and animal genomes have demonstrated that the majority of RNA transcripts are relatively small, noncoding RNAs (sncRNAs), rather than large, protein coding message RNAs (mRNAs). Moreover, these sncRNAs may represent a novel important layer of regulation for gene expression. The most important breakthrough in this new area is the discovery of microRNAs (miRNAs). miRNAs comprise a novel class of endogenous, small, noncoding RNAs that negatively regulate gene expression via degradation or translational inhibition of their target mRNAs. As a group, miRNAs may directly regulate ∼30% of the genes in the human genome. In keeping with the nomenclature of RNomics, which is to study sncRNAs on the genomic scale, “microRNomics” is coined here to describe a novel subdiscipline of genomics that studies the identification, expression, biogenesis, structure, regulation of expression, targets, and biological functions of miRNAs on the genomic scale. A growing body of exciting evidence suggests that miRNAs are important regulators of cell differentiation, proliferation/growth, mobility, and apoptosis. These miRNAs therefore play important roles in development and physiology. Consequently, dysregulation of miRNA function may lead to human diseases such as cancer, cardiovascular disease, liver disease, immune dysfunction, and metabolic disorders. microRNomics may be a newly emerging approach for human disease biology.


Blood ◽  
2006 ◽  
Vol 108 (12) ◽  
pp. 3646-3653 ◽  
Author(s):  
Ramesh A. Shivdasani

AbstractThe existence and roles of a class of abundant regulatory RNA molecules have recently come into sharp focus. Micro-RNAs (miRNAs) are small (approximately 22 bases), non–protein-coding RNAs that recognize target sequences of imperfect complementarity in cognate mRNAs and either destabilize them or inhibit protein translation. Although mechanisms of miRNA biogenesis have been elucidated in some detail, there is limited appreciation of their biological functions. Reported examples typically focus on miRNA regulation of a single tissue-restricted transcript, often one encoding a transcription factor, that controls a specific aspect of development, cell differentiation, or physiology. However, computational algorithms predict up to hundreds of putative targets for individual miRNAs, single transcripts may be regulated by multiple miRNAs, and miRNAs may either eliminate target gene expression or serve to finetune transcript and protein levels. Theoretical considerations and early experimental results hence suggest diverse roles for miRNAs as a class. One appealing possibility, that miRNAs eliminate low-level expression of unwanted genes and hence refine unilineage gene expression, may be especially amenable to evaluation in models of hematopoiesis. This review summarizes current understanding of miRNA mechanisms, outlines some of the important outstanding questions, and describes studies that attempt to define miRNA functions in hematopoiesis.


2019 ◽  
Author(s):  
Xujun Wang ◽  
Jingru Tian ◽  
Peng Cui ◽  
Stephen Mastriano ◽  
Dingyao Zhang ◽  
...  

AbstractMicroRNAs (miRNAs) regulate protein-coding gene expression primarily through cognitive binding sites in the 3’ untranslated regions (3′ UTRs). Seed sites are sequences in messenger RNAs (mRNAs) that form perfect Watson-Crick base-paring with a miRNA’s seed region, which can effectively reduce mRNA abundance and/or repress protein translation. Some seedless sites, which do no form perfect seed-pairing with a miRNA, can also lead to target repression, often with lower efficacy. Here we report the surprising finding that when seedless sites and seed sites are co-present in the same 3’UTR, seedless sites attenuate strong-seed-site-mediated target suppression, independent of 3′ UTR length. This attenuation effect is detectable in >70% of transcriptomic datasets examined, in which specific miRNAs are experimentally increased or decreased. The attenuation effect is confirmed by 3’UTR reporter assays and mediated through base-pairing between miRNA and seedless sites. Furthermore, this seedless-site-based attenuation effect could affect seed sites of the same miRNA or another miRNA, thus partially explaining the variability in target suppression and miRNA-mediated gene upregulation. Our findings reveal an unexpected principle of miRNA-mediated gene regulation, and could impact the understanding of many miRNA-regulated biological processes.


2017 ◽  
Author(s):  
Weibing Yang ◽  
Raymond Wightman ◽  
Elliot M. Meyerowitz

AbstractIn eukaryotic cells, most RNA molecules are exported into the cytoplasm after being transcribed in the nucleus. Long noncoding RNAs (lncRNAs) have been found to reside and function primarily inside the nucleus, but nuclear localization of protein-coding messenger RNAs (mRNAs) has been considered rare in both animals and plants. Here we show that two mRNAs, transcribed from theCDC20andCCS52B(plant orthologue ofCDH1) genes, are specifically sequestered inside the nucleus during the cell cycle. CDC20 and CDH1 both function as coactivators of the anaphase-promoting complex or cyclosome (APC/C) E3 ligase to trigger cyclin B (C YCB) destruction. In theArabidopsis thalianashoot apical meristem (SAM), we findCDC20andCCS52Bare co-expressed withCYCBsin mitotic cells.CYCBtranscripts can be exported and translated, whereasCDC20andCCS52BmRNAs are strictly confined to the nucleus at prophase and the cognate proteins are not translated until the redistribution of the mRNAs to the cytoplasm after nuclear envelope breakdown (NEBD) at prometaphase. The 5’ untranslated region (UTR) is necessary and sufficient forCDC20mRNA nuclear localization as well as protein translation. Mitotic enrichment ofCDC20andCCS52Btranscripts enables the timely and rapid activation of APC/C, while their nuclear sequestration at prophase appears to protect cyclins from precocious degradation.


eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
Will Putzbach ◽  
Ashley Haluck-Kangas ◽  
Quan Q Gao ◽  
Aishe A Sarshad ◽  
Elizabeth T Bartom ◽  
...  

CD95/Fas ligand binds to the death receptor CD95 to induce apoptosis in sensitive cells. We previously reported that CD95L mRNA is enriched in sequences that, when converted to si/shRNAs, kill all cancer cells by targeting critical survival genes (<xref ref-type="bibr" rid="bib27">Putzbach et al., 2017</xref>). We now report expression of full-length CD95L mRNA itself is highly toxic to cells and induces a similar form of cell death. We demonstrate that small (s)RNAs derived from CD95L are loaded into the RNA induced silencing complex (RISC) which is required for the toxicity and processing of CD95L mRNA into sRNAs is independent of both Dicer and Drosha. We provide evidence that in addition to the CD95L transgene a number of endogenous protein coding genes involved in regulating protein translation, particularly under low miRNA conditions, can be processed to sRNAs and loaded into the RISC suggesting a new level of cell fate regulation involving RNAi.


2013 ◽  
Vol 21 (3-4) ◽  
pp. 118-124 ◽  
Author(s):  
Rajendra Bhadane ◽  
Rupali Bhadane ◽  
Dhananjay Meshram

Guanine rich sequences have the ability to fold into stable 4 stranded structures called G-quadruplex under physiological concentrations of Na+ or K+. G-quadruplexes are found in telomeres, being stable structures under the control of telomerase binding proteins. They are also identified throughout the genome and are enriched in promoter regions of protein coding genes, upstream and downstream of the transcription initiation sites. A number of these promoter quadruplexes have been investigated for several proto-oncogenes. The formation of these quadruplexes can lead to chemical intervention of gene expression using a G-quadruplex binding ligand. We review location, configuration, and stabilization of these quadruplexes in some of the important promoters with regards to their potential as anticancer target.


Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. 390-390
Author(s):  
Paul F. Bray ◽  
Steven E. McKenzie ◽  
Leonard C. Edelstein ◽  
Srikanth Nagalla ◽  
Kathleen Delgrosso ◽  
...  

Abstract Abstract 390 A conspicuous lesson that has emerged from the 1000 Genomes Project is the greater genetic variation in the population than previously appreciated. Transcriptomics is rapidly assuming a prominent role in the understanding of basic molecular mechanisms accounting for variation within the normal population and disease states. Besides protein-coding RNAs, the importance of non-coding RNAs (ncRNAs) – primarily as regulators of gene expression – is well recognized but largely unexplored. The platelet transcriptome reflects megakaryocyte RNA content at the time of proplatelet release, subsequent splicing events, selective packaging and platelet RNA stability. An accurate understanding of the platelet transcriptome has both biological (improved understanding of platelet protein translation and the mechanisms of megakaryocyte/platelet gene expression) and clinical (novel biomarkers of disease) relevance. We carried out transcriptome sequencing of total RNA isolated from leukocyte-depleted platelet preparations from four healthy adults using an AB/LT SOLiD™ system. For each individual, we constructed 3 libraries: a) long (≥ 40 nucleotides) total RNA, b) long RNA depleted of rRNA, and c) short (< 40 nucleotides) RNA. ∼1 billion reads from the 12 datasets were mapped on each chromosome and strand of the human genome. About one-third mapped uniquely, similar to other unbiased methods like SAGE. Normalizing for transcript length and scale using ß-actin expression level provided the ability to appropriately scale expression within a read-set and to compare expression levels across read-sets. Of the known protein-coding loci, ∼9,500 were present in human platelets. Plotting the number of protein-coding genes as a function of the level of normalized expression underscored different gene estimates between total and rRNA-depleted RNA preparations, and substantial inter-individual variation in the less abundant genes. RT-PCR validated the RNA-seq estimates of transcript levels exhibiting a range of >3 orders of magnitude of normalized read counts (r=0.7757; p=0.0001). A strong correlation was measured between mRNAs identified by RNA-seq and 3 published microarray datasets for well-expressed mRNAs, although RNA-seq identified many more transcripts of lower abundance. Unexpectedly, ribosomal RNA depletion significantly and adversely affected estimates of the relative abundance of transcripts including members of the RNA interference pathway DGCR8, DROSHA, XPO5, DICER1, EIF2C1-4, which exhibited large differences (up to 32-fold) between the total and rRNA-depleted preparations. A rigorous and highly stringent approach identified bona fide intronic regions that gave rise to 6,992 and 1,236 currently uncharacterized long and short RNA transcripts, respectively. We discovered numerous previously unreported antisense transcripts: 1) to known protein-coding regions of the genome, 2) 10 miRNA precursors where each locus generated 1–2 distinct antisense transcripts, presumably mature and “star” miRNAs, and 3) long and short RNAs antisense to several known repeat families. We did not observe enrichment of long-intergenic ncRNAs. We considered various possible explanations for the ∼60% sequence reads that could not be mapped on the genome. Much more lenient parameter settings only accounted for only ∼6.5% sequenced reads. An even smaller fraction of reads was observed when considering all possible combinations of exon-exon junctions in the genome (12,382,819 junctions) and the highly polymorphic HLA region of chr 6, indicating these did not contribute in any substantive manner to the platelet transcriptome. Lastly, RNA-seq was highly reproducible (>97 for 1 subject studied on 4 occasions). In summary, our work reveals a richness and diversity of platelet RNA molecules, suggesting a context where platelet biology transcends protein- and mRNA-centric descriptions. We will provide a publicly available web tool of these data embedded in a local mirror of the UCSC genome browser, facilitating the elucidation of previously unappreciated molecular species and molecular interactions. This will eventually permit an improved understanding of the molecular mechanisms that regulate platelet physiology and that contribute to disorders of thrombosis, hemostasis and inflammation. Disclosures: No relevant conflicts of interest to declare.


Nutrients ◽  
2018 ◽  
Vol 10 (12) ◽  
pp. 1831
Author(s):  
Pui-Pik Law ◽  
Michelle Holland

Protein encoding genes constitute a small fraction of mammalian genomes. In addition to the protein coding genes, there are other functional units within the genome that are transcribed, but not translated into protein, the so called non-coding RNAs. There are many types of non-coding RNAs that have been identified and shown to have important roles in regulating gene expression either at the transcriptional or post-transcriptional level. A number of recent studies have highlighted that dietary manipulation in mammals can influence the expression or function of a number of classes of non-coding RNAs that contribute to the protein translation machinery. The identification of protein translation as a common target for nutritional regulation underscores the need to investigate how this may mechanistically contribute to phenotypes and diseases that are modified by nutritional intervention. Finally, we describe the state of the art and the application of emerging ‘-omics’ technologies to address the regulation of protein translation in response to diet.


2014 ◽  
Vol 369 (1652) ◽  
pp. 20130504 ◽  
Author(s):  
Neil R. Smalheiser

If mRNAs were the only RNAs made by a neuron, there would be a simple mapping of mRNAs to proteins. However, microRNAs and other non-coding RNAs (ncRNAs; endo-siRNAs, piRNAs, BC1, BC200, antisense and long ncRNAs, repeat-related transcripts, etc.) regulate mRNAs via effects on protein translation as well as transcriptional and epigenetic mechanisms. Not only are genes ON or OFF, but their ability to be translated can be turned ON or OFF at the level of synapses, supporting an enormous increase in information capacity. Here, I review evidence that ncRNAs are expressed pervasively within dendrites in mammalian brain; that some are activity-dependent and highly enriched near synapses; and that synaptic ncRNAs participate in plasticity responses including learning and memory. Ultimately, ncRNAs can be viewed as the post-it notes of the neuron. They have no literal meaning of their own, but derive their functions from where (and to what) they are stuck. This may explain, in part, why ncRNAs differ so dramatically from protein-coding genes, both in terms of the usual indicators of functionality and in terms of evolutionary constraints. ncRNAs do not appear to be direct mediators of synaptic transmission in the manner of neurotransmitters or receptors, yet they orchestrate synaptic plasticity—and may drive species-specific changes in cognition.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Corentin Meyer ◽  
Nicolas Scalzitti ◽  
Anne Jeannin-Girardon ◽  
Pierre Collet ◽  
Olivier Poch ◽  
...  

Abstract Background Recent advances in sequencing technologies have led to an explosion in the number of genomes available, but accurate genome annotation remains a major challenge. The prediction of protein-coding genes in eukaryotic genomes is especially problematic, due to their complex exon–intron structures. Even the best eukaryotic gene prediction algorithms can make serious errors that will significantly affect subsequent analyses. Results We first investigated the prevalence of gene prediction errors in a large set of 176,478 proteins from ten primate proteomes available in public databases. Using the well-studied human proteins as a reference, a total of 82,305 potential errors were detected, including 44,001 deletions, 27,289 insertions and 11,015 mismatched segments where part of the correct protein sequence is replaced with an alternative erroneous sequence. We then focused on the mismatched sequence errors that cause particular problems for downstream applications. A detailed characterization allowed us to identify the potential causes for the gene misprediction in approximately half (5446) of these cases. As a proof-of-concept, we also developed a simple method which allowed us to propose improved sequences for 603 primate proteins. Conclusions Gene prediction errors in primate proteomes affect up to 50% of the sequences. Major causes of errors include undetermined genome regions, genome sequencing or assembly issues, and limitations in the models used to represent gene exon–intron structures. Nevertheless, existing genome sequences can still be exploited to improve protein sequence quality. Perspectives of the work include the characterization of other types of gene prediction errors, as well as the development of a more comprehensive algorithm for protein sequence error correction.


Sign in / Sign up

Export Citation Format

Share Document