scholarly journals Conserved long-range base pairings are associated with pre-mRNA processing of human genes

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Svetlana Kalmykova ◽  
Marina Kalinina ◽  
Stepan Denisov ◽  
Alexey Mironov ◽  
Dmitry Skvortsov ◽  
...  

AbstractThe ability of nucleic acids to form double-stranded structures is essential for all living systems on Earth. Current knowledge on functional RNA structures is focused on locally-occurring base pairs. However, crosslinking and proximity ligation experiments demonstrated that long-range RNA structures are highly abundant. Here, we present the most complete to-date catalog of conserved complementary regions (PCCRs) in human protein-coding genes. PCCRs tend to occur within introns, suppress intervening exons, and obstruct cryptic and inactive splice sites. Double-stranded structure of PCCRs is supported by decreased icSHAPE nucleotide accessibility, high abundance of RNA editing sites, and frequent occurrence of forked eCLIP peaks. Introns with PCCRs show a distinct splicing pattern in response to RNAPII slowdown suggesting that splicing is widely affected by co-transcriptional RNA folding. The enrichment of 3’-ends within PCCRs raises the intriguing hypothesis that coupling between RNA folding and splicing could mediate co-transcriptional suppression of premature pre-mRNA cleavage and polyadenylation.

Author(s):  
Svetlana Kalmykova ◽  
Marina Kalinina ◽  
Stepan Denisov ◽  
Alexey Mironov ◽  
Dmitry Skvortsov ◽  
...  

AbstractThe ability of nucleic acids to form double-stranded structures is essential for all living systems on Earth. While DNA employs it for genome replication, RNA molecules fold into complicated secondary and tertiary structures. Current knowledge on functional RNA structures in human protein-coding genes is focused on locally-occurring base pairs. However, chemical crosslinking and proximity ligation experiments have demonstrated that long-range RNA structures are highly abundant. Here, we present the most complete to-date catalog of conserved long-range RNA structures in the human transcriptome, which consists of 1.1 million pairs of conserved complementary regions (PCCRs). PCCRs tend to occur within introns proximally to splice sites, suppress intervening exons, circumscribe circular RNAs, and exert an obstructive effect on cryptic and inactive splice sites. The double-stranded structure of PCCRs is supported by a significant decrease of icSHAPE nucleotide accessibility, high abundance of A-to-I RNA editing sites, and frequent nearby occurrence of forked eCLIP peaks. Introns with PCCRs show a distinct splicing pattern in response to RNA Pol II slowdown suggesting that splicing is widely affected by co-transcriptional RNA folding. Additionally, transcript starts and ends are strongly enriched in regions between complementary parts of PCCRs, leading to an intriguing hypothesis that RNA folding coupled with splicing could mediate co-transcriptional suppression of premature cleavage and polyadenylation events. PCCR detection procedure is highly sensitive with respect to bona fide validated RNA structures at the expense of having a high false positive rate, which cannot be reduced without loss of sensitivity. The catalog of PCCRs is visualized through a UCSC Genome Browser track hub to facilitate further genome research on long-range RNA structures.


Author(s):  
Fabian Amman ◽  
Stephan H. Bernhart ◽  
Gero Doose ◽  
Ivo L. Hofacker ◽  
Jing Qin ◽  
...  
Keyword(s):  

2021 ◽  
pp. 1-21
Author(s):  
Roberta Migale ◽  
Michelle Neumann ◽  
Robin Lovell-Badge

The development of sexually dimorphic gonads is a unique process that starts with the specification of the bipotential genital ridges and culminates with the development of fully differentiated ovaries and testes in females and males, respectively. Research on sex determination has been mostly focused on the identification of sex determination genes, the majority of which encode for proteins and specifically transcription factors such as SOX9 in the testes and FOXL2 in the ovaries. Our understanding of which factors may be critical for sex determination have benefited from the study of human disorders of sex development (DSD) and animal models, such as the mouse and the goat, as these often replicate the same phenotypes observed in humans when mutations or chromosomic rearrangements arise in protein-coding genes. Despite the advances made so far in explaining the role of key factors such as SRY, SOX9, and FOXL2 and the genes they control, what may regulate these factors upstream is not entirely understood, often resulting in the inability to correctly diagnose DSD patients. The role of non-coding DNA, which represents 98% of the human genome, in sex determination has only recently begun to be fully appreciated. In this review, we summarize the current knowledge on the long-range regulation of 2 important sex determination genes, <i>SOX9</i> and <i>FOXL2</i>, and discuss the challenges that lie ahead and the many avenues of research yet to be explored in the sex determination field.


2020 ◽  
Author(s):  
Marina Kalinina ◽  
Dmitry Skvortsov ◽  
Svetlana Kalmykova ◽  
Timofei Ivanov ◽  
Olga Dontsova ◽  
...  

Abstract The mammalian Ate1 gene encodes an arginyl transferase enzyme with tumor suppressor function that depends on the inclusion of one of the two mutually exclusive exons (MXE), exons 7a and 7b. We report that the molecular mechanism underlying MXE splicing in Ate1 involves five conserved regulatory intronic elements R1–R5, of which R1 and R4 compete for base pairing with R3, while R2 and R5 form an ultra-long-range RNA structure spanning 30 Kb. In minigenes, single and double mutations that disrupt base pairings in R1R3 and R3R4 lead to the loss of MXE splicing, while compensatory triple mutations that restore RNA structure revert splicing to that of the wild type. In the endogenous Ate1 pre-mRNA, blocking the competing base pairings by LNA/DNA mixmers complementary to R3 leads to the loss of MXE splicing, while the disruption of R2R5 interaction changes the ratio of MXE. That is, Ate1 splicing is controlled by two independent, dynamically interacting, and functionally distinct RNA structure modules. Exon 7a becomes more included in response to RNA Pol II slowdown, however it fails to do so when the ultra-long-range R2R5 interaction is disrupted, indicating that exon 7a/7b ratio depends on co-transcriptional RNA folding. In sum, these results demonstrate that splicing is coordinated both in time and in space over very long distances, and that the interaction of these components is mediated by RNA structure.


2016 ◽  
Vol 44 (4) ◽  
pp. 1051-1057 ◽  
Author(s):  
Jessica G. Hardy ◽  
Chris J. Norbury

Most mammalian protein coding genes are subject to alternative cleavage and polyadenylation (APA), which can generate distinct mRNA 3′UTRs with differing regulatory potential. Although this process has been intensely studied in recent years, it remains unclear how and to what extent cleavage site selection is regulated under different physiological conditions. The cleavage factor Im (CFIm) complex is a core component of the mammalian cleavage machinery, and the observation that its depletion causes transcriptome-wide changes in cleavage site use makes it a key candidate regulator of APA. This review aims to summarize current knowledge of the CFIm complex, and explores the evidence surrounding its potential contribution to regulation of APA.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
David S. M. Lee ◽  
Joseph Park ◽  
Andrew Kromer ◽  
Aris Baras ◽  
Daniel J. Rader ◽  
...  

AbstractRibosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.


Genes ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 108
Author(s):  
Miroslav Pribyl ◽  
Zdenek Hodny ◽  
Iva Kubikova

Among the ~22,000 human genes, very few remain that have unknown functions. One such example is suprabasin (SBSN). Originally described as a component of the cornified envelope, the function of stratified epithelia-expressed SBSN is unknown. Both the lack of knowledge about the gene role under physiological conditions and the emerging link of SBSN to various human diseases, including cancer, attract research interest. The association of SBSN expression with poor prognosis of patients suffering from oesophageal carcinoma, glioblastoma multiforme, and myelodysplastic syndromes suggests that SBSN may play a role in human tumourigenesis. Three SBSN isoforms code for the secreted proteins with putative function as signalling molecules, yet with poorly described effects. In this first review about SBSN, we summarised the current knowledge accumulated since its original description, and we discuss the potential mechanisms and roles of SBSN in both physiology and pathology.


Archaea ◽  
2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Reema K. Gudhka ◽  
Brett A. Neilan ◽  
Brendan P. Burns

Halococcus hamelinensiswas the first archaeon isolated from stromatolites. These geomicrobial ecosystems are thought to be some of the earliest known on Earth, yet, despite their evolutionary significance, the role of Archaea in these systems is still not well understood. Detailed here is the genome sequencing and analysis of an archaeon isolated from stromatolites. The genome ofH. hamelinensisconsisted of 3,133,046 base pairs with an average G+C content of 60.08% and contained 3,150 predicted coding sequences or ORFs, 2,196 (68.67%) of which were protein-coding genes with functional assignments and 954 (29.83%) of which were of unknown function. Codon usage of theH. hamelinensisgenome was consistent with a highly acidic proteome, a major adaptive mechanism towards high salinity. Amino acid transport and metabolism, inorganic ion transport and metabolism, energy production and conversion, ribosomal structure, and unknown function COG genes were overrepresented. The genome ofH. hamelinensisalso revealed characteristics reflecting its survival in its extreme environment, including putative genes/pathways involved in osmoprotection, oxidative stress response, and UV damage repair. Finally, genome analyses indicated the presence of putative transposases as well as positive matches of genes ofH. hamelinensisagainst various genomes of Bacteria, Archaea, and viruses, suggesting the potential for horizontal gene transfer.


2012 ◽  
Vol 2012 ◽  
pp. 1-12 ◽  
Author(s):  
Claudia P. Spampinato ◽  
Diego F. Gomez-Casati

Different model organisms, such asEscherichia coli,Saccharomyces cerevisiae,Caenorhabditis elegans,Drosophila melanogaster, mouse, cultured human cell lines, among others, were used to study the mechanisms of several human diseases. Since human genes and proteins have been structurally and functionally conserved in plant organisms, the use of plants, especiallyArabidopsis thaliana, as a model system to relate molecular defects to clinical disorders has recently increased. Here, we briefly review our current knowledge of human diseases of nuclear and mitochondrial origin and summarize the experimental findings of plant homologs implicated in each process.


Sign in / Sign up

Export Citation Format

Share Document