scholarly journals Intron exon boundary junctions in human genome have in-built unique structural and energetic signals

2021 ◽  
Vol 49 (5) ◽  
pp. 2674-2683
Author(s):  
Akhilesh Mishra ◽  
Priyanka Siwach ◽  
Pallavi Misra ◽  
Simran Dhiman ◽  
Ashutosh Kumar Pandey ◽  
...  

Abstract Precise identification of correct exon–intron boundaries is a prerequisite to analyze the location and structure of genes. The existing framework for genomic signals, delineating exon and introns in a genomic segment, seems insufficient, predominantly due to poor sequence consensus as well as limitations of training on available experimental data sets. We present here a novel concept for characterizing exon–intron boundaries in genomic segments on the basis of structural and energetic properties. We analyzed boundary junctions on both sides of all the exons (3 28 368) of protein coding genes from human genome (GENCODE database) using 28 structural and three energy parameters. Study of sequence conservation at these sites shows very poor consensus. It is observed that DNA adopts a unique structural and energy state at the boundary junctions. Also, signals are somewhat different for housekeeping and tissue specific genes. Clustering of 31 parameters into four derived vectors gives some additional insights into the physical mechanisms involved in this biological process. Sites of structural and energy signals correlate well to the positions playing important roles in pre-mRNA splicing.

Author(s):  
Nicolas Rodrigue ◽  
Thibault Latrille ◽  
Nicolas Lartillot

Abstract In recent years, codon substitution models based on the mutation–selection principle have been extended for the purpose of detecting signatures of adaptive evolution in protein-coding genes. However, the approaches used to date have either focused on detecting global signals of adaptive regimes—across the entire gene—or on contexts where experimentally derived, site-specific amino acid fitness profiles are available. Here, we present a Bayesian site-heterogeneous mutation–selection framework for site-specific detection of adaptive substitution regimes given a protein-coding DNA alignment. We offer implementations, briefly present simulation results, and apply the approach on a few real data sets. Our analyses suggest that the new approach shows greater sensitivity than traditional methods. However, more study is required to assess the impact of potential model violations on the method, and gain a greater empirical sense its behavior on a broader range of real data sets. We propose an outline of such a research program.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Chao-Hsin Chen ◽  
Chao-Yu Pan ◽  
Wen-chang Lin

Abstract The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5ʹ-tandem overlapping and 3ʹ-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.


2015 ◽  
Vol 11 (5) ◽  
pp. 1378-1388 ◽  
Author(s):  
Ting Liu ◽  
Kui Lin

The relationships among the types of transcripts, the classes of coding SNPs and the population frequencies in the human genome.


Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. SCI-13-SCI-13
Author(s):  
Adrian Krainer

Abstract Abstract SCI-13 Most eukaryotic protein-coding genes have one or more introns, and their transcripts can undergo alternative splicing, giving rise to multiple isoforms. Accurate splicing is essential for normal gene expression, and alternative splicing is a key mechanism for expanding the proteome and regulating the expression of diverse protein isoforms. This session will review the general mechanisms of pre-mRNA splicing and the regulation of alternative splicing. In addition, the process of how abnormal splicing arises as a result of intronic or exonic mutations in particular genes, or more globally as a result of splicing-factor misregulation, as well as the contribution of splicing misregulation to cancer, will be described. Lastly the current status of targeted therapeutics development, focusing on antisense approaches to correct abnormal splicing of specific genes or to modulate alternative splicing, will be discussed. Disclosures: Krainer: ISIS Pharmaceuticals: Consultancy, Patents & Royalties, Research Funding.


2018 ◽  
Author(s):  
Weixue Mu ◽  
Ting Yang ◽  
Xin Liu

AbstractBrassicales is a diverse angiosperm order with about 4,700 recognized species. Here, we assembled and described the complete plastid genomes from four species of Brassicales: Capparis urophylla F.Chun (Capparaceae), Carica papaya L. (Caricaceae), Cleome rutidosperma DC. (Cleomaceae), and Moringa oleifera Lam. (Moringaceae), including two plastid genomes newly assembled for two families (Capparaceae and Moringaceae). The four plastid genomes are 159,680 base pairs on average in length and encode 78 protein-coding genes. The genomes each contains a typical structure of a Large Single-Copy (LSC) region and a Small Single-Copy (SSC) region separated by two Inverted Repeat (IR) regions. We performed the maximum-likelihood (ML) phylogenetic analysis using three different data sets of 66 protein-coding genes (ntAll, ntNo3rd and AA). Our phylogenetic results from different dataset are congruent, and are consistent with previous phylogenetic studies of Brassiales.


Author(s):  
Aysha Divan ◽  
Janice A. Royds

Biological functions require protein and the protein makeup of a cell determines its behaviour and identity. Proteins, therefore, are the most abundant molecules in the body except for water. The approximately 20,000 protein coding genes in the human genome can, by alternative splicing, multiple translation starts, and post-translational modifications, produce over 1,000,000 different proteins, collectively called ‘the proteome’. It is the size of the proteome and not the genome that defines the complexity of an organism. ‘Proteins’ describes the composition and structure of proteins and how they are studied. What information is required in order to understand how proteins work and what happens when this function is impaired in disease?


Blood ◽  
2009 ◽  
Vol 114 (22) ◽  
pp. 3260-3260
Author(s):  
Rosana A Silveira ◽  
Angela A Fachel ◽  
Yuri B Moreira ◽  
Marcia T Delamain ◽  
Carmino Antonio De Souza ◽  
...  

Abstract Abstract 3260 Poster Board III-1 Background: CML treatment with tyrosine kinase inhibitors induces high and durable rates of complete cytogenetic response. Despite treatment efficacy, a significant proportion of patients develop resistance to these drugs. We measured gene expression profiles in an attempt to identify gene pathways that may be associated with dasatinib resistance. Patients and Methods: Mononuclear cells were separated from peripheral blood samples from seven CML patients resistant to imatinib, collected prior and after dasatinib treatment. Three patients who achieved partial cytogenetic response (Ph-positive cells: 1% - 35%) within twelve months were considered responders (R), whereas four patients who failed to achieve PCyR within 12 months of treatment were classified as non-responders. RNA samples prepared from peripheral mononuclear cells were hybridized to Agilent Technologies 4×44K Whole Human Genome Microarrays (WHGM) and 4×44K intronic-exonic custom oligoarrays. The latter was developed by Verjovski-Almeida's group (Nakaya et al, Genome Biology 2007, 8:R43) and contains sense and antisense probes that map to intronic regions in the human genome representing totally (TIN) and partially (PIN) intronic non-coding RNAs (ncRNAs), in addition to probes for the corresponding protein-coding genes of the same loci. Raw microarray data were normalized by the Affy package in statistical R language implemented in the Bioconductor platform. Each sample was labeled in replicate with Cy3 or Cy5 and the two were considered technical replicates. Two independent statistical approaches SAM (Significance Analysis of Microarrays) and Golub's discrimination score (SNR, Signal to Noise Ratio, with permutations) were performed to identify differentially expressed transcripts between responder and non-responder patients. For the intronic-exonic platform, the analysis parameters were FDR 10%, SNR>1.5 and p<0.01, and for WHGM platform parameters were FDR 5%, SNR>1.5 and p<0.001. For this latter platform, we also performed a patient leave-one-out analysis. Functions of transcripts differentially expressed were annotated and compared using GO Biological Process categories (www.genetools.microarray.ntu.no/egon). Results: We identified 34 ncRNAs with altered expression (26 over and 8 underexpressed in responders) in pre-treatment samples and 33 ncRNAs (20 over and 13 underexpressed in responders) in post-treatment samples. Functions associated with protein-coding genes from the same genomic loci as those of the intronic differentially expressed ncRNAs were: regulation of transcription (PRMT5, SOD2, SSBP3, BCL7A, MLL), signal transduction (PRKCB1, RASGRP2, NF1, PXN) and apoptosis (BCL2, PCSK6, TNFAIP8, EIF4G2). WHGM platform data analysis showed 63 and 250 protein-coding genes differentially expressed in pre and post-treatment samples, respectively. We observed a higher number of protein-coding genes with altered expression after treatment in the following functions: cell communication, immune response and metabolic process (p<0.02). Conclusions: Overall, these findings indicate that protein-coding genes and intronic ncRNAs may be related to dasatinib resistance and response to treatment. In particular, altered expression of ncRNAs transcribed from the introns of ‘regulation of transcription' genes could be part of an important alternative mechanism of gene expression control during emergence of resistance.Support: FAPESP (2005/60266-8) Disclosures: No relevant conflicts of interest to declare.


2009 ◽  
Vol 2009 ◽  
pp. 1-6 ◽  
Author(s):  
Noam Shomron ◽  
Carmit Levy

MicroRNAs (miRNAs) are often hosted in introns of protein-coding genes. Given that the same transcriptional unit can potentially give rise to both miRNA and mRNA transcripts raises the intriguing question of the level of interaction between these processes. Recent studies from transcription, pre-mRNA splicing, and miRNA-processing perspectives have investigated these relationships and yielded interesting, yet somewhat controversial findings. Here we discuss major studies in the field.


Open Biology ◽  
2017 ◽  
Vol 7 (6) ◽  
pp. 170073 ◽  
Author(s):  
Joana Guiro ◽  
Shona Murphy

In addition to protein-coding genes, RNA polymerase II (pol II) transcribes numerous genes for non-coding RNAs, including the small-nuclear (sn)RNA genes. snRNAs are an important class of non-coding RNAs, several of which are involved in pre-mRNA splicing. The molecular mechanisms underlying expression of human pol II-transcribed snRNA genes are less well characterized than for protein-coding genes and there are important differences in expression of these two gene types. Here, we review the DNA features and proteins required for efficient transcription of snRNA genes and co-transcriptional 3′ end formation of the transcripts.


2021 ◽  
Author(s):  
Noah Dukler ◽  
Mehreen R Mughal ◽  
Ritika Ramani ◽  
Yi-Fei Huang ◽  
Adam Siepel

Genome sequencing of tens of thousands of human individuals has recently enabled the measurement of large selective effects for mutations to protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring similar selective effects at individual sites in noncoding as well as in coding regions of the human genome. ExtRaINSIGHT estimates the prevalance of strong purifying selection, or "ultraselection" (λs), as the fractional depletion of rare single-nucleotide variants (minor allele frequency <0.1%) in a target set of genomic sites relative to matched sites that are putatively neutrally evolving, in a manner that controls for local variation and neighbor-dependence in mutation rate. We show using simulations that, above an appropriate threshold, λs is closely related to the average site-specific selection coefficient against heterozygous point mutations, as predicted at mutation-selection balance. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find particularly strong evidence of ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. Moreover, our estimated selection coefficient against heterozygous amino-acid replacements across the genome (at 1.4%) is substantially larger than previous estimates based on smaller sample sizes. By contrast, we find weak evidence of ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest evidence in ultraconserved elements and human accelerated regions. We estimate that ~0.3-0.5% of the human genome is ultraselected, with one third to one half of ultraselected sites falling in coding regions. These estimates suggest ~0.3-0.4 lethal or nearly lethal de novo mutations per potential human zygote, together with ~2 de novo mutations that are more weakly deleterious. Overall, our study sheds new light on the genome-wide distribution of fitness effects for new point mutations by combining deep new sequencing data sets and classical theory from population genetics.


Sign in / Sign up

Export Citation Format

Share Document