Genetic variation in mRNA coding sequences of highly conserved genes

2001 ◽  
Vol 5 (3) ◽  
pp. 113-118 ◽  
Author(s):  
ANNELOOR L. M. A. TEN ASBROEK ◽  
JEFFREY OLSEN ◽  
DAVID HOUSMAN ◽  
FRANK BAAS ◽  
VINCE STANTON

The frequency and distribution of genetic polymorphism in the human genome is a question of major importance. We have studied this in highly conserved genes, which encode crucial functions such as DNA replication, mRNA transcription, and translation. Evolutionary comparisons suggest that these genes are under particularly strong selective pressure, and their frequency of nucleotide sequence polymorphism would be expected to represent a minimum estimate for sequence variation throughout the genome. We have analyzed the complete coding sequence and the 3′-untranslated region (3′-UTR) of 22 human genes, most of which have homologs in all cellular organisms and all of which are at least 25% amino acid identical to homologs in yeast. Comparisons with similar studies of less conserved human disease genes indicate that 1) evolutionarily conserved genes are, on average, less polymorphic than disease related genes; 2) the difference in polymorphism levels is attributable almost entirely to reduced levels of variation in protein coding sequences, whereas noncoding sequences have similar levels of polymorphism; and 3) the character of polymorphism, in terms of the spectrum and frequency of mutational changes, is similar.

2017 ◽  
Author(s):  
Jochen Weile ◽  
Song Sun ◽  
Atina G. Cote ◽  
Jennifer Knapp ◽  
Marta Verby ◽  
...  

AbstractAlthough we now routinely sequence human genomes, we can confidently identify only a fraction of the sequence variants that have a functional impact. Here we developed a deep mutational scanning framework that produces exhaustive maps for human missense variants by combining random codon-mutagenesis and multiplexed functional variation assays with computational imputation and refinement. We applied this framework to four proteins corresponding to six human genes: UBE2I (encoding SUMO E2 conjugase), SUMO1 (small ubiquitin-like modifier), TPK1 (thiamin pyrophosphokinase), and CALM1/2/3 (three genes encoding the protein calmodulin). The resulting maps recapitulate known protein features, and confidently identify pathogenic variation. Assays potentially amenable to deep mutational scanning are already available for 57% of human disease genes, suggesting that DMS could ultimately map functional variation for all human disease genes.


Biology ◽  
2022 ◽  
Vol 11 (1) ◽  
pp. 63
Author(s):  
Xiu-Xiu Guo ◽  
Xiao-Jian Qu ◽  
Xue-Jie Zhang ◽  
Shou-Jin Fan

Aristidoideae is a subfamily in the PACMAD clade of family Poaceae, including three genera, Aristida, Stipagrostis, and Sartidia. In this study, the plastomes of Aristida adscensionis and Stipagrostis pennata were newly sequenced, and a total of 16 Aristidoideae plastomes were compared. All plastomes were conservative in genome size, gene number, structure, and IR boundary. Repeat sequence analysis showed that forward and palindrome repeats were the most common repeat types. The number of SSRs ranged from 30 (Sartidia isaloensis) to 54 (Aristida purpurea). Codon usage analysis showed that plastome genes preferred to use codons ending with A/T. A total of 12 highly variable regions were screened, including four protein coding sequences (matK, ndhF, infA, and rpl32) and eight non-coding sequences (rpl16-1-rpl16-2, ccsA-ndhD, trnY-GUA-trnD-GUC, ndhF-rpl32, petN-trnC-GCA, trnT-GGU-trnE-UUC, trnG-GCC-trnfM-CAU, and rpl32-trnL-UAG). Furthermore, the phylogenetic position of this subfamily and their intergeneric relationships need to be illuminated. All Maximum Likelihood and Bayesian Inference trees strongly support the monophyly of Aristidoideae and each of three genera, and the clade of Aristidoideae and Panicoideae was a sister to other subfamilies in the PACMAD clade. Within Aristidoideae, Aristida is a sister to the clade composed of Stipagrostis and Sartidia. The divergence between C4 Stipagrostis and C3 Sartidia was estimated at 11.04 Ma, which may be associated with the drought event in the Miocene period. Finally, the differences in carbon fixation patterns, geographical distributions, and ploidy may be related to the difference of species numbers among these three genera. This study provides insights into the phylogeny and evolution of the subfamily Aristidoideae.


2020 ◽  
Author(s):  
Dmitry Biba ◽  
Galya Klink ◽  
Georgii Bazykin

AbstractInsertions and deletions of lengths not divisible by 3 in protein-coding sequences cause frameshifts that usually induce premature stop codons and may carry a high fitness cost. However, this cost can be circumvented by a second compensatory indel restoring the reading frame. The role of such compensatory frameshifting mutations (CFMs) in evolution has not been studied systematically. Here, we use whole-genome alignments of protein coding genes of 100 vertebrate species, and of 122 insect species, studying the prevalence of CFMs in their divergence. After stringent filtering, we detect a total of 11 high-confidence genes carrying pairs of CFMs, including three human genes: RAB36, ARHGAP6 and NCR3LG1. CFMs tended to occur in genes under relaxed negative selection, indicating that they are typically prevented at functionally important genes. In some instances, mutations closely predating or following the CFMs restored the biochemical similarity of the frameshifted segment to the ancestral sequence, possibly reducing or negating the fitness cost of a CFM. Typically, however, the resulting sequence bore no similarity to the ancestral one, indicating that the CFMs can uncover radically novel regions of sequence space. In total, CFMs represent a potentially important and previously overlooked source of novel variation in amino acid sequences.


2019 ◽  
Vol 14 (3) ◽  
pp. 211-225 ◽  
Author(s):  
Ming Fang ◽  
Xiujuan Lei ◽  
Ling Guo

Background: Essential proteins play important roles in the survival or reproduction of an organism and support the stability of the system. Essential proteins are the minimum set of proteins absolutely required to maintain a living cell. The identification of essential proteins is a very important topic not only for a better comprehension of the minimal requirements for cellular life, but also for a more efficient discovery of the human disease genes and drug targets. Traditionally, as the experimental identification of essential proteins is complex, it usually requires great time and expense. With the cumulation of high-throughput experimental data, many computational methods that make useful complements to experimental methods have been proposed to identify essential proteins. In addition, the ability to rapidly and precisely identify essential proteins is of great significance for discovering disease genes and drug design, and has great potential for applications in basic and synthetic biology research. Objective: The aim of this paper is to provide a review on the identification of essential proteins and genes focusing on the current developments of different types of computational methods, point out some progress and limitations of existing methods, and the challenges and directions for further research are discussed.


2019 ◽  
Vol 8 (23) ◽  
Author(s):  
Si Chul Kim ◽  
Hyo Jung Lee

Here, we report the draft genome sequence of Pseudorhodobacter sp. strain E13, a Gram-negative, aerobic, nonflagellated, and rod-shaped bacterium which was isolated from the Yellow Sea in South Korea. The assembled genome sequence is 3,878,578 bp long with 3,646 protein-coding sequences in 159 contigs.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Svetlana Kalmykova ◽  
Marina Kalinina ◽  
Stepan Denisov ◽  
Alexey Mironov ◽  
Dmitry Skvortsov ◽  
...  

AbstractThe ability of nucleic acids to form double-stranded structures is essential for all living systems on Earth. Current knowledge on functional RNA structures is focused on locally-occurring base pairs. However, crosslinking and proximity ligation experiments demonstrated that long-range RNA structures are highly abundant. Here, we present the most complete to-date catalog of conserved complementary regions (PCCRs) in human protein-coding genes. PCCRs tend to occur within introns, suppress intervening exons, and obstruct cryptic and inactive splice sites. Double-stranded structure of PCCRs is supported by decreased icSHAPE nucleotide accessibility, high abundance of RNA editing sites, and frequent occurrence of forked eCLIP peaks. Introns with PCCRs show a distinct splicing pattern in response to RNAPII slowdown suggesting that splicing is widely affected by co-transcriptional RNA folding. The enrichment of 3’-ends within PCCRs raises the intriguing hypothesis that coupling between RNA folding and splicing could mediate co-transcriptional suppression of premature pre-mRNA cleavage and polyadenylation.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
David S. M. Lee ◽  
Joseph Park ◽  
Andrew Kromer ◽  
Aris Baras ◽  
Daniel J. Rader ◽  
...  

AbstractRibosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.


Genetics ◽  
1996 ◽  
Vol 143 (1) ◽  
pp. 537-548 ◽  
Author(s):  
Sudhir Kumar

Abstract Maximum likelihood methods were used to study the differences in substitution rates among the four nucleotides and among different nucleotide sites in mitochondrial protein-coding genes of vertebrates. In the lst+2nd codon position data, the frequency of nucleotide G is negatively correlated with evolutionary rates of genes, substitution rates vary substantially among sites, and the transition / transversion rate bias (R) is two to five times larger than that expected at random. Generally, largest transition biases and greatest differences in substitution rates among sites are found in the highly conserved genes. The 3rd positions in placental mammal genes exhibit strong nucleotide composition biases and the transitional rates exceed transversional rates by one to two orders of magnitude. Tamura-Nei and Hasegawa-Kishino-Yano models with gamma distributed variable rates among sites (gamma parameter, α) adequately describe the nucleotide substitution process in 1st+2nd position data. In these data, ignoring differences in substitution rates among sites leads to largest biases while estimating substitution rates. Kimura's two-parameter model with variable-rates among sites performs satisfactorily in likelihood estimation of R, α, and overall amount of evolution for lst+2nd position data. It can also be used to estimate pairwise distances with appropriate values of α for a majority of genes.


Sign in / Sign up

Export Citation Format

Share Document