scholarly journals Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models

2016 ◽  
Author(s):  
Jesse D. Bloom

AbstractSites of positive selection are identified by comparing observed evolutionary patterns to those expected under a null model for evolution in the absence of such selection. For protein-coding genes, the most common null model is that nonsynonymous and synonymous mutations fix at equal rates; this unrealistic model has limited power to detect many interesting forms of selection. I describe a new approach that uses a null model based on high-throughput lab measurements of a gene's site-specific amino-acid preferences. This null model makes it possible to identify diversifying selection for amino-acid change and differential selection for mutations to amino acids that are unexpected given the measurements made in the lab. I show that this approach identifies sites of adaptive substitutions in four genes (lactamase, Gal4, influenza nucleoprotein, and influenza hemagglutinin) far better than a comparable method that simply compares the rates of nonsynonymous and synonymous substitutions. As rapid increases in biological data enable increasingly nuanced descriptions of the constraints on individual sites, approaches like the one here can improve our ability to identify many interesting forms of selection.

Genetics ◽  
2000 ◽  
Vol 155 (1) ◽  
pp. 431-449 ◽  
Author(s):  
Ziheng Yang ◽  
Rasmus Nielsen ◽  
Nick Goldman ◽  
Anne-Mette Krabbe Pedersen

AbstractComparison of relative fixation rates of synonymous (silent) and nonsynonymous (amino acid-altering) mutations provides a means for understanding the mechanisms of molecular sequence evolution. The nonsynonymous/synonymous rate ratio (ω = dN/dS) is an important indicator of selective pressure at the protein level, with ω = 1 meaning neutral mutations, ω < 1 purifying selection, and ω > 1 diversifying positive selection. Amino acid sites in a protein are expected to be under different selective pressures and have different underlying ω ratios. We develop models that account for heterogeneous ω ratios among amino acid sites and apply them to phylogenetic analyses of protein-coding DNA sequences. These models are useful for testing for adaptive molecular evolution and identifying amino acid sites under diversifying selection. Ten data sets of genes from nuclear, mitochondrial, and viral genomes are analyzed to estimate the distributions of ω among sites. In all data sets analyzed, the selective pressure indicated by the ω ratio is found to be highly heterogeneous among sites. Previously unsuspected Darwinian selection is detected in several genes in which the average ω ratio across sites is <1, but in which some sites are clearly under diversifying selection with ω > 1. Genes undergoing positive selection include the β-globin gene from vertebrates, mitochondrial protein-coding genes from hominoids, the hemagglutinin (HA) gene from human influenza virus A, and HIV-1 env, vif, and pol genes. Tests for the presence of positively selected sites and their subsequent identification appear quite robust to the specific distributional form assumed for ω and can be achieved using any of several models we implement. However, we encountered difficulties in estimating the precise distribution of ω among sites from real data sets.


2020 ◽  
Author(s):  
Asma Awadi ◽  
Hichem Ben Slimen ◽  
Helmut Schaschl ◽  
Felix Knauer ◽  
Franz Suchentrunk

Abstract Background: Animal mitochondria play a central role in energy production in the cells through the oxidative phosphorylation (OXPHOS) pathway. Recent studies of selection on different mitochondrial OXPHOS genes have revealed the adaptive implications of amino acid changes in these subunits. In hares, climatic variation and/or introgression were suggested to be at the origin of such adaptation. Here we looked for evidence of positive selection in three mitochondrial OXPHOS genes, using tests of selection, protein structure modelling and effects of amino acid substitutions on the protein function and stability. We also used statistical models to test for climate and introgression effects on sites under positive selection. Results: Our results revealed seven sites under positive selection in ND4 and three sites in Cytb. However, no sites under positive selection were observed in the COX1 gene. All three subunits presented a high number of codons under negative selection. Sites under positive selection were mapped on the tridimensional structure of the predicted models for the respective mitochondrial subunit. Of the ten amino acid replacements inferred to have evolved under positive selection for both subunits, six were located in the transmembrane domain. On the other hand, three codons were identified as sites lining proton translocation channels. Furthermore, four codons were identified as destabilizing with a significant variation of Δ vibrational entropy energy between wild and mutant type. Moreover, the PROVEAN analysis suggested that among all positively selected sites two fixed amino acid replacements altered the protein functioning. The statistical model runs indicated significant effects of climate on the presence of ND4 and Cytb protein variants, but no effect by trans-specific mitochondrial DNA introgresson.Conclusions: Positive selection was observed in several codons in two OXPHOS genes. We found that substitutions in the positively selected codons have structural and functional impacts on the encoded proteins. Our results are concordantly suggesting that adaptations have strongly affected the evolution of mtDNA of hare species with potential effects on the protein function. Environmental/climatic changes appear to be a major trigger of this adaptation, whereas trans-specific introgressive hybridization seems to play no major role for the occurrence of protein variants.


BMC Biology ◽  
2019 ◽  
Vol 17 (1) ◽  
Author(s):  
Frida Belinky ◽  
Itamar Sela ◽  
Igor B. Rogozin ◽  
Eugene V. Koonin

Abstract Background Single nucleotide substitutions in protein-coding genes can be divided into synonymous (S), with little fitness effect, and non-synonymous (N) ones that alter amino acids and thus generally have a greater effect. Most of the N substitutions are affected by purifying selection that eliminates them from evolving populations. However, additional mutations of nearby bases potentially could alleviate the deleterious effect of single substitutions, making them subject to positive selection. To elucidate the effects of selection on double substitutions in all codons, it is critical to differentiate selection from mutational biases. Results We addressed the evolutionary regimes of within-codon double substitutions in 37 groups of closely related prokaryotic genomes from diverse phyla by comparing the fractions of double substitutions within codons to those of the equivalent double S substitutions in adjacent codons. Under the assumption that substitutions occur one at a time, all within-codon double substitutions can be represented as “ancestral-intermediate-final” sequences (where “intermediate” refers to the first single substitution and “final” refers to the second substitution) and can be partitioned into four classes: (1) SS, S intermediate–S final; (2) SN, S intermediate–N final; (3) NS, N intermediate–S final; and (4) NN, N intermediate–N final. We found that the selective pressure on the second substitution markedly differs among these classes of double substitutions. Analogous to single S (synonymous) substitutions, SS double substitutions evolve neutrally, whereas analogous to single N (non-synonymous) substitutions, SN double substitutions are subject to purifying selection. In contrast, NS show positive selection on the second step because the original amino acid is recovered. The NN double substitutions are heterogeneous and can be subject to either purifying or positive selection, or evolve neutrally, depending on the amino acid similarity between the final or intermediate and the ancestral states. Conclusions The results of the present, comprehensive analysis of the evolutionary landscape of within-codon double substitutions reaffirm the largely conservative regime of protein evolution. However, the second step of a double substitution can be subject to positive selection when the first step is deleterious. Such positive selection can result in frequent crossing of valleys on the fitness landscape.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Allison J Shultz ◽  
Timothy B Sackton

Consistent patterns of positive selection in functionally similar genes can suggest a common selective pressure across a group of species. We use alignments of orthologous protein-coding genes from 39 species of birds to estimate parameters related to positive selection for 11,000 genes conserved across birds. We show that functional pathways related to the immune system, recombination, lipid metabolism, and phototransduction are enriched for positively selected genes. By comparing our results with mammalian data, we find a significant enrichment for positively selected genes shared between taxa, and that these shared selected genes are enriched for viral immune pathways. Using pathogen-challenge transcriptome data, we show that genes up-regulated in response to pathogens are also enriched for positively selected genes. Together, our results suggest that pathogens, particularly viruses, consistently target the same genes across divergent clades, and that these genes are hotspots of host-pathogen conflict over deep evolutionary time.


2012 ◽  
Vol 93 (11) ◽  
pp. 2408-2418 ◽  
Author(s):  
Donald B. Smith ◽  
Jeff Vanek ◽  
Sandeep Ramalingam ◽  
Ingolfur Johannessen ◽  
Kate Templeton ◽  
...  

The presence of a hypervariable (HVR) region within the genome of hepatitis E virus (HEV) remains unexplained. Previous studies have described the HVR as a proline-rich spacer between flanking functional domains of the ORF1 polyprotein. Others have proposed that the region has no function, that it reflects a hypermutable region of the virus genome, that it is derived from the insertion and evolution of host sequences or that it is subject to positive selection. This study attempts to differentiate between these explanations by documenting the evolutionary processes occurring within the HVR. We have measured the diversity of HVR sequences within acutely infected individuals or amongst sequences derived from epidemiologically linked samples and, surprisingly, find relative homogeneity amongst these datasets. We found no evidence of positive selection for amino acid substitution in the HVR. Through an analysis of published sequences, we conclude that the range of HVR diversity observed within virus genotypes can be explained by the accumulation of substitutions and, to a much lesser extent, through deletions or duplications of this region. All published HVR amino acid sequences display a relative overabundance of proline and serine residues that cannot be explained by a local bias towards cytosine in this part of the genome. Although all published HVRs contain one or more SH3-binding PxxP motifs, this motif does not occur more frequently than would be expected from the proportion of proline residues in these sequences. Taken together, these observations are consistent with the hypothesis that the HVR has a structural role that is dependent upon length and amino acid composition, rather than a specific sequence.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Randi Istrup Juul ◽  
Morten Muhlig Nielsen ◽  
Malene Juul ◽  
Lars Feuerbach ◽  
Jakob Skou Pedersen

AbstractLarge sets of whole cancer genomes make it possible to study mutation hotspots genome-wide. Here we detect, categorize, and characterize site-specific hotspots using 2279 whole cancer genomes from the Pan-Cancer Analysis of Whole Genomes project and provide a resource of annotated hotspots genome-wide. We investigate the excess of hotspots in both protein-coding and gene regulatory regions and develop measures of positive selection and functional impact for individual hotspots. Using cancer allele fractions, expression aberrations, mutational signatures, and a variety of genomic features, such as potential gain or loss of transcription factor binding sites, we annotate and prioritize all highly mutated hotspots. Genome-wide we find more high-frequency SNV and indel hotspots than expected given mutational background models. Protein-coding regions are generally enriched for SNV hotspots compared to other regions. Gene regulatory hotspots show enrichment of potential same-patient second-hit missense mutations, consistent with enrichment of hotspot driver mutations compared to singletons. For protein-coding regions, splice-sites, promoters, and enhancers, we see an excess of hotspots associated with cancer genes. Interestingly, missense hotspot mutations in tumor suppressors are associated with elevated expression, suggesting localized amino-acid changes with functional impact. For individual non-coding hotspots, only a small number show clear signs of positive selection, including known sites in the TERT promoter and the 5’ UTR of TP53. Most of the new candidates have few mutations and limited driver evidence. However, a hotspot in an enhancer of the oncogene POU2AF1, which may create a transcription factor binding site, presents multiple lines of driver-consistent evidence.


2020 ◽  
Author(s):  
Aayatti Mallick Gupta ◽  
Sukhendu Mandal

The non-synonymous mutations of SARS-Cov-2 have been identified, isolated, and sequenced across several COVID-19 infected countries from Asia, Africa, Europe, North, Central, and South Americas, and Oceania during the last few months since its emergence in Dec 2019 to April 2020. The surface glycoprotein spike of SARS-Cov-2 forms the most important hotspot for amino acid alterations followed by the ORF1a/ORF1ab poly-proteins. It is evident that the D614G mutation in spike glycoprotein and P4715L in RdRp showed co-existence among the various samples and are the important determinant of SARS-Cov-2 evolution from its emergence in China to the present epicenter. Both these mutations are increasing in number from March 2020 to become the most dominant subtype of SARS-Cov-2. It is important to notice that mutation P4715L in RdRp, G251V in ORF3a, and S1498F in the PL2 domain of NSP3 is associated with the epitope loss that may influence pathogenesis caused by antibody escape variants. Phylogenomics showed two distinct clades, (i) green clade with ancestral viral samples from China and most of Asia isolated between Dec 2019 to Feb 2020, and (ii) red with the evolved variants isolated from Europe and Americas from Mar 2020 to April 2020. The evolved variants have been found to show the loss in epitopes from its different proteins. SARS-Cov-2 from the Indian isolates distributed under both clades. The positive selection of mutations among the red clade is becoming predominant globally. These findings have important implications for SARS-Cov-2 transmission, pathogenesis, and immune interventions.


2016 ◽  
Vol 19 (3) ◽  
pp. 461-469
Author(s):  
B. Ślaska ◽  
L. Grzybowska-Szatkowska ◽  
M. Bugno-Poniewierska ◽  
A. Gurgul ◽  
A. Śmiech ◽  
...  

Abstract The aim of the study was to identify polymorphisms and mutations in the mitochondrial ND4 gene and to analyse the associations between the occurrence of molecular changes in mtDNA and phenotypic traits in tumours in German Shepherd dogs. Fifty samples obtained from blood and tumour tissues of German Shepherd dogs with diagnosed tumours were analysed. DNA extraction, amplification, and sequencing of the mtDNA ND4 gene, and bioinformatics, statistical, and in silico protein coding SNP analyses were performed. ND4 mutations and/or polymorphisms were noted in eleven nucleotide positions in nearly half of the examined dogs. All the changes were substitution mutations. A majority of the changes identified were homoplasmic. In one dog with osteosarcoma, blood heteroplasmy was detected. In two positions of the ND4 gene, presence of non-synonymous mutations leading to amino acid changes in the ND4 protein was reported. Analyses carried out to determine the deleterious effect of mutations indicated an almost 97 and 62% probability that a single amino acid substitution (p.G239V and p.I401T, respectively) in the protein has a negative impact on its function. The results of statistical analyses indicate a significant association between the occurrence of mutations in three loci of the ND4 gene and the location of tumours. The mutations identified may be a result of cell adaptation to the changes in the environment occurring during carcinogenesis. The high frequency of mutations in the tumours may indicate genetic instability of mtDNA, which may also play a role in carcinogenesis.


Sign in / Sign up

Export Citation Format

Share Document