scholarly journals Calculating site-specific evolutionary rates at the amino-acid or codon level yields similar rate estimates

PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3391 ◽  
Author(s):  
Dariya K. Sydykova ◽  
Claus O. Wilke

Site-specific evolutionary rates can be estimated from codon sequences or from amino-acid sequences. For codon sequences, the most popular methods use some variation of the dN∕dS ratio. For amino-acid sequences, one widely-used method is called Rate4Site, and it assigns a relative conservation score to each site in an alignment. How site-wise dN∕dS values relate to Rate4Site scores is not known. Here we elucidate the relationship between these two rate measurements. We simulate sequences with known dN∕dS, using either dN∕dS models or mutation–selection models for simulation. We then infer Rate4Site scores on the simulated alignments, and we compare those scores to either true or inferred dN∕dS values on the same alignments. We find that Rate4Site scores generally correlate well with true dN∕dS, and the correlation strengths increase in alignments with greater sequence divergence and more taxa. Moreover, Rate4Site scores correlate very well with inferred (as opposed to true) dN∕dS values, even for small alignments with little divergence. Finally, we verify this relationship between Rate4Site and dN∕dS in a variety of empirical datasets. We conclude that codon-level and amino-acid-level analysis frameworks are directly comparable and yield very similar inferences.

2017 ◽  
Author(s):  
Dariya K. Sydykova ◽  
Claus O Wilke

Site-specific evolutionary rates can be estimated from codon sequences or from amino-acid sequences. For codon sequences, the most popular methods use some variation of the dN/dS ratio. For amino-acid sequences, one widely-used method is called Rate4Site, and it assigns a relative conservation score to each site in an alignment. How site-wise dN/dS values relate to Rate4Site scores is not known. Here we elucidate the relationship between these two rate measurements. We simulate sequences with known dN/dS, using either dN/dS models or mutation--selection models for simulation. We then infer Rate4Site scores on the simulated alignments, and we compare those scores to either true or inferred dN/dS values on the same alignments. We find that Rate4Site scores generally correlate well with true dN/dS, and the correlation strengths increase in alignments with higher sequence divergence and higher number of taxa. Moreover, Rate4Site scores correlate nearly perfectly with inferred dN/dS values, even for small alignments with little divergence. Finally, we verify this relationship between Rate4Site and dN/dS in a variety of natural sequence alignments. We conclude that codon-level and amino-acid-level analysis frameworks are directly comparable and yield near-identical inferences.


2017 ◽  
Author(s):  
Dariya K. Sydykova ◽  
Claus O Wilke

Site-specific evolutionary rates can be estimated from codon sequences or from amino-acid sequences. For codon sequences, the most popular methods use some variation of the dN/dS ratio. For amino-acid sequences, one widely-used method is called Rate4Site, and it assigns a relative conservation score to each site in an alignment. How site-wise dN/dS values relate to Rate4Site scores is not known. Here we elucidate the relationship between these two rate measurements. We simulate sequences with known dN/dS, using either dN/dS models or mutation--selection models for simulation. We then infer Rate4Site scores on the simulated alignments, and we compare those scores to either true or inferred dN/dS values on the same alignments. We find that Rate4Site scores generally correlate well with true dN/dS, and the correlation strengths increase in alignments with higher sequence divergence and higher number of taxa. Moreover, Rate4Site scores correlate nearly perfectly with inferred dN/dS values, even for small alignments with little divergence. Finally, we verify this relationship between Rate4Site and dN/dS in a variety of natural sequence alignments. We conclude that codon-level and amino-acid-level analysis frameworks are directly comparable and yield near-identical inferences.


2017 ◽  
Author(s):  
Dariya K Sydykova ◽  
Claus O Wilke

Many applications require the calculation of site-specific evolutionary rates from alignments of amino-acid sequences. For example, catalytic residues in enzymes and interface regions in protein complexes can be inferred from observed relative rates. While numerous approaches exist to calculate amino-acid rates, however, it is not entirely clear what physical quantities the inferred rates represent and how these rates relate to the underlying fitness landscape of the evolving protein. Further, amino-acid rates can be calculated in the context of different amino-acid exchangeability matrices, such as JTT, LG, or WAG, and again it is not known how the choice of the matrix influences the physical interpretation of the inferred rates. Here, we develop a theory of measurement for site-specific evolutionary rates, but analytically solving the maximum-likelihood equations for rate inference performed on sequences evolved under a mutation–selection model. We demonstrate that the measurement process can only recover the true expected rates of the mutation–selection model if rates are measured relative to a naïve exchangeability matrix, in which all exchangeabilities are equal to one. Rate measurements using other matrices are quantitatively close but not mathematically correct. Our results demonstrate that insights obtained from phylogenetic-tree inference do not necessarily apply to rate inference, and best practices for the former may be deleterious for the latter.


2017 ◽  
Author(s):  
Dariya K Sydykova ◽  
Claus O Wilke

Many applications require the calculation of site-specific evolutionary rates from alignments of amino-acid sequences. For example, catalytic residues in enzymes and interface regions in protein complexes can be inferred from observed relative rates. While numerous approaches exist to calculate amino-acid rates, however, it is not entirely clear what physical quantities the inferred rates represent and how these rates relate to the underlying fitness landscape of the evolving protein. Further, amino-acid rates can be calculated in the context of different amino-acid exchangeability matrices, such as JTT, LG, or WAG, and again it is not known how the choice of the matrix influences the physical interpretation of the inferred rates. Here, we develop a theory of measurement for site-specific evolutionary rates, but analytically solving the maximum-likelihood equations for rate inference performed on sequences evolved under a mutation–selection model. We demonstrate that the measurement process can only recover the true expected rates of the mutation–selection model if rates are measured relative to a naïve exchangeability matrix, in which all exchangeabilities are equal to one. Rate measurements using other matrices are quantitatively close but not mathematically correct. Our results demonstrate that insights obtained from phylogenetic-tree inference do not necessarily apply to rate inference, and best practices for the former may be deleterious for the latter.


2018 ◽  
Author(s):  
Dariya K. Sydykova ◽  
Claus O. Wilke

In the field of molecular evolution, we commonly calculate site-specific evolutionary rates from alignments of amino-acid sequences. For example, catalytic residues in enzymes and interface regions in protein complexes can be inferred from observed relative rates. While numerous approaches exist to calculate amino-acid rates, it is not entirely clear what physical quantities the inferred rates represent and how these rates relate to the underlying fitness landscape of the evolving proteins. Further, amino-acid rates can be calculated in the context of different amino-acid exchangeability matrices, such as JTT, LG, or WAG, and again it is not well understood how the choice of the matrix influences the physical inter-pretation of the inferred rates. Here, we develop a theory of measurement for site-specific evolutionary rates, by analytically solving the maximum-likelihood equations for rate inference performed on sequences evolved under a mutation–selection model. We demonstrate that for realistic analysis settings the measurement process will recover the true expected rates of the mutation–selection model if rates are measured relative to a naïve exchangeability matrix, in which all exchangeabilities are equal to 1/19. We also show that rate measurements using other matrices are quantitatively close but in general not mathematically equivalent. Our results demonstrate that insights obtained from phylogenetic-tree inference do not necessarily apply to rate inference, and best practices for the former may be deleterious for the latter.Significance StatementMaximum likelihood inference is widely used to infer model parameters from sequence data in an evolutionary context. One major challenge in such inference procedures is the problem of having to identify the appropriate model used for inference. Model parameters usually are meaningful only to the extent that the model is appropriately specified and matches the process that generated the data. However, in practice, we don’t know what process generated the data, and most models in actual use are misspecified. To circumvent this problem, we show here that we can employ maximum likelihood inference to make defined and meaningful measurements on arbitrary processes. Our approach uses misspecification as a deliberate strategy, and this strategy results in robust and meaningful parameter inference.


1996 ◽  
Vol 76 (3) ◽  
pp. 887-926 ◽  
Author(s):  
H. A. Fozzard ◽  
D. A. Hanck

Cardiac and nerve Na channels have broadly similar functional properties and amino acid sequences, but they demonstrate specific differences in gating, permeation, ionic block, modulation, and pharmacology. Resolution of three-dimensional structures of Na channels is unlikely in the near future, but a number of amino acid sequences from a variety of species and isoforms are known so that channel differences can be exploited to gain insight into the relationship of structure to function. The combination of molecular biology to create chimeras and channels with point mutations and high-resolution electrophysiological techniques to study function encourage the idea that predictions of structure from function are possible. With the goal of understanding the special properties of the cardiac Na channel, this review examines the structural (sequence) similarities between the cardiac and nerve channels and considers what is known about the relationship of structure to function for voltage-dependent Na channels in general and for the cardiac Na channels in particular.


1987 ◽  
Vol 7 (6) ◽  
pp. 2231-2242 ◽  
Author(s):  
J E Rudolph ◽  
M Kimble ◽  
H D Hoyle ◽  
M A Subler ◽  
E C Raff

The genomic DNA sequence and deduced amino acid sequence are presented for three Drosophila melanogaster beta-tubulins: a developmentally regulated isoform beta 3-tubulin, the wild-type testis-specific isoform beta 2-tubulin, and an ethyl methanesulfonate-induced assembly-defective mutation of the testis isoform, B2t8. The testis-specific beta 2-tubulin is highly homologous to the major vertebrate beta-tubulins, but beta 3-tubulin is considerably diverged. Comparison of the amino acid sequences of the two Drosophila isoforms to those of other beta-tubulins indicates that these two proteins are representative of an ancient sequence divergence event which at least preceded the split between lines leading to vertebrates and invertebrates. The intron/exon structures of the genes for beta 2- and beta 3-tubulin are not the same. The structure of the gene for the variant beta 3-tubulin isoform, but not that of the testis-specific beta 2-tubulin gene, is similar to that of vertebrate beta-tubulins. The mutation B2t8 in the gene for the testis-specific beta 2-tubulin defines a single amino acid residue required for normal assembly function of beta-tubulin. The sequence of the B2t8 gene is identical to that of the wild-type gene except for a single nucleotide change resulting in the substitution of lysine for glutamic acid at residue 288. This position falls at the junction between two major structural domains of the beta-tubulin molecule. Although this hinge region is relatively variable in sequence among different beta-tubulins, the residue corresponding to glu 288 of Drosophila beta 2-tubulin is highly conserved as an acidic amino acid not only in all other beta-tubulins but in alpha-tubulins as well.


1996 ◽  
Vol 315 (3) ◽  
pp. 807-814 ◽  
Author(s):  
Said MODARESSI ◽  
Bruno CHRIST ◽  
Jutta BRATKE ◽  
Stefan ZAHN ◽  
Tilman HEISE ◽  
...  

In human liver, phosphoenolpyruvate carboxykinase (PCK; EC 4.1.1.32) is about equally distributed between cytosol and mitochondria in contrast with rat liver in which it is essentially a cytosolic enzyme. Recently, the isolation of the gene and cDNA of the human cytosolic enzyme has been reported [Ting, Burgess, Chamberlain, Keith, Falls and Meisler (1993) Genomics 16, 698–706; Stoffel, Xiang, Espinosa, Cox, Le Beau and Bell (1993) Hum. Mol. Genet. 2, 1–4]. It was the goal of this investigation to isolate the cDNA of the human mitochondrial form of hepatic PCK. A human liver cDNA library was screened with a rat cytosolic PCK cDNA probe comprising sequences from exons 2 to 9. A cDNA clone was isolated which had overall a 68% DNA sequence and a 70% deduced amino acid sequence identity with the human cytosolic PCK cDNA. Without the flanking 270 bases (=90 amino acids) each at the 5´ and 3´ end, the sequence identity was 73% on the DNA and 78% on the amino acid level. The isolated cDNA had an open reading frame of 1920 bp; it was 54 bp (equivalent to 18 amino acids) longer than that of human or rat cytosolic PCK cDNA. The isolated cDNA was cloned into the eukaryotic expression vector pcDNAI and transfected into human embryonal kidney cells HEK293; PCK activity was increased by 3-fold in the mitochondria, which normally contain 70% of total PCK activity, but not in the cytosol. The isolated cDNA was also transfected into cultured rat hepatocytes; again, PCK activity was enhanced by about 40-fold in the mitochondria, which normally possess only 10% of total PCK activity, but not in the cytosol. In the rat hepatocytes only the endogenous cytosolic PCK and not the transfected mitochondrial PCK was induced 3-fold with glucagon. Comparison of the amino acid sequences deduced from the isolated cDNA with human and rat cytosolic PCK showed that the additional 18 amino acids were located at the N-terminus of the protein and probably constitute a mitochondrial targeting signal. Northern-blot analyses revealed the human mitochondrial PCK mRNA to be 2.25 kb long, about 0.6 kb shorter than the mRNA of the cytosolic PCK. Primer extension experiments showed that the 5´-untranslated region of mitochondrial PCK mRNA was 134 nucleotides in length.


Sign in / Sign up

Export Citation Format

Share Document