Site-Specific Structural Constraints on Protein Sequence Evolutionary Divergence: Local Packing Density versus Solvent Exposure

Functional and biophysical constraints result in site-dependent patterns of protein sequence variability. It is commonly assumed that the key structural determinant of site-specific rates of evolution is the Relative Solvent Accessibility (RSA). However, a recent study found that amino acid substitution rates correlate better with two Local Packing Density (LPD) measures, the Weighted Contact Number (WCN) and the Contact Number (CN), than with RSA. This work aims at a more thorough assessment. To this end, in addition to substitution rates, we considered four other sequence variability scores, four measures of solvent accessibility (SA), and other CN measures. We compared all properties for each protein of a structurally and functionally diverse representative dataset of monomeric enzymes. We show that the best sequence variability measures take into account phylogenetic tree topology. More importantly, we show that both LPD measures (WCN and CN) correlate better than all of the SA measures, regardless of the sequence variability score used. Moreover, the independent contribution of the best LPD measure is approximately four times larger than that of the best SA measure. This study strongly supports the conclusion that a site’s packing density rather than its solvent accessibility is the main structural determinant of its rate of evolution.

Download Full-text

Dissecting the roles of local packing density and longer-range effects in protein sequence evolution

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.25034 ◽

2016 ◽

Vol 84 (6) ◽

pp. 841-854 ◽

Cited By ~ 14

Author(s):

Amir Shahmoradi ◽

Claus O. Wilke

Keyword(s):

Packing Density ◽

Protein Sequence ◽

Sequence Evolution ◽

Local Packing Density ◽

Protein Sequence Evolution

Download Full-text

Dissecting the roles of local packing density and longer-range effects in protein sequence evolution

10.1101/023499 ◽

2015 ◽

Author(s):

Amir Shahmoradi ◽

Claus O Wilke

Keyword(s):

Packing Density ◽

Solvent Accessibility ◽

Voronoi Cell ◽

Sequence Evolution ◽

Relative Importance ◽

Site Specific ◽

Contact Number ◽

Evolutionary Variation ◽

Local Packing Density ◽

Protein Sequence Evolution

What are the structural determinants of protein sequence evolution? A number of site-specific structural characteristics have been proposed, most of which are broadly related to either the density of contacts or the solvent accessibility of individual residues. Most importantly, there has been disagreement in the literature over the relative importance of solvent accessibility and local packing density for explaining site-specific sequence variability in proteins. We show here that this discussion has been confounded by the definition of local packing density. The most commonly used measures of local packing, such as the contact number and the weighted contact number, represent by definition the combined effects of local packing density and longer-range effects. As an alternative, we here propose a truly local measure of packing density around a single residue, based on the Voronoi cell volume. We show that the Voronoi cell volume, when calculated relative to the geometric center of amino-acid side chains, behaves nearly identically to the relative solvent accessibility, and both can explain, on average, approximately 34\% of the site-specific variation in evolutionary rate in a data set of 209 enzymes. An additional 10\% of variation can be explained by non-local effects that are captured in the weighted contact number. Consequently, evolutionary variation at a site is determined by the combined action of the immediate amino-acid neighbors of that site and of effects mediated by more distant amino acids. We conclude that instead of contrasting solvent accessibility and local packing density, future research should emphasize the relative importance of immediate contacts and longer-range effects on evolutionary variation.

Download Full-text

Nanotribology of Octadecyltrichlorosilane Monolayers and Silicon: Self-Mated versus Unmated Interfaces and Local Packing Density Effects

Langmuir ◽

10.1021/la063644e ◽

2007 ◽

Vol 23 (18) ◽

pp. 9242-9252 ◽

Cited By ~ 64

Author(s):

Erin E. Flater ◽

W. Robert Ashurst ◽

Robert W. Carpick

Keyword(s):

Packing Density ◽

Density Effects ◽

Local Packing Density

Download Full-text

Amino-acid site variability among natural and designed proteins

10.7287/peerj.preprints.74 ◽

2013 ◽

Author(s):

Eleisha L. Jackson ◽

Noah Ollikainen ◽

Arthur W. Covert III ◽

Tanja Kortemme ◽

Claus O. Wilke

Keyword(s):

Amino Acid ◽

Protein Design ◽

Protein Sequences ◽

Structural Constraints ◽

Scoring Functions ◽

Solvent Exposure ◽

Backbone Flexibility ◽

Hydrophobic Residues ◽

Designed Proteins ◽

Site Variability

Computational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility. We find that proteins designed with a fixed backbone tend to underestimate the amount of site variability observed in natural proteins while proteins designed with an intermediate amount of backbone flexibility result in more realistic site variability. Further, the correlation between solvent exposure and site variability in designed proteins is lower than that in natural proteins. This finding suggests that site variability is too uniform across different solvent exposure states (i.e., buried residues are too variable or exposed residues too conserved). When comparing the amino acid frequencies in the designed proteins with those in natural proteins we find that in the designed proteins hydrophobic residues are underrepresented in the core. From these results we conclude that intermediate backbone flexibility during design results in more accurate protein design and that either scoring functions or backbone sampling methods require further improvement to accurately replicate structural constraints on site variability.

Download Full-text

Cross-species comparison of site-specific evolutionary-rate variation in influenza haemagglutinin

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2012.0334 ◽

2013 ◽

Vol 368 (1614) ◽

pp. 20120334 ◽

Cited By ~ 17

Author(s):

Austin G. Meyer ◽

Eric T. Dawson ◽

Claus O. Wilke

Keyword(s):

Protein Structure ◽

Sialic Acid ◽

Avian Influenza ◽

Evolutionary Rate ◽

Solvent Accessibility ◽

Rate Variation ◽

Structural Constraints ◽

Binding Region ◽

Site Specific ◽

Sialic Acid Binding

We investigate the causes of site-specific evolutionary-rate variation in influenza haemagglutinin (HA) between human and avian influenza, for subtypes H1, H3, and H5. By calculating the evolutionary-rate ratio, ω = d N /d S as a function of a residue's solvent accessibility in the three-dimensional protein structure, we show that solvent accessibility has a significant but relatively modest effect on site-specific rate variation. By comparing rates within HA subtypes among host species, we derive an upper limit to the amount of variation that can be explained by structural constraints of any kind. Protein structure explains only 20–40% of the variation in ω . Finally, by comparing ω at sites near the sialic-acid-binding region to ω at other sites, we show that ω near the sialic-acid-binding region is significantly elevated in both human and avian influenza, with the exception of avian H5. We conclude that protein structure, HA subtype, and host biology all impose distinct selection pressures on sites in influenza HA.

Download Full-text

Measuring site-specific glycosylation similarity between influenza A virus variants with statistical certainty

10.1101/2020.03.13.991380 ◽

2020 ◽

Author(s):

Deborah Chang ◽

William E. Hackett ◽

Lei Zhong ◽

Xiu-Feng Wan ◽

Joseph Zaia

Keyword(s):

Influenza A Virus ◽

Protein Sequence ◽

Influenza A ◽

Viral Fitness ◽

Antigenic Drift ◽

Viral Escape ◽

Strain Selection ◽

Expression Vectors ◽

Site Specific ◽

Measuring Site

AbstractInfluenza A virus (IAV) mutates rapidly, resulting in antigenic drift and poor year-to-year vaccine effectiveness. One challenge in designing effective vaccines is that genetic mutations frequently cause amino acid variations in IAV envelope protein hemagglutinin (HA) that create new N-glycosylation sequons; resulting N-glycans cause antigenic shielding, allowing viral escape from adaptive immune responses. Vaccine candidate strain selection currently involves correlating antigenicity with HA protein sequence among circulating strains, but quantitative comparison of site-specific glycosylation information may likely improve the ability to design vaccines with broader effectiveness against evolving strains. However, there is poor understanding of the influence of glycosylation on immunodominance, antigenicity, and immunogenicity of HA, and there are no well-tested methods for comparing glycosylation similarity among virus samples. Here, we present a method for statistically rigorous quantification of similarity between two related virus strains that considers the presence and abundance of glycopeptide glycoforms. We demonstrate the strength of our approach by determining that there was a quantifiable difference in glycosylation at the protein level between wild-type IAV HA from A/Switzerland/9715293/2013 (SWZ13) and a mutant strain of SWZ13, even though no N-glycosylation sequons were changed. We determined site-specifically that WT and mutant HA have varying similarity at the glycosylation sites of the head domain, reflecting competing pressures to evade host immune response while retaining viral fitness. To our knowledge, our results are the first to quantify changes in glycosylation state that occur in related proteins of considerable glycan heterogeneity. Our results provide a method for understanding how changes in glycosylation state are correlated with variations in protein sequence, which is necessary for improving IAV vaccine strain selection. Understanding glycosylation will be especially important as we find new expression vectors for vaccine production, as glycosylation state depends greatly on the host species.

Download Full-text

Amino-acid site variability among natural and designed proteins

10.7287/peerj.preprints.74v1 ◽

2013 ◽

Author(s):

Eleisha L. Jackson ◽

Noah Ollikainen ◽

Arthur W. Covert III ◽

Tanja Kortemme ◽

Claus O. Wilke

Keyword(s):

Amino Acid ◽

Protein Design ◽

Protein Sequences ◽

Structural Constraints ◽

Scoring Functions ◽

Solvent Exposure ◽

Backbone Flexibility ◽

Hydrophobic Residues ◽

Designed Proteins ◽

Site Variability

Computational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility. We find that proteins designed with a fixed backbone tend to underestimate the amount of site variability observed in natural proteins while proteins designed with an intermediate amount of backbone flexibility result in more realistic site variability. Further, the correlation between solvent exposure and site variability in designed proteins is lower than that in natural proteins. This finding suggests that site variability is too uniform across different solvent exposure states (i.e., buried residues are too variable or exposed residues too conserved). When comparing the amino acid frequencies in the designed proteins with those in natural proteins we find that in the designed proteins hydrophobic residues are underrepresented in the core. From these results we conclude that intermediate backbone flexibility during design results in more accurate protein design and that either scoring functions or backbone sampling methods require further improvement to accurately replicate structural constraints on site variability.

Download Full-text

Measuring Site-specific Glycosylation Similarity between Influenza a Virus Variants with Statistical Certainty

Molecular & Cellular Proteomics ◽

10.1074/mcp.ra120.002031 ◽

2020 ◽

Vol 19 (9) ◽

pp. 1533-1545

Author(s):

Deborah Chang ◽

William E. Hackett ◽

Lei Zhong ◽

Xiu-Feng Wan ◽

Joseph Zaia

Keyword(s):

Influenza A Virus ◽

Protein Sequence ◽

Influenza A ◽

Viral Fitness ◽

Antigenic Drift ◽

Viral Escape ◽

Strain Selection ◽

Expression Vectors ◽

Site Specific ◽

Measuring Site

Influenza A virus (IAV) mutates rapidly, resulting in antigenic drift and poor year-to-year vaccine effectiveness. One challenge in designing effective vaccines is that genetic mutations frequently cause amino acid variations in IAV envelope protein hemagglutinin (HA) that create new N-glycosylation sequons; resulting N-glycans cause antigenic shielding, allowing viral escape from adaptive immune responses. Vaccine candidate strain selection currently involves correlating antigenicity with HA protein sequence among circulating strains, but quantitative comparison of site-specific glycosylation information may likely improve the ability to design vaccines with broader effectiveness against evolving strains. However, there is poor understanding of the influence of glycosylation on immunodominance, antigenicity, and immunogenicity of HA, and there are no well-tested methods for comparing glycosylation similarity among virus samples. Here, we present a method for statistically rigorous quantification of similarity between two related virus strains that considers the presence and abundance of glycopeptide glycoforms. We demonstrate the strength of our approach by determining that there was a quantifiable difference in glycosylation at the protein level between WT IAV HA from A/Switzerland/9715293/2013 (SWZ13) and a mutant strain of SWZ13, even though no N-glycosylation sequons were changed. We determined site-specifically that WT and mutant HA have varying similarity at the glycosylation sites of the head domain, reflecting competing pressures to evade host immune response while retaining viral fitness. To our knowledge, our results are the first to quantify changes in glycosylation state that occur in related proteins of considerable glycan heterogeneity. Our results provide a method for understanding how changes in glycosylation state are correlated with variations in protein sequence, which is necessary for improving IAV vaccine strain selection. Understanding glycosylation will be especially important as we find new expression vectors for vaccine production, as glycosylation state depends greatly on the host species.

Download Full-text