Solvent Accessibility of Residues Undergoing Pathogenic Variations in Humans: From Protein Structures to Protein Sequences

Solvent accessibility (SASA) is a key feature of proteins for determining their folding and stability. SASA is computed from protein structures with different algorithms, and from protein sequences with machine-learning based approaches trained on solved structures. Here we ask the question as to which extent solvent exposure of residues can be associated to the pathogenicity of the variation. By this, SASA of the wild-type residue acquires a role in the context of functional annotation of protein single-residue variations (SRVs). By mapping variations on a curated database of human protein structures, we found that residues targeted by disease related SRVs are less accessible to solvent than residues involved in polymorphisms. The disease association is not evenly distributed among the different residue types: SRVs targeting glycine, tryptophan, tyrosine, and cysteine are more frequently disease associated than others. For all residues, the proportion of disease related SRVs largely increases when the wild-type residue is buried and decreases when it is exposed. The extent of the increase depends on the residue type. With the aid of an in house developed predictor, based on a deep learning procedure and performing at the state-of-the-art, we are able to confirm the above tendency by analyzing a large data set of residues subjected to variations and occurring in some 12,494 human protein sequences still lacking three-dimensional structure (derived from HUMSAVAR). Our data support the notion that surface accessible area is a distinguished property of residues that undergo variation and that pathogenicity is more frequently associated to the buried property than to the exposed one.

Download Full-text

Prediction and Analysis of Surface Hydrophobic Residues in Tertiary Structure of Proteins

The Scientific World JOURNAL ◽

10.1155/2014/971258 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 19

Author(s):

Shambhu Malleshappa Gowder ◽

Jhinuk Chatterjee ◽

Tanusree Chaudhuri ◽

Kusum Paul

Keyword(s):

Tertiary Structure ◽

Structural Information ◽

Solvent Accessibility ◽

Conservation Score ◽

Protein Structures ◽

Three Dimensional ◽

Dimensional Structure ◽

Data Set ◽

Hydrophobic Residues ◽

Monomeric Proteins

The analysis of protein structures provides plenty of information about the factors governing the folding and stability of proteins, the preferred amino acids in the protein environment, the location of the residues in the interior/surface of a protein and so forth. In general, hydrophobic residues such as Val, Leu, Ile, Phe, and Met tend to be buried in the interior and polar side chains exposed to solvent. The present work depends on sequence as well as structural information of the protein and aims to understand nature of hydrophobic residues on the protein surfaces. It is based on the nonredundant data set of 218 monomeric proteins. Solvent accessibility of each protein was determined using NACCESS software and then obtained the homologous sequences to understand how well solvent exposed and buried hydrophobic residues are evolutionarily conserved and assigned the confidence scores to hydrophobic residues to be buried or solvent exposed based on the information obtained from conservation score and knowledge of flanking regions of hydrophobic residues. In the absence of a three-dimensional structure, the ability to predict surface accessibility of hydrophobic residues directly from the sequence is of great help in choosing the sites of chemical modification or specific mutations and in the studies of protein stability and molecular interactions.

Download Full-text

The structural determinants of intra-protein compensatory substitutions

10.1101/2021.11.11.468231 ◽

2021 ◽

Author(s):

Shilpi Chaurasia ◽

Julien Y Dutheil

Keyword(s):

Fitness Landscape ◽

General Scheme ◽

Protein Structures ◽

Three Dimensional ◽

Charge Compensation ◽

Dimensional Structure ◽

Solvent Exposure ◽

Structural Determinants ◽

Sequence Alignments ◽

Compensatory Mutations

Compensating substitutions happen when one mutation is advantageously selected because it restores the loss of fitness induced by a previous deleterious mutation. How frequent such mutations occur in evolution and what is the structural and functional context permitting their emergence remain open questions. We built an atlas of intra-protein compensatory substitutions using a phylogenetic approach and a dataset of 1,630 bacterial protein families for which high-quality sequence alignments and experimentally derived protein structures were available. We identified more than 51,000 positions coevolving by the mean of predicted compensatory mutations. Using the evolutionary and structural properties of the analyzed positions, we demonstrate that compensatory mutations are scarce (typically only a few in the protein history) but widespread (the majority of proteins experienced at least one). Typical coevolving residues are evolving slowly, are located in the protein core outside secondary structure motifs, and are more often in contact than expected by chance, even after accounting for their evolutionary rate and solvent exposure. An exception to this general scheme are residues coevolving for charge compensation, which are evolving faster than non-coevolving sites, in contradiction with predictions from simple coevolutionary models, but similar to stem pairs in RNA. While sites with a significant pattern of coevolution by compensatory mutations are rare, the comparative analysis of hundreds of structures ultimately permits a better understanding of the link between the three-dimensional structure of a protein and its fitness landscape.

Download Full-text

FRACTAL ASPECTS OF PROTEIN STRUCTURE AND DYNAMICS

Fractals ◽

10.1142/s0218348x93000198 ◽

1993 ◽

Vol 01 (02) ◽

pp. 179-189 ◽

Cited By ~ 6

Author(s):

T. GREGORY DEWEY

Keyword(s):

Dynamic Properties ◽

Linear Chain ◽

Protein Structures ◽

Three Dimensional ◽

Alpha Helix ◽

Large Data ◽

Thermal Fluctuations ◽

Radius Of Gyration ◽

Data Set ◽

Long Range Correlations

Proteins have well-defined three dimensional structures which are dictated by their amino acid sequence. Despite this great specificity, general structural and dynamic properties exist. Scaling relationships for the radius of gyration and surface area of a large data set of proteins are demonstrated in this work. These results show that proteins scale as collapsed polymers. Thermal fluctuations are examined for two different proteins by an analysis of the Debye-Waller factors derived from X-ray crystallographic data. Long-range correlations exist between fluctuations along the backbone. A disordered Ising model is presented which gives similar correlations. To further examine the role of multiple connectivity in protein structures, the vibrational spectrum for an alpha helix (linear chain with H-bonds) is analyzed from recursive relationships derived using a decimation technique.

Download Full-text

Two Functional States of the CD11b A-Domain: Correlations with Key Features of Two Mn2+-complexed Crystal Structures

The Journal of Cell Biology ◽

10.1083/jcb.143.6.1523 ◽

1998 ◽

Vol 143 (6) ◽

pp. 1523-1534 ◽

Cited By ~ 96

Author(s):

Rui Li ◽

Philippe Rieu ◽

Diana L. Griffith ◽

David Scott ◽

M. Amin Arnaout

Keyword(s):

Active Form ◽

Three Dimensional ◽

Dimensional Structure ◽

Side Chain ◽

Wild Type ◽

Solvent Exposure ◽

Open Conformation ◽

Functional States ◽

Open Versus ◽

Metal Ligand

In the presence of bound Mn2+, the three- dimensional structure of the ligand-binding A-domain from the integrin CR3 (CD11b/CD18) is shown to exist in the “open” conformation previously described only for a crystalline Mg2+ complex. The open conformation is distinguished from the “closed” form by the solvent exposure of F302, a direct T209–Mn2+ bond, and the presence of a glutamate side chain in the MIDAS site. Approximately 10% of wild-type CD11b A-domain is present in an “active” state (binds to activation-dependent ligands, e.g., iC3b and the mAb 7E3). In the isolated domain and in the holoreceptor, the percentage of the active form can be quantitatively increased or abolished in F302W and T209A mutants, respectively. The iC3b-binding site is located on the MIDAS face and includes conformationally sensitive residues that undergo significant shifts in the open versus closed structures. We suggest that stabilization of the open structure is independent of the nature of the metal ligand and that the open conformation may represent the physiologically active form.

Download Full-text

PROCARB: A Database of Known and Modelled Carbohydrate-Binding Protein Structures with Sequence-Based Prediction Tools

Advances in Bioinformatics ◽

10.1155/2010/436036 ◽

2010 ◽

Vol 2010 ◽

pp. 1-9 ◽

Cited By ~ 14

Author(s):

Adeel Malik ◽

Ahmad Firoz ◽

Vivekanand Jha ◽

Shandar Ahmad

Keyword(s):

Binding Sites ◽

De Novo ◽

Solvent Accessibility ◽

Protein Structures ◽

Three Dimensional ◽

Data Bank ◽

Dimensional Structure ◽

Carbohydrate Binding ◽

Structural And Functional Properties ◽

Three Dimensional Models

Understanding of the three-dimensional structures of proteins that interact with carbohydrates covalently (glycoproteins) as well as noncovalently (protein-carbohydrate complexes) is essential to many biological processes and plays a significant role in normal and disease-associated functions. It is important to have a central repository of knowledge available about these protein-carbohydrate complexes as well as preprocessed data of predicted structures. This can be significantly enhanced by tools de novo which can predict carbohydrate-binding sites for proteins in the absence of structure of experimentally known binding site. PROCARB is an open-access database comprising three independently working components, namely, (i) Core PROCARB module, consisting of three-dimensional structures of protein-carbohydrate complexes taken from Protein Data Bank (PDB), (ii) Homology Models module, consisting of manually developed three-dimensional models of N-linked and O-linked glycoproteins of unknown three-dimensional structure, and (iii) CBS-Pred prediction module, consisting of web servers to predict carbohydrate-binding sites using single sequence or server-generated PSSM. Several precomputed structural and functional properties of complexes are also included in the database for quick analysis. In particular, information about function, secondary structure, solvent accessibility, hydrogen bonds and literature reference, and so forth, is included. In addition, each protein in the database is mapped to Uniprot, Pfam, PDB, and so forth.

Download Full-text

Structural analysis of mutations in the Drosophila beta 2-tubulin isoform reveals regions in the beta-tubulin molecular required for general and for tissue-specific microtubule functions.

Genetics ◽

10.1093/genetics/139.1.267 ◽

1995 ◽

Vol 139 (1) ◽

pp. 267-286 ◽

Cited By ~ 2

Author(s):

J D Fackenthal ◽

J A Hutchens ◽

F R Turner ◽

E C Raff

Keyword(s):

Amino Acid ◽

Three Dimensional ◽

Variable Region ◽

Internal Variable ◽

Microtubule Assembly ◽

Dimensional Structure ◽

Wild Type ◽

Beta Tubulin ◽

Assembly Kinetics ◽

Beta 2

Abstract We have determined the lesions in a number of mutant alleles of beta Tub85D, the gene that encodes the testis-specific beta 2-tubulin isoform in Drosophila melanogaster. Mutations responsible for different classes of functional phenotypes are distributed throughout the beta 2-tubulin molecule. There is a telling correlation between the degree of phylogenetic conservation of the altered residues and the number of different microtubule categories disrupted by the lesions. The majority of lesions occur at positions that are evolutionarily highly conserved in all beta-tubulins; these lesions disrupt general functions common to multiple classes of microtubules. However, a single allele B2t6 contains an amino acid substitution within an internal cluster of variable amino acids that has been identified as an isotype-defining domain in vertebrate beta-tubulins. Correspondingly, B2t6 disrupts only a subset of microtubule functions, resulting in misspecification of the morphology of the doublet microtubules of the sperm tail axoneme. We previously demonstrated that beta 3, a developmentally regulated Drosophila beta-tubulin isoform, confers the same restricted morphological phenotype in a dominant way when it is coexpressed in the testis with wild-type beta 2-tubulin. We show here by complementation analysis that beta 3 and the B2t6 product disrupt a common aspect of microtubule assembly. We therefore conclude that the amino acid sequence of the beta 2-tubulin internal variable region is required for generation of correct axoneme morphology but not for general microtubule functions. As we have previously reported, the beta 2-tubulin carboxy terminal isotype-defining domain is required for suprastructural organization of the axoneme. We demonstrate here that the beta 2 variant lacking the carboxy terminus and the B2t6 variant complement each other for mild-to-moderate meiotic defects but do not complement for proper axonemal morphology. Our results are consistent with the hypothesis drawn from comparisons of vertebrate beta-tubulins that the two isotype-defining domains interact in a three-dimensional structure in wild-type beta-tubulins. We propose that the integrity of this structure in the Drosophila testis beta 2-tubulin isoform is required for proper axoneme assembly but not necessarily for general microtubule functions. On the basis of our observations we present a model for regulation of axoneme microtubule morphology as a function of tubulin assembly kinetics.

Download Full-text

Substitution of murine ferrochelatase glutamate-287 with glutamine or alanine leads to porphyrin substrate-bound variants

Biochemical Journal ◽

10.1042/bj3560217 ◽

2001 ◽

Vol 356 (1) ◽

pp. 217-222 ◽

Cited By ~ 5

Author(s):

Ricardo FRANCO ◽

Alice S. PEREIRA ◽

Pedro TAVARES ◽

Arianna MANGRAVITA ◽

Michael J. BARBER ◽

...

Keyword(s):

Absorption Spectra ◽

Catalytic Mechanism ◽

Protoporphyrin Ix ◽

Three Dimensional ◽

Iron Chelation ◽

Enzymic Activity ◽

Dimensional Structure ◽

Wild Type ◽

Wild Type Enzyme ◽

Flow Experiments

Ferrochelatase (EC 4.99.1.1) is the terminal enzyme of the haem biosynthetic pathway and catalyses iron chelation into the protoporphyrin IX ring. Glutamate-287 (E287) of murine mature ferrochelatase is a conserved residue in all known sequences of ferrochelatase, is present at the active site of the enzyme, as inferred from the Bacillus subtilis ferrochelatase three-dimensional structure, and is critical for enzyme activity. Substitution of E287 with either glutamine (Q) or alanine (A) yielded variants with lower enzymic activity than that of the wild-type ferrochelatase and with different absorption spectra from the wild-type enzyme. In contrast to the wild-type enzyme, the absorption spectra of the variants indicate that these enzymes, as purified, contain protoporphyrin IX. Identification and quantification of the porphyrin bound to the E287-directed variants indicate that approx. 80% of the total porphyrin corresponds to protoporphyrin IX. Significantly, rapid stopped-flow experiments of the E287A and E287Q variants demonstrate that reaction with Zn2+ results in the formation of bound Zn-protoporphyrin IX, indicating that the endogenously bound protoporphyrin IX can be used as a substrate. Taken together, these findings suggest that the structural strain imposed by ferrochelatase on the porphyrin substrate as a critical step in the enzyme catalytic mechanism is also accomplished by the E287A and E287Q variants, but without the release of the product. Thus E287 in murine ferrochelatase appears to be critical for the catalytic process by controlling the release of the product.

Download Full-text

Beyond History: The List of The Most Well Studied Human Protein Structures

10.20944/preprints202008.0655.v1 ◽

2020 ◽

Author(s):

Zhenlu Li ◽

Matthias Buck

Keyword(s):

Protein Structures ◽

Protein Sequences ◽

Human Protein ◽

Current Status ◽

Protein Database ◽

X Ray ◽

X Ray Crystallography ◽

Protein Biophysics ◽

The Relationship ◽

Past Trend

Of 20,000 or so canonical human protein sequences, as of July 2020, 6,747 proteins have had their full or partial medium to high-resolution structures determined by x-ray crystallography or other methods. Which of these proteins dominate the protein database (the PDB) and why? In this paper, we list the 272 top protein structures based on the number of their PDB depositions. This set of proteins accounts for more than 40% of all available human PDB entries and represent past trend and current status for protein science. We briefly discuss the relationship which some of the prominent protein structures have with protein biophysics research and mention their relevance to human diseases. The information may inspire researchers who are new to protein science, but it also provides a year 2020 snap-shot for the state of protein science.

Download Full-text

Prediction of Structural and Functional Aspects of Protein

Advances in Secure Computing, Internet Services, and Applications - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-4666-4940-8.ch016 ◽

2014 ◽

pp. 317-333

Author(s):

Arun G. Ingale

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Tertiary Structure ◽

Protein Structures ◽

Three Dimensional ◽

Dimensional Structure ◽

Sequence Information ◽

Predict Protein Structure ◽

Basic Ideas

To predict the structure of protein from a primary amino acid sequence is computationally difficult. An investigation of the methods and algorithms used to predict protein structure and a thorough knowledge of the function and structure of proteins are critical for the advancement of biology and the life sciences as well as the development of better drugs, higher-yield crops, and even synthetic bio-fuels. To that end, this chapter sheds light on the methods used for protein structure prediction. This chapter covers the applications of modeled protein structures and unravels the relationship between pure sequence information and three-dimensional structure, which continues to be one of the greatest challenges in molecular biology. With this resource, it presents an all-encompassing examination of the problems, methods, tools, servers, databases, and applications of protein structure prediction, giving unique insight into the future applications of the modeled protein structures. In this chapter, current protein structure prediction methods are reviewed for a milieu on structure prediction, the prediction of structural fundamentals, tertiary structure prediction, and functional imminent. The basic ideas and advances of these directions are discussed in detail.

Download Full-text

MRPC (Missing Regions in Polypeptide Chains): a knowledgebase

Journal of Applied Crystallography ◽

10.1107/s1600576719012330 ◽

2019 ◽

Vol 52 (6) ◽

pp. 1422-1426

Author(s):

Rajendran Santhosh ◽

Namrata Bankoti ◽

Adgonda Malgonnavar Padmashri ◽

Daliah Michael ◽

Jeyaraman Jeyakanthan ◽

...

Keyword(s):

Protein Structures ◽

Three Dimensional ◽

Protein Molecule ◽

Data Bank ◽

Protein Crystal ◽

Dimensional Structure ◽

Protein Structure Analysis ◽

Three Dimensional Structure ◽

X Ray Crystallography ◽

Polypeptide Chains

Missing regions in protein crystal structures are those regions that cannot be resolved, mainly owing to poor electron density (if the three-dimensional structure was solved using X-ray crystallography). These missing regions are known to have high B factors and could represent loops with a possibility of being part of an active site of the protein molecule. Thus, they are likely to provide valuable information and play a crucial role in the design of inhibitors and drugs and in protein structure analysis. In view of this, an online database, Missing Regions in Polypeptide Chains (MRPC), has been developed which provides information about the missing regions in protein structures available in the Protein Data Bank. In addition, the new database has an option for users to obtain the above data for non-homologous protein structures (25 and 90%). A user-friendly graphical interface with various options has been incorporated, with a provision to view the three-dimensional structure of the protein along with the missing regions using JSmol. The MRPC database is updated regularly (currently once every three months) and can be accessed freely at the URL http://cluster.physics.iisc.ac.in/mrpc.

Download Full-text