scholarly journals The structural determinants of intra-protein compensatory substitutions

2021 ◽  
Author(s):  
Shilpi Chaurasia ◽  
Julien Y Dutheil

Compensating substitutions happen when one mutation is advantageously selected because it restores the loss of fitness induced by a previous deleterious mutation. How frequent such mutations occur in evolution and what is the structural and functional context permitting their emergence remain open questions. We built an atlas of intra-protein compensatory substitutions using a phylogenetic approach and a dataset of 1,630 bacterial protein families for which high-quality sequence alignments and experimentally derived protein structures were available. We identified more than 51,000 positions coevolving by the mean of predicted compensatory mutations. Using the evolutionary and structural properties of the analyzed positions, we demonstrate that compensatory mutations are scarce (typically only a few in the protein history) but widespread (the majority of proteins experienced at least one). Typical coevolving residues are evolving slowly, are located in the protein core outside secondary structure motifs, and are more often in contact than expected by chance, even after accounting for their evolutionary rate and solvent exposure. An exception to this general scheme are residues coevolving for charge compensation, which are evolving faster than non-coevolving sites, in contradiction with predictions from simple coevolutionary models, but similar to stem pairs in RNA. While sites with a significant pattern of coevolution by compensatory mutations are rare, the comparative analysis of hundreds of structures ultimately permits a better understanding of the link between the three-dimensional structure of a protein and its fitness landscape.

2021 ◽  
Vol 7 ◽  
Author(s):  
Castrense Savojardo ◽  
Matteo Manfredi ◽  
Pier Luigi Martelli ◽  
Rita Casadio

Solvent accessibility (SASA) is a key feature of proteins for determining their folding and stability. SASA is computed from protein structures with different algorithms, and from protein sequences with machine-learning based approaches trained on solved structures. Here we ask the question as to which extent solvent exposure of residues can be associated to the pathogenicity of the variation. By this, SASA of the wild-type residue acquires a role in the context of functional annotation of protein single-residue variations (SRVs). By mapping variations on a curated database of human protein structures, we found that residues targeted by disease related SRVs are less accessible to solvent than residues involved in polymorphisms. The disease association is not evenly distributed among the different residue types: SRVs targeting glycine, tryptophan, tyrosine, and cysteine are more frequently disease associated than others. For all residues, the proportion of disease related SRVs largely increases when the wild-type residue is buried and decreases when it is exposed. The extent of the increase depends on the residue type. With the aid of an in house developed predictor, based on a deep learning procedure and performing at the state-of-the-art, we are able to confirm the above tendency by analyzing a large data set of residues subjected to variations and occurring in some 12,494 human protein sequences still lacking three-dimensional structure (derived from HUMSAVAR). Our data support the notion that surface accessible area is a distinguished property of residues that undergo variation and that pathogenicity is more frequently associated to the buried property than to the exposed one.


Author(s):  
Arun G. Ingale

To predict the structure of protein from a primary amino acid sequence is computationally difficult. An investigation of the methods and algorithms used to predict protein structure and a thorough knowledge of the function and structure of proteins are critical for the advancement of biology and the life sciences as well as the development of better drugs, higher-yield crops, and even synthetic bio-fuels. To that end, this chapter sheds light on the methods used for protein structure prediction. This chapter covers the applications of modeled protein structures and unravels the relationship between pure sequence information and three-dimensional structure, which continues to be one of the greatest challenges in molecular biology. With this resource, it presents an all-encompassing examination of the problems, methods, tools, servers, databases, and applications of protein structure prediction, giving unique insight into the future applications of the modeled protein structures. In this chapter, current protein structure prediction methods are reviewed for a milieu on structure prediction, the prediction of structural fundamentals, tertiary structure prediction, and functional imminent. The basic ideas and advances of these directions are discussed in detail.


2019 ◽  
Vol 52 (6) ◽  
pp. 1422-1426
Author(s):  
Rajendran Santhosh ◽  
Namrata Bankoti ◽  
Adgonda Malgonnavar Padmashri ◽  
Daliah Michael ◽  
Jeyaraman Jeyakanthan ◽  
...  

Missing regions in protein crystal structures are those regions that cannot be resolved, mainly owing to poor electron density (if the three-dimensional structure was solved using X-ray crystallography). These missing regions are known to have high B factors and could represent loops with a possibility of being part of an active site of the protein molecule. Thus, they are likely to provide valuable information and play a crucial role in the design of inhibitors and drugs and in protein structure analysis. In view of this, an online database, Missing Regions in Polypeptide Chains (MRPC), has been developed which provides information about the missing regions in protein structures available in the Protein Data Bank. In addition, the new database has an option for users to obtain the above data for non-homologous protein structures (25 and 90%). A user-friendly graphical interface with various options has been incorporated, with a provision to view the three-dimensional structure of the protein along with the missing regions using JSmol. The MRPC database is updated regularly (currently once every three months) and can be accessed freely at the URL http://cluster.physics.iisc.ac.in/mrpc.


2018 ◽  
Vol 19 (11) ◽  
pp. 3401 ◽  
Author(s):  
Ashutosh Srivastava ◽  
Tetsuro Nagai ◽  
Arpita Srivastava ◽  
Osamu Miyashita ◽  
Florence Tama

Protein structural biology came a long way since the determination of the first three-dimensional structure of myoglobin about six decades ago. Across this period, X-ray crystallography was the most important experimental method for gaining atomic-resolution insight into protein structures. However, as the role of dynamics gained importance in the function of proteins, the limitations of X-ray crystallography in not being able to capture dynamics came to the forefront. Computational methods proved to be immensely successful in understanding protein dynamics in solution, and they continue to improve in terms of both the scale and the types of systems that can be studied. In this review, we briefly discuss the limitations of X-ray crystallography in studying protein dynamics, and then provide an overview of different computational methods that are instrumental in understanding the dynamics of proteins and biomacromolecular complexes.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Shambhu Malleshappa Gowder ◽  
Jhinuk Chatterjee ◽  
Tanusree Chaudhuri ◽  
Kusum Paul

The analysis of protein structures provides plenty of information about the factors governing the folding and stability of proteins, the preferred amino acids in the protein environment, the location of the residues in the interior/surface of a protein and so forth. In general, hydrophobic residues such as Val, Leu, Ile, Phe, and Met tend to be buried in the interior and polar side chains exposed to solvent. The present work depends on sequence as well as structural information of the protein and aims to understand nature of hydrophobic residues on the protein surfaces. It is based on the nonredundant data set of 218 monomeric proteins. Solvent accessibility of each protein was determined using NACCESS software and then obtained the homologous sequences to understand how well solvent exposed and buried hydrophobic residues are evolutionarily conserved and assigned the confidence scores to hydrophobic residues to be buried or solvent exposed based on the information obtained from conservation score and knowledge of flanking regions of hydrophobic residues. In the absence of a three-dimensional structure, the ability to predict surface accessibility of hydrophobic residues directly from the sequence is of great help in choosing the sites of chemical modification or specific mutations and in the studies of protein stability and molecular interactions.


2005 ◽  
Vol 391 (1) ◽  
pp. 1-15 ◽  
Author(s):  
K. V. Brinda ◽  
Avadhesha Surolia ◽  
Sarawathi Vishveshwara

The unique three-dimensional structure of both monomeric and oligomeric proteins is encoded in their sequence. The biological functions of proteins are dependent on their tertiary and quaternary structures, and hence it is important to understand the determinants of quaternary association in proteins. Although a large number of investigations have been carried out in this direction, the underlying principles of protein oligomerization are yet to be completely understood. Recently, new insights into this problem have been gained from the analysis of structure graphs of proteins belonging to the legume lectin family. The legume lectins are an interesting family of proteins with very similar tertiary structures but varied quaternary structures. Hence they have become a very good model with which to analyse the role of primary structures in determining the modes of quaternary association. The present review summarizes the results of a legume lectin study as well as those obtained from a similar analysis carried out here on the animal lectins, namely galectins, pentraxins, calnexin, calreticulin and rhesus rotavirus Vp4 sialic-acid-binding domain. The lectin structure graphs have been used to obtain clusters of non-covalently interacting amino acid residues at the intersubunit interfaces. The present study, performed along with traditional sequence alignment methods, has provided the signature sequence motifs for different kinds of quaternary association seen in lectins. Furthermore, the network representation of the lectin oligomers has enabled us to detect the residues which make extensive interactions (‘hubs’) across the oligomeric interfaces that can be targetted for interface-destabilizing mutations. The present review also provides an overview of the methodology involved in representing oligomeric protein structures as connected networks of amino acid residues. Further, it illustrates the potential of such a representation in elucidating the structural determinants of protein–protein association in general and will be of significance to protein chemists and structural biologists.


1980 ◽  
Vol 13 (3) ◽  
pp. 339-386 ◽  
Author(s):  
O. B. Ptitsyn ◽  
A. V. Finkelstein

(A) Evolutionary similarities of protein structures Two decades have passed from the time that the three dimensional structure of the first globular protein, sperm whale myoglobin, was decoded (Kendrew et al. 1960). Its structure, which now looks so simple and habitual, then seemed to be unusually complicated. The decoding of the subsequent proteins, lysozyme (Blake et al. 1965), ribonuclease (Kartha, Bello & Harker, 1967), chymotrypsin (Matthews et al. 1967), carboxypeptidase (Lipscomb et al. 1969) redoubled the feeling of amazement and even of some confusion before the extremely complicated, intricate and, above all, absolutely unlike protein structures. Some consolation against this background was the evident and far-reaching similarity between the three-dimensional structures of myoglobin and hemoglobin subunits (Perutz, Kendrew & Watson, 1965) and an analogous similarity between the structures of chymotrypsin and other serine proteases, elastase (Shotton & Watson, 1970) and trypsin (Stroud, Kay & Dickerson, 1972). However this similarity was easily explained by the far-reaching homology between the primary structures of myoglobin and hemoglobin and between the primary structures of serine proteases.


2000 ◽  
Vol 33 (1) ◽  
pp. 176-183 ◽  
Author(s):  
Guoguang Lu

In order to facilitate the three-dimensional structure comparison of proteins, software for making comparisons and searching for similarities to protein structures in databases has been developed. The program identifies the residues that share similar positions of both main-chain and side-chain atoms between two proteins. The unique functions of the software also include database processingviaInternet- and Web-based servers for different types of users. The developed method and its friendly user interface copes with many of the problems that frequently occur in protein structure comparisons, such as detecting structurally equivalent residues, misalignment caused by coincident match of Cαatoms, circular sequence permutations, tedious repetition of access, maintenance of the most recent database, and inconvenience of user interface. The program is also designed to cooperate with other tools in structural bioinformatics, such as the 3DB Browser software [Prilusky (1998).Protein Data Bank Q. Newslett.84, 3–4] and the SCOP database [Murzin, Brenner, Hubbard & Chothia (1995).J. Mol. Biol.247, 536–540], for convenient molecular modelling and protein structure analysis. A similarity ranking score of `structure diversity' is proposed in order to estimate the evolutionary distance between proteins based on the comparisons of their three-dimensional structures. The function of the program has been utilized as a part of an automated program for multiple protein structure alignment. In this paper, the algorithm of the program and results of systematic tests are presented and discussed.


2004 ◽  
Vol 02 (03) ◽  
pp. 471-495 ◽  
Author(s):  
LUIGI PALOPOLI ◽  
GIORGIO TERRACINA

Predicting the three-dimensional structure of proteins is a difficult task. In the last few years several approaches have been proposed for performing this task taking into account different protein chemical and physical properties. As a result, a growing number of protein structure prediction tools is becoming available, some of them specialized to work on either some aspects of the predictions or on some categories of proteins; however, they are still not sufficiently accurate and reliable for predicting all kinds of proteins. In this context, it is useful to jointly apply different prediction tools and combine their results in order to improve the quality of the predictions. However, several problems have to be solved in order to make this a viable possibility. In this paper a framework and a tool is proposed which allows: (i) definition of a common reference applicative domain for different prediction tools; (ii) characterization of prediction tools through evaluating some quality parameters; (iii) characterization of the performances of a team of predictors jointly applied over a prediction problem; (iv) the singling out of the best team for a prediction problem; and (v) the integration of predictor results in the team in order to obtain a unique prediction. A system implementing the various steps of the proposed framework (CooPPS) has been developed and several experiments for testing the effectiveness of the proposed approach have been carried out.


2021 ◽  
Author(s):  
Vineeth Chelur ◽  
U. Deva Priyakumar

Protein-drug interactions play important roles in many biological processes and therapeutics. Prediction of the active binding site of a protein helps discover and optimise these interactions leading to the design of better ligand molecules. The tertiary structure of a protein determines the binding sites available to the drug molecule. A quick and accurate prediction of the binding site from sequence alone without utilising the three-dimensional structure is challenging. Deep Learning has been used in a variety of biochemical tasks and has been hugely successful. In this paper, a Residual Neural Network (leveraging skip connections) is implemented to predict a protein's most active binding site. An Annotated Database of Druggable Binding Sites from the Protein DataBank, sc-PDB, is used for training the network. Features extracted from the Multiple Sequence Alignments (MSAs) of the protein generated using DeepMSA, such as Position-Specific Scoring Matrix (PSSM), Secondary Structure (SS3), and Relative Solvent Accessibility (RSA), are provided as input to the network. A weighted binary cross-entropy loss function is used to counter the substantial imbalance in the two classes of binding and non-binding residues. The network performs very well on single-chain proteins, providing a pocket that has good interactions with a ligand.


Sign in / Sign up

Export Citation Format

Share Document