scholarly journals Deep geometric representations for modeling effects of mutations on protein-protein binding affinity

2021 ◽  
Vol 17 (8) ◽  
pp. e1009284
Author(s):  
Xianggen Liu ◽  
Yunan Luo ◽  
Pengyong Li ◽  
Sen Song ◽  
Jian Peng

Modeling the impact of amino acid mutations on protein-protein interaction plays a crucial role in protein engineering and drug design. In this study, we develop GeoPPI, a novel structure-based deep-learning framework to predict the change of binding affinity upon mutations. Based on the three-dimensional structure of a protein, GeoPPI first learns a geometric representation that encodes topology features of the protein structure via a self-supervised learning scheme. These representations are then used as features for training gradient-boosting trees to predict the changes of protein-protein binding affinity upon mutations. We find that GeoPPI is able to learn meaningful features that characterize interactions between atoms in protein structures. In addition, through extensive experiments, we show that GeoPPI achieves new state-of-the-art performance in predicting the binding affinity changes upon both single- and multi-point mutations on six benchmark datasets. Moreover, we show that GeoPPI can accurately estimate the difference of binding affinities between a few recently identified SARS-CoV-2 antibodies and the receptor-binding domain (RBD) of the S protein. These results demonstrate the potential of GeoPPI as a powerful and useful computational tool in protein design and engineering. Our code and datasets are available at: https://github.com/Liuxg16/GeoPPI.

2019 ◽  
Author(s):  
Martin Simonovsky ◽  
Joshua Meyers

AbstractMotivationProtein binding site comparison (pocket matching) is of importance in drug discovery. Identification of similar binding sites can help guide efforts for hit finding, understanding polypharmacology and characterization of protein function. The design of pocket matching methods has traditionally involved much intuition, and has employed a broad variety of algorithms and representations of the input protein structures. We regard the high heterogeneity of past work and the recent availability of large-scale benchmarks as an indicator that a data-driven approach may provide a new perspective.ResultsWe propose DeeplyTough, a convolutional neural network that encodes a three-dimensional representation of protein binding sites into descriptor vectors that may be compared efficiently in an alignment-free manner by computing pairwise Euclidean distances. The network is trained with supervision: (i) to provide similar pockets with similar descriptors, (ii) to separate the descriptors of dissimilar pockets by a minimum margin, and (iii) to achieve robustness to nuisance variations. We evaluate our method using three large-scale benchmark datasets, on which it demonstrates excellent performance for held-out data coming from the training distribution and competitive performance when the trained network is required to generalize to datasets constructed independently.Availabilityhttps://github.com/BenevolentAI/[email protected],[email protected]


2020 ◽  
Vol 18 (04) ◽  
pp. 2050019
Author(s):  
K. G. Kulikov ◽  
T. V. Koshlan

A new method has been introduced which allows us to determine the stability of protein complexes with point changes of amino acid residues that also take into account the three-dimensional structure of the complex. This formulated and proven theorem is aimed at determining the criterion for the stability of protein molecules. The algorithm and software package were developed for analyzing protein interactions, taking into account their three-dimensional structure from the PDB database.


2021 ◽  
Author(s):  
Yuan Zhang ◽  
Arunima Mandal ◽  
Kevin Cui ◽  
Xiuwen Liu ◽  
Jinfeng Zhang

We present ProDCoNN-server, a web server for protein sequence design and prediction from a given protein structure. The server is based on a previously developed deep learning model for protein design, ProDCoNN, which achieved state-of-the-art performance when tested on large numbers of test proteins and benchmark datasets. The prediction is very fast compared with other protein sequence prediction servers - it takes only a few minutes for a query protein on average. Two models could be selected for different purposes: BBO for full sequence prediction, extendable for multiple sequence generation, and BBS for single position prediction with the type of other residues known. ProDCoNN-server outputs the predicted sequence and the probability matrix for each amino acid at each predicted residue. The probability matrix can also be visualized as a sequence logos figure (BBO) or probability distribution plot (BBS). The server is available at: https://prodconn.stat.fsu.edu/.


2017 ◽  
Vol 83 (20) ◽  
Author(s):  
Sabino Pacheco ◽  
Isabel Gómez ◽  
Jorge Sánchez ◽  
Blanca-Ines García-Gómez ◽  
Mario Soberón ◽  
...  

ABSTRACT Bacillus thuringiensis three-domain Cry toxins kill insects by forming pores in the apical membrane of larval midgut cells. Oligomerization of the toxin is an important step for pore formation. Domain I helix α-3 participates in toxin oligomerization. Here we identify an intramolecular salt bridge within helix α-3 of Cry4Ba (D111-K115) that is conserved in many members of the family of three-domain Cry toxins. Single point mutations such as D111K or K115D resulted in proteins severely affected in toxicity. These mutants were also altered in oligomerization, and the mutant K115D was more sensitive to protease digestion. The double point mutant with reversed charges, D111K-K115D, recovered both oligomerization and toxicity, suggesting that this salt bridge is highly important for conservation of the structure of helix α-3 and necessary to promote the correct oligomerization of the toxin. IMPORTANCE Domain I has been shown to be involved in oligomerization through helix α-3 in different Cry toxins, and mutations affecting oligomerization also elicit changes in toxicity. The three-dimensional structure of the Cry4Ba toxin reveals an intramolecular salt bridge in helix α-3 of domain I. Mutations that disrupt this salt bridge resulted in changes in Cry4Ba oligomerization and toxicity, while a double point reciprocal mutation that restored the salt bridge resulted in recovery of toxin oligomerization and toxicity. These data highlight the role of oligomer formation as a key step in Cry4Ba toxicity.


2019 ◽  
Author(s):  
Sushant Kumar ◽  
Arif Harmanci ◽  
Jagath Vytheeswaran ◽  
Mark B. Gerstein

AbstractA rapid decline in sequencing cost has made large-scale genome sequencing studies feasible. One of the fundamental goals of these studies is to catalog all pathogenic variants. Numerous methods and tools have been developed to interpret point mutations and small insertions and deletions. However, there is a lack of approaches for identifying pathogenic genomic structural variations (SVs). That said, SVs are known to play a crucial role in many diseases by altering the sequence and three-dimensional structure of the genome. Previous studies have suggested a complex interplay of genomic and epigenomic features in the emergence and distribution of SVs. However, the exact mechanism of pathogenesis for SVs in different diseases is not straightforward to decipher. Thus, we built an agnostic machine-learning-based workflow, called SVFX, to assign a “pathogenicity score” to somatic and germline SVs in various diseases. In particular, we generated somatic and germline training models, which included genomic, epigenomic, and conservation-based features for SV call sets in diseased and healthy individuals. We then applied SVFX to SVs in six different cancer cohorts and a cardiovascular disease (CVD) cohort. Overall, SVFX achieved high accuracy in identifying pathogenic SVs. Moreover, we found that predicted pathogenic SVs in cancer cohorts were enriched among known cancer genes and many cancer-related pathways (including Wnt signaling, Ras signaling, DNA repair, and ubiquitin-mediated proteolysis). Finally, we note that SVFX is flexible and can be easily extended to identify pathogenic SVs in additional disease cohorts.


2021 ◽  
Vol 7 ◽  
Author(s):  
Castrense Savojardo ◽  
Matteo Manfredi ◽  
Pier Luigi Martelli ◽  
Rita Casadio

Solvent accessibility (SASA) is a key feature of proteins for determining their folding and stability. SASA is computed from protein structures with different algorithms, and from protein sequences with machine-learning based approaches trained on solved structures. Here we ask the question as to which extent solvent exposure of residues can be associated to the pathogenicity of the variation. By this, SASA of the wild-type residue acquires a role in the context of functional annotation of protein single-residue variations (SRVs). By mapping variations on a curated database of human protein structures, we found that residues targeted by disease related SRVs are less accessible to solvent than residues involved in polymorphisms. The disease association is not evenly distributed among the different residue types: SRVs targeting glycine, tryptophan, tyrosine, and cysteine are more frequently disease associated than others. For all residues, the proportion of disease related SRVs largely increases when the wild-type residue is buried and decreases when it is exposed. The extent of the increase depends on the residue type. With the aid of an in house developed predictor, based on a deep learning procedure and performing at the state-of-the-art, we are able to confirm the above tendency by analyzing a large data set of residues subjected to variations and occurring in some 12,494 human protein sequences still lacking three-dimensional structure (derived from HUMSAVAR). Our data support the notion that surface accessible area is a distinguished property of residues that undergo variation and that pathogenicity is more frequently associated to the buried property than to the exposed one.


2021 ◽  
Author(s):  
Klara Markova ◽  
Antonin Kunka ◽  
Klaudia Chmelova ◽  
Martin Havlasek ◽  
Petra Babkova ◽  
...  

<p>The functionality of a protein depends on its unique three-dimensional structure, which is a result of the folding process when the nascent polypeptide follows a funnel-like energy landscape to reach a global energy minimum. Computer-encoded algorithms are increasingly employed to stabilize native proteins for use in research and biotechnology applications. Here, we reveal a unique example where the computational stabilization of a monomeric α/β-hydrolase enzyme (<i>T</i><sub>m</sub> = 73.5°C; Δ<i>T</i><sub>m</sub> > 23°C) affected the protein folding energy landscape. Introduction of eleven single-point stabilizing mutations based on force field calculations and evolutionary analysis yielded catalytically active domain-swapped intermediates trapped in local energy minima. Crystallographic structures revealed that these stabilizing mutations target cryptic hinge regions and newly introduced secondary interfaces, where they make extensive non-covalent interactions between the intertwined misfolded protomers. The existence of domain-swapped dimers in a solution is further confirmed experimentally by data obtained from SAXS and crosslinking mass spectrometry. Unfolding experiments showed that the domain-swapped dimers can be irreversibly converted into native-like monomers, suggesting that the domain-swapping occurs exclusively <i>in vivo</i>. Our findings uncovered hidden protein-folding consequences of computational protein design, which need to be taken into account when applying a rational stabilization to proteins of biological and pharmaceutical interest.</p>


2021 ◽  
Author(s):  
Klara Markova ◽  
Antonin Kunka ◽  
Klaudia Chmelova ◽  
Martin Havlasek ◽  
Petra Babkova ◽  
...  

<p>The functionality of a protein depends on its unique three-dimensional structure, which is a result of the folding process when the nascent polypeptide follows a funnel-like energy landscape to reach a global energy minimum. Computer-encoded algorithms are increasingly employed to stabilize native proteins for use in research and biotechnology applications. Here, we reveal a unique example where the computational stabilization of a monomeric α/β-hydrolase enzyme (<i>T</i><sub>m</sub> = 73.5°C; Δ<i>T</i><sub>m</sub> > 23°C) affected the protein folding energy landscape. Introduction of eleven single-point stabilizing mutations based on force field calculations and evolutionary analysis yielded catalytically active domain-swapped intermediates trapped in local energy minima. Crystallographic structures revealed that these stabilizing mutations target cryptic hinge regions and newly introduced secondary interfaces, where they make extensive non-covalent interactions between the intertwined misfolded protomers. The existence of domain-swapped dimers in a solution is further confirmed experimentally by data obtained from SAXS and crosslinking mass spectrometry. Unfolding experiments showed that the domain-swapped dimers can be irreversibly converted into native-like monomers, suggesting that the domain-swapping occurs exclusively <i>in vivo</i>. Our findings uncovered hidden protein-folding consequences of computational protein design, which need to be taken into account when applying a rational stabilization to proteins of biological and pharmaceutical interest.</p>


Author(s):  
Arun G. Ingale

To predict the structure of protein from a primary amino acid sequence is computationally difficult. An investigation of the methods and algorithms used to predict protein structure and a thorough knowledge of the function and structure of proteins are critical for the advancement of biology and the life sciences as well as the development of better drugs, higher-yield crops, and even synthetic bio-fuels. To that end, this chapter sheds light on the methods used for protein structure prediction. This chapter covers the applications of modeled protein structures and unravels the relationship between pure sequence information and three-dimensional structure, which continues to be one of the greatest challenges in molecular biology. With this resource, it presents an all-encompassing examination of the problems, methods, tools, servers, databases, and applications of protein structure prediction, giving unique insight into the future applications of the modeled protein structures. In this chapter, current protein structure prediction methods are reviewed for a milieu on structure prediction, the prediction of structural fundamentals, tertiary structure prediction, and functional imminent. The basic ideas and advances of these directions are discussed in detail.


2019 ◽  
Vol 52 (6) ◽  
pp. 1422-1426
Author(s):  
Rajendran Santhosh ◽  
Namrata Bankoti ◽  
Adgonda Malgonnavar Padmashri ◽  
Daliah Michael ◽  
Jeyaraman Jeyakanthan ◽  
...  

Missing regions in protein crystal structures are those regions that cannot be resolved, mainly owing to poor electron density (if the three-dimensional structure was solved using X-ray crystallography). These missing regions are known to have high B factors and could represent loops with a possibility of being part of an active site of the protein molecule. Thus, they are likely to provide valuable information and play a crucial role in the design of inhibitors and drugs and in protein structure analysis. In view of this, an online database, Missing Regions in Polypeptide Chains (MRPC), has been developed which provides information about the missing regions in protein structures available in the Protein Data Bank. In addition, the new database has an option for users to obtain the above data for non-homologous protein structures (25 and 90%). A user-friendly graphical interface with various options has been incorporated, with a provision to view the three-dimensional structure of the protein along with the missing regions using JSmol. The MRPC database is updated regularly (currently once every three months) and can be accessed freely at the URL http://cluster.physics.iisc.ac.in/mrpc.


Sign in / Sign up

Export Citation Format

Share Document