Accurate prediction of residue-residue contacts across homo-oligomeric protein interfaces through deep leaning

Mapping Intimacies ◽

10.1101/2020.09.13.295196 ◽

2020 ◽

Author(s):

Yumeng Yan ◽

Sheng-You Huang

Keyword(s):

Deep Learning ◽

Protein Interactions ◽

Structure Prediction ◽

High Accuracy ◽

Protein Protein Interactions ◽

Sequence Alignments ◽

Contact Prediction ◽

Protein Interfaces ◽

Residue Contacts ◽

Oligomeric Protein

AbstractProtein-protein interactions play a fundamental role in all cellular processes. Therefore, determining the structure of protein-protein complexes is crucial to understand their molecular mechanisms and develop drugs targeting the protein-protein interactions. Recently, deep learning has led to a breakthrough in intraprotein contact prediction, achieving an unusual high accuracy in recent CASP structure prediction challenges. However, due to the limited number of known homologous protein-protein interactions and the challenge to generate joint multiple sequence alignments (MSA) of two interacting proteins, the advances in inter-protein contact prediction remain limited. Here, we have proposed a deep learning model to predict inter-protein residue-residue contacts across homo-oligomeric protein interfaces, named as DeepHomo, by integrating evolutionary coupling, sequence conservation, distance map, docking pattern, and physic-chemical information of monomers. DeepHomo was extensively tested on both experimentally determined structures and realistic CASP-CAPRI targets. It was shown that DeepHomo achieved a high accuracy of >60% for the top predicted contact and outperformed state-of-the-art direct-coupling analysis (DCA) and machine learning (ML)-based approaches. Integrating predicted contacts into protein docking with blindly predicted monomer structures also significantly improved the docking accuracy. The present study demonstrated the success of DeepHomo in inter-protein contact prediction. It is anticipated that DeepHomo will have a far-reaching implication in the inter-protein contact and structure prediction for protein-protein interactions.

Download Full-text

Conservation of coevolving protein interfaces bridges prokaryote–eukaryote homologies in the twilight zone

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1611861114 ◽

2016 ◽

Vol 113 (52) ◽

pp. 15018-15023 ◽

Cited By ~ 24

Author(s):

Juan Rodriguez-Rivas ◽

Simone Marsili ◽

David Juan ◽

Alfonso Valencia

Keyword(s):

Protein Interactions ◽

Protein Complexes ◽

Accurate Information ◽

Twilight Zone ◽

Sequence Information ◽

Protein Protein Interactions ◽

Sequence Alignments ◽

Multiple Sequence ◽

Protein Interfaces ◽

Recent Developments

Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.

Download Full-text

Evolutionary couplings detect side-chain interactions

10.1101/447409 ◽

2018 ◽

Cited By ~ 1

Author(s):

Adam J. Hockenberry ◽

Claus O. Wilke

Keyword(s):

Protein Interactions ◽

De Novo ◽

Protein Structures ◽

Side Chain ◽

Protein Protein Interactions ◽

Sequence Alignments ◽

Practical Applications ◽

Residue Contacts ◽

Coupling Algorithms ◽

Evolutionary Coupling

Patterns of amino acid covariation in large protein sequence alignments can inform the prediction of de novo protein structures, binding interfaces, and mutational effects. While algorithms that detect these so-called evolutionary couplings between residues have proven useful for practical applications, less is known about how and why these methods perform so well, and what insights into biological processes can be gained from their application. Evolutionary coupling algorithms are commonly benchmarked by comparison to true structural contacts derived from solved protein structures. However, the methods used to determine true structural contacts are not standardized and different definitions of structural contacts may have important consequences for interpreting the results from evolutionary coupling analyses and understanding their overall utility. Here, we show that evolutionary coupling analyses are significantly more likely to identify structural contacts between side-chain atoms than between backbone atoms. We use both simulations and empirical analyses to highlight that purely backbone-based definitions of true residue–residue contacts (i.e., based on the distance between Cα atoms) may underestimate the accuracy of evolutionary coupling algorithms by as much as 40% and that a commonly used reference point (Cβ atoms) underestimates the accuracy by 10–15%. These findings show that co-evolutionary outcomes differ according to which atoms participate in residue–residue interactions and suggest that accounting for different interaction types may lead to further improvements to contact-prediction methods.Significance StatementEvolutionary couplings between residues within a protein can provide valuable information about protein structures, protein-protein interactions, and the mutability of individual residues. However, the mechanistic factors that determine whether two residues will co-evolve remains unknown. We show that structural proximity by itself is not sufficient for co-evolution to occur between residues. Rather, evolutionary couplings between residues are specifically governed by interactions between side-chain atoms. By contrast, intramolecular contacts between atoms in the protein backbone display only a weak signature of evolutionary coupling. These findings highlight that different types of stabilizing contacts exist within protein structures and that these types have a differential impact on the evolution of protein structures that should be considered in co-evolutionary applications.

Download Full-text

AttentiveDist: Protein Inter-Residue Distance Prediction Using Deep Learning with Attention on Quadruple Multiple Sequence Alignments

10.1101/2020.11.24.396770 ◽

2020 ◽

Author(s):

Aashish Jain ◽

Genki Terashi ◽

Yuki Kagaya ◽

Sai Raghavendra Maddhuri Venkata Subramaniya ◽

Charles Christoffer ◽

...

Keyword(s):

Deep Learning ◽

Structure Prediction ◽

Prediction Models ◽

3D Structure ◽

Evolutionary Information ◽

Sequence Alignments ◽

Multiple Sequence ◽

Contact Prediction ◽

Multiple Sequence Alignments ◽

Distance Prediction

ABSTRACTProtein 3D structure prediction has advanced significantly in recent years due to improving contact prediction accuracy. This improvement has been largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). In this work we present AttentiveDist, a novel approach that uses different MSAs generated with different E-values in a single model to increase the co-evolutionary information provided to the model. To determine the importance of each MSA’s feature at the inter-residue level, we added an attention layer to the deep neural network. The model is trained in a multi-task fashion to also predict backbone and orientation angles further improving the inter-residue distance prediction. We show that AttentiveDist outperforms the top methods for contact prediction in the CASP13 structure prediction competition. To aid in structure modeling we also developed two new deep learning-based sidechain center distance and peptide-bond nitrogen-oxygen distance prediction models. Together these led to a 12% increase in TM-score from the best server method in CASP13 for structure prediction.

Download Full-text

Deep graph learning of inter-protein contacts

10.1101/2021.08.14.456342 ◽

2021 ◽

Author(s):

Ziwei Xie ◽

Jinbo Xu

Keyword(s):

Deep Learning ◽

Protein Interactions ◽

Language Model ◽

Learning Method ◽

Invariant Representation ◽

Sequence Alignments ◽

Multiple Sequence ◽

Contact Prediction ◽

Graph Learning ◽

Interfacial Contact

Motivation: Inter-protein (interfacial) contact prediction is very useful for in silico structural characterization of protein-protein interactions. Although deep learning has been applied to this problem, its accuracy is not as good as intra-protein contact prediction. Results: We propose a new deep learning method GLINTER (Graph Learning of INTER-protein contacts) for interfacial contact prediction of dimers, leveraging a rotational invariant representation of protein tertiary structures and a pretrained language model of multiple sequence alignments (MSAs). Tested on the 13th and 14th CASP-CAPRI datasets, the average top L/10 precision achieved by GLINTER is 54.35% on the homodimers and 51.56% on all the dimers, much higher than 30.43% obtained by the latest deep learning method DeepHomo on the homodimers and 14.69% obtained by BIPSPI on all the dimers. Our experiments show that GLINTER-predicted contacts help improve selection of docking decoys.

Download Full-text

A deep dilated convolutional residual network for predicting interchain contacts of protein homodimers

10.1101/2021.09.19.460941 ◽

2021 ◽

Author(s):

Raj Shekhor Roy ◽

Farhan Quadir ◽

Elham Soltanikazemi ◽

Jianlin Cheng

Keyword(s):

Deep Learning ◽

Tertiary Structure ◽

Quaternary Structure ◽

High Accuracy ◽

Residual Network ◽

Sequence Alignments ◽

Learning Methods ◽

Tertiary Structures ◽

Residue Contacts ◽

Contact Predictions

Deep learning has revolutionized protein tertiary structure prediction recently. The cutting-edge deep learning methods such as AlphaFold can predict high-accuracy tertiary structures for most individual protein chains. However, the accuracy of predicting quaternary structures of protein complexes consisting of multiple chains is still relatively low due to lack of advanced deep learning methods in the field. Because interchain residue-residue contacts can be used as distance restraints to guide quaternary structure modeling, here we develop a deep dilated convolutional residual network method (DRCon) to predict interchain residue-residue contacts in homodimers from residue-residue co-evolutionary signals derived from multiple sequence alignments of monomers, intrachain residue-residue contacts of monomers extracted from true/predicted tertiary structures or predicted by deep learning, and other sequence and structural features. Tested on three homodimer test datasets (Homo_std dataset, DeepHomo dataset, and CASP14-CAPRI dataset), the precision of DRCon for top L/5 interchain contact predictions (L: length of monomer in a homodimer) is 43.46%, 47.15%, and 24.81% respectively, which is substantially better than two existing deep learning interchain contact prediction methods. Moreover, our experiments demonstrate that using predicted tertiary structure or intrachain contacts of monomers in the unbound state as input, DRCon still performs reasonably well, even though its accuracy is lower than when true tertiary structures in the bound state are used as input. Finally, our case study shows that good interchain contact predictions can be used to build high-accuracy quaternary structure models of homodimers.

Download Full-text

Conservation of co-evolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone

10.1101/067587 ◽

2016 ◽

Author(s):

Juan Rodriguez-Rivas ◽

Simone Marsili ◽

David Juan ◽

Alfonso Valencia

Keyword(s):

Protein Interactions ◽

Homology Modelling ◽

Accurate Information ◽

Sequence Information ◽

Evolutionary Analysis ◽

Interacting Proteins ◽

Protein Protein Interactions ◽

Sequence Alignments ◽

Multiple Sequence ◽

Protein Interfaces

AbstractProtein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue co-evolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that co-evolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a novel domain-centred protocol to study the interplay between residue co-evolution and structural conservation of protein-protein interfaces. We show that sequence-based co-evolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence, where standard homology modelling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic co-evolutionary analysis to the prediction of eukaryotic interfaces further illustrates the potential of this novel approach.Significance statementInteracting proteins tend to co-evolve through interdependent changes at the interaction interface. This phenomenon leads to patterns of coordinated mutations that can be exploited to systematically predict contacts between interacting proteins in prokaryotes. We explore the hypothesis that co-evolving contacts at protein interfaces are preferentially conserved through long evolutionary periods. We demonstrate that co-evolving residues in prokaryotes identify inter-protein contacts that are particularly well conserved in the corresponding structure of their eukaryotic homologues. Therefore, these contacts have likely been important to maintain protein-protein interactions during evolution. We show that this property can be used to reliably predict interacting residues between eukaryotic proteins with homologues in prokaryotes even if they are very distantly related in sequence.

Download Full-text

Opportunities and limitations in applying coevolution-derived contacts to protein structure prediction

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2014-0013 ◽

2014 ◽

Vol 10 (4) ◽

Cited By ~ 4

Author(s):

Stuart Tetchner ◽

Tomasz Kosciolek ◽

David T. Jones

Keyword(s):

Protein Interactions ◽

Structure Prediction ◽

Tertiary Structure ◽

Protein Structures ◽

Initial Assessment ◽

Protein Protein Interactions ◽

Sequence Alignments ◽

Data Dependencies ◽

Long Time ◽

Alternative Approaches

AbstractThe prospect of identifying contacts in protein structures purely from aligned protein sequences has lured researchers for a long time, but progress has been modest until recently. Here, we reviewed the most successful methods for identifying structural contacts from sequence and how these methods differ and made an initial assessment of the overlap of predicted contacts by alternative approaches. We then discussed the limitations of these methods and possibilities for future development and highlighted the recent applications of contacts in tertiary structure prediction, identifying the residues at the interfaces of protein-protein interactions, and the use of these methods in disentangling alternative conformational states. Finally, we identified the current challenges in the field of contact prediction, concentrating on the limitations imposed by available data, dependencies on the sequence alignments, and possible future developments.

Download Full-text

A Reproducibility Analysis-based Statistical Framework for Residue-Residue Evolutionary Coupling Detection

10.1101/2021.02.01.429092 ◽

2021 ◽

Author(s):

Yunda Si ◽

Chengfei Yan

Keyword(s):

Protein Interactions ◽

Rna Structure ◽

Protein Protein Interactions ◽

Multiple Sequence ◽

Contact Prediction ◽

Statistical Framework ◽

Residue Contacts ◽

Direct Coupling Analysis ◽

General Statistical ◽

Evolutionary Coupling

AbstractDirect coupling analysis (DCA) has been widely used to predict residue-residue contacts to assist protein/RNA structure and interaction prediction. However, effectively selecting residue pairs for contact prediction according to the result of DCA is a non-trivial task, since the number of highly predictive residue pairs and the coupling scores obtained from DCA are highly dependent on the number and the length of the homologous sequences forming the multiple sequence alignment, the detailed settings of the DCA algorithm, the functional characteristics of the macromolecule, etc. In this study, we present a general statistical framework for selecting predictive residue pairs through significant evolutionary coupling detection, referred to as IDR-DCA, which is based on reproducibility analysis of the coupling scores from replicated DCA. IDR-DCA was applied to select residue pairs for contact prediction for 150 proteins, 30 protein-protein interactions and 36 RNAs, in which we applied three widely used DCA software to perform the DCA. We show that with the application of IDR-DCA, the predictive residue pairs can be effectively selected through a universal threshold independent on the DCA software.

Download Full-text

Deep Learning in the Study of Protein-Related Interactions

Protein and Peptide Letters ◽

10.2174/0929866526666190723114142 ◽

2020 ◽

Vol 27 (5) ◽

pp. 359-369 ◽

Cited By ~ 1

Author(s):

Cheng Shi ◽

Jiaxing Chen ◽

Xinyue Kang ◽

Guiling Zhao ◽

Xingzhen Lao ◽

...

Keyword(s):

Deep Learning ◽

Protein Interactions ◽

Physiological Data ◽

Great Promise ◽

Complex Data ◽

Protein Protein Interactions ◽

Learning Patterns ◽

Introductory Overview ◽

Protein Research ◽

Neural Network Theory

: Protein-related interaction prediction is critical to understanding life processes, biological functions, and mechanisms of drug action. Experimental methods used to determine proteinrelated interactions have always been costly and inefficient. In recent years, advances in biological and medical technology have provided us with explosive biological and physiological data, and deep learning-based algorithms have shown great promise in extracting features and learning patterns from complex data. At present, deep learning in protein research has emerged. In this review, we provide an introductory overview of the deep neural network theory and its unique properties. Mainly focused on the application of this technology in protein-related interactions prediction over the past five years, including protein-protein interactions prediction, protein-RNA\DNA, Protein– drug interactions prediction, and others. Finally, we discuss some of the challenges that deep learning currently faces.

Download Full-text

Short Linear Motifs Characterizing Snake Venom and Mammalian Phospholipases A2

Toxins ◽

10.3390/toxins13040290 ◽

2021 ◽

Vol 13 (4) ◽

pp. 290

Author(s):

Caterina Peggion ◽

Fiorella Tonello

Keyword(s):

Snake Venom ◽

Protein Interactions ◽

Biological Properties ◽

Protein Protein Interactions ◽

Sequence Alignments ◽

Toxic Activity ◽

Short Linear Motifs ◽

Phospholipases A2 ◽

Group I ◽

Linear Motifs

Snake venom phospholipases A2 (PLA2s) have sequences and structures very similar to those of mammalian group I and II secretory PLA2s, but they possess many toxic properties, ranging from the inhibition of coagulation to the blockage of nerve transmission, and the induction of muscle necrosis. The biological properties of these proteins are not only due to their enzymatic activity, but also to protein–protein interactions which are still unidentified. Here, we compare sequence alignments of snake venom and mammalian PLA2s, grouped according to their structure and biological activity, looking for differences that can justify their different behavior. This bioinformatics analysis has evidenced three distinct regions, two central and one C-terminal, having amino acid compositions that distinguish the different categories of PLA2s. In these regions, we identified short linear motifs (SLiMs), peptide modules involved in protein–protein interactions, conserved in mammalian and not in snake venom PLA2s, or vice versa. The different content in the SLiMs of snake venom with respect to mammalian PLA2s may result in the formation of protein membrane complexes having a toxic activity, or in the formation of complexes whose activity cannot be blocked due to the lack of switches in the toxic PLA2s, as the motif recognized by the prolyl isomerase Pin1.

Download Full-text