Harnessing protein folding neural networks for peptide–protein docking

AbstractHighly accurate protein structure predictions by deep neural networks such as AlphaFold2 and RoseTTAFold have tremendous impact on structural biology and beyond. Here, we show that, although these deep learning approaches have originally been developed for the in silico folding of protein monomers, AlphaFold2 also enables quick and accurate modeling of peptide–protein interactions. Our simple implementation of AlphaFold2 generates peptide–protein complex models without requiring multiple sequence alignment information for the peptide partner, and can handle binding-induced conformational changes of the receptor. We explore what AlphaFold2 has memorized and learned, and describe specific examples that highlight differences compared to state-of-the-art peptide docking protocol PIPER-FlexPepDock. These results show that AlphaFold2 holds great promise for providing structural insight into a wide range of peptide–protein complexes, serving as a starting point for the detailed characterization and manipulation of these interactions.

Download Full-text

Harnessing protein folding neural networks for peptide-protein docking

10.1101/2021.08.01.454656 ◽

2021 ◽

Author(s):

Tomer Tsaban ◽

Julia K Varga ◽

Orly Avraham ◽

Ziv Ben Aharon ◽

Alisa Khramushin ◽

...

Keyword(s):

Neural Networks ◽

Protein Interactions ◽

Protein Complexes ◽

Protein Docking ◽

Peptide Sequence ◽

Multiple Sequence ◽

C Terminus ◽

Docking Protocol ◽

Wide Range ◽

Simple Implementation

Highly accurate protein structure predictions by the recently published deep neural networks such as AlphaFold2 and RoseTTAFold are truly impressive achievements, and will have a tremendous impact far beyond structural biology. If peptide-protein binding can be seen as a final complementing step in the folding of a protein monomer, we reasoned that these approaches might be applicable to the modeling of such interactions. We present a simple implementation of AlphaFold2 to model the structure of peptide-protein interactions, enabled by linking the peptide sequence to the protein c-terminus via a poly glycine linker. We show on a large non-redundant set of 162 peptide-protein complexes that peptide-protein interactions can indeed be modeled accurately. Importantly, prediction is fast and works without multiple sequence alignment information for the peptide partner. We compare performance on a smaller, representative set to the state-of-the-art peptide docking protocol PIPER-FlexPepDock, and describe in detail specific examples that highlight advantages of the two approaches, pointing to possible further improvements and insights in the modeling of peptide-protein interactions. Peptide-mediated interactions play important regulatory roles in functional cells. Thus the present advance holds much promise for significant impact, by bringing into reach a wide range of peptide-protein complexes, and providing important starting points for detailed study and manipulation of many specific interactions.

Download Full-text

Harnessing protein folding neural networks for peptide-protein docking

10.21203/rs.3.rs-781411/v1 ◽

2021 ◽

Author(s):

Tomer Tsaban ◽

Julia Varga ◽

Orly Avraham ◽

Ziv Ben-Aharon ◽

Alisa Khramushin ◽

...

Keyword(s):

Neural Networks ◽

Protein Interactions ◽

Protein Complexes ◽

Protein Docking ◽

Peptide Sequence ◽

Multiple Sequence ◽

C Terminus ◽

Docking Protocol ◽

Wide Range ◽

Simple Implementation

Abstract Highly accurate protein structure predictions by the recently published deep neural networks such as AlphaFold2 and RoseTTAFold are truly impressive achievements, and will have a tremendous impact far beyond structural biology. If peptide-protein binding can be seen as a final complementing step in the folding of a protein monomer, we reasoned that these approaches might be applicable to the modeling of such interactions. We present a simple implementation of AlphaFold2 to model the structure of peptide-protein interactions, enabled by linking the peptide sequence to the protein c-terminus via a poly glycine linker. We show on a large non-redundant set of 162 peptide-protein complexes that peptide-protein interactions can indeed be modeled accurately. Importantly, prediction is fast and works without multiple sequence alignment information for the peptide partner. We compare performance on a smaller, representative set to the state-of-the-art peptide docking protocol PIPER-FlexPepDock, and describe in detail specific examples that highlight advantages of the two approaches, pointing to possible further improvements and insights in the modeling of peptide-protein interactions. Peptide-mediated interactions play important regulatory roles in functional cells. Thus the present advance holds much promise for significant impact, by bringing into reach a wide range of peptide-protein complexes, and providing important starting points for detailed study and manipulation of many specific interactions.

Download Full-text

Text mining for modeling of protein complexes enhanced by machine learning

Bioinformatics ◽

10.1093/bioinformatics/btaa823 ◽

2020 ◽

Author(s):

Varsha D Badal ◽

Petras J Kundrotas ◽

Ilya A Vakser

Keyword(s):

Machine Learning ◽

Text Mining ◽

Protein Interactions ◽

Full Text ◽

Protein Complexes ◽

Protein Docking ◽

Supplementary Information ◽

Support Vector ◽

Learning Approaches ◽

Protein Protein Interactions

Abstract Motivation Procedures for structural modeling of protein-protein complexes (protein docking) produce a number of models which need to be further analyzed and scored. Scoring can be based on independently determined constraints on the structure of the complex, such as knowledge of amino acids essential for the protein interaction. Previously, we showed that text mining of residues in freely available PubMed abstracts of papers on studies of protein-protein interactions may generate such constraints. However, absence of post-processing of the spotted residues reduced usability of the constraints, as a significant number of the residues were not relevant for the binding of the specific proteins. Results We explored filtering of the irrelevant residues by two machine learning approaches, Deep Recursive Neural Network (DRNN) and Support Vector Machine (SVM) models with different training/testing schemes. The results showed that the DRNN model is superior to the SVM model when training is performed on the PMC-OA full-text articles and applied to classification (interface or non-interface) of the residues spotted in the PubMed abstracts. When both training and testing is performed on full-text articles or on abstracts, the performance of these models is similar. Thus, in such cases, there is no need to utilize computationally demanding DRNN approach, which is computationally expensive especially at the training stage. The reason is that SVM success is often determined by the similarity in data/text patterns in the training and the testing sets, whereas the sentence structures in the abstracts are, in general, different from those in the full text articles. Availability The code and the datasets generated in this study are available at https://gitlab.ku.edu/vakser-lab-public/text-mining/-/tree/2020-09-04. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Improving Protein Docking with Constraint Programming and Coevolution Data

10.1101/002329 ◽

2014 ◽

Author(s):

Ludwig Krippahl ◽

Fábio Madeira

Keyword(s):

Protein Interactions ◽

Constraint Programming ◽

Protein Complexes ◽

Protein Sequences ◽

Protein Docking ◽

Statistical Parameters ◽

Sequence Alignments ◽

Multiple Sequence ◽

Simplified Approach ◽

Multiple Sequence Alignments

Background: Constraint programming (CP) is usually seen as a rigid approach, focusing on crisp, precise, distinctions between what is allowed as a solution and what is not. At first sight, this makes it seem inadequate for bioinformatics applications that rely mostly on statistical parameters and optimisation. The prediction of protein interactions, or protein docking, is one such application. And this apparent problem with CP is particularly evident when constraints are provided by noisy data, as it is the case when using the statistical analysis of Multiple Sequence Alignments (MSA) to extract coevolution information. The goal of this paper is to show that this first impression is misleading and that CP is a useful technique for improving protein docking even with data as vague and noisy as the coevolution indicators that can be inferred from MSA. Results: Here we focus on the study of two protein complexes. In one case we used a simplified estimator of interaction propensity to infer a set of five candidate residues for the interface and used that set to constrain the docking models. Even with this simplified approach and considering only the interface of one of the partners, there is a visible focusing of the models around the correct configuration. Considering a set of 400 models with the best geometric contacts, this constraint increases the number of models close to the target (RMSD ¡5Å) from 2 to 5 and decreases the RMSD of all retained models from 26Å to 17.5Å. For the other example we used a more standard estimate of coevolving residues, from the Co-Evolution Analysis using Protein Sequences (CAPS) software. Using a group of three residues identified from the sequence alignment as potentially co-evolving to constrain the search, the number of complexes similar to the target among the 50 highest scoring docking models increased from 3 in the unconstrained docking to 30 in the constrained docking. Conclusions: Although only a proof-of-concept application, our results show that, with suitably designed constraints, CP allows us to integrate coevolution data, which can be inferred from databases of protein sequences, even though the data is noisy and often fuzzy, with no well-defined discontinuities. This also shows, more generally, that CP in bioinformatics needs not be limited to the more crisp cases of finite domains and explicit rules but can also be applied to a broader range of problems that depend on statistical measurements and continuous data.

Download Full-text

Predicting direct physical interactions in multimeric proteins with deep learning

10.1101/2021.11.09.467949 ◽

2021 ◽

Author(s):

Mu Gao ◽

Davi Nakajima An ◽

Jerry M Parks ◽

Jeffrey Skolnick

Keyword(s):

Protein Interactions ◽

Protein Complexes ◽

Network Models ◽

Protein Docking ◽

Protein Protein Interactions ◽

Neural Network Models ◽

Sequence Alignments ◽

Multiple Sequence ◽

Atomic Structures ◽

Cytochrome C Biogenesis

Accurate descriptions of protein-protein interactions are essential for understanding biological systems. Very recently, AlphaFold2 has been shown to be remarkably accurate for predicting the atomic structures of individual proteins. Here, we demonstrate that the same neural network models developed for AlphaFold2 can be adapted to predict the structures of multimeric protein complexes without retraining. In contrast to common approaches that require paired multiple sequence alignments, our method, AF2Complex, works without using such paired alignments. It achieves higher accuracy than complex strategies that combine AlphaFold2 and protein-protein docking. New metrics are then introduced for predicting direct protein-protein interactions between arbitrary protein pairs. The approach is successfully validated on some challenging CASP14 multimeric targets, a small but appropriate benchmark set, and the E. coli proteome. Lastly, using the cytochrome c biogenesis system as an example, we present high-confidence models of three sought-after assemblies formed by eight members of this system.

Download Full-text

Mass spectrometry-based cross-linking study shows that the Psb28 protein binds to cytochrome b559 in Photosystem II

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1620360114 ◽

2017 ◽

Vol 114 (9) ◽

pp. 2224-2229 ◽

Cited By ~ 18

Author(s):

Daniel A. Weisz ◽

Haijun Liu ◽

Hao Zhang ◽

Sundarapandian Thangapandian ◽

Emad Tajkhorshid ◽

...

Keyword(s):

Mass Spectrometry ◽

Photosystem Ii ◽

Protein Interactions ◽

Protein Complexes ◽

Protein Docking ◽

Cross Linking ◽

Protein Protein Interactions ◽

Cytochrome B559 ◽

Reaction Center Complex ◽

Assembly Intermediate

Photosystem II (PSII), a large pigment protein complex, undergoes rapid turnover under natural conditions. During assembly of PSII, oxidative damage to vulnerable assembly intermediate complexes must be prevented. Psb28, the only cytoplasmic extrinsic protein in PSII, protects the RC47 assembly intermediate of PSII and assists its efficient conversion into functional PSII. Its role is particularly important under stress conditions when PSII damage occurs frequently. Psb28 is not found, however, in any PSII crystal structure, and its structural location has remained unknown. In this study, we used chemical cross-linking combined with mass spectrometry to capture the transient interaction of Psb28 with PSII. We detected three cross-links between Psb28 and the α- and β-subunits of cytochrome b559, an essential component of the PSII reaction-center complex. These distance restraints enable us to position Psb28 on the cytosolic surface of PSII directly above cytochrome b559, in close proximity to the QB site. Protein–protein docking results also support Psb28 binding in this region. Determination of the Psb28 binding site and other biochemical evidence allow us to propose a mechanism by which Psb28 exerts its protective effect on the RC47 intermediate. This study also shows that isotope-encoded cross-linking with the “mass tags” selection criteria allows confident identification of more cross-linked peptides in PSII than has been previously reported. This approach thus holds promise to identify other transient protein–protein interactions in membrane protein complexes.

Download Full-text

Metallocluster transactions: dynamic protein interactions guide the biosynthesis of Fe–S clusters in bacteria

Biochemical Society Transactions ◽

10.1042/bst20180365 ◽

2018 ◽

Vol 46 (6) ◽

pp. 1593-1603 ◽

Cited By ~ 9

Author(s):

Chenkang Zheng ◽

Patricia C. Dos Santos

Keyword(s):

Protein Interactions ◽

Protein Complexes ◽

Specific Protein ◽

Biological Synthesis ◽

Cluster Assembly ◽

Sequence Motifs ◽

Protein Protein Interactions ◽

Cysteine Desulfurase ◽

Wide Range ◽

Domains Of Life

Iron–sulfur (Fe–S) clusters are ubiquitous cofactors present in all domains of life. The chemistries catalyzed by these inorganic cofactors are diverse and their associated enzymes are involved in many cellular processes. Despite the wide range of structures reported for Fe–S clusters inserted into proteins, the biological synthesis of all Fe–S clusters starts with the assembly of simple units of 2Fe–2S and 4Fe–4S clusters. Several systems have been associated with the formation of Fe–S clusters in bacteria with varying phylogenetic origins and number of biosynthetic and regulatory components. All systems, however, construct Fe–S clusters through a similar biosynthetic scheme involving three main steps: (1) sulfur activation by a cysteine desulfurase, (2) cluster assembly by a scaffold protein, and (3) guided delivery of Fe–S units to either final acceptors or biosynthetic enzymes involved in the formation of complex metalloclusters. Another unifying feature on the biological formation of Fe–S clusters in bacteria is that these systems are tightly regulated by a network of protein interactions. Thus, the formation of transient protein complexes among biosynthetic components allows for the direct transfer of reactive sulfur and Fe–S intermediates preventing oxygen damage and reactions with non-physiological targets. Recent studies revealed the importance of reciprocal signature sequence motifs that enable specific protein–protein interactions and consequently guide the transactions between physiological donors and acceptors. Such findings provide insights into strategies used by bacteria to regulate the flow of reactive intermediates and provide protein barcodes to uncover yet-unidentified cellular components involved in Fe–S metabolism.

Download Full-text

Protein-protein docking using learned three-dimensional representations

10.1101/738690 ◽

2019 ◽

Cited By ~ 1

Author(s):

Georgy Derevyanko ◽

Guillaume Lamoureux

Keyword(s):

Protein Interactions ◽

Network Architecture ◽

Protein Complexes ◽

Three Dimensional ◽

Spatial Arrangement ◽

Protein Docking ◽

Protein Protein Interactions ◽

Translational Invariance ◽

Shape Complementarity ◽

Spatial Features

AbstractProtein-protein interactions are determined by a number of hard-to-capture features related to shape complementarity, electrostatics, and hydrophobicity. These features may be intrinsic to the protein or induced by the presence of a partner. A conventional approach to protein-protein docking consists in engineering a small number of spatial features for each protein, and in minimizing the sum of their correlations with respect to the spatial arrangement of the two proteins. To generalize this approach, we introduce a deep neural network architecture that transforms the raw atomic densities of each protein into complex three-dimensional representations. Each point in the volume containing the protein is described by 48 learned features, which are correlated and combined with the features of a second protein to produce a score dependent on the relative position and orientation of the two proteins. The architecture is based on multiple layers of SE(3)-equivariant convolutional neural networks, which provide built-in rotational and translational invariance of the score with respect to the structure of the complex. The model is trained end-to-end on a set of decoy conformations generated from 851 nonredundant protein-protein complexes and is tested on data from the Protein-Protein Docking Benchmark Version 4.0.

Download Full-text

Dissecting the conformation of glycans and their interactions with proteins

Journal of Biomedical Science ◽

10.1186/s12929-020-00684-5 ◽

2020 ◽

Vol 27 (1) ◽

Author(s):

Sheng-Hung Wang ◽

Tsai-Jung Wu ◽

Chien-Wei Lee ◽

John Yu

Keyword(s):

Conformational Changes ◽

Molecular Interactions ◽

Protein Interactions ◽

Protein Complexes ◽

Unmet Need ◽

Protein Molecule ◽

Structural Basis ◽

Glycan Structure ◽

Chronic Obstructive ◽

Technology Platforms

Abstract The use of in silico strategies to develop the structural basis for a rational optimization of glycan-protein interactions remains a great challenge. This problem derives, in part, from the lack of technologies to quantitatively and qualitatively assess the complex assembling between a glycan and the targeted protein molecule. Since there is an unmet need for developing new sugar-targeted therapeutics, many investigators are searching for technology platforms to elucidate various types of molecular interactions within glycan-protein complexes and aid in the development of glycan-targeted therapies. Here we discuss three important technology platforms commonly used in the assessment of the complex assembly of glycosylated biomolecules, such as glycoproteins or glycosphingolipids: Biacore analysis, molecular docking, and molecular dynamics simulations. We will also discuss the structural investigation of glycosylated biomolecules, including conformational changes of glycans and their impact on molecular interactions within the glycan-protein complex. For glycoproteins, secreted protein acidic and rich in cysteine (SPARC), which is associated with various lung disorders, such as chronic obstructive pulmonary disease (COPD) and lung cancer, will be taken as an example showing that the core fucosylation of N-glycan in SPARC regulates protein-binding affinity with extracellular matrix collagen. For glycosphingolipids (GSLs), Globo H ceramide, an important tumor-associated GSL which is being actively investigated as a target for new cancer immunotherapies, will be used to demonstrate how glycan structure plays a significant role in enhancing angiogenesis in tumor microenvironments.

Download Full-text

Conservation of coevolving protein interfaces bridges prokaryote–eukaryote homologies in the twilight zone

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1611861114 ◽

2016 ◽

Vol 113 (52) ◽

pp. 15018-15023 ◽

Cited By ~ 24

Author(s):

Juan Rodriguez-Rivas ◽

Simone Marsili ◽

David Juan ◽

Alfonso Valencia

Keyword(s):

Protein Interactions ◽

Protein Complexes ◽

Accurate Information ◽

Twilight Zone ◽

Sequence Information ◽

Protein Protein Interactions ◽

Sequence Alignments ◽

Multiple Sequence ◽

Protein Interfaces ◽

Recent Developments

Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.

Download Full-text