SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences

Jian Zhang; Lukasz Kurgan

doi:10.1093/bioinformatics/btz324

SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences

Bioinformatics ◽

10.1093/bioinformatics/btz324 ◽

2019 ◽

Vol 35 (14) ◽

pp. i343-i353 ◽

Cited By ~ 10

Author(s):

Jian Zhang ◽

Lukasz Kurgan

Keyword(s):

Protein Binding ◽

Protein Interactions ◽

Rna Binding ◽

Protein Complexes ◽

Predictive Performance ◽

Protein Docking ◽

Supplementary Information ◽

Binding Residue ◽

Binding Residues ◽

The Cross

AbstractMotivationAccurate predictions of protein-binding residues (PBRs) enhances understanding of molecular-level rules governing protein–protein interactions, helps protein–protein docking and facilitates annotation of protein functions. Recent studies show that current sequence-based predictors of PBRs severely cross-predict residues that interact with other types of protein partners (e.g. RNA and DNA) as PBRs. Moreover, these methods are relatively slow, prohibiting genome-scale use.ResultsWe propose a novel, accurate and fast sequence-based predictor of PBRs that minimizes the cross-predictions. Our SCRIBER (SeleCtive pRoteIn-Binding rEsidue pRedictor) method takes advantage of three innovations: comprehensive dataset that covers multiple types of binding residues, novel types of inputs that are relevant to the prediction of PBRs, and an architecture that is tailored to reduce the cross-predictions. The dataset includes complete protein chains and offers improved coverage of binding annotations that are transferred from multiple protein–protein complexes. We utilize innovative two-layer architecture where the first layer generates a prediction of protein-binding, RNA-binding, DNA-binding and small ligand-binding residues. The second layer re-predicts PBRs by reducing overlap between PBRs and the other types of binding residues produced in the first layer. Empirical tests on an independent test dataset reveal that SCRIBER significantly outperforms current predictors and that all three innovations contribute to its high predictive performance. SCRIBER reduces cross-predictions by between 41% and 69% and our conservative estimates show that it is at least 3 times faster. We provide putative PBRs produced by SCRIBER for the entire human proteome and use these results to hypothesize that about 14% of currently known human protein domains bind proteins.Availability and implementationSCRIBER webserver is available at http://biomine.cs.vcu.edu/servers/SCRIBER/.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

PROBselect: accurate prediction of protein-binding residues from proteins sequences via dynamic predictor selection

Bioinformatics ◽

10.1093/bioinformatics/btaa806 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i735-i744

Author(s):

Fuhao Zhang ◽

Wenbo Shi ◽

Jian Zhang ◽

Min Zeng ◽

Min Li ◽

...

Keyword(s):

Protein Binding ◽

Protein Interactions ◽

Predictive Performance ◽

Protein Docking ◽

Supplementary Information ◽

Protein Protein Interactions ◽

Cross Prediction ◽

Predictive Quality ◽

Protein Functions ◽

Binding Residues

Abstract Motivation Knowledge of protein-binding residues (PBRs) improves our understanding of protein−protein interactions, contributes to the prediction of protein functions and facilitates protein−protein docking calculations. While many sequence-based predictors of PBRs were published, they offer modest levels of predictive performance and most of them cross-predict residues that interact with other partners. One unexplored option to improve the predictive quality is to design consensus predictors that combine results produced by multiple methods. Results We empirically investigate predictive performance of a representative set of nine predictors of PBRs. We report substantial differences in predictive quality when these methods are used to predict individual proteins, which contrast with the dataset-level benchmarks that are currently used to assess and compare these methods. Our analysis provides new insights for the cross-prediction concern, dissects complementarity between predictors and demonstrates that predictive performance of the top methods depends on unique characteristics of the input protein sequence. Using these insights, we developed PROBselect, first-of-its-kind consensus predictor of PBRs. Our design is based on the dynamic predictor selection at the protein level, where the selection relies on regression-based models that accurately estimate predictive performance of selected predictors directly from the sequence. Empirical assessment using a low-similarity test dataset shows that PROBselect provides significantly improved predictive quality when compared with the current predictors and conventional consensuses that combine residue-level predictions. Moreover, PROBselect informs the users about the expected predictive quality for the prediction generated from a given input protein. Availability and implementation PROBselect is available at http://bioinformatics.csu.edu.cn/PROBselect/home/index. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

InterPep2: global peptide–protein docking using interaction surface templates

Bioinformatics ◽

10.1093/bioinformatics/btaa005 ◽

2020 ◽

Vol 36 (8) ◽

pp. 2458-2465 ◽

Cited By ~ 2

Author(s):

Isak Johansson-Åkhe ◽

Claudio Mirabello ◽

Björn Wallner

Keyword(s):

Protein Interactions ◽

Protein Complexes ◽

Structural Features ◽

Protein Docking ◽

Supplementary Information ◽

Peptide Ligand ◽

Protein Protein Interactions ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Improved Performance

Abstract Motivation Interactions between proteins and peptides or peptide-like intrinsically disordered regions are involved in many important biological processes, such as gene expression and cell life-cycle regulation. Experimentally determining the structure of such interactions is time-consuming and difficult because of the inherent flexibility of the peptide ligand. Although several prediction-methods exist, most are limited in performance or availability. Results InterPep2 is a freely available method for predicting the structure of peptide–protein interactions. Improved performance is obtained by using templates from both peptide–protein and regular protein–protein interactions, and by a random forest trained to predict the DockQ-score for a given template using sequence and structural features. When tested on 252 bound peptide–protein complexes from structures deposited after the complexes used in the construction of the training and templates sets of InterPep2, InterPep2-Refined correctly positioned 67 peptides within 4.0 Å LRMSD among top10, similar to another state-of-the-art template-based method which positioned 54 peptides correctly. However, InterPep2 displays a superior ability to evaluate the quality of its own predictions. On a previously established set of 27 non-redundant unbound-to-bound peptide–protein complexes, InterPep2 performs on-par with leading methods. The extended InterPep2-Refined protocol managed to correctly model 15 of these complexes within 4.0 Å LRMSD among top10, without using templates from homologs. In addition, combining the template-based predictions from InterPep2 with ab initio predictions from PIPER-FlexPepDock resulted in 22% more near-native predictions compared to the best single method (22 versus 18). Availability and implementation The program is available from: http://wallnerlab.org/InterPep2. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Text mining for modeling of protein complexes enhanced by machine learning

Bioinformatics ◽

10.1093/bioinformatics/btaa823 ◽

2020 ◽

Author(s):

Varsha D Badal ◽

Petras J Kundrotas ◽

Ilya A Vakser

Keyword(s):

Machine Learning ◽

Text Mining ◽

Protein Interactions ◽

Full Text ◽

Protein Complexes ◽

Protein Docking ◽

Supplementary Information ◽

Support Vector ◽

Learning Approaches ◽

Protein Protein Interactions

Abstract Motivation Procedures for structural modeling of protein-protein complexes (protein docking) produce a number of models which need to be further analyzed and scored. Scoring can be based on independently determined constraints on the structure of the complex, such as knowledge of amino acids essential for the protein interaction. Previously, we showed that text mining of residues in freely available PubMed abstracts of papers on studies of protein-protein interactions may generate such constraints. However, absence of post-processing of the spotted residues reduced usability of the constraints, as a significant number of the residues were not relevant for the binding of the specific proteins. Results We explored filtering of the irrelevant residues by two machine learning approaches, Deep Recursive Neural Network (DRNN) and Support Vector Machine (SVM) models with different training/testing schemes. The results showed that the DRNN model is superior to the SVM model when training is performed on the PMC-OA full-text articles and applied to classification (interface or non-interface) of the residues spotted in the PubMed abstracts. When both training and testing is performed on full-text articles or on abstracts, the performance of these models is similar. Thus, in such cases, there is no need to utilize computationally demanding DRNN approach, which is computationally expensive especially at the training stage. The reason is that SVM success is often determined by the similarity in data/text patterns in the training and the testing sets, whereas the sentence structures in the abstracts are, in general, different from those in the full text articles. Availability The code and the datasets generated in this study are available at https://gitlab.ku.edu/vakser-lab-public/text-mining/-/tree/2020-09-04. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

pyDockEneRes: per-residue decomposition of protein–protein docking energy

Bioinformatics ◽

10.1093/bioinformatics/btz884 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2284-2285 ◽

Cited By ~ 1

Author(s):

Miguel Romero-Durana ◽

Brian Jiménez-García ◽

Juan Fernández-Recio

Keyword(s):

Binding Affinity ◽

Protein Interactions ◽

Structural Model ◽

Protein Complexes ◽

Complex Structure ◽

Protein Docking ◽

Supplementary Information ◽

Scoring Functions ◽

Residue Decomposition ◽

Docking Energy

Abstract Motivation Protein–protein interactions are key to understand biological processes at the molecular level. As a complement to experimental characterization of protein interactions, computational docking methods have become useful tools for the structural and energetics modeling of protein–protein complexes. A key aspect of such algorithms is the use of scoring functions to evaluate the generated docking poses and try to identify the best models. When the scoring functions are based on energetic considerations, they can help not only to provide a reliable structural model for the complex, but also to describe energetic aspects of the interaction. This is the case of the scoring function used in pyDock, a combination of electrostatics, desolvation and van der Waals energy terms. Its correlation with experimental binding affinity values of protein–protein complexes was explored in the past, but the per-residue decomposition of the docking energy was never systematically analyzed. Results Here, we present pyDockEneRes (pyDock Energy per-Residue), a web server that provides pyDock docking energy partitioned at the residue level, giving a much more detailed description of the docking energy landscape. Additionally, pyDockEneRes computes the contribution to the docking energy of the side-chain atoms. This fast approach can be applied to characterize a complex structure in order to identify energetically relevant residues (hot-spots) and estimate binding affinity changes upon mutation to alanine. Availability and implementation The server does not require registration by the user and is freely accessible for academics at https://life.bsc.es/pid/pydockeneres. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Prediction of protein-binding residues: dichotomy of sequence-based methods developed using structured complexes versus disordered proteins

Bioinformatics ◽

10.1093/bioinformatics/btaa573 ◽

2020 ◽

Vol 36 (18) ◽

pp. 4729-4738 ◽

Cited By ~ 2

Author(s):

Jian Zhang ◽

Sina Ghadermarzi ◽

Lukasz Kurgan

Keyword(s):

Protein Binding ◽

The Other ◽

Disordered Proteins ◽

Supplementary Information ◽

Supplementary Data ◽

Protein Partners ◽

Structure Disorder ◽

Binding Residues ◽

The Cross ◽

Protein Nucleic Acid

Abstract Motivation There are over 30 sequence-based predictors of the protein-binding residues (PBRs). They use either structure-annotated or disorder-annotated training datasets, potentially creating a dichotomy where the structure-/disorder-specific models may not be able to cross-over to accurately predict the other type. Moreover, the structure-trained predictors were shown to substantially cross-predict PBRs among residues that interact with non-protein partners (nucleic acids and small ligands). We address these issues by performing first-of-its-kind comparative study of a representative collection of disorder- and structure-trained predictors using a comprehensive benchmark set with the structure- and disorder-derived annotations of PBRs (to analyze the cross-over) and the protein-, nucleic acid- and small ligand-binding proteins (to study the cross-predictions). Results Three predictors provide accurate results: SCRIBER, ANCHOR and disoRDPbind. Some of the structure-trained methods make accurate predictions on the structure-annotated proteins. Similarly, the disorder-trained predictors predict well on the disorder-annotated proteins. However, the considered predictors generally fail to cross-over, with the exception of SCRIBER. Our study also reveals that virtually all methods substantially cross-predict PBRs, except for SCRIBER for the structure-annotated proteins and disoRDPbind for the disorder-annotated proteins. We formulate a novel hybrid predictor, hybridPBRpred, that combines results produced by disoRDPbind and SCRIBER to accurately predict disorder- and structure-annotated PBRs. HybridPBRpred generates accurate results that cross-over structure- and disorder-annotated proteins and produces relatively low amount of cross-predictions, offering an accurate alternative to predict PBRs. Availability and implementation HybridPBRpred webserver, benchmark dataset and supplementary information are available at http://biomine.cs.vcu.edu/servers/hybridPBRpred/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RNA-Centric Approaches to Profile the RNA–Protein Interaction Landscape on Selected RNAs

Non-Coding RNA ◽

10.3390/ncrna7010011 ◽

2021 ◽

Vol 7 (1) ◽

pp. 11 ◽

Cited By ~ 1

Author(s):

André P. Gerber

Keyword(s):

Mass Spectrometry ◽

Protein Interactions ◽

Regulatory Networks ◽

Rna Binding ◽

Rna Binding Proteins ◽

Protein Complexes ◽

Cell Protein ◽

Transcriptional Regulatory Networks ◽

Technological Advances

RNA–protein interactions frame post-transcriptional regulatory networks and modulate transcription and epigenetics. While the technological advances in RNA sequencing have significantly expanded the repertoire of RNAs, recently developed biochemical approaches combined with sensitive mass-spectrometry have revealed hundreds of previously unrecognized and potentially novel RNA-binding proteins. Nevertheless, a major challenge remains to understand how the thousands of RNA molecules and their interacting proteins assemble and control the fate of each individual RNA in a cell. Here, I review recent methodological advances to approach this problem through systematic identification of proteins that interact with particular RNAs in living cells. Thereby, a specific focus is given to in vivo approaches that involve crosslinking of RNA–protein interactions through ultraviolet irradiation or treatment of cells with chemicals, followed by capture of the RNA under study with antisense-oligonucleotides and identification of bound proteins with mass-spectrometry. Several recent studies defining interactomes of long non-coding RNAs, viral RNAs, as well as mRNAs are highlighted, and short reference is given to recent in-cell protein labeling techniques. These recent experimental improvements could open the door for broader applications and to study the remodeling of RNA–protein complexes upon different environmental cues and in disease.

Download Full-text

Mass spectrometry-based cross-linking study shows that the Psb28 protein binds to cytochrome b559 in Photosystem II

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1620360114 ◽

2017 ◽

Vol 114 (9) ◽

pp. 2224-2229 ◽

Cited By ~ 18

Author(s):

Daniel A. Weisz ◽

Haijun Liu ◽

Hao Zhang ◽

Sundarapandian Thangapandian ◽

Emad Tajkhorshid ◽

...

Keyword(s):

Mass Spectrometry ◽

Photosystem Ii ◽

Protein Interactions ◽

Protein Complexes ◽

Protein Docking ◽

Cross Linking ◽

Protein Protein Interactions ◽

Cytochrome B559 ◽

Reaction Center Complex ◽

Assembly Intermediate

Photosystem II (PSII), a large pigment protein complex, undergoes rapid turnover under natural conditions. During assembly of PSII, oxidative damage to vulnerable assembly intermediate complexes must be prevented. Psb28, the only cytoplasmic extrinsic protein in PSII, protects the RC47 assembly intermediate of PSII and assists its efficient conversion into functional PSII. Its role is particularly important under stress conditions when PSII damage occurs frequently. Psb28 is not found, however, in any PSII crystal structure, and its structural location has remained unknown. In this study, we used chemical cross-linking combined with mass spectrometry to capture the transient interaction of Psb28 with PSII. We detected three cross-links between Psb28 and the α- and β-subunits of cytochrome b559, an essential component of the PSII reaction-center complex. These distance restraints enable us to position Psb28 on the cytosolic surface of PSII directly above cytochrome b559, in close proximity to the QB site. Protein–protein docking results also support Psb28 binding in this region. Determination of the Psb28 binding site and other biochemical evidence allow us to propose a mechanism by which Psb28 exerts its protective effect on the RC47 intermediate. This study also shows that isotope-encoded cross-linking with the “mass tags” selection criteria allows confident identification of more cross-linked peptides in PSII than has been previously reported. This approach thus holds promise to identify other transient protein–protein interactions in membrane protein complexes.

Download Full-text

Protein-protein docking using learned three-dimensional representations

10.1101/738690 ◽

2019 ◽

Cited By ~ 1

Author(s):

Georgy Derevyanko ◽

Guillaume Lamoureux

Keyword(s):

Protein Interactions ◽

Network Architecture ◽

Protein Complexes ◽

Three Dimensional ◽

Spatial Arrangement ◽

Protein Docking ◽

Protein Protein Interactions ◽

Translational Invariance ◽

Shape Complementarity ◽

Spatial Features

AbstractProtein-protein interactions are determined by a number of hard-to-capture features related to shape complementarity, electrostatics, and hydrophobicity. These features may be intrinsic to the protein or induced by the presence of a partner. A conventional approach to protein-protein docking consists in engineering a small number of spatial features for each protein, and in minimizing the sum of their correlations with respect to the spatial arrangement of the two proteins. To generalize this approach, we introduce a deep neural network architecture that transforms the raw atomic densities of each protein into complex three-dimensional representations. Each point in the volume containing the protein is described by 48 learned features, which are correlated and combined with the features of a second protein to produce a score dependent on the relative position and orientation of the two proteins. The architecture is based on multiple layers of SE(3)-equivariant convolutional neural networks, which provide built-in rotational and translational invariance of the score with respect to the structure of the complex. The model is trained end-to-end on a set of decoy conformations generated from 851 nonredundant protein-protein complexes and is tested on data from the Protein-Protein Docking Benchmark Version 4.0.

Download Full-text

UNRES-Dock—protein–protein and peptide–protein docking by coarse-grained replica-exchange MD simulations

Bioinformatics ◽

10.1093/bioinformatics/btaa897 ◽

2020 ◽

Cited By ~ 1

Author(s):

Paweł Krupa ◽

Agnieszka S Karczyńska ◽

Magdalena A Mozolewska ◽

Adam Liwo ◽

Cezary Czaplewski

Keyword(s):

Protein Complexes ◽

Md Simulations ◽

Protein Docking ◽

Conformational Space ◽

Coarse Grained ◽

Supplementary Information ◽

Replica Exchange ◽

Variable Degree ◽

Single Chain ◽

Simulation Speed

Abstract Motivation The majority of the proteins in living organisms occur as homo- or hetero-multimeric structures. Although there are many tools to predict the structures of single-chain proteins or protein complexes with small ligands, peptide–protein and protein–protein docking is more challenging. In this work, we utilized multiplexed replica-exchange molecular dynamics (MREMD) simulations with the physics-based heavily coarse-grained UNRES model, which provides more than a 1000-fold simulation speed-up compared with all-atom approaches to predict structures of protein complexes. Results We present a new protein–protein and peptide–protein docking functionality of the UNRES package, which includes a variable degree of conformational flexibility. UNRES-Dock protocol was tested on a set of 55 complexes with size from 43 to 587 amino-acid residues, showing that structures of the complexes can be predicted with good quality, if the sampling of the conformational space is sufficient, especially for flexible peptide–protein systems. The developed automatized protocol has been implemented in the standalone UNRES package and in the UNRES server. Availability and implementation UNRES server: http://unres-server.chem.ug.edu.pl; UNRES package and data used in testing of UNRES-Dock: http://unres.pl. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RNA–protein interactions: disorder, moonlighting and junk contribute to eukaryotic complexity

Open Biology ◽

10.1098/rsob.190096 ◽

2019 ◽

Vol 9 (6) ◽

pp. 190096 ◽

Cited By ~ 14

Author(s):

Anna Balcerak ◽

Alicja Trebinska-Stryjewska ◽

Ryszard Konopinski ◽

Maciej Wakula ◽

Ewa Anna Grzybowska

Keyword(s):

Protein Interactions ◽

Regulatory Networks ◽

Rna Binding ◽

Rna Binding Proteins ◽

Protein Complexes ◽

Binding Motifs ◽

Coding Regions ◽

Regulatory Circuits ◽

Proteins Interactions

RNA–protein interactions are crucial for most biological processes in all organisms. However, it appears that the complexity of RNA-based regulation increases with the complexity of the organism, creating additional regulatory circuits, the scope of which is only now being revealed. It is becoming apparent that previously unappreciated features, such as disordered structural regions in proteins or non-coding regions in DNA leading to higher plasticity and pliability in RNA–protein complexes, are in fact essential for complex, precise and fine-tuned regulation. This review addresses the issue of the role of RNA–protein interactions in generating eukaryotic complexity, focusing on the newly characterized disordered RNA-binding motifs, moonlighting of metabolic enzymes, RNA-binding proteins interactions with different RNA species and their participation in regulatory networks of higher order.

Download Full-text