scholarly journals Patch-DCA: improved protein interface prediction by utilizing structural information and clustering DCA scores

Author(s):  
Amir Vajdi ◽  
Kourosh Zarringhalam ◽  
Nurit Haspel

Abstract Motivation Over the past decade, there have been impressive advances in determining the 3D structures of protein complexes. However, there are still many complexes with unknown structures, even when the structures of the individual proteins are known. The advent of protein sequence information provides an opportunity to leverage evolutionary information to enhance the accuracy of protein–protein interface prediction. To this end, several statistical and machine learning methods have been proposed. In particular, direct coupling analysis has recently emerged as a promising approach for identification of protein contact maps from sequential information. However, the ability of these methods to detect protein–protein inter-residue contacts remains relatively limited. Results In this work, we propose a method to integrate sequential and co-evolution information with structural and functional information to increase the performance of protein–protein interface prediction. Further, we present a post-processing clustering method that improves the average relative F1 score by 70% and 24% and the average relative precision by 80% and 36% in comparison with two state-of-the-art methods, PSICOV and GREMLIN. Availability and implementation https://github.com/BioMLBoston/PatchDCA Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Amir Vajdi ◽  
Kourosh Zarringhalam ◽  
Nurit Haspel

AbstractOver the past decade there have been impressive advances in determining the 3D structures of protein complexes. However, there are still many complexes with unknown structures, even when the structures of the individual proteins are known. The advent of protein sequence information provides an opportunity to leverage evolutionary information to enhance the accuracy of protein-protein interface prediction. To this end, several statistical and machine learning methods have been proposed. In particular, direct coupling analysis has recently emerged as a promising approach for identification of protein contact maps from sequential information. However, the ability of these methods to detect protein-protein inter-residue contacts remains relatively limited.In this work, we propose a method to integrate sequential and co-evolution information with structural and functional information to increase the performance of protein-protein interface prediction. Further, we present a post-processing clustering method that improves the average relative F1 score by 70 % and 24 % and the precision by 80 % and 36 % in comparison with two state-of-the-art methods PSICOV and GREMLIN.


Genes ◽  
2018 ◽  
Vol 9 (8) ◽  
pp. 394 ◽  
Author(s):  
Xiu-Juan Liu ◽  
Xiu-Jun Gong ◽  
Hua Yu ◽  
Jia-Hui Xu

Nowadays, various machine learning-based approaches using sequence information alone have been proposed for identifying DNA-binding proteins, which are crucial to many cellular processes, such as DNA replication, DNA repair and DNA modification. Among these methods, building a meaningful feature representation of the sequences and choosing an appropriate classifier are the most trivial tasks. Disclosing the significances and contributions of different feature spaces and classifiers to the final prediction is of the utmost importance, not only for the prediction performances, but also the practical clues of biological experiment designs. In this study, we propose a model stacking framework by orchestrating multi-view features and classifiers (MSFBinder) to investigate how to integrate and evaluate loosely-coupled models for predicting DNA-binding proteins. The framework integrates multi-view features including Local_DPP, 188D, Position-Specific Scoring Matrix (PSSM)_DWT and autocross-covariance of secondary structures(AC_Struc), which were extracted based on evolutionary information, sequence composition, physiochemical properties and predicted structural information, respectively. These features are fed into various loosely-coupled classifiers such as SVM and random forest. Then, a logistic regression model was applied to evaluate the contributions of these individual classifiers and to make the final prediction. When performing on the training dataset PDB1075, the proposed method achieves an accuracy of 83.53%. On the independent dataset PDB186, the method achieves an accuracy of 81.72%, which outperforms many existing methods. These results suggest that the framework is able to orchestrate various predicted models flexibly with good performances.


2019 ◽  
Author(s):  
David Becerra ◽  
Alexander Butyaev ◽  
Jérôme Waldispühl

Abstract Motivation Protein folding is a dynamic process through which polypeptide chains reach their native 3D structures. Although the importance of this mechanism is widely acknowledged, very few high-throughput computational methods have been developed to study it. Results In this paper, we report a computational platform named P3Fold that combines statistical and evolutionary information for predicting and analyzing protein folding routes. P3Fold uses coarse-grained modeling and efficient combinatorial schemes to predict residue contacts and evaluate the folding routes of a protein sequence within minutes or hours. To facilitate access to this technology, we devise graphical representations and implement an interactive web interface that allows end-users to leverage P3Fold predictions. Finally, we use P3Fold to conduct large and short scale experiments on the human proteome that reveal the broad conservation and variations of structural intermediates within protein families. Availability and implementation A Web server of P3Fold is freely available at http://csb.cs.mcgill.ca/P3Fold. Supplementary information Supplementary data are available at Bioinformatics online.


Biomolecules ◽  
2020 ◽  
Vol 10 (6) ◽  
pp. 938
Author(s):  
Kriti Chopra ◽  
Bhawna Burdak ◽  
Kaushal Sharma ◽  
Ajit Kembhavi ◽  
Shekhar C. Mande ◽  
...  

Decrypting the interface residues of the protein complexes provides insight into the functions of the proteins and, hence, the overall cellular machinery. Computational methods have been devised in the past to predict the interface residues using amino acid sequence information, but all these methods have been majorly applied to predict for prokaryotic protein complexes. Since the composition and rate of evolution of the primary sequence is different between prokaryotes and eukaryotes, it is important to develop a method specifically for eukaryotic complexes. Here, we report a new hybrid pipeline for predicting the protein-protein interaction interfaces in a pairwise manner from the amino acid sequence information of the interacting proteins. It is based on the framework of Co-evolution, machine learning (Random Forest), and Network Analysis named CoRNeA trained specifically on eukaryotic protein complexes. We use Co-evolution, physicochemical properties, and contact potential as major group of features to train the Random Forest classifier. We also incorporate the intra-contact information of the individual proteins to eliminate false positives from the predictions keeping in mind that the amino acid sequence of a protein also holds information for its own folding and not only the interface propensities. Our prediction on example datasets shows that CoRNeA not only enhances the prediction of true interface residues but also reduces false positive rates significantly.


Author(s):  
Gen Li ◽  
Swagata Pahari ◽  
Adithya Krishna Murthy ◽  
Siqi Liang ◽  
Robert Fragoza ◽  
...  

Abstract Motivation Vast majority of human genetic disorders are associated with mutations that affect protein–protein interactions by altering wild-type binding affinity. Therefore, it is extremely important to assess the effect of mutations on protein–protein binding free energy to assist the development of therapeutic solutions. Currently, the most popular approaches use structural information to deliver the predictions, which precludes them to be applicable on genome-scale investigations. Indeed, with the progress of genomic sequencing, researchers are frequently dealing with assessing effect of mutations for which there is no structure available. Results Here, we report a Gradient Boosting Decision Tree machine learning algorithm, the SAAMBE-SEQ, which is completely sequence-based and does not require structural information at all. SAAMBE-SEQ utilizes 80 features representing evolutionary information, sequence-based features and change of physical properties upon mutation at the mutation site. The approach is shown to achieve Pearson correlation coefficient (PCC) of 0.83 in 5-fold cross validation in a benchmarking test against experimentally determined binding free energy change (ΔΔG). Further, a blind test (no-STRUC) is compiled collecting experimental ΔΔG upon mutation for protein complexes for which structure is not available and used to benchmark SAAMBE-SEQ resulting in PCC in the range of 0.37–0.46. The accuracy of SAAMBE-SEQ method is found to be either better or comparable to most advanced structure-based methods. SAAMBE-SEQ is very fast, available as webserver and stand-alone code, and indeed utilizes only sequence information, and thus it is applicable for genome-scale investigations to study the effect of mutations on protein–protein interactions. Availability and implementation SAAMBE-SEQ is available at http://compbio.clemson.edu/saambe_webserver/indexSEQ.php#started. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Sherlyn Jemimah ◽  
Masakazu Sekijima ◽  
M Michael Gromiha

Abstract Motivation Protein–protein interactions are essential for the cell and mediate various functions. However, mutations can disrupt these interactions and may cause diseases. Currently available computational methods require a complex structure as input for predicting the change in binding affinity. Further, they have not included the functional class information for the protein–protein complex. To address this, we have developed a method, ProAffiMuSeq, which predicts the change in binding free energy using sequence-based features and functional class. Results Our method shows an average correlation between predicted and experimentally determined ΔΔG of 0.73 and mean absolute error (MAE) of 0.86 kcal/mol in 10-fold cross-validation and correlation of 0.75 with MAE of 0.94 kcal/mol in the test dataset. ProAffiMuSeq was also tested on an external validation set and showed results comparable to structure-based methods. Our method can be used for large-scale analysis of disease-causing mutations in protein–protein complexes without structural information. Availability and implementation Users can access the method at https://web.iitm.ac.in/bioinfo2/proaffimuseq/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Kriti Chopra ◽  
Bhawna Burdak ◽  
Kaushal Sharma ◽  
Ajit Kembavi ◽  
Shekhar C. Mande ◽  
...  

AbstractComputational methods have been devised in the past to predict the interface residues using amino acid sequence information but have been majorly applied to predict for prokaryotic protein complexes. Since the composition and rate of evolution of the primary sequence are different between prokaryotes and eukaryotes, it is important to develop a method specifically for eukaryotic complexes. Here we report a new hybrid pipeline for the prediction of protein-protein interaction interfaces from the amino acid sequence information alone based on the framework of Co-evolution, machine learning (Random forest) and Network Analysis named CoRNeA trained specifically on eukaryotic protein complexes. We incorporate the intra contact information of the individual proteins to eliminate false positives from the predictions as the amino acid sequence also holds information for its own folding along with the interface propensities. Our prediction on various case studies shows that CoRNeA can successfully identify minimal interacting regions of two partner proteins with higher precision and recall.


2012 ◽  
Vol 28 (20) ◽  
pp. 2608-2614 ◽  
Author(s):  
Ryan Brenke ◽  
David R. Hall ◽  
Gwo-Yu Chuang ◽  
Stephen R. Comeau ◽  
Tanggis Bohnuud ◽  
...  

Abstract Motivation: An effective docking algorithm for antibody–protein antigen complex prediction is an important first step toward design of biologics and vaccines. We have recently developed a new class of knowledge-based interaction potentials called Decoys as the Reference State (DARS) and incorporated DARS into the docking program PIPER based on the fast Fourier transform correlation approach. Although PIPER was the best performer in the latest rounds of the CAPRI protein docking experiment, it is much less accurate for docking antibody–protein antigen pairs than other types of complexes, in spite of incorporating sequence-based information on the location of the paratope. Analysis of antibody–protein antigen complexes has revealed an inherent asymmetry within these interfaces. Specifically, phenylalanine, tryptophan and tyrosine residues highly populate the paratope of the antibody but not the epitope of the antigen. Results: Since this asymmetry cannot be adequately modeled using a symmetric pairwise potential, we have removed the usual assumption of symmetry. Interaction statistics were extracted from antibody–protein complexes under the assumption that a particular atom on the antibody is different from the same atom on the antigen protein. The use of the new potential significantly improves the performance of docking for antibody–protein antigen complexes, even without any sequence information on the location of the paratope. We note that the asymmetric potential captures the effects of the multi-body interactions inherent to the complex environment in the antibody–protein antigen interface. Availability: The method is implemented in the ClusPro protein docking server, available at http://cluspro.bu.edu. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Zachary VanAernum ◽  
Florian Busch ◽  
Benjamin J. Jones ◽  
Mengxuan Jia ◽  
Zibo Chen ◽  
...  

It is important to assess the identity and purity of proteins and protein complexes during and after protein purification to ensure that samples are of sufficient quality for further biochemical and structural characterization, as well as for use in consumer products, chemical processes, and therapeutics. Native mass spectrometry (nMS) has become an important tool in protein analysis due to its ability to retain non-covalent interactions during measurements, making it possible to obtain protein structural information with high sensitivity and at high speed. Interferences from the presence of non-volatiles are typically alleviated by offline buffer exchange, which is timeconsuming and difficult to automate. We provide a protocol for rapid online buffer exchange (OBE) nMS to directly screen structural features of pre-purified proteins, protein complexes, or clarified cell lysates. Information obtained by OBE nMS can be used for fast (<5 min) quality control and can further guide protein expression and purification optimization.


2020 ◽  
Vol 27 (37) ◽  
pp. 6306-6355 ◽  
Author(s):  
Marian Vincenzi ◽  
Flavia Anna Mercurio ◽  
Marilisa Leone

Background:: Many pathways regarding healthy cells and/or linked to diseases onset and progression depend on large assemblies including multi-protein complexes. Protein-protein interactions may occur through a vast array of modules known as protein interaction domains (PIDs). Objective:: This review concerns with PIDs recognizing post-translationally modified peptide sequences and intends to provide the scientific community with state of art knowledge on their 3D structures, binding topologies and potential applications in the drug discovery field. Method:: Several databases, such as the Pfam (Protein family), the SMART (Simple Modular Architecture Research Tool) and the PDB (Protein Data Bank), were searched to look for different domain families and gain structural information on protein complexes in which particular PIDs are involved. Recent literature on PIDs and related drug discovery campaigns was retrieved through Pubmed and analyzed. Results and Conclusion:: PIDs are rather versatile as concerning their binding preferences. Many of them recognize specifically only determined amino acid stretches with post-translational modifications, a few others are able to interact with several post-translationally modified sequences or with unmodified ones. Many PIDs can be linked to different diseases including cancer. The tremendous amount of available structural data led to the structure-based design of several molecules targeting protein-protein interactions mediated by PIDs, including peptides, peptidomimetics and small compounds. More studies are needed to fully role out, among different families, PIDs that can be considered reliable therapeutic targets, however, attacking PIDs rather than catalytic domains of a particular protein may represent a route to obtain selective inhibitors.


Sign in / Sign up

Export Citation Format

Share Document