scholarly journals Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms

Molecules ◽  
2021 ◽  
Vol 27 (1) ◽  
pp. 41
Author(s):  
Brandan Dunham ◽  
Madhavi K. Ganapathiraju

Protein–protein interactions (PPIs) perform various functions and regulate processes throughout cells. Knowledge of the full network of PPIs is vital to biomedical research, but most of the PPIs are still unknown. As it is infeasible to discover all of them experimentally due to technical and resource limitations, computational prediction of PPIs is essential and accurately assessing the performance of algorithms is required before further application or translation. However, many published methods compose their evaluation datasets incorrectly, using a higher proportion of positive class data than occuring naturally, leading to exaggerated performance. We re-implemented various published algorithms and evaluated them on datasets with realistic data compositions and found that their performance is overstated in original publications; with several methods outperformed by our control models built on ‘illogical’ and random number features. We conclude that these methods are influenced by an over-characterization of some proteins in the literature and due to scale-free nature of PPI network and that they fail when tested on all possible protein pairs. Additionally, we found that sequence-only-based algorithms performed worse than those that employ functional and expression features. We present a benchmark evaluation of many published algorithms for PPI prediction. The source code of our implementations and the benchmark datasets created here are made available in open source.

2019 ◽  
Vol 2019 ◽  
pp. 1-7 ◽  
Author(s):  
Xue Li ◽  
Lifeng Yang ◽  
Xiaopan Zhang ◽  
Xiong Jiao

Protein-protein interactions (PPIs) play a crucial role in various biological processes. To better comprehend the pathogenesis and treatments of various diseases, it is necessary to learn the detail of these interactions. However, the current experimental method still has many false-positive and false-negative problems. Computational prediction of protein-protein interaction has become a more important prediction method which can overcome the obstacles of the experimental method. In this work, we proposed a novel computational domain-based method for PPI prediction, and an SVM model for the prediction was built based on the physicochemical property of the domain. The outcomes of SVM and the domain-domain score were used to construct the prediction model for protein-protein interaction. The predicted results demonstrated the domain-based research can enhance the ability to predict protein interactions.


Author(s):  
Morihiro Hayashida ◽  
Tatsuya Akutsu

Protein-protein interactions play various essential roles in cellular systems. Many methods have been developed for inference of protein-protein interactions from protein sequence data. In this paper, the authors focus on methods based on domain-domain interactions, where a domain is defined as a region within a protein that either performs a specific function or constitutes a stable structural unit. In these methods, the probabilities of domain-domain interactions are inferred from known protein-protein interaction data and protein domain data, and then prediction of interactions is performed based on these probabilities and contents of domains of given proteins. This paper overviews several fundamental methods, which include association method, expectation maximization-based method, support vector machine-based method, linear programming-based method, and conditional random field-based method. This paper also reviews a simple evolutionary model of protein domains, which yields a scale-free distribution of protein domains. By combining with a domain-based protein interaction model, a scale-free distribution of protein-protein interaction networks is also derived.


Author(s):  
Tatsuya Akutsu ◽  
Morihiro Hayashida

Many methods have been proposed for inference of protein-protein interactions from protein sequence data. This chapter focuses on methods based on domain-domain interactions, where a domain is defined as a region within a protein that either performs a specific function or constitutes a stable structural unit. In these methods, the probabilities of domain-domain interactions are inferred from known protein-protein interaction data and protein domain data, and then prediction of interactions is performed based on these probabilities and contents of domains of given proteins. This chapter overviews several fundamental methods, which include association method, expectation maximization-based method, support vector machine-based method, and linear programmingbased method. This chapter also reviews a simple evolutionary model of protein domains, which yields a scalefree distribution of protein domains. By combining with a domain-based protein interaction model, a scale-free distribution of protein-protein interaction networks is also derived.


Biotechnology ◽  
2019 ◽  
pp. 406-427
Author(s):  
Morihiro Hayashida ◽  
Tatsuya Akutsu

Protein-protein interactions play various essential roles in cellular systems. Many methods have been developed for inference of protein-protein interactions from protein sequence data. In this paper, the authors focus on methods based on domain-domain interactions, where a domain is defined as a region within a protein that either performs a specific function or constitutes a stable structural unit. In these methods, the probabilities of domain-domain interactions are inferred from known protein-protein interaction data and protein domain data, and then prediction of interactions is performed based on these probabilities and contents of domains of given proteins. This paper overviews several fundamental methods, which include association method, expectation maximization-based method, support vector machine-based method, linear programming-based method, and conditional random field-based method. This paper also reviews a simple evolutionary model of protein domains, which yields a scale-free distribution of protein domains. By combining with a domain-based protein interaction model, a scale-free distribution of protein-protein interaction networks is also derived.


2020 ◽  
Vol 21 (6) ◽  
pp. 454-463 ◽  
Author(s):  
Mst. Shamima Khatun ◽  
Watshara Shoombuatong ◽  
Md. Mehedi Hasan ◽  
Hiroyuki Kurata

Protein-protein interactions (PPIs) are the physical connections between two or more proteins via electrostatic forces or hydrophobic effects. Identification of the PPIs is pivotal, which contributes to many biological processes including protein function, disease incidence, and therapy design. The experimental identification of PPIs via high-throughput technology is time-consuming and expensive. Bioinformatics approaches are expected to solve such restrictions. In this review, our main goal is to provide an inclusive view of the existing sequence-based computational prediction of PPIs. Initially, we briefly introduce the currently available PPI databases and then review the state-of-the-art bioinformatics approaches, working principles, and their performances. Finally, we discuss the caveats and future perspective of the next generation algorithms for the prediction of PPIs.


2019 ◽  
Vol 27 (01) ◽  
pp. 1-18
Author(s):  
YUANMIAO GUI ◽  
RUJING WANG ◽  
YUANYUAN WEI ◽  
XUE WANG

Protein–protein interaction (PPI) is very important for various biological processes and has given rise to a series of prediction-computing methods. In spite of different computing methods in relation to PPI prediction, PPI network projects fail to perform on a large scale. Aiming at ensuring that PPI can be predicted effectively, we used a deep neural network (DNN) for the study of PPI prediction that is based on an amino acid sequence. We present a novel DNN-PPI model with an auto covariance (AC) descriptor and a conjoint triad (CT) descriptor for the prediction of PPI that is based only on the protein sequence information. The 10-fold cross-validation indicated that the best DNN-PPI model with CT achieved 97.65% accuracy, 98.96% recall and a 98.51% area under the curve (AUC). The model exhibits a prediction accuracy of 94.20–97.10% for other external datasets. All of these suggest the high validity of the proposed algorithm in relation to various species.


2021 ◽  
Vol 16 ◽  
Author(s):  
Fee Faysal Ahmed ◽  
Mst Shamima Khatun ◽  
Md. Parvez Mosharaf ◽  
Md. Nurul Haque Mollah

Background: Protein-protein interactions (PPI) play a vital role in a wide range of biological processes starting from cell-cell interactions to developmental control in all organisms. However, experimental identification of PPI is often laborious, time-consuming and costly compared to computational prediction. There are several computational prediction models in the literature based on complete training samples, but none of them dealt with the partial training samples. Objective: The objective of this work was to develop an effective PPI prediction model for Arabidopsis Thaliana using partial training samples in a machine learning framework. Methods: We proposed an effective computational PPI prediction model by combining random forest (RF) classifier and autocorrelation (AC) sequence encoding features with 1:2 ratio of positive-PPI and unknown-PPI samples. Results: We observed that the proposed prediction model produces the highest average performance scores of sensitivity (94.62%), AUC (0.92) and pAUC (0.189) with the training datasets and sensitivity (88.14%), AUC (0.89) and pAUC (0.176) with the test datasets of 5-fold cross-validation compared to other candidate predictors based on LDA, LOGI, ADA, NB, KNN & SVM classifiers. It also computed the highest performance scores of TPR (91.82%) and pAUC (0.174) at FPR= 20% with AUC (0.948) compared to other candidate predictors. Conclusion: Overall performance of the developed model revealed that our proposed predictor might be useful to elucidate the biological function of unseen PPIs from a large number of candidate proteins in Arabidopsis thaliana.


2012 ◽  
Vol 22 (1) ◽  
pp. 7-14
Author(s):  
Bui Phuong Thuy ◽  
Trinh Xuan Hoang

Protein interacts with one another resulting in complex functions in living organisms. Like many other real-world networks, the networks of protein-protein interactions possess a certain degree of ordering, such as the scale-free property. The latter means that the probability $P$ to find a protein that interacts with $k$ other proteins follows a power law, $P(k) \sim k^{-\gamma}$. Protein interaction networks (PINs) have been studied by using a stochastic model, the duplication-divergence model, which is based on mechanisms of gene duplication and divergence during evolution. In this work, we show that this model can be used to fit experimental data on the PIN of yeast Saccharomyces cerevisae at two different time instances simultaneously. Our study shows that the evolution of PIN given by model is consistent with growing experimental data over time, and that the scale-free property of protein interaction network is robust against random deletion of interactions.


Author(s):  
Yu-Miao Zhang ◽  
Jun Wang ◽  
Tao Wu

In this study, the Agrobacterium infection medium, infection duration, detergent, and cell density were optimized. The sorghum-based infection medium (SbIM), 10-20 min infection time, addition of 0.01% Silwet L-77, and Agrobacterium optical density at 600 nm (OD600), improved the competence of onion epidermal cells to support Agrobacterium infection at >90% efficiency. Cyclin-dependent kinase D-2 (CDKD-2) and cytochrome c-type biogenesis protein (CYCH), protein-protein interactions were localized. The optimized procedure is a quick and efficient system for examining protein subcellular localization and protein-protein interaction.


2019 ◽  
Vol 26 (21) ◽  
pp. 3890-3910 ◽  
Author(s):  
Branislava Gemovic ◽  
Neven Sumonja ◽  
Radoslav Davidovic ◽  
Vladimir Perovic ◽  
Nevena Veljkovic

Background: The significant number of protein-protein interactions (PPIs) discovered by harnessing concomitant advances in the fields of sequencing, crystallography, spectrometry and two-hybrid screening suggests astonishing prospects for remodelling drug discovery. The PPI space which includes up to 650 000 entities is a remarkable reservoir of potential therapeutic targets for every human disease. In order to allow modern drug discovery programs to leverage this, we should be able to discern complete PPI maps associated with a specific disorder and corresponding normal physiology. Objective: Here, we will review community available computational programs for predicting PPIs and web-based resources for storing experimentally annotated interactions. Methods: We compared the capacities of prediction tools: iLoops, Struck2Net, HOMCOS, COTH, PrePPI, InterPreTS and PRISM to predict recently discovered protein interactions. Results: We described sequence-based and structure-based PPI prediction tools and addressed their peculiarities. Additionally, since the usefulness of prediction algorithms critically depends on the quality and quantity of the experimental data they are built on; we extensively discussed community resources for protein interactions. We focused on the active and recently updated primary and secondary PPI databases, repositories specialized to the subject or species, as well as databases that include both experimental and predicted PPIs. Conclusion: PPI complexes are the basis of important physiological processes and therefore, possible targets for cell-penetrating ligands. Reliable computational PPI predictions can speed up new target discoveries through prioritization of therapeutically relevant protein–protein complexes for experimental studies.


Sign in / Sign up

Export Citation Format

Share Document