Systems Biology Inferring edge function in protein-protein interaction networks

AbstractMotivation: Post-translational modifications (PTMs) regulate many key cellular processes. Numerous studies have linked the topology of protein-protein interaction (PPI) networks to many biological phenomena such as key regulatory processes and disease. However, these methods fail to give insight in the functional nature of these interactions. On the other hand, pathways are commonly used to gain biological insight into the function of PPIs in the context of cascading interactions, sacrificing the coverage of networks for rich functional annotations on each PPI. We present a machine learning approach that uses Gene Ontology, InterPro and Pfam annotations to infer the edge functions in PPI networks, allowing us to combine the high coverage of networks with the information richness of pathways.Results: An ensemble method with a combination Logistic Regression and Random Forest classifiers trained on a high-quality set of annotated interactions, with a total of 18 unique labels, achieves high a average F1 score 0.88 despite not taking advantage of multi-label dependencies. When applied to the human interactome, our method confidently classifies 62% of interactions at a probability of 0.7 or higher.Availability: Software and data are available at https://github.com/DavisLaboratory/pyPPIContact:[email protected] information: Supplementary data are available at Bioinformatics online.

Download Full-text

Protein–protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique

Bioinformatics ◽

10.1093/bioinformatics/bty995 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2395-2402 ◽

Cited By ~ 39

Author(s):

Xiaoying Wang ◽

Bin Yu ◽

Anjun Ma ◽

Cheng Chen ◽

Bingqiang Liu ◽

...

Keyword(s):

Ensemble Learning ◽

Protein Interaction ◽

Feature Selection Method ◽

Feature Space ◽

Supplementary Information ◽

Sequence Profile ◽

Protein Protein Interaction ◽

Interaction Sites ◽

Imbalance Problem ◽

Ppi Networks

Abstract Motivation The prediction of protein–protein interaction (PPI) sites is a key to mutation design, catalytic reaction and the reconstruction of PPI networks. It is a challenging task considering the significant abundant sequences and the imbalance issue in samples. Results A new ensemble learning-based method, Ensemble Learning of synthetic minority oversampling technique (SMOTE) for Unbalancing samples and RF algorithm (EL-SMURF), was proposed for PPI sites prediction in this study. The sequence profile feature and the residue evolution rates were combined for feature extraction of neighboring residues using a sliding window, and the SMOTE was applied to oversample interface residues in the feature space for the imbalance problem. The Multi-dimensional Scaling feature selection method was implemented to reduce feature redundancy and subset selection. Finally, the Random Forest classifiers were applied to build the ensemble learning model, and the optimal feature vectors were inserted into EL-SMURF to predict PPI sites. The performance validation of EL-SMURF on two independent validation datasets showed 77.1% and 77.7% accuracy, which were 6.2–15.7% and 6.1–18.9% higher than the other existing tools, respectively. Availability and implementation The source codes and data used in this study are publicly available at http://github.com/QUST-AIBBDRC/EL-SMURF/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Heuristic Modularity for Complex Identification in Protein-Protein Interaction Networks

Iraqi Journal of Science ◽

10.24996/ijs.2019.60.8.22 ◽

2019 ◽

pp. 1846-1859

Author(s):

Amenah H. H. Abdulateef ◽

Bara'a A. Attea ◽

Ahmed N. Rashid

Keyword(s):

Protein Interaction ◽

Protein Complexes ◽

Building Blocks ◽

Detection Accuracy ◽

Protein Protein Interaction ◽

Protein Levels ◽

Cellular Processes ◽

Ppi Networks ◽

Protein Protein Interaction Networks ◽

Different Levels

Due to the significant role in understanding cellular processes, the decomposition of Protein-Protein Interaction (PPI) networks into essential building blocks, or complexes, has received much attention for functional bioinformatics research in recent years. One of the well-known bi-clustering descriptors for identifying communities and complexes in complex networks, such as PPI networks, is modularity function. The contribution of this paper is to introduce heuristic optimization models that can collaborate with the modularity function to improve its detection ability. The definitions of the formulated heuristics are based on nodes and different levels of their neighbor properties. The modularity function and the formulated heuristics are then injected into the mechanism of a single objective Evolutionary Algorithm (EA) tailored specifically to tackle the problem, and thus, to identify possible complexes from PPI networks. In the experiments, different overlapping scores are used to evaluate the detection accuracy in both complex and protein levels. According to the evaluation metrics, the results reveal that the introduced heuristics have the ability to harness the accuracy of the existing modularity while identifying protein complexes in the tested PPI networks.

Download Full-text

Spectral clustering for detecting protein complexes in protein–protein interaction (PPI) networks

Mathematical and Computer Modelling ◽

10.1016/j.mcm.2010.06.015 ◽

2010 ◽

Vol 52 (11-12) ◽

pp. 2066-2074 ◽

Cited By ~ 25

Author(s):

Guimin Qin ◽

Lin Gao

Keyword(s):

Protein Interaction ◽

Spectral Clustering ◽

Protein Complexes ◽

Protein Protein Interaction ◽

Ppi Networks

Download Full-text

DISTANCE-WISE PATHWAY DISCOVERY FROM PROTEIN–PROTEIN INTERACTION NETWORKS WEIGHTED BY SEMANTIC SIMILARITY

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720014500048 ◽

2014 ◽

Vol 12 (01) ◽

pp. 1450004 ◽

Cited By ~ 3

Author(s):

SLAVKA JAROMERSKA ◽

PETR PRAUS ◽

YOUNG-RAE CHO

Keyword(s):

Signaling Pathways ◽

Semantic Similarity ◽

Protein Interaction ◽

Protein Pair ◽

Semantic Analysis ◽

Topological Analysis ◽

Cellular Mechanisms ◽

Semantic Similarity Measure ◽

Protein Protein Interaction ◽

Ppi Networks

Reconstruction of signaling pathways is crucial for understanding cellular mechanisms. A pathway is represented as a path of a signaling cascade involving a series of proteins to perform a particular function. Since a protein pair involved in signaling and response have a strong interaction, putative pathways can be detected from protein–protein interaction (PPI) networks. However, predicting directed pathways from the undirected genome-wide PPI networks has been challenging. We present a novel computational algorithm to efficiently predict signaling pathways from PPI networks given a starting protein and an ending protein. Our approach integrates topological analysis of PPI networks and semantic analysis of PPIs using Gene Ontology data. An advanced semantic similarity measure is used for weighting each interacting protein pair. Our distance-wise algorithm iteratively selects an adjacent protein from a PPI network to build a pathway based on a distance condition. On each iteration, the strength of a hypothetical path passing through a candidate edge is estimated by a local heuristic. We evaluate the performance by comparing the resultant paths to known signaling pathways on yeast. The results show that our approach has higher accuracy and efficiency than previous methods.

Download Full-text

A Non-negative Matrix Factorization Based Method for Identifying Essential Proteins

10.21203/rs.3.rs-537545/v1 ◽

2021 ◽

Author(s):

Zhihong Zhang ◽

Sai Hu ◽

Wei Yan ◽

Bihai Zhao ◽

Lei Wang

Keyword(s):

Protein Interaction ◽

Matrix Factorization ◽

Biological Data ◽

Protein Domain ◽

Biological Information ◽

Ppi Network ◽

Essential Proteins ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Non Negative Matrix Factorization

Abstract BackgroundIdentification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, various different computational methods have been proposed to identify essential proteins based on protein-protein interaction (PPI) networks. However, there has been reliable evidence that a huge amount of false negatives and false positives exist in PPI data. Therefore, it is necessary to reduce the influence of false data on accuracy of essential proteins prediction by integrating multi-source biological information with PPI networks.ResultsIn this paper, we proposed a non-negative matrix factorization and multiple biological information based model (NDM) for identifying essential proteins. The first stage in this progress was to construct a weighted PPI network by combing the information of protein domain, protein complex and the topology characteristic of the original PPI network. Then, the non-negative matrix factorization technique was used to reconstruct an optimized PPI network with whole enough weight of edges. In the final stage, the ranking score of each protein was computed by the PageRank algorithm in which the initial scores were calculated with homologous and subcellular localization information. In order to verify the effectiveness of the NDM method, we compared the NDM with other state-of-the-art essential proteins prediction methods. The comparison of the results obtained from different methods indicated that our NDM model has better performance in predicting essential proteins.ConclusionEmploying the non-negative matrix factorization and integrating multi-source biological data can effectively improve quality of the PPI network, which resulted in the led to optimization of the performance essential proteins identification. This will also provide a new perspective for other prediction based on protein-protein interaction networks.

Download Full-text

PPInfer: a Bioconductor package for inferring functionally related proteins using protein interaction networks

F1000Research ◽

10.12688/f1000research.12947.1 ◽

2017 ◽

Vol 6 ◽

pp. 1969

Author(s):

Dongmin Jung ◽

Xijin Ge

Keyword(s):

Protein Interaction ◽

Protein Interaction Networks ◽

Interaction Networks ◽

Biological Processes ◽

Bioconductor Package ◽

Biological Functions ◽

Ppi Network ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Related Proteins

Interactions between proteins occur in many, if not most, biological processes. This fact has motivated the development of a variety of experimental methods for the identification of protein-protein interaction (PPI) networks. Leveraging PPI data available STRING database, we use network-based statistical learning methods to infer the putative functions of proteins from the known functions of neighboring proteins on a PPI network. This package identifies such proteins often involved in the same or similar biological functions. The package is freely available at the Bioconductor web site (http://bioconductor.org/packages/PPInfer/).

Download Full-text

PPInfer: a Bioconductor package for inferring functionally related proteins using protein interaction networks

F1000Research ◽

10.12688/f1000research.12947.3 ◽

2018 ◽

Vol 6 ◽

pp. 1969 ◽

Cited By ~ 3

Author(s):

Dongmin Jung ◽

Xijin Ge

Keyword(s):

Protein Interaction ◽

Protein Interaction Networks ◽

Interaction Networks ◽

Biological Processes ◽

Bioconductor Package ◽

Biological Functions ◽

Ppi Network ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Related Proteins

Interactions between proteins occur in many, if not most, biological processes. This fact has motivated the development of a variety of experimental methods for the identification of protein-protein interaction (PPI) networks. Leveraging PPI data available in the STRING database, we use a network-based statistical learning methods to infer the putative functions of proteins from the known functions of neighboring proteins on a PPI network. This package identifies such proteins often involved in the same or similar biological functions. The package is freely available at the Bioconductor web site (http://bioconductor.org/packages/PPInfer/).

Download Full-text

Extracting Biological Significant Subnetworks from Protein-Protein Interactions Induced by Differentially Expressed Genes of HIV-1 Vpr Variants

International Journal of System Dynamics Applications ◽

10.4018/ijsda.2015100103 ◽

2015 ◽

Vol 4 (4) ◽

pp. 35-51 ◽

Cited By ~ 1

Author(s):

Bandana Barman ◽

Anirban Mukhopadhyay

Keyword(s):

Differentially Expressed Genes ◽

Protein Interaction ◽

Protein Interactions ◽

Protein Interaction Network ◽

Interaction Network ◽

Differentially Expressed ◽

Wild Type ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Hiv 1

Identification of protein interaction network is very important to find the cell signaling pathway for a particular disease. The authors have found the differentially expressed genes between two sample groups of HIV-1. Samples are wild type HIV-1 Vpr and HIV-1 mutant Vpr. They did statistical t-test and found false discovery rate (FDR) to identify the genes increased in expression (up-regulated) or decreased in expression (down-regulated). In the test, the authors have computed q-values of test to identify minimum FDR which occurs. As a result they found 172 differentially expressed genes between their sample wild type HIV-1 Vpr and HIV-1 mutant Vpr, R80A. They found 68 up-regulated genes and 104 down-regulated genes. From the 172 differentially expressed genes the authors found protein-protein interaction network with string-db and then clustered (subnetworks) the PPI networks with cytoscape3.0. Lastly, the authors studied significance of subnetworks with performing gene ontology and also studied the KEGG pathway of those subnetworks.

Download Full-text

Dynamics of Protein-Protein Interaction Network in Plasmodium Falciparum

Biological Data Mining in Protein Interaction Networks ◽

10.4018/978-1-60566-398-2.ch015 ◽

2009 ◽

pp. 257-284

Author(s):

Smita Mohanty ◽

Shashi Bhushan Pandit ◽

Narayanaswamy Srinivasan

Keyword(s):

Plasmodium Falciparum ◽

Protein Interaction ◽

Protein Interaction Network ◽

Cellular Localization ◽

Interaction Network ◽

Malarial Parasite ◽

Strategic Integration ◽

Protein Protein Interaction ◽

Cellular Processes ◽

Expression Of Genes

Integration of organism-wide protein interactome data with information on expression of genes, cellular localization of proteins and their functions has proved extremely useful in developing biologically intuitive interaction networks. This chapter highlights the dynamics in protein interaction network across different stages in the lifecycle of Plasmodium falciparum, a malarial parasite, and the implication of the network dynamics in different physiological processes. The main focus of the chapter is the integration of information on experimentally derived interactions of P.falciparum proteins with expression data and analysis of the implications of interactions in different cellular processes. Extensive analysis has been made to quantify the interaction dynamics across various stages, as well as correlating it with the dynamics of the cellular pathways involving the interacting proteins. The authors’ analysis demonstrates the power of strategic integration of genome-wide datasets in extracting information on dynamics of biological pathways and processes.

Download Full-text

Multifaceted protein–protein interaction prediction based on Siamese residual RCNN

Bioinformatics ◽

10.1093/bioinformatics/btz328 ◽

2019 ◽

Vol 35 (14) ◽

pp. i305-i314 ◽

Cited By ~ 27

Author(s):

Muhao Chen ◽

Chelsea J -T Ju ◽

Guangyu Zhou ◽

Xuelu Chen ◽

Tianran Zhang ◽

...

Keyword(s):

Protein Interaction ◽

Mutual Influence ◽

State Of The Art ◽

Supplementary Information ◽

Interaction Type ◽

Interaction Prediction ◽

Protein Protein Interaction ◽

Protein Interaction Prediction ◽

Ppi Prediction ◽

Limited Coverage

AbstractMotivationSequence-based protein–protein interaction (PPI) prediction represents a fundamental computational biology problem. To address this problem, extensive research efforts have been made to extract predefined features from the sequences. Based on these features, statistical algorithms are learned to classify the PPIs. However, such explicit features are usually costly to extract, and typically have limited coverage on the PPI information.ResultsWe present an end-to-end framework, PIPR (Protein–Protein Interaction Prediction Based on Siamese Residual RCNN), for PPI predictions using only the protein sequences. PIPR incorporates a deep residual recurrent convolutional neural network in the Siamese architecture, which leverages both robust local features and contextualized information, which are significant for capturing the mutual influence of proteins sequences. PIPR relieves the data pre-processing efforts that are required by other systems, and generalizes well to different application scenarios. Experimental evaluations show that PIPR outperforms various state-of-the-art systems on the binary PPI prediction problem. Moreover, it shows a promising performance on more challenging problems of interaction type prediction and binding affinity estimation, where existing approaches fall short.Availability and implementationThe implementation is available at https://github.com/muhaochen/seq_ppi.git.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text