scholarly journals Protein–protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique

2018 ◽  
Vol 35 (14) ◽  
pp. 2395-2402 ◽  
Author(s):  
Xiaoying Wang ◽  
Bin Yu ◽  
Anjun Ma ◽  
Cheng Chen ◽  
Bingqiang Liu ◽  
...  

Abstract Motivation The prediction of protein–protein interaction (PPI) sites is a key to mutation design, catalytic reaction and the reconstruction of PPI networks. It is a challenging task considering the significant abundant sequences and the imbalance issue in samples. Results A new ensemble learning-based method, Ensemble Learning of synthetic minority oversampling technique (SMOTE) for Unbalancing samples and RF algorithm (EL-SMURF), was proposed for PPI sites prediction in this study. The sequence profile feature and the residue evolution rates were combined for feature extraction of neighboring residues using a sliding window, and the SMOTE was applied to oversample interface residues in the feature space for the imbalance problem. The Multi-dimensional Scaling feature selection method was implemented to reduce feature redundancy and subset selection. Finally, the Random Forest classifiers were applied to build the ensemble learning model, and the optimal feature vectors were inserted into EL-SMURF to predict PPI sites. The performance validation of EL-SMURF on two independent validation datasets showed 77.1% and 77.7% accuracy, which were 6.2–15.7% and 6.1–18.9% higher than the other existing tools, respectively. Availability and implementation The source codes and data used in this study are publicly available at http://github.com/QUST-AIBBDRC/EL-SMURF/. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Daniel Esposito ◽  
Joseph Cursons ◽  
Melissa Davis

AbstractMotivation: Post-translational modifications (PTMs) regulate many key cellular processes. Numerous studies have linked the topology of protein-protein interaction (PPI) networks to many biological phenomena such as key regulatory processes and disease. However, these methods fail to give insight in the functional nature of these interactions. On the other hand, pathways are commonly used to gain biological insight into the function of PPIs in the context of cascading interactions, sacrificing the coverage of networks for rich functional annotations on each PPI. We present a machine learning approach that uses Gene Ontology, InterPro and Pfam annotations to infer the edge functions in PPI networks, allowing us to combine the high coverage of networks with the information richness of pathways.Results: An ensemble method with a combination Logistic Regression and Random Forest classifiers trained on a high-quality set of annotated interactions, with a total of 18 unique labels, achieves high a average F1 score 0.88 despite not taking advantage of multi-label dependencies. When applied to the human interactome, our method confidently classifies 62% of interactions at a probability of 0.7 or higher.Availability: Software and data are available at https://github.com/DavisLaboratory/pyPPIContact:[email protected] information: Supplementary data are available at Bioinformatics online.


Author(s):  
Yasanthi Hirimutugoda

Proteins are the workhorses of the cell that perform biological functions by interacting with other proteins. Many statistical methods for protein-protein interaction (PPI) have been studied without considering time-dependent changes in networks and the functionalities. I introduced a novel method that models PPI networks as being dynamic in nature and evolving time-varying multivariate distribution with Conditional Random Fields (CRF). This research is directed towards implementing this new combinatorial algorithm on massively parallel architectures such as Graphics Processing Units (GPUs) for efficient computations for large scale bioinformatics datasets. I compared Conditional Random Fields (CRF) and the proposed novel method using CRF combined with the Block Coordinate Descent algorithm for human protein-protein interaction data set. Both are implemented on GPU-Accelerated Computing Architecture and the proposed novel method showed the advantages in predicting protein-protein interaction sites. I also show that the proposed approach is more efficient in 6.13% than standalone CRF++ in predicting protein-protein interaction sites.


2014 ◽  
Vol 12 (01) ◽  
pp. 1450004 ◽  
Author(s):  
SLAVKA JAROMERSKA ◽  
PETR PRAUS ◽  
YOUNG-RAE CHO

Reconstruction of signaling pathways is crucial for understanding cellular mechanisms. A pathway is represented as a path of a signaling cascade involving a series of proteins to perform a particular function. Since a protein pair involved in signaling and response have a strong interaction, putative pathways can be detected from protein–protein interaction (PPI) networks. However, predicting directed pathways from the undirected genome-wide PPI networks has been challenging. We present a novel computational algorithm to efficiently predict signaling pathways from PPI networks given a starting protein and an ending protein. Our approach integrates topological analysis of PPI networks and semantic analysis of PPIs using Gene Ontology data. An advanced semantic similarity measure is used for weighting each interacting protein pair. Our distance-wise algorithm iteratively selects an adjacent protein from a PPI network to build a pathway based on a distance condition. On each iteration, the strength of a hypothetical path passing through a candidate edge is estimated by a local heuristic. We evaluate the performance by comparing the resultant paths to known signaling pathways on yeast. The results show that our approach has higher accuracy and efficiency than previous methods.


Sign in / Sign up

Export Citation Format

Share Document