Background:
Protein-Protein Interactions (PPIs) play a key role in various biological
processes. Many methods have been developed to predict protein-protein interactions and protein
interaction networks. However, many existing applications are limited, because of relying on a large
number of homology proteins and interaction marks.
Methods:
In this paper, we propose a novel integrated learning approach (RF-Ada-DF) with the
sequence-based feature representation, for identifying protein-protein interactions. Our method firstly
constructs a sequence-based feature vector to represent each pair of proteins, viaMultivariate Mutual
Information (MMI) and Normalized Moreau-Broto Autocorrelation (NMBAC). Then, we feed the 638-
dimentional features into an integrated learning model for judging interaction pairs and non-interaction
pairs. Furthermore, this integrated model embeds Random Forest in AdaBoost framework and turns
weak classifiers into a single strong classifier. Meanwhile, we also employ double fault detection in
order to suppress over-adaptation during the training process.
Results:
To evaluate the performance of our method, we conduct several comprehensive tests for PPIs
prediction. On the H. pyloridataset, our method achieves 88.16% accuracy and 87.68% sensitivity, the
accuracy of our method is increased by 0.57%. On the S. cerevisiaedataset, our method achieves
95.77% accuracy and 93.36% sensitivity, the accuracy of our method is increased by 0.76%. On the
Humandataset, our method achieves 98.16% accuracy and 96.80% sensitivity, the accuracy of our
method is increased by 0.6%. Experiments show that our method achieves better results than other
outstanding methods for sequence-based PPIs prediction. The datasets and codes are available at
https://github.com/guofei-tju/RF-Ada-DF.git.