Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence

Increasing demand for the knowledge about protein-protein interactions (PPIs) is promoting the development of methods for predicting protein interaction network. Although high-throughput technologies have generated considerable PPIs data for various organisms, it has inevitable drawbacks such as high cost, time consumption, and inherently high false positive rate. For this reason, computational methods are drawing more and more attention for predicting PPIs. In this study, we report a computational method for predicting PPIs using the information of protein sequences. The main improvements come from adopting a novel protein sequence representation by using discrete cosine transform (DCT) on substitution matrix representation (SMR) and from using weighted sparse representation based classifier (WSRC). When performing on the PPIs dataset ofYeast,Human, andH. pylori, we got excellent results with average accuracies as high as 96.28%, 96.30%, and 86.74%, respectively, significantly better than previous methods. Promising results obtained have proven that the proposed method is feasible, robust, and powerful. To further evaluate the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier. Extensive experiments were also performed in which we usedYeastPPIs samples as training set to predict PPIs of other five species datasets.

Download Full-text

Robust and accurate prediction of protein–protein interactions by exploiting evolutionary information

Scientific Reports ◽

10.1038/s41598-021-96265-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yang Li ◽

Zheng Wang ◽

Li-Ping Li ◽

Zhu-Hong You ◽

Wen-Zhun Huang ◽

...

Keyword(s):

Protein Interactions ◽

Protein Sequence ◽

Large Scale ◽

False Positive Rate ◽

Computational Method ◽

Evolutionary Information ◽

Local Alignment ◽

Protein Interaction Data ◽

Sequence Information ◽

Protein Protein Interactions

AbstractVarious biochemical functions of organisms are performed by protein–protein interactions (PPIs). Therefore, recognition of protein–protein interactions is very important for understanding most life activities, such as DNA replication and transcription, protein synthesis and secretion, signal transduction and metabolism. Although high-throughput technology makes it possible to generate large-scale PPIs data, it requires expensive cost of both time and labor, and leave a risk of high false positive rate. In order to formulate a more ingenious solution, biology community is looking for computational methods to quickly and efficiently discover massive protein interaction data. In this paper, we propose a computational method for predicting PPIs based on a fresh idea of combining orthogonal locality preserving projections (OLPP) and rotation forest (RoF) models, using protein sequence information. Specifically, the protein sequence is first converted into position-specific scoring matrices (PSSMs) containing protein evolutionary information by using the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then we characterize a protein as a fixed length feature vector by applying OLPP to PSSMs. Finally, we train an RoF classifier for the purpose of identifying non-interacting and interacting protein pairs. The proposed method yielded a significantly better results than existing methods, with 90.07% and 96.09% prediction accuracy on Yeast and Human datasets. Our experiment show the proposed method can serve as a useful tool to accelerate the process of solving key problems in proteomics.

Download Full-text

Prediction of Disease Comorbidity Using HeteSim Scores based on Multiple Heterogeneous Networks

Current Gene Therapy ◽

10.2174/1566523219666190917155959 ◽

2019 ◽

Vol 19 (4) ◽

pp. 232-241 ◽

Cited By ~ 5

Author(s):

Xuegong Chen ◽

Wanwan Shi ◽

Lei Deng

Keyword(s):

Protein Interactions ◽

Experimental Studies ◽

Treatment Strategies ◽

Computational Method ◽

Biological Information ◽

Support Vector ◽

Protein Protein Interactions ◽

Efficient Treatment ◽

Disease Associations ◽

Previous State

Background: Accumulating experimental studies have indicated that disease comorbidity causes additional pain to patients and leads to the failure of standard treatments compared to patients who have a single disease. Therefore, accurate prediction of potential comorbidity is essential to design more efficient treatment strategies. However, only a few disease comorbidities have been discovered in the clinic. Objective: In this work, we propose PCHS, an effective computational method for predicting disease comorbidity. Materials and Methods: We utilized the HeteSim measure to calculate the relatedness score for different disease pairs in the global heterogeneous network, which integrates six networks based on biological information, including disease-disease associations, drug-drug interactions, protein-protein interactions and associations among them. We built the prediction model using the Support Vector Machine (SVM) based on the HeteSim scores. Results and Conclusion: The results showed that PCHS performed significantly better than previous state-of-the-art approaches and achieved an AUC score of 0.90 in 10-fold cross-validation. Furthermore, some of our predictions have been verified in literatures, indicating the effectiveness of our method.

Download Full-text

INFERRING PROTEIN-PROTEIN INTERACTIONS FROM MESSENGER RNA EXPRESSION PROFILES WITH SVM

Journal of Biological System ◽

10.1142/s0218339005001525 ◽

2005 ◽

Vol 13 (03) ◽

pp. 287-298 ◽

Cited By ~ 1

Author(s):

JUN CAI ◽

YING HUANG ◽

LIANG JI ◽

YANDA LI

Keyword(s):

High Throughput ◽

Protein Interactions ◽

Messenger Rna ◽

Expression Profiles ◽

Support Vector ◽

Svm Classifier ◽

Good Prediction ◽

Protein Protein Interactions ◽

Protein Protein Interaction ◽

High Throughput Experiments

In post-genomic biology, researchers in the field of proteome focus their attention on the networks of protein interactions that control the lives of cells and organisms. Protein-protein interactions play a useful role in dynamic cellular machinery. In this paper, we developed a method to infer protein-protein interactions based on the theory of support vector machine (SVM). For a given pair of proteins, a new strategy of calculating cross-correlation function of mRNA expression profiles was used to encode SVM vectors. We compared the performance with other methods of inferring protein-protein interaction. Results suggested that, through five-fold cross validation, our SVM model achieved a good prediction. It enables us to show that expression profiles in transcription level can be used to distinguish physical or functional interactions of proteins as well as sequence contents. Lastly, we applied our SVM classifier to evaluate data quality of interaction data sets from four high-throughput experiments. The results show that high-throughput experiments sacrifice some accuracy in determination of interactions because of limitation of experiment technologies.

Download Full-text

ACT-SVM: Prediction of Protein-Protein Interactions Based on Support Vector Basis Model

Scientific Programming ◽

10.1155/2020/8866557 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Wenzheng Ma ◽

Yi Cao ◽

Wenzheng Bao ◽

Bin Yang ◽

Yuehui Chen

Keyword(s):

Protein Interactions ◽

Prediction Accuracy ◽

Support Vector ◽

Svm Classifier ◽

Protein Protein Interactions ◽

Svm Model ◽

Novel Method ◽

H Pylori ◽

Almost All ◽

Human Dataset

The interactions between proteins play important roles in several organisms, and such issue can be involved in almost all activities in the cell. The research of protein-protein interactions (PPIs) can make a huge contribution to the prevention and treatment of diseases. Currently, many prediction methods based on machine learning have been proposed to predict PPIs. In this article, we propose a novel method ACT-SVM that can effectively predict PPIs. The ACT-SVM model maps protein sequences to digital features, performs feature extraction twice on the protein sequence to obtain vector A and descriptor CT, and combines them into a vector. Then, the feature vectors of the protein pair are merged as the input of the support vector machine (SVM) classifier. We utilize nonredundant H. pylori and human dataset to verify the prediction performance of our method. Finally, the proposed method has a prediction accuracy of 0.727897 for H. pylori data and a prediction accuracy of 0.838799 for human dataset. The results demonstrate that this method can be called a stable and reliable prediction model of PPIs.

Download Full-text

Detecting Protein-Protein Interactions with a Novel Matrix-Based Protein Sequence Representation and Support Vector Machines

BioMed Research International ◽

10.1155/2015/867516 ◽

2015 ◽

Vol 2015 ◽

pp. 1-9 ◽

Cited By ~ 26

Author(s):

Zhu-Hong You ◽

Jianqiang Li ◽

Xin Gao ◽

Zhou He ◽

Lin Zhu ◽

...

Keyword(s):

Protein Interactions ◽

Protein Sequence ◽

Molecular Mechanisms ◽

Computational Approach ◽

Support Vector ◽

Biological Processes ◽

Protein Protein Interactions ◽

Technological Advances ◽

Vector Machines ◽

Sequence Representation

Proteins and their interactions lie at the heart of most underlying biological processes. Consequently, correct detection of protein-protein interactions (PPIs) is of fundamental importance to understand the molecular mechanisms in biological systems. Although the convenience brought by high-throughput experiment in technological advances makes it possible to detect a large amount of PPIs, the data generated through these methods is unreliable and may not be completely inclusive of all possible PPIs. Targeting at this problem, this study develops a novel computational approach to effectively detect the protein interactions. This approach is proposed based on a novel matrix-based representation of protein sequence combined with the algorithm of support vector machine (SVM), which fully considers the sequence order and dipeptide information of the protein primary sequence. When performed on yeast PPIs datasets, the proposed method can reach 90.06% prediction accuracy with 94.37% specificity at the sensitivity of 85.74%, indicating that this predictor is a useful tool to predict PPIs. Achieved results also demonstrate that our approach can be a helpful supplement for the interactions that have been detected experimentally.

Download Full-text

Prediction of Protein-Protein Interactions from Protein Sequences by Combining MatPCA Feature Extraction Algorithms and Weighted Sparse Representation Models

Mathematical Problems in Engineering ◽

10.1155/2020/5764060 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Zheng Wang ◽

Yang Li ◽

Zhu-Hong You ◽

Li-Ping Li ◽

Xin-Ke Zhan ◽

...

Keyword(s):

Feature Extraction ◽

Sparse Representation ◽

Protein Interactions ◽

Biological Activities ◽

Vital Role ◽

Computational Method ◽

Sequence Information ◽

Protein Protein Interactions ◽

Sparse Representation Classifier ◽

H Pylori

Identifying protein-protein interactions (PPIs) plays a vital role in a number of biological activities such as signal transduction, transcriptional regulation, and apoptosis. Although advances in high-throughput technologies have generated large amounts of PPI data for different species, they only cover a small part of the entire PPI network. Furthermore, traditional experimental methods are generally expensive, time-consuming, tedious, and prone to high false-positive rates. Therefore, to overcome this problem, it is necessary to develop a novel computational method for predicting PPIs. In this article, we propose an efficient computational method to detect protein-protein interactions using only protein sequence information, which integrates the MatPCA feature extraction algorithm and the weighted sparse representation classifier. As a result, when predicting PPIs on yeast, human, and H. pylori datasets, the proposed method achieves superior prediction performance with an average accuracy of 94.55%, 97.48%, and 83.64%, respectively. These experimental results further illustrate that the proposed method is reliable and robust in predicting PPIs, which can be regarded as a useful complement to the experimental method.

Download Full-text

An Efficient Feature Extraction Technique Based on Local Coding PSSM and Multifeatures Fusion for Predicting Protein-Protein Interactions

Evolutionary Bioinformatics ◽

10.1177/1176934319879920 ◽

2019 ◽

Vol 15 ◽

pp. 117693431987992 ◽

Cited By ~ 1

Author(s):

Ji-Yong An ◽

Yong Zhou ◽

Yu-Jun Zhao ◽

Zi-Ji Yan

Keyword(s):

Feature Extraction ◽

Protein Interactions ◽

Functional Organization ◽

Extraction Methods ◽

Amino Acid Sequences ◽

Evolutionary Information ◽

Support Vector ◽

Svm Classifier ◽

Protein Protein Interactions ◽

Local Coding

Background: Increasing evidence has indicated that protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of a cell. Thus, continuing to uncover potential PPIs is an important topic in the biomedical domain. Although various feature extraction methods with machine learning approaches have enhanced the prediction of PPIs. There remains room for improvement by developing novel and effective feature extraction methods and classifier approaches to identify PPIs. Method: In this study, we proposed a sequence-based feature extraction method called LCPSSMMF, which combined local coding position-specific scoring matrix (PSSM) with multifeatures fusion. First, we used a novel local coding method based on PSSM to build a new PSSM (CPSSM); the advantage of this method is that it incorporated global and local feature extraction, which can account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. Second, we adopted 2 different feature extraction methods (Local Average Group [LAG] and Bigram Probability [BP]) to capture multiple key feature information by employing the evolutionary information embedded in the CPSSM matrix. Finally, feature vectors were acquired by using multifeatures fusion method. Result: To evaluate the performance of the proposed feature extraction approach, we employed support vector machine (SVM) as a prediction classifier and applied this method to yeast and human PPI datasets. The prediction accuracies of LCPSSMMF were 93.43% and 90.41% on the yeast and human datasets, respectively. Moreover, we also compared the proposed method with the previous sequence-based approaches on the yeast datasets by using the same SVM classifier. The experimental results indicated that the performance of LCPSSMMF significantly exceeded that of several other state-of-the-art methods. It is proven that the LCPSSMMF approach can capture more local and global discriminatory information than almost all previous methods and can function remarkably well in identifying PPIs. To facilitate extensive research in future proteomics studies, we developed a LCPSSMMFSVM server, which is freely available for academic use at http://219.219.62.123:8888/LCPSSMMFSVM .

Download Full-text

Computational Prediction of Protein-Protein Interactions in Plants Using Only Sequence Information

10.21203/rs.3.rs-411601/v1 ◽

2021 ◽

Author(s):

Jie. Pan ◽

Zhu Hong. You ◽

Li Ping. Li ◽

Chang-Qing. Yu ◽

Xin-Ke. Zhan

Keyword(s):

Plant Cell ◽

Protein Interactions ◽

Cell Function ◽

Functional Organization ◽

Computational Prediction ◽

Support Vector ◽

Svm Classifier ◽

Sequence Information ◽

Protein Protein Interactions ◽

Representation Scheme

Abstract Protein-protein interactions (PPIs) in plants plays a significant role in plant biology and functional organization of cells. Although, a large amount of plant PPIs data have been generated by high-throughput techniques, but due to the complexity of plant cell, the PPIs pairs currently obtained by experimental methods cover only a small fraction of the complete plant PPIs network. In addition, the experimental approaches for identifying PPIs in plants are laborious, time-consuming, and costly. Hence, it is highly desirable to develop more efficient approaches to detect PPIs in plants. In this study, we present a novel computational model combining weighted sparse representation-based classifier (WSRC) with a novel inverse fast Fourier transform (IFFT) representation scheme which was adopted in position specific scoring matrix (PSSM) to extract features from plant protein sequence. When performed the proposed method on the plants PPIs dataset of Mazie, Rice and Arabidopsis thaliana (Arabidopsis), we achieved excellent results with high accuracies of 89.12%, 84.72% and 71.74%, respectively. To further assess the prediction performance of the proposed approach, we compared it with the state-of-art support vector machine (SVM) classifier. To the best of our knowledge, we are the first to employ protein sequences information to predict PPIs in plants. Experimental results demonstrate that the proposed method has a great potential to become a powerful tool for exploring the plant cell function.

Download Full-text

Prediction of Protein-Protein Interactions Based on Molecular Interface Features and the Support Vector Machine

Current Bioinformatics ◽

10.2174/1574893611308010003 ◽

2013 ◽

Vol 8 (1) ◽

pp. 3-8 ◽

Cited By ~ 1

Author(s):

Weiqiang Zhou ◽

Hong Yan ◽

Xiaodan Fan ◽

Quan Hao

Keyword(s):

Support Vector Machine ◽

Protein Interactions ◽

Support Vector ◽

Protein Protein Interactions

Download Full-text

Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences

BioMed Research International ◽

10.1155/2016/4783801 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 13

Author(s):

Ji-Yong An ◽

Fan-Rong Meng ◽

Zhu-Hong You ◽

Yu-Hong Fang ◽

Yu-Jun Zhao ◽

...

Keyword(s):

Protein Sequences ◽

Relevance Vector Machine ◽

Experimental Results ◽

Computational Method ◽

Support Vector ◽

Svm Classifier ◽

Local Phase ◽

Local Phase Quantization ◽

Phase Quantization ◽

Better Than

We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments onYeastandHumandatasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on theYeastdataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.

Download Full-text