INFERRING PROTEIN-PROTEIN INTERACTIONS FROM MESSENGER RNA EXPRESSION PROFILES WITH SVM

2005 ◽  
Vol 13 (03) ◽  
pp. 287-298 ◽  
Author(s):  
JUN CAI ◽  
YING HUANG ◽  
LIANG JI ◽  
YANDA LI

In post-genomic biology, researchers in the field of proteome focus their attention on the networks of protein interactions that control the lives of cells and organisms. Protein-protein interactions play a useful role in dynamic cellular machinery. In this paper, we developed a method to infer protein-protein interactions based on the theory of support vector machine (SVM). For a given pair of proteins, a new strategy of calculating cross-correlation function of mRNA expression profiles was used to encode SVM vectors. We compared the performance with other methods of inferring protein-protein interaction. Results suggested that, through five-fold cross validation, our SVM model achieved a good prediction. It enables us to show that expression profiles in transcription level can be used to distinguish physical or functional interactions of proteins as well as sequence contents. Lastly, we applied our SVM classifier to evaluate data quality of interaction data sets from four high-throughput experiments. The results show that high-throughput experiments sacrifice some accuracy in determination of interactions because of limitation of experiment technologies.

2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Mayumi Kamada ◽  
Yusuke Sakuma ◽  
Morihiro Hayashida ◽  
Tatsuya Akutsu

Proteins in living organisms express various important functions by interacting with other proteins and molecules. Therefore, many efforts have been made to investigate and predict protein-protein interactions (PPIs). Analysis of strengths of PPIs is also important because such strengths are involved in functionality of proteins. In this paper, we propose several feature space mappings from protein pairs using protein domain information to predict strengths of PPIs. Moreover, we perform computational experiments employing two machine learning methods, support vector regression (SVR) and relevance vector machine (RVM), for dataset obtained from biological experiments. The prediction results showed that both SVR and RVM with our proposed features outperformed the best existing method.


2019 ◽  
Author(s):  
Franziska Seeger ◽  
Anna Little ◽  
Yang Chen ◽  
Tina Woolf ◽  
Haiyan Cheng ◽  
...  

AbstractProtein-protein interactions regulate many essential biological processes and play an important role in health and disease. The process of experimentally charac-terizing protein residues that contribute the most to protein-protein interaction affin-ity and specificity is laborious. Thus, developing models that accurately characterize hotspots at protein-protein interfaces provides important information about how to inhibit therapeutically relevant protein-protein interactions. During the course of the ICERM WiSDM workshop 2017, we combined the KFC2a protein-protein interaction hotspot prediction features with Rosetta scoring function terms and interface filter metrics. A 2-way and 3-way forward selection strategy was employed to train support vector machine classifiers, as was a reverse feature elimination strategy. From these results, we identified subsets of KFC2a and Rosetta combined features that show improved performance over KFC2a features alone.


Author(s):  
Morihiro Hayashida ◽  
Tatsuya Akutsu

Protein-protein interactions play various essential roles in cellular systems. Many methods have been developed for inference of protein-protein interactions from protein sequence data. In this paper, the authors focus on methods based on domain-domain interactions, where a domain is defined as a region within a protein that either performs a specific function or constitutes a stable structural unit. In these methods, the probabilities of domain-domain interactions are inferred from known protein-protein interaction data and protein domain data, and then prediction of interactions is performed based on these probabilities and contents of domains of given proteins. This paper overviews several fundamental methods, which include association method, expectation maximization-based method, support vector machine-based method, linear programming-based method, and conditional random field-based method. This paper also reviews a simple evolutionary model of protein domains, which yields a scale-free distribution of protein domains. By combining with a domain-based protein interaction model, a scale-free distribution of protein-protein interaction networks is also derived.


2019 ◽  
Vol 16 (4) ◽  
pp. 263-274
Author(s):  
Chunhua Zhang ◽  
Sijia Guo ◽  
Jingbo Zhang ◽  
Xizi Jin ◽  
Yanwen Li ◽  
...  

Protein-protein interactions play an important role in biological and cellular processes. Biochemistry experiment is the most reliable approach identifying protein-protein interactions, but it is time-consuming and expensive. It is one of the important reasons why there is only a little fraction of complete protein-protein interactions networks available by far. Hence, accurate computational methods are in a great need to predict protein-protein interactions. In this work, we proposed a new weighted feature fusion algorithm for protein-protein interactions prediction, which extracts both protein sequence feature and evolutionary feature, for the purpose to use both global and local information to identify protein-protein interactions. The method employs maximum margin criterion for feature selection and support vector machine for classification. Experimental results on 11188 protein pairs showed that our method had better performance and robustness. Performed on the independent database of Helicobacter pylori, the method achieved 99.59% sensitivity and 93.66% prediction accuracy, while the maximum margin criterion is 88.03%. The results indicated that our method was more efficient in predicting protein-protein interaction compared with other six state-of-the-art peer methods.


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Wenzheng Ma ◽  
Yi Cao ◽  
Wenzheng Bao ◽  
Bin Yang ◽  
Yuehui Chen

The interactions between proteins play important roles in several organisms, and such issue can be involved in almost all activities in the cell. The research of protein-protein interactions (PPIs) can make a huge contribution to the prevention and treatment of diseases. Currently, many prediction methods based on machine learning have been proposed to predict PPIs. In this article, we propose a novel method ACT-SVM that can effectively predict PPIs. The ACT-SVM model maps protein sequences to digital features, performs feature extraction twice on the protein sequence to obtain vector A and descriptor CT, and combines them into a vector. Then, the feature vectors of the protein pair are merged as the input of the support vector machine (SVM) classifier. We utilize nonredundant H. pylori and human dataset to verify the prediction performance of our method. Finally, the proposed method has a prediction accuracy of 0.727897 for H. pylori data and a prediction accuracy of 0.838799 for human dataset. The results demonstrate that this method can be called a stable and reliable prediction model of PPIs.


2019 ◽  
Vol 15 ◽  
pp. 117693431987992 ◽  
Author(s):  
Ji-Yong An ◽  
Yong Zhou ◽  
Yu-Jun Zhao ◽  
Zi-Ji Yan

Background: Increasing evidence has indicated that protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of a cell. Thus, continuing to uncover potential PPIs is an important topic in the biomedical domain. Although various feature extraction methods with machine learning approaches have enhanced the prediction of PPIs. There remains room for improvement by developing novel and effective feature extraction methods and classifier approaches to identify PPIs. Method: In this study, we proposed a sequence-based feature extraction method called LCPSSMMF, which combined local coding position-specific scoring matrix (PSSM) with multifeatures fusion. First, we used a novel local coding method based on PSSM to build a new PSSM (CPSSM); the advantage of this method is that it incorporated global and local feature extraction, which can account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. Second, we adopted 2 different feature extraction methods (Local Average Group [LAG] and Bigram Probability [BP]) to capture multiple key feature information by employing the evolutionary information embedded in the CPSSM matrix. Finally, feature vectors were acquired by using multifeatures fusion method. Result: To evaluate the performance of the proposed feature extraction approach, we employed support vector machine (SVM) as a prediction classifier and applied this method to yeast and human PPI datasets. The prediction accuracies of LCPSSMMF were 93.43% and 90.41% on the yeast and human datasets, respectively. Moreover, we also compared the proposed method with the previous sequence-based approaches on the yeast datasets by using the same SVM classifier. The experimental results indicated that the performance of LCPSSMMF significantly exceeded that of several other state-of-the-art methods. It is proven that the LCPSSMMF approach can capture more local and global discriminatory information than almost all previous methods and can function remarkably well in identifying PPIs. To facilitate extensive research in future proteomics studies, we developed a LCPSSMMFSVM server, which is freely available for academic use at http://219.219.62.123:8888/LCPSSMMFSVM .


2014 ◽  
Vol 11 (90) ◽  
pp. 20130860 ◽  
Author(s):  
Véronique Hamon ◽  
Raphael Bourgeas ◽  
Pierre Ducrot ◽  
Isabelle Theret ◽  
Laura Xuereb ◽  
...  

Over the last 10 years, protein–protein interactions (PPIs) have shown increasing potential as new therapeutic targets. As a consequence, PPIs are today the most screened target class in high-throughput screening (HTS). The development of broad chemical libraries dedicated to these particular targets is essential; however, the chemical space associated with this ‘high-hanging fruit’ is still under debate. Here, we analyse the properties of 40 non-redundant small molecules present in the 2P2I database ( http://2p2idb.cnrs-mrs.fr/ ) to define a general profile of orthosteric inhibitors and propose an original protocol to filter general screening libraries using a support vector machine (SVM) with 11 standard D ragon molecular descriptors. The filtering protocol has been validated using external datasets from PubChem BioAssay and results from in-house screening campaigns . This external blind validation demonstrated the ability of the SVM model to reduce the size of the filtered chemical library by eliminating up to 96% of the compounds as well as enhancing the proportion of active compounds by up to a factor of 8. We believe that the resulting chemical space identified in this paper will provide the scientific community with a concrete support to search for PPI inhibitors during HTS campaigns.


2010 ◽  
Vol 9 ◽  
pp. CIN.S3899 ◽  
Author(s):  
Jianghui Xiong ◽  
Juan Liu ◽  
Simon Rayner ◽  
Yinghui Li ◽  
Shanguang Chen

Cancer is a disease associated with the deregulation of multiple gene networks. Microarray data has permitted researchers to identify gene panel markers for diagnosis or prognosis of cancer but these are not sufficient to make specific mechanistic assertions about phenotype switches. We propose a strategy to identify putative mechanisms of cancer phenotypes by protein-protein interactions (PPI). We first extracted the logic status of a PPI via the relative expression of the corresponding gene pair. The joint association of a gene pair on a cancer phenotype was calculated by entropy minimization and assessed using a support vector machine. A typical predictor is “ If Src high-expression, and Cav-1 low-expression, then cancer.“ We achieved 90% accuracy on test data with a majority of predictions associated with the MAPK pathway, focal adhesion, apoptosis and cell cycle. Our results can aid in the development of phenotype discrimination biomarkers and identification of putative therapeutic interference targets for drug development.


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Yu-An Huang ◽  
Zhu-Hong You ◽  
Xin Gao ◽  
Leon Wong ◽  
Lirong Wang

Increasing demand for the knowledge about protein-protein interactions (PPIs) is promoting the development of methods for predicting protein interaction network. Although high-throughput technologies have generated considerable PPIs data for various organisms, it has inevitable drawbacks such as high cost, time consumption, and inherently high false positive rate. For this reason, computational methods are drawing more and more attention for predicting PPIs. In this study, we report a computational method for predicting PPIs using the information of protein sequences. The main improvements come from adopting a novel protein sequence representation by using discrete cosine transform (DCT) on substitution matrix representation (SMR) and from using weighted sparse representation based classifier (WSRC). When performing on the PPIs dataset ofYeast,Human, andH. pylori, we got excellent results with average accuracies as high as 96.28%, 96.30%, and 86.74%, respectively, significantly better than previous methods. Promising results obtained have proven that the proposed method is feasible, robust, and powerful. To further evaluate the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier. Extensive experiments were also performed in which we usedYeastPPIs samples as training set to predict PPIs of other five species datasets.


2021 ◽  
Author(s):  
Jie. Pan ◽  
Zhu Hong. You ◽  
Li Ping. Li ◽  
Chang-Qing. Yu ◽  
Xin-Ke. Zhan

Abstract Protein-protein interactions (PPIs) in plants plays a significant role in plant biology and functional organization of cells. Although, a large amount of plant PPIs data have been generated by high-throughput techniques, but due to the complexity of plant cell, the PPIs pairs currently obtained by experimental methods cover only a small fraction of the complete plant PPIs network. In addition, the experimental approaches for identifying PPIs in plants are laborious, time-consuming, and costly. Hence, it is highly desirable to develop more efficient approaches to detect PPIs in plants. In this study, we present a novel computational model combining weighted sparse representation-based classifier (WSRC) with a novel inverse fast Fourier transform (IFFT) representation scheme which was adopted in position specific scoring matrix (PSSM) to extract features from plant protein sequence. When performed the proposed method on the plants PPIs dataset of Mazie, Rice and Arabidopsis thaliana (Arabidopsis), we achieved excellent results with high accuracies of 89.12%, 84.72% and 71.74%, respectively. To further assess the prediction performance of the proposed approach, we compared it with the state-of-art support vector machine (SVM) classifier. To the best of our knowledge, we are the first to employ protein sequences information to predict PPIs in plants. Experimental results demonstrate that the proposed method has a great potential to become a powerful tool for exploring the plant cell function.


Sign in / Sign up

Export Citation Format

Share Document