A novel method to predict protein-protein interactions based on the information of protein sequence

Author(s):  
Zhu-Hong You ◽  
Zhong Ming ◽  
Haiyun Huang ◽  
Xiaogang Peng
2016 ◽  
Vol 5 (4) ◽  
pp. 93-98
Author(s):  
Wen Sun ◽  
Lin Han ◽  
Wenmao Xu ◽  
Yazhen Sun

AbstractObjective: The objective of this work is to search for a novel method to explore the disrupted pathways associated with periodontitis (PD) based on the network level.Methods: Firstly, the differential expression genes (DEGs) between PD patients and cognitively normal subjects were inferred based on LIMMA package. Then, the protein-protein interactions (PPI) in each pathway were explored by Empirical Bayesian (EB) co-expression program. Specifically, we determined the 100th weight value as the threshold value of the disrupted pathways of PPI by constructing the randomly model and confirmed the weight value of each pathway. Meanwhile, we dissected the disrupted pathways under the weight value > the threshold value. Pathways enrichment analyses of DEGs were carried out based on Expression Analysis Systematic Explored (EASE) test. Finally, the better method was selected based on the more rich and significant obtained pathways by comparing the two methods.Results: After the calculation of LIMMA package, we estimated 524 DEGs in all. Then we determined 0.115222 as the threshold value of the disrupted pathways of PPI. When the weight value>0.115222, there were 258 disrupted pathways of PPI enriched in. Additionally, we observed those 524 DEGs that were enriched in 4 pathways under EASE=0.1.Conclusion: We proposed a novel network method inferring the disrupted pathway for PD. The disrupted pathways might be underlying biomarkers for treatment associated with PD.


2019 ◽  
Vol 20 (S16) ◽  
Author(s):  
Da Zhang ◽  
Mansur Kabuka

Abstract Background Protein-protein interactions(PPIs) engage in dynamic pathological and biological procedures constantly in our life. Thus, it is crucial to comprehend the PPIs thoroughly such that we are able to illuminate the disease occurrence, achieve the optimal drug-target therapeutic effect and describe the protein complex structures. However, compared to the protein sequences obtainable from various species and organisms, the number of revealed protein-protein interactions is relatively limited. To address this dilemma, lots of research endeavor have investigated in it to facilitate the discovery of novel PPIs. Among these methods, PPI prediction techniques that merely rely on protein sequence data are more widespread than other methods which require extensive biological domain knowledge. Results In this paper, we propose a multi-modal deep representation learning structure by incorporating protein physicochemical features with the graph topological features from the PPI networks. Specifically, our method not only bears in mind the protein sequence information but also discerns the topological representations for each protein node in the PPI networks. In our paper, we construct a stacked auto-encoder architecture together with a continuous bag-of-words (CBOW) model based on generated metapaths to study the PPI predictions. Following by that, we utilize the supervised deep neural networks to identify the PPIs and classify the protein families. The PPI prediction accuracy for eight species ranged from 96.76% to 99.77%, which signifies that our multi-modal deep representation learning framework achieves superior performance compared to other computational methods. Conclusion To the best of our knowledge, this is the first multi-modal deep representation learning framework for examining the PPI networks.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yang Li ◽  
Zheng Wang ◽  
Li-Ping Li ◽  
Zhu-Hong You ◽  
Wen-Zhun Huang ◽  
...  

AbstractVarious biochemical functions of organisms are performed by protein–protein interactions (PPIs). Therefore, recognition of protein–protein interactions is very important for understanding most life activities, such as DNA replication and transcription, protein synthesis and secretion, signal transduction and metabolism. Although high-throughput technology makes it possible to generate large-scale PPIs data, it requires expensive cost of both time and labor, and leave a risk of high false positive rate. In order to formulate a more ingenious solution, biology community is looking for computational methods to quickly and efficiently discover massive protein interaction data. In this paper, we propose a computational method for predicting PPIs based on a fresh idea of combining orthogonal locality preserving projections (OLPP) and rotation forest (RoF) models, using protein sequence information. Specifically, the protein sequence is first converted into position-specific scoring matrices (PSSMs) containing protein evolutionary information by using the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then we characterize a protein as a fixed length feature vector by applying OLPP to PSSMs. Finally, we train an RoF classifier for the purpose of identifying non-interacting and interacting protein pairs. The proposed method yielded a significantly better results than existing methods, with 90.07% and 96.09% prediction accuracy on Yeast and Human datasets. Our experiment show the proposed method can serve as a useful tool to accelerate the process of solving key problems in proteomics.


2017 ◽  
Vol 45 (12) ◽  
pp. 7094-7105 ◽  
Author(s):  
Milana Frenkel-Morgenstern ◽  
Alessandro Gorohovski ◽  
Somnath Tagore ◽  
Vaishnovi Sekar ◽  
Miguel Vazquez ◽  
...  

2014 ◽  
Vol 12 (06) ◽  
pp. 1442008 ◽  
Author(s):  
Jung-Hsien Chiang ◽  
Jiun-Huang Ju

Protein–protein interactions (PPIs) are involved in the majority of biological processes. Identification of PPIs is therefore one of the key aims of biological research. Although there are many databases of PPIs, many other unidentified PPIs could be buried in the biomedical literature. Therefore, automated identification of PPIs from biomedical literature repositories could be used to discover otherwise hidden interactions. Search engines, such as Google, have been successfully applied to measure the relatedness among words. Inspired by such approaches, we propose a novel method to identify PPIs through semantic similarity measures among protein mentions. We define six semantic similarity measures as features based on the page counts retrieved from the MEDLINE database. A machine learning classifier, Random Forest, is trained using the above features. The proposed approach achieve an averaged micro-F of 71.28% and an averaged macro-F of 64.03% over five PPI corpora, an improvement over the results of using only the conventional co-occurrence feature (averaged micro-F of 68.79% and an averaged macro-F of 60.49%). A relation-word reinforcement further improves the averaged micro-F to 71.3% and averaged macro-F to 65.12%. Comparing the results of the current work with other studies on the AIMed corpus (ranging from 77.58% to 85.1% in micro-F, 62.18% to 76.27% in macro-F), we show that the proposed approach achieves micro-F of 81.88% and macro-F of 64.01% without the use of sophisticated feature extraction. Finally, we manually examine the newly discovered PPI pairs based on a literature review, and the results suggest that our approach could extract novel protein–protein interactions.


Cells ◽  
2019 ◽  
Vol 8 (2) ◽  
pp. 122 ◽  
Author(s):  
Yanbin Wang ◽  
Zhu-Hong You ◽  
Shan Yang ◽  
Xiao Li ◽  
Tong-Hai Jiang ◽  
...  

Many life activities and key functions in organisms are maintained by different types of protein–protein interactions (PPIs). In order to accelerate the discovery of PPIs for different species, many computational methods have been developed. Unfortunately, even though computational methods are constantly evolving, efficient methods for predicting PPIs from protein sequence information have not been found for many years due to limiting factors including both methodology and technology. Inspired by the similarity of biological sequences and languages, developing a biological language processing technology may provide a brand new theoretical perspective and feasible method for the study of biological sequences. In this paper, a pure biological language processing model is proposed for predicting protein–protein interactions only using a protein sequence. The model was constructed based on a feature representation method for biological sequences called bio-to-vector (Bio2Vec) and a convolution neural network (CNN). The Bio2Vec obtains protein sequence features by using a “bio-word” segmentation system and a word representation model used for learning the distributed representation for each “bio-word”. The Bio2Vec supplies a frame that allows researchers to consider the context information and implicit semantic information of a bio sequence. A remarkable improvement in PPIs prediction performance has been observed by using the proposed model compared with state-of-the-art methods. The presentation of this approach marks the start of “bio language processing technology,” which could cause a technological revolution and could be applied to improve the quality of predictions in other problems.


Sign in / Sign up

Export Citation Format

Share Document