scholarly journals Protein-Protein Interactions Prediction Based on Graph Energy and Protein Sequence Information

Molecules ◽  
2020 ◽  
Vol 25 (8) ◽  
pp. 1841 ◽  
Author(s):  
Da Xu ◽  
Hanxiao Xu ◽  
Yusen Zhang ◽  
Wei Chen ◽  
Rui Gao

Identification of protein-protein interactions (PPIs) plays an essential role in the understanding of protein functions and cellular biological activities. However, the traditional experiment-based methods are time-consuming and laborious. Therefore, developing new reliable computational approaches has great practical significance for the identification of PPIs. In this paper, a novel prediction method is proposed for predicting PPIs using graph energy, named PPI-GE. Particularly, in the process of feature extraction, we designed two new feature extraction methods, the physicochemical graph energy based on the ionization equilibrium constant and isoelectric point and the contact graph energy based on the contact information of amino acids. The dipeptide composition method was used for order information of amino acids. After multi-information fusion, principal component analysis (PCA) was implemented for eliminating noise and a robust weighted sparse representation-based classification (WSRC) classifier was applied for sample classification. The prediction accuracies based on the five-fold cross-validation of the human, Helicobacter pylori (H. pylori), and yeast data sets were 99.49%, 97.15%, and 99.56%, respectively. In addition, in five independent data sets and two significant PPI networks, the comparative experimental results also demonstrate that PPI-GE obtained better performance than the compared methods.

2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Zheng Wang ◽  
Yang Li ◽  
Zhu-Hong You ◽  
Li-Ping Li ◽  
Xin-Ke Zhan ◽  
...  

Identifying protein-protein interactions (PPIs) plays a vital role in a number of biological activities such as signal transduction, transcriptional regulation, and apoptosis. Although advances in high-throughput technologies have generated large amounts of PPI data for different species, they only cover a small part of the entire PPI network. Furthermore, traditional experimental methods are generally expensive, time-consuming, tedious, and prone to high false-positive rates. Therefore, to overcome this problem, it is necessary to develop a novel computational method for predicting PPIs. In this article, we propose an efficient computational method to detect protein-protein interactions using only protein sequence information, which integrates the MatPCA feature extraction algorithm and the weighted sparse representation classifier. As a result, when predicting PPIs on yeast, human, and H. pylori datasets, the proposed method achieves superior prediction performance with an average accuracy of 94.55%, 97.48%, and 83.64%, respectively. These experimental results further illustrate that the proposed method is reliable and robust in predicting PPIs, which can be regarded as a useful complement to the experimental method.


2019 ◽  
Vol 20 (S16) ◽  
Author(s):  
Da Zhang ◽  
Mansur Kabuka

Abstract Background Protein-protein interactions(PPIs) engage in dynamic pathological and biological procedures constantly in our life. Thus, it is crucial to comprehend the PPIs thoroughly such that we are able to illuminate the disease occurrence, achieve the optimal drug-target therapeutic effect and describe the protein complex structures. However, compared to the protein sequences obtainable from various species and organisms, the number of revealed protein-protein interactions is relatively limited. To address this dilemma, lots of research endeavor have investigated in it to facilitate the discovery of novel PPIs. Among these methods, PPI prediction techniques that merely rely on protein sequence data are more widespread than other methods which require extensive biological domain knowledge. Results In this paper, we propose a multi-modal deep representation learning structure by incorporating protein physicochemical features with the graph topological features from the PPI networks. Specifically, our method not only bears in mind the protein sequence information but also discerns the topological representations for each protein node in the PPI networks. In our paper, we construct a stacked auto-encoder architecture together with a continuous bag-of-words (CBOW) model based on generated metapaths to study the PPI predictions. Following by that, we utilize the supervised deep neural networks to identify the PPIs and classify the protein families. The PPI prediction accuracy for eight species ranged from 96.76% to 99.77%, which signifies that our multi-modal deep representation learning framework achieves superior performance compared to other computational methods. Conclusion To the best of our knowledge, this is the first multi-modal deep representation learning framework for examining the PPI networks.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Kanchan Jha ◽  
Sriparna Saha

Abstract Protein is the primary building block of living organisms. It interacts with other proteins and is then involved in various biological processes. Protein–protein interactions (PPIs) help in predicting and hence help in understanding the functionality of the proteins, causes and growth of diseases, and designing new drugs. However, there is a vast gap between the available protein sequences and the identification of protein–protein interactions. To bridge this gap, researchers proposed several computational methods to reveal the interactions between proteins. These methods merely depend on sequence-based information of proteins. With the advancement of technology, different types of information related to proteins are available such as 3D structure information. Nowadays, deep learning techniques are adopted successfully in various domains, including bioinformatics. So, current work focuses on the utilization of different modalities, such as 3D structures and sequence-based information of proteins, and deep learning algorithms to predict PPIs. The proposed approach is divided into several phases. We first get several illustrations of proteins using their 3D coordinates information, and three attributes, such as hydropathy index, isoelectric point, and charge of amino acids. Amino acids are the building blocks of proteins. A pre-trained ResNet50 model, a subclass of a convolutional neural network, is utilized to extract features from these representations of proteins. Autocovariance and conjoint triad are two widely used sequence-based methods to encode proteins, which are used here as another modality of protein sequences. A stacked autoencoder is utilized to get the compact form of sequence-based information. Finally, the features obtained from different modalities are concatenated in pairs and fed into the classifier to predict labels for protein pairs. We have experimented on the human PPIs dataset and Saccharomyces cerevisiae PPIs dataset and compared our results with the state-of-the-art deep-learning-based classifiers. The results achieved by the proposed method are superior to those obtained by the existing methods. Extensive experimentations on different datasets indicate that our approach to learning and combining features from two different modalities is useful in PPI prediction.


2019 ◽  
Author(s):  
Miguel Andrade ◽  
Camila Pontes ◽  
Werner Treptow

ABSTRACTHere, we investigate the contributions of coevolutive, evolutive and stochastic information in determining protein-protein interactions (PPIs) based on primary sequences of two interacting protein families A and B. Specifically, under the assumption that coevolutive information is imprinted on the interacting amino acids of two proteins in contrast to other (evolutive and stochastic) sources spread over their sequences, we dissect those contributions in terms of compensatory mutations at physically-coupled and uncoupled amino acids of A and B. We find that physically-coupled amino-acids at short range distances store the largest per-contact mutual information content, with a significant fraction of that content resulting from coevolutive sources alone. The information stored in coupled amino acids is shown further to discriminate multi-sequence alignments (MSAs) with the largest expectation fraction of PPI matches – a conclusion that holds against various definitions of intermolecular contacts and binding modes. When compared to the informational content resulting from evolution at long-range interactions, the mutual information in physically-coupled amino-acids is the strongest signal to distinguish PPIs derived from cospeciation and likely, the unique indication in case of molecular coevolution in independent genomes as the evolutive information must vanish for uncorrelated proteins.SIGNIFICANCEThe problem of predicting protein-protein interactions (PPIs) based on multi-sequence alignments (MSAs) appears not completely resolved to date. In previous studies, one or more sources of information were taken into account not clarifying the isolated contributions of coevolutive, evolutive and stochastic information in resolving the problem. By benefiting from data sets made available in the sequence- and structure-rich era, we revisit the field to show that physically-coupled amino-acids of proteins store the largest (per contact) information content to discriminate MSAs with the largest expectation fraction of PPI matches – a result that should guide new developments in the field, aiming at characterizing protein interactions in general.


BMC Genomics ◽  
2019 ◽  
Vol 20 (S9) ◽  
Author(s):  
Alhadi Bustamam ◽  
Mohamad I. S. Musti ◽  
Susilo Hartomo ◽  
Shirley Aprilia ◽  
Patuan P. Tampubolon ◽  
...  

Abstract Background There are two significant problems associated with predicting protein-protein interactions using the sequences of amino acids. The first problem is representing each sequence as a feature vector, and the second is designing a model that can identify the protein interactions. Thus, effective feature extraction methods can lead to improved model performance. In this study, we used two types of feature extraction methods—global encoding and pseudo-substitution matrix representation (PseudoSMR)—to represent the sequences of amino acids in human proteins and Human Immunodeficiency Virus type 1 (HIV-1) to address the classification problem of predicting protein-protein interactions. We also compared principal component analysis (PCA) with independent principal component analysis (IPCA) as methods for transforming Rotation Forest. Results The results show that using global encoding and PseudoSMR as a feature extraction method successfully represents the amino acid sequence for the Rotation Forest classifier with PCA or with IPCA. This can be seen from the comparison of the results of evaluation metrics, which were >73% across the six different parameters. The accuracy of both methods was >74%. The results for the other model performance criteria, such as sensitivity, specificity, precision, and F1-score, were all >73%. The data used in this study can be accessed using the following link: https://www.dsc.ui.ac.id/research/amino-acid-pred/. Conclusions Both global encoding and PseudoSMR can successfully represent the sequences of amino acids. Rotation Forest (PCA) performed better than Rotation Forest (IPCA) in terms of predicting protein-protein interactions between HIV-1 and human proteins. Both the Rotation Forest (PCA) classifier and the Rotation Forest IPCA classifier performed better than other classifiers, such as Gradient Boosting, K-Nearest Neighbor, Logistic Regression, Random Forest, and Support Vector Machine (SVM). Rotation Forest (PCA) and Rotation Forest (IPCA) have accuracy, sensitivity, specificity, precision, and F1-score values >70% while the other classifiers have values <70%.


Entropy ◽  
2019 ◽  
Vol 21 (11) ◽  
pp. 1090 ◽  
Author(s):  
Edwin Rodriguez Horta ◽  
Pierre Barrat-Charlaix ◽  
Martin Weigt

Global coevolutionary models of protein families have become increasingly popular due to their capacity to predict residue–residue contacts from sequence information, but also to predict fitness effects of amino acid substitutions or to infer protein–protein interactions. The central idea in these models is to construct a probability distribution, a Potts model, that reproduces single and pairwise frequencies of amino acids found in natural sequences of the protein family. This approach treats sequences from the family as independent samples, completely ignoring phylogenetic relations between them. This simplification is known to lead to potentially biased estimates of the parameters of the model, decreasing their biological relevance. Current workarounds for this problem, such as reweighting sequences, are poorly understood and not principled. Here, we propose an inference scheme that takes the phylogeny of a protein family into account in order to correct biases in estimating the frequencies of amino acids. Using artificial data, we show that a Potts model inferred using these corrected frequencies performs better in predicting contacts and fitness effect of mutations. First, only partially successful tests on real protein data are presented, too.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Sun Sook Chung ◽  
Joseph C F Ng ◽  
Anna Laddach ◽  
N Shaun B Thomas ◽  
Franca Fraternali

Abstract Direct drug targeting of mutated proteins in cancer is not always possible and efficacy can be nullified by compensating protein–protein interactions (PPIs). Here, we establish an in silico pipeline to identify specific PPI sub-networks containing mutated proteins as potential targets, which we apply to mutation data of four different leukaemias. Our method is based on extracting cyclic interactions of a small number of proteins topologically and functionally linked in the Protein–Protein Interaction Network (PPIN), which we call short loop network motifs (SLM). We uncover a new property of PPINs named ‘short loop commonality’ to measure indirect PPIs occurring via common SLM interactions. This detects ‘modules’ of PPI networks enriched with annotated biological functions of proteins containing mutation hotspots, exemplified by FLT3 and other receptor tyrosine kinase proteins. We further identify functional dependency or mutual exclusivity of short loop commonality pairs in large-scale cellular CRISPR–Cas9 knockout screening data. Our pipeline provides a new strategy for identifying new therapeutic targets for drug discovery.


2018 ◽  
Vol 14 ◽  
pp. 2881-2896 ◽  
Author(s):  
Laura Carro

Antibiotics are potent pharmacological weapons against bacterial infections; however, the growing antibiotic resistance of microorganisms is compromising the efficacy of the currently available pharmacotherapies. Even though antimicrobial resistance is not a new problem, antibiotic development has failed to match the growth of resistant pathogens and hence, it is highly critical to discover new anti-infective drugs with novel mechanisms of action which will help reducing the burden of multidrug-resistant microorganisms. Protein–protein interactions (PPIs) are involved in a myriad of vital cellular processes and have become an attractive target to treat diseases. Therefore, targeting PPI networks in bacteria may offer a new and unconventional point of intervention to develop novel anti-infective drugs which can combat the ever-increasing rate of multidrug-resistant bacteria. This review describes the progress achieved towards the discovery of molecules that disrupt PPI systems in bacteria for which inhibitors have been identified and whose targets could represent an alternative lead discovery strategy to obtain new anti-infective molecules.


2021 ◽  
Vol 12 ◽  
Author(s):  
Peng Wang ◽  
Yuanyuan Shi ◽  
Yadong Li ◽  
Lili Zhang ◽  
Sihao Qu ◽  
...  

Background: Pulmonary Fibrosis (PF) is an interstitial lung disease characterized by excessive accumulation of extracellular matrix in the lungs, which disrupts the structure and gas exchange of the alveoli. There are only two approved therapies for PF, nintedanib (Nib) and pirfenidone. Therefore, the use of Chinese medicine for PF is attracting attention. Tianlongkechuanling (TL) is an effective Chinese formula that has been applied clinically to alleviate PF, which can enhance lung function and quality of life.Purpose: The potential effects and specific mechanisms of TL have not been fully explored, yet. In the present study, proteomics was performed to explore the therapeutic protein targets of TL on Bleomycin (BLM)-induced Pulmonary Fibrosis.Method: BLM-induced PF mice models were established. Hematoxylineosin staining and Masson staining were used to analyze histopathological changes and collagen deposition. To screen the differential proteins expression between the Control, BLM, BLM + TL and BLM + Nib (BLM + nintedanib) groups, quantitative proteomics was performed using tandem mass tag (TMT) labeling with nanoLC-MS/MS [nano liquid chromatographymass spectrometry]). Changes in the profiles of the expressed proteins were analyzed using the bioinformatics tools Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The protein–protein interactions (PPI) were established by STRING. Expressions of α-smooth muscle actin (α-SMA), Collagen I (Col1a1), Fibronectin (Fn1) and enzymes in arginase-ornithine pathway were detected by Western blot or RT-PCR.Result: TL treatments significantly ameliorated BLM-induced collagen deposition in lung tissues. Moreover, TL can inhibit the protein expressions of α-SMA and the mRNA expressions of Col1a1 and Fn1. Using TMT technology, we observed 253 differentially expressed proteins related to PPI networks and involved different KEGG pathways. Arginase-ornithine pathway is highly significant. The expression of arginase1 (Arg1), carbamoyltransferase (OTC), carbamoy-phosphate synthase (CPS1), argininosuccinate synthase (ASS1), ornithine aminotransferase (OAT) argininosuccinate lyase (ASL) and inducible nitric oxide synthase (iNOS) was significantly decreased after TL treatments.Conclusion: Administration of TL in BLM-induced mice resulted in decreasing pulmonary fibrosis. Our findings propose that the down regulation of arginase-ornithine pathway expression with the reduction of arginase biosynthesis is a central mechanism and potential treatment for pulmonary fibrosis with the prevention of TL.


Sign in / Sign up

Export Citation Format

Share Document