DNN-PPI: A LARGE-SCALE PREDICTION OF PROTEIN–PROTEIN INTERACTIONS BASED ON DEEP NEURAL NETWORKS

Protein–protein interaction (PPI) is very important for various biological processes and has given rise to a series of prediction-computing methods. In spite of different computing methods in relation to PPI prediction, PPI network projects fail to perform on a large scale. Aiming at ensuring that PPI can be predicted effectively, we used a deep neural network (DNN) for the study of PPI prediction that is based on an amino acid sequence. We present a novel DNN-PPI model with an auto covariance (AC) descriptor and a conjoint triad (CT) descriptor for the prediction of PPI that is based only on the protein sequence information. The 10-fold cross-validation indicated that the best DNN-PPI model with CT achieved 97.65% accuracy, 98.96% recall and a 98.51% area under the curve (AUC). The model exhibits a prediction accuracy of 94.20–97.10% for other external datasets. All of these suggest the high validity of the proposed algorithm in relation to various species.

Download Full-text

Short loop functional commonality identified in leukaemia proteome highlights crucial protein sub-networks

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab010 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Sun Sook Chung ◽

Joseph C F Ng ◽

Anna Laddach ◽

N Shaun B Thomas ◽

Franca Fraternali

Keyword(s):

Protein Interactions ◽

Large Scale ◽

Interaction Network ◽

Protein Protein Interactions ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Short Loop ◽

New Strategy ◽

Loop Network ◽

Protein Protein Interaction Network

Abstract Direct drug targeting of mutated proteins in cancer is not always possible and efficacy can be nullified by compensating protein–protein interactions (PPIs). Here, we establish an in silico pipeline to identify specific PPI sub-networks containing mutated proteins as potential targets, which we apply to mutation data of four different leukaemias. Our method is based on extracting cyclic interactions of a small number of proteins topologically and functionally linked in the Protein–Protein Interaction Network (PPIN), which we call short loop network motifs (SLM). We uncover a new property of PPINs named ‘short loop commonality’ to measure indirect PPIs occurring via common SLM interactions. This detects ‘modules’ of PPI networks enriched with annotated biological functions of proteins containing mutation hotspots, exemplified by FLT3 and other receptor tyrosine kinase proteins. We further identify functional dependency or mutual exclusivity of short loop commonality pairs in large-scale cellular CRISPR–Cas9 knockout screening data. Our pipeline provides a new strategy for identifying new therapeutic targets for drug discovery.

Download Full-text

Multimodal deep representation learning for protein interaction identification and protein family classification

BMC Bioinformatics ◽

10.1186/s12859-019-3084-y ◽

2019 ◽

Vol 20 (S16) ◽

Cited By ~ 4

Author(s):

Da Zhang ◽

Mansur Kabuka

Keyword(s):

Protein Interactions ◽

Protein Sequence ◽

Representation Learning ◽

Superior Performance ◽

Sequence Information ◽

Protein Protein Interactions ◽

Learning Framework ◽

Topological Features ◽

Ppi Networks ◽

Ppi Prediction

Abstract Background Protein-protein interactions(PPIs) engage in dynamic pathological and biological procedures constantly in our life. Thus, it is crucial to comprehend the PPIs thoroughly such that we are able to illuminate the disease occurrence, achieve the optimal drug-target therapeutic effect and describe the protein complex structures. However, compared to the protein sequences obtainable from various species and organisms, the number of revealed protein-protein interactions is relatively limited. To address this dilemma, lots of research endeavor have investigated in it to facilitate the discovery of novel PPIs. Among these methods, PPI prediction techniques that merely rely on protein sequence data are more widespread than other methods which require extensive biological domain knowledge. Results In this paper, we propose a multi-modal deep representation learning structure by incorporating protein physicochemical features with the graph topological features from the PPI networks. Specifically, our method not only bears in mind the protein sequence information but also discerns the topological representations for each protein node in the PPI networks. In our paper, we construct a stacked auto-encoder architecture together with a continuous bag-of-words (CBOW) model based on generated metapaths to study the PPI predictions. Following by that, we utilize the supervised deep neural networks to identify the PPIs and classify the protein families. The PPI prediction accuracy for eight species ranged from 96.76% to 99.77%, which signifies that our multi-modal deep representation learning framework achieves superior performance compared to other computational methods. Conclusion To the best of our knowledge, this is the first multi-modal deep representation learning framework for examining the PPI networks.

Download Full-text

Robust and accurate prediction of protein–protein interactions by exploiting evolutionary information

Scientific Reports ◽

10.1038/s41598-021-96265-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yang Li ◽

Zheng Wang ◽

Li-Ping Li ◽

Zhu-Hong You ◽

Wen-Zhun Huang ◽

...

Keyword(s):

Protein Interactions ◽

Protein Sequence ◽

Large Scale ◽

False Positive Rate ◽

Computational Method ◽

Evolutionary Information ◽

Local Alignment ◽

Protein Interaction Data ◽

Sequence Information ◽

Protein Protein Interactions

AbstractVarious biochemical functions of organisms are performed by protein–protein interactions (PPIs). Therefore, recognition of protein–protein interactions is very important for understanding most life activities, such as DNA replication and transcription, protein synthesis and secretion, signal transduction and metabolism. Although high-throughput technology makes it possible to generate large-scale PPIs data, it requires expensive cost of both time and labor, and leave a risk of high false positive rate. In order to formulate a more ingenious solution, biology community is looking for computational methods to quickly and efficiently discover massive protein interaction data. In this paper, we propose a computational method for predicting PPIs based on a fresh idea of combining orthogonal locality preserving projections (OLPP) and rotation forest (RoF) models, using protein sequence information. Specifically, the protein sequence is first converted into position-specific scoring matrices (PSSMs) containing protein evolutionary information by using the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then we characterize a protein as a fixed length feature vector by applying OLPP to PSSMs. Finally, we train an RoF classifier for the purpose of identifying non-interacting and interacting protein pairs. The proposed method yielded a significantly better results than existing methods, with 90.07% and 96.09% prediction accuracy on Yeast and Human datasets. Our experiment show the proposed method can serve as a useful tool to accelerate the process of solving key problems in proteomics.

Download Full-text

On the structure of protein–protein interaction networks

Biochemical Society Transactions ◽

10.1042/bst0311491 ◽

2003 ◽

Vol 31 (6) ◽

pp. 1491-1496 ◽

Cited By ~ 47

Author(s):

A. Thomas ◽

R. Cannings ◽

N.A.M. Monk ◽

C. Cannings

Keyword(s):

Power Law ◽

Protein Interactions ◽

Large Scale ◽

Underlying Structure ◽

Protein Protein Interactions ◽

Yeast Two Hybrid ◽

Protein Protein Interaction ◽

Human Proteins ◽

Approximate Power ◽

Two Hybrid

We present a simple model for the underlying structure of protein–protein pairwise interaction graphs that is based on the way in which proteins attach to each other in experiments such as yeast two-hybrid assays. We show that data on the interactions of human proteins lend support to this model. The frequency of the number of connections per protein under this model does not follow a power law, in contrast to the reported behaviour of data from large-scale yeast two-hybrid screens of yeast protein–protein interactions. Sampling sub-graphs from the underlying graphs generated with our model, in a way analogous to the sampling performed in large-scale yeast two-hybrid searches, gives degree distributions that differ subtly from the power law and that fit the observed data better than the power law itself. Our results show that the observation of approximate power law behaviour in a sampled sub-graph does not imply that the underlying graph follows a power law.

Download Full-text

Mining for Candidate Genes Related to Pancreatic Cancer Using Protein-Protein Interactions and a Shortest Path Approach

BioMed Research International ◽

10.1155/2015/623121 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 9

Author(s):

Fei Yuan ◽

Yu-Hang Zhang ◽

Sibao Wan ◽

ShaoPeng Wang ◽

Xiang-Yin Kong

Keyword(s):

Pancreatic Cancer ◽

Candidate Genes ◽

Shortest Path ◽

Protein Interactions ◽

Large Scale ◽

Permutation Test ◽

Computational Method ◽

Large Network ◽

Protein Protein Interactions ◽

Protein Protein Interaction

Pancreatic cancer (PC) is a highly malignant tumor derived from pancreas tissue and is one of the leading causes of death from cancer. Its molecular mechanism has been partially revealed by validating its oncogenes and tumor suppressor genes; however, the available data remain insufficient for medical workers to design effective treatments. Large-scale identification of PC-related genes can promote studies on PC. In this study, we propose a computational method for mining new candidate PC-related genes. A large network was constructed using protein-protein interaction information, and a shortest path approach was applied to mine new candidate genes based on validated PC-related genes. In addition, a permutation test was adopted to further select key candidate genes. Finally, for all discovered candidate genes, the likelihood that the genes are novel PC-related genes is discussed based on their currently known functions.

Download Full-text

Parallel PPI Prediction Performance Study on HPC Platforms

Journal of Circuits System and Computers ◽

10.1142/s0218126615500747 ◽

2015 ◽

Vol 24 (05) ◽

pp. 1550074 ◽

Cited By ~ 1

Author(s):

Ali A. El-Moursy ◽

Wael S. Afifi ◽

Fadi N. Sibai ◽

Salwa M. Nassar

Keyword(s):

Protein Interactions ◽

Execution Time ◽

High Performance ◽

Large Scale ◽

Parallel Implementation ◽

Prediction Method ◽

Protein Protein Interactions ◽

Performance Study ◽

Ppi Prediction ◽

Performance Computing

STRIKE is an algorithm which predicts protein–protein interactions (PPIs) and determines that proteins interact if they contain similar substrings of amino acids. Unlike other methods for PPI prediction, STRIKE is able to achieve reasonable improvement over the existing PPI prediction methods. Although its high accuracy as a PPI prediction method, STRIKE consumes a large execution time and hence it is considered to be a compute-intensive application. In this paper, we develop and implement a parallel STRIKE algorithm for high-performance computing (HPC) systems. Using a large-scale cluster, the execution time of the parallel implementation of this bioinformatics algorithm was reduced from about a week on a serial uniprocessor machine to about 16.5 h on 16 computing nodes, down to about 2 h on 128 parallel nodes. Communication overheads between nodes are thoroughly studied.

Download Full-text

Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms

Molecules ◽

10.3390/molecules27010041 ◽

2021 ◽

Vol 27 (1) ◽

pp. 41

Author(s):

Brandan Dunham ◽

Madhavi K. Ganapathiraju

Keyword(s):

Protein Interactions ◽

Computational Prediction ◽

Protein Protein Interactions ◽

Scale Free ◽

Protein Protein Interaction ◽

Resource Limitations ◽

Positive Class ◽

Benchmark Datasets ◽

Ppi Prediction ◽

Benchmark Evaluation

Protein–protein interactions (PPIs) perform various functions and regulate processes throughout cells. Knowledge of the full network of PPIs is vital to biomedical research, but most of the PPIs are still unknown. As it is infeasible to discover all of them experimentally due to technical and resource limitations, computational prediction of PPIs is essential and accurately assessing the performance of algorithms is required before further application or translation. However, many published methods compose their evaluation datasets incorrectly, using a higher proportion of positive class data than occuring naturally, leading to exaggerated performance. We re-implemented various published algorithms and evaluated them on datasets with realistic data compositions and found that their performance is overstated in original publications; with several methods outperformed by our control models built on ‘illogical’ and random number features. We conclude that these methods are influenced by an over-characterization of some proteins in the literature and due to scale-free nature of PPI network and that they fail when tested on all possible protein pairs. Additionally, we found that sequence-only-based algorithms performed worse than those that employ functional and expression features. We present a benchmark evaluation of many published algorithms for PPI prediction. The source code of our implementations and the benchmark datasets created here are made available in open source.

Download Full-text

Development of a cell-free split-luciferase biochemical assay as a tool for screening for inhibitors of challenging protein-protein interaction targets

Wellcome Open Research ◽

10.12688/wellcomeopenres.15675.1 ◽

2020 ◽

Vol 5 ◽

pp. 20

Author(s):

Rachel Cooley ◽

Neesha Kara ◽

Ning Sze Hui ◽

Jonathan Tart ◽

Chloë Roustan ◽

...

Keyword(s):

Protein Interaction ◽

Drug Screening ◽

Protein Interactions ◽

Mammalian Cells ◽

Large Scale ◽

Cost Effective ◽

Biochemical Assay ◽

Protein Protein Interactions ◽

Protein Protein Interaction ◽

Phosphoinositide 3 Kinase

Targeting the interaction of proteins with weak binding affinities or low solubility represents a particular challenge for drug screening. The NanoLucâ ® Binary Technology (NanoBiTâ ®) was originally developed to detect protein-protein interactions in live mammalian cells. Here we report the successful translation of the NanoBit cellular assay into a biochemical, cell-free format using mammalian cell lysates. We show that the assay is suitable for the detection of both strong and weak protein interactions such as those involving the binding of RAS oncoproteins to either RAF or phosphoinositide 3-kinase (PI3K) effectors respectively, and that it is also effective for the study of poorly soluble protein domains such as the RAS binding domain of PI3K. Furthermore, the RAS interaction assay is sensitive and responds to both strong and weak RAS inhibitors. Our data show that the assay is robust, reproducible, cost-effective, and can be adapted for small and large-scale screening approaches. The NanoBit Biochemical Assay offers an attractive tool for drug screening against challenging protein-protein interaction targets, including the interaction of RAS with PI3K.

Download Full-text

Augmenting protein network embeddings with sequence information

10.1101/730481 ◽

2019 ◽

Cited By ~ 2

Author(s):

Hassan Kané ◽

Mohamed Coulibali ◽

Ali Abdalla ◽

Pelkins Ajanoh

Keyword(s):

Protein Interactions ◽

Protein Function ◽

Quaternary Structure ◽

Protein Function Prediction ◽

Representation Learning ◽

Specific Protein ◽

Sequence Information ◽

Protein Protein Interactions ◽

Tissue Specific ◽

Protein Protein Interaction

ABSTRACTComputational methods that infer the function of proteins are key to understanding life at the molecular level. In recent years, representation learning has emerged as a powerful paradigm to discover new patterns among entities as varied as images, words, speech, molecules. In typical representation learning, there is only one source of data or one level of abstraction at which the learned representation occurs. However, proteins can be described by their primary, secondary, tertiary, and quaternary structure or even as nodes in protein-protein interaction networks. Given that protein function is an emergent property of all these levels of interactions in this work, we learn joint representations from both amino acid sequence and multilayer networks representing tissue-specific protein-protein interactions. Using these hybrid representations, we show that simple machine learning models trained using these hybrid representations outperform existing network-based methods on the task of tissue-specific protein function prediction on 13 out of 13 tissues. Furthermore, these representations outperform existing ones by 14% on average.

Download Full-text

Development of a cell-free split-luciferase biochemical assay as a tool for screening for inhibitors of challenging protein-protein interaction targets

Wellcome Open Research ◽

10.12688/wellcomeopenres.15675.2 ◽

2020 ◽

Vol 5 ◽

pp. 20

Author(s):

Rachel Cooley ◽

Neesha Kara ◽

Ning Sze Hui ◽

Jonathan Tart ◽

Chloë Roustan ◽

...

Keyword(s):

Protein Interaction ◽

Drug Screening ◽

Protein Interactions ◽

Mammalian Cells ◽

Large Scale ◽

Cost Effective ◽

Biochemical Assay ◽

Protein Protein Interactions ◽

Protein Protein Interaction ◽

Phosphoinositide 3 Kinase

Targeting the interaction of proteins with weak binding affinities or low solubility represents a particular challenge for drug screening. The NanoLuc ® Binary Technology (NanoBiT ®) was originally developed to detect protein-protein interactions in live mammalian cells. Here we report the successful translation of the NanoBit cellular assay into a biochemical, cell-free format using mammalian cell lysates. We show that the assay is suitable for the detection of both strong and weak protein interactions such as those involving the binding of RAS oncoproteins to either RAF or phosphoinositide 3-kinase (PI3K) effectors respectively, and that it is also effective for the study of poorly soluble protein domains such as the RAS binding domain of PI3K. Furthermore, the RAS interaction assay is sensitive and responds to both strong and weak RAS inhibitors. Our data show that the assay is robust, reproducible, cost-effective, and can be adapted for small and large-scale screening approaches. The NanoBit Biochemical Assay offers an attractive tool for drug screening against challenging protein-protein interaction targets, including the interaction of RAS with PI3K.

Download Full-text