Large-scale identification of human protein function using topological features of interaction network

Human protein-protein interaction networks are critical to understanding cell biology and interpreting genetic and genomic data, but are challenging to produce in individual large-scale experiments. We describe a general computational framework that through data integration and quality control provides a scored human protein-protein interaction network (InWeb_IM). Juxtaposed with five comparable resources, InWeb_IM has 2.8 times more interactions (~585K) and a superior functional signal showing that the added interactions reflect real cellular biology. InWeb_IM is a versatile resource for accurate and cost-efficient functional interpretation of massive genomic datasets illustrated by annotating candidate genes from >4,700 cancer genomes and genes involved in neuropsychiatric diseases.

Download Full-text

Application of dynamic expansion tree for finding large network motifs in biological networks

PeerJ ◽

10.7717/peerj.6917 ◽

2019 ◽

Vol 7 ◽

pp. e6917 ◽

Cited By ~ 1

Author(s):

Sabyasachi Patra ◽

Anjali Mohapatra

Keyword(s):

Biological Networks ◽

Protein Function ◽

Large Scale ◽

Network Motif ◽

Graph Isomorphism ◽

Interaction Network ◽

Motif Finding ◽

Network Motifs ◽

Large Network ◽

Scalable Network

Network motifs play an important role in the structural analysis of biological networks. Identification of such network motifs leads to many important applications such as understanding the modularity and the large-scale structure of biological networks, classification of networks into super-families, and protein function annotation. However, identification of large network motifs is a challenging task as it involves the graph isomorphism problem. Although this problem has been studied extensively in the literature using different computational approaches, still there is a lot of scope for improvement. Motivated by the challenges involved in this field, an efficient and scalable network motif finding algorithm using a dynamic expansion tree is proposed. The novelty of the proposed algorithm is that it avoids computationally expensive graph isomorphism tests and overcomes the space limitation of the static expansion tree (SET) which makes it enable to find large motifs. In this algorithm, the embeddings corresponding to a child node of the expansion tree are obtained from the embeddings of a parent node, either by adding a vertex or by adding an edge. This process does not involve any graph isomorphism check. The time complexity of vertex addition and edge addition are O(n) and O(1), respectively. The growth of a dynamic expansion tree (DET) depends on the availability of patterns in the target network. Pruning of branches in the DET significantly reduces the space requirement of the SET. The proposed algorithm has been tested on a protein–protein interaction network obtained from the MINT database. The proposed algorithm is able to identify large network motifs faster than most of the existing motif finding algorithms.

Download Full-text

Discovering disease-genes by topological features in human protein–protein interaction network

Bioinformatics ◽

10.1093/bioinformatics/btl467 ◽

2006 ◽

Vol 22 (22) ◽

pp. 2800-2805 ◽

Cited By ~ 265

Author(s):

Jianzhen Xu ◽

Yongjin Li

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Interaction Network ◽

Human Protein ◽

Disease Genes ◽

Protein Protein Interaction ◽

Topological Features ◽

Protein Protein Interaction Network

Download Full-text

Protein Interaction Network-based Deep Learning Framework for Identifying Disease-Associated Human Proteins

10.1101/2021.06.03.446973 ◽

2021 ◽

Author(s):

Barnali Das ◽

Pralay Mitra

Keyword(s):

Deep Learning ◽

Protein Interaction ◽

Protein Interaction Network ◽

Large Scale ◽

Protein Identification ◽

Interaction Network ◽

Host Protein ◽

Convolutional Network ◽

Topological Features ◽

Disease Protein

Infectious diseases in humans appear to be one of the most primary public health issues. Identification of novel disease-associated proteins will furnish an efficient recognition of the novel therapeutic targets. Here, we develop a Graph Convolutional Network (GCN)-based model called PINDeL to identify the disease-associated host proteins by integrating the human Protein Locality Graph and its corresponding topological features. Because of the amalgamation of GCN with the protein interaction network, PINDeL achieves the highest accuracy of 83.45% while AUROC and AUPRC values are 0.90 and 0.88, respectively. With high accuracy, recall, F1-score, specificity, AUROC, and AUPRC, PINDeL outperforms other existing machine-learning and deep-learning techniques for disease gene/protein identification in humans. Application of PINDeL on an independent dataset of 24320 proteins, which are not used for training, validation, or testing purposes, predicts 6448 new disease-protein associations of which we verify 3196 disease-proteins through experimental evidence like disease ontology, Gene Ontology, and KEGG pathway enrichment analyses. Our investigation informs that experimentally-verified 748 proteins are indeed responsible for pathogen-host protein interactions of which 22 disease-proteins share their association with multiple diseases such as cancer, aging, chem-dependency, pharmacogenomics, normal variation, infection, and immune-related diseases. This unique Graph Convolution Network-based prediction model is of utmost use in large-scale disease-protein association prediction and hence, will provide crucial insights on disease pathogenesis and will further aid in developing novel therapeutics.

Download Full-text

Prediction of Gastric Cancer-Related Proteins Based on Graph Fusion Method

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.739715 ◽

2021 ◽

Vol 9 ◽

Author(s):

Hao Zhang ◽

Ruisi Xu ◽

Meng Ding ◽

Ying Zhang

Keyword(s):

Gastric Cancer ◽

Early Diagnosis ◽

High Speed ◽

Large Scale ◽

Interaction Network ◽

Proteomics Data ◽

Speed Increase ◽

Topological Features ◽

Potential Biomarker ◽

Related Proteins

Gastric cancer is a common malignant tumor of the digestive system with no specific symptoms. Due to the limited knowledge of pathogenesis, patients are usually diagnosed in advanced stage and do not have effective treatment methods. Proteome has unique tissue and time specificity and can reflect the influence of external factors that has become a potential biomarker for early diagnosis. Therefore, discovering gastric cancer-related proteins could greatly help researchers design drugs and develop an early diagnosis kit. However, identifying gastric cancer-related proteins by biological experiments is time- and money-consuming. With the high speed increase of data, it has become a hot issue to mine the knowledge of proteomics data on a large scale through computational methods. Based on the hypothesis that the stronger the association between the two proteins, the more likely they are to be associated with the same disease, in this paper, we constructed both disease similarity network and protein interaction network. Then, Graph Convolutional Networks (GCN) was applied to extract topological features of these networks. Finally, Xgboost was used to identify the relationship between proteins and gastric cancer. Results of 10-cross validation experiments show high area under the curve (AUC) (0.85) and area under the precision recall (AUPR) curve (0.76) of our method, which proves the effectiveness of our method.

Download Full-text

Human Protein Structural Interaction Network: Domain Effects on Network Topology and Protein Function*

PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS ◽

10.3724/sp.j.1206.2009.00640 ◽

2010 ◽

Vol 37 (5) ◽

pp. 517-526 ◽

Cited By ~ 2

Author(s):

Li-Na CHEN ◽

Qian WANG ◽

Yu-Kui SHANG ◽

Liang-Cai ZHANG ◽

Zhao SUN ◽

...

Keyword(s):

Network Topology ◽

Protein Function ◽

Interaction Network ◽

Human Protein ◽

Structural Interaction

Download Full-text

Faculty Opinions recommendation of Interaction between intrinsically disordered proteins frequently occurs in a human protein-protein interaction network.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1164197.624888 ◽

2009 ◽

Author(s):

Vladimir Uversky

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Intrinsically Disordered Proteins ◽

Interaction Network ◽

Human Protein ◽

Disordered Proteins ◽

Protein Protein Interaction ◽

Intrinsically Disordered ◽

Protein Protein Interaction Network

Download Full-text

Faculty Opinions recommendation of Extreme multifunctional proteins identified from a human protein interaction network.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.725542134.793507289 ◽

2015 ◽

Author(s):

Sheila McCormick

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Interaction Network ◽

Human Protein ◽

Multifunctional Proteins ◽

Human Protein Interaction

Download Full-text

Applications of random graphs

10.1093/oso/9780198709893.003.0011 ◽

2017 ◽

Author(s):

A.C.C. Coolen ◽

A. Annibale ◽

E.S. Roberts

Keyword(s):

Social Networks ◽

Random Graphs ◽

Interaction Network ◽

Power Grids ◽

Protein Protein Interaction ◽

Topological Features ◽

First Case ◽

The Stability ◽

Protein Protein Interaction Network

This chapter reviews graph generation techniques in the context of applications. The first case study is power grids, where proposed strategies to prevent blackouts have been tested on tailored random graphs. The second case study is in social networks. Applications of random graphs to social networks are extremely wide ranging – the particular aspect looked at here is modelling the spread of disease on a social network – and how a particular construction based on projecting from a bipartite graph successfully captures some of the clustering observed in real social networks. The third case study is on null models of food webs, discussing the specific constraints relevant to this application, and the topological features which may contribute to the stability of an ecosystem. The final case study is taken from molecular biology, discussing the importance of unbiased graph sampling when considering if motifs are over-represented in a protein–protein interaction network.

Download Full-text