scholarly journals Large-scale identification of human protein function using topological features of interaction network

2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Zhanchao Li ◽  
Zhiqing Liu ◽  
Wenqian Zhong ◽  
Menghua Huang ◽  
Na Wu ◽  
...  
2016 ◽  
Author(s):  
T Li ◽  
R Wernersson ◽  
RB Hansen ◽  
H Horn ◽  
JM Mercer ◽  
...  

Human protein-protein interaction networks are critical to understanding cell biology and interpreting genetic and genomic data, but are challenging to produce in individual large-scale experiments. We describe a general computational framework that through data integration and quality control provides a scored human protein-protein interaction network (InWeb_IM). Juxtaposed with five comparable resources, InWeb_IM has 2.8 times more interactions (~585K) and a superior functional signal showing that the added interactions reflect real cellular biology. InWeb_IM is a versatile resource for accurate and cost-efficient functional interpretation of massive genomic datasets illustrated by annotating candidate genes from >4,700 cancer genomes and genes involved in neuropsychiatric diseases.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6917 ◽  
Author(s):  
Sabyasachi Patra ◽  
Anjali Mohapatra

Network motifs play an important role in the structural analysis of biological networks. Identification of such network motifs leads to many important applications such as understanding the modularity and the large-scale structure of biological networks, classification of networks into super-families, and protein function annotation. However, identification of large network motifs is a challenging task as it involves the graph isomorphism problem. Although this problem has been studied extensively in the literature using different computational approaches, still there is a lot of scope for improvement. Motivated by the challenges involved in this field, an efficient and scalable network motif finding algorithm using a dynamic expansion tree is proposed. The novelty of the proposed algorithm is that it avoids computationally expensive graph isomorphism tests and overcomes the space limitation of the static expansion tree (SET) which makes it enable to find large motifs. In this algorithm, the embeddings corresponding to a child node of the expansion tree are obtained from the embeddings of a parent node, either by adding a vertex or by adding an edge. This process does not involve any graph isomorphism check. The time complexity of vertex addition and edge addition are O(n) and O(1), respectively. The growth of a dynamic expansion tree (DET) depends on the availability of patterns in the target network. Pruning of branches in the DET significantly reduces the space requirement of the SET. The proposed algorithm has been tested on a protein–protein interaction network obtained from the MINT database. The proposed algorithm is able to identify large network motifs faster than most of the existing motif finding algorithms.


2021 ◽  
Author(s):  
Barnali Das ◽  
Pralay Mitra

Infectious diseases in humans appear to be one of the most primary public health issues. Identification of novel disease-associated proteins will furnish an efficient recognition of the novel therapeutic targets. Here, we develop a Graph Convolutional Network (GCN)-based model called PINDeL to identify the disease-associated host proteins by integrating the human Protein Locality Graph and its corresponding topological features. Because of the amalgamation of GCN with the protein interaction network, PINDeL achieves the highest accuracy of 83.45% while AUROC and AUPRC values are 0.90 and 0.88, respectively. With high accuracy, recall, F1-score, specificity, AUROC, and AUPRC, PINDeL outperforms other existing machine-learning and deep-learning techniques for disease gene/protein identification in humans. Application of PINDeL on an independent dataset of 24320 proteins, which are not used for training, validation, or testing purposes, predicts 6448 new disease-protein associations of which we verify 3196 disease-proteins through experimental evidence like disease ontology, Gene Ontology, and KEGG pathway enrichment analyses. Our investigation informs that experimentally-verified 748 proteins are indeed responsible for pathogen-host protein interactions of which 22 disease-proteins share their association with multiple diseases such as cancer, aging, chem-dependency, pharmacogenomics, normal variation, infection, and immune-related diseases. This unique Graph Convolution Network-based prediction model is of utmost use in large-scale disease-protein association prediction and hence, will provide crucial insights on disease pathogenesis and will further aid in developing novel therapeutics.


Author(s):  
Hao Zhang ◽  
Ruisi Xu ◽  
Meng Ding ◽  
Ying Zhang

Gastric cancer is a common malignant tumor of the digestive system with no specific symptoms. Due to the limited knowledge of pathogenesis, patients are usually diagnosed in advanced stage and do not have effective treatment methods. Proteome has unique tissue and time specificity and can reflect the influence of external factors that has become a potential biomarker for early diagnosis. Therefore, discovering gastric cancer-related proteins could greatly help researchers design drugs and develop an early diagnosis kit. However, identifying gastric cancer-related proteins by biological experiments is time- and money-consuming. With the high speed increase of data, it has become a hot issue to mine the knowledge of proteomics data on a large scale through computational methods. Based on the hypothesis that the stronger the association between the two proteins, the more likely they are to be associated with the same disease, in this paper, we constructed both disease similarity network and protein interaction network. Then, Graph Convolutional Networks (GCN) was applied to extract topological features of these networks. Finally, Xgboost was used to identify the relationship between proteins and gastric cancer. Results of 10-cross validation experiments show high area under the curve (AUC) (0.85) and area under the precision recall (AUPR) curve (0.76) of our method, which proves the effectiveness of our method.


2010 ◽  
Vol 37 (5) ◽  
pp. 517-526 ◽  
Author(s):  
Li-Na CHEN ◽  
Qian WANG ◽  
Yu-Kui SHANG ◽  
Liang-Cai ZHANG ◽  
Zhao SUN ◽  
...  

Author(s):  
A.C.C. Coolen ◽  
A. Annibale ◽  
E.S. Roberts

This chapter reviews graph generation techniques in the context of applications. The first case study is power grids, where proposed strategies to prevent blackouts have been tested on tailored random graphs. The second case study is in social networks. Applications of random graphs to social networks are extremely wide ranging – the particular aspect looked at here is modelling the spread of disease on a social network – and how a particular construction based on projecting from a bipartite graph successfully captures some of the clustering observed in real social networks. The third case study is on null models of food webs, discussing the specific constraints relevant to this application, and the topological features which may contribute to the stability of an ecosystem. The final case study is taken from molecular biology, discussing the importance of unbiased graph sampling when considering if motifs are over-represented in a protein–protein interaction network.


Sign in / Sign up

Export Citation Format

Share Document