Appraisal Study of Similarity-Based and Embedding-Based Link Prediction Methods on Graphs

2021 ◽  
Author(s):  
Md Kamrul Islam ◽  
Sabeur Aridhi ◽  
Malika Smail-Tabbone

The task of inferring missing links or predicting future ones in a graph based on its current structure is referred to as link prediction. Link prediction methods that are based on pairwise node similarity are well-established approaches in the literature and show good prediction performance in many real-world graphs though they are heuristic. On the other hand, graph embedding approaches learn low-dimensional representation of nodes in graph and are capable of capturing inherent graph features, and thus support the subsequent link prediction task in graph. This appraisal paper studies a selection of methods from both categories on several benchmark (homogeneous) graphs with different properties from various domains. Beyond the intra and inter category comparison of the performances of the methods our aim is also to uncover interesting connections between Graph Neural Network(GNN)-based methods and heuristic ones as a means to alleviate the black-box well-known limitation.

Author(s):  
Md Kamrul Islam ◽  
Sabeur Aridhi ◽  
Malika Smail-Tabbone

The task of inferring missing links or predicting future ones in a graph based on its current structure is referred to as link prediction. Link prediction methods that are based on pairwise node similarity are well-established approaches in the literature and show good prediction performance in many realworld graphs though they are heuristic. On the other hand, graph embedding approaches learn lowdimensional representation of nodes in graph and are capable of capturing inherent graph features, and thus support the subsequent link prediction task in graph. This paper studies a selection of methods from both categories on several benchmark (homogeneous) graphs with different properties from various domains. Beyond the intra and inter category comparison of the performances of the methods, our aim is also to uncover interesting connections between Graph Neural Network(GNN)- based methods and heuristic ones as a means to alleviate the black-box well-known limitation.


Author(s):  
Md Kamrul Islam ◽  
Sabeur Aridhi ◽  
Malika Smail-Tabbone

The task of inferring missing links or predicting future ones in a graph based on its current structure is referred to as link prediction. Link prediction methods that are based on pairwise node similarity are well-established approaches in the literature and show good prediction performance in many realworld graphs though they are heuristic. On the other hand, graph embedding approaches learn lowdimensional representation of nodes in graph and are capable of capturing inherent graph features, and thus support the subsequent link prediction task in graph. This paper studies a selection of methods from both categories on several benchmark (homogeneous) graphs with different properties from various domains. Beyond the intra and inter category comparison of the performances of the methods, our aim is also to uncover interesting connections between Graph Neural Network(GNN)- based methods and heuristic ones as a means to alleviate the black-box well-known limitation.


2020 ◽  
Vol 8 (2) ◽  
Author(s):  
Leo Torres ◽  
Kevin S Chan ◽  
Tina Eliassi-Rad

Abstract Graph embedding seeks to build a low-dimensional representation of a graph $G$. This low-dimensional representation is then used for various downstream tasks. One popular approach is Laplacian Eigenmaps (LE), which constructs a graph embedding based on the spectral properties of the Laplacian matrix of $G$. The intuition behind it, and many other embedding techniques, is that the embedding of a graph must respect node similarity: similar nodes must have embeddings that are close to one another. Here, we dispose of this distance-minimization assumption. Instead, we use the Laplacian matrix to find an embedding with geometric properties instead of spectral ones, by leveraging the so-called simplex geometry of $G$. We introduce a new approach, Geometric Laplacian Eigenmap Embedding, and demonstrate that it outperforms various other techniques (including LE) in the tasks of graph reconstruction and link prediction.


2020 ◽  
Vol 31 (11) ◽  
pp. 2050158
Author(s):  
Xiang-Chun Liu ◽  
Dian-Qing Meng ◽  
Xu-Zhen Zhu ◽  
Yang Tian

Link prediction based on node similarity has become one of the most effective prediction methods for complex network. When calculating the similarity between two unconnected endpoints in link prediction, most scholars evaluate the influence of endpoint based on the node degree. However, this method ignores the difference in contribution of neighbor (NC) nodes for endpoint. Through abundant investigations and analyses, the paper quantifies the NC nodes to endpoint, and conceives NC Index to evaluate the endpoint influence accurately. Extensive experiments on 12 real datasets indicate that our proposed algorithm can increase the accuracy of link prediction significantly and show an obvious advantage over traditional algorithms.


2020 ◽  
Author(s):  
Mustafa Coşkun ◽  
Mehmet Koyutürk

AbstractMotivationLink prediction is an important and well-studied problem in computational biology, with a broad range of applications including disease gene prioritization, drug-disease associations, and drug response in cancer. The general principle in link prediction is to use the topological characteristics and the attributes–if available– of the nodes in the network to predict new links that are likely to emerge/disappear. Recently, graph representation learning methods, which aim to learn a low-dimensional representation of topological characteristics and the attributes of the nodes, have drawn increasing attention to solve the link prediction problem via learnt low-dimensional features. Most prominently, Graph Convolution Network (GCN)-based network embedding methods have demonstrated great promise in link prediction due to their ability of capturing non-linear information of the network. To date, GCN-based network embedding algorithms utilize a Laplacian matrix in their convolution layers as the convolution matrix and the effect of the convolution matrix on algorithm performance has not been comprehensively characterized in the context of link prediction in biomedical networks. On the other hand, for a variety of biomedical link prediction tasks, traditional node similarity measures such as Common Neighbor, Ademic-Adar, and other have shown promising results, and hence there is a need to systematically evaluate the node similarity measures as convolution matrices in terms of their usability and potential to further the state-of-the-art.ResultsWe select 8 representative node similarity measures as convolution matrices within the single-layered GCN graph embedding method and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug–drug interaction (DDI) prediction, protein–protein interaction (PPI) prediction. Our experimental results demonstrate that the node similarity-based convolution matrices significantly improves GCN-based embedding algorithms and deserve more attention in the future biomedical link predictionAvailabilityOur method is implemented as a python library and is available at [email protected] informationSupplementary data are available at Bioinformatics online.


2014 ◽  
Vol 651-653 ◽  
pp. 1748-1752
Author(s):  
Fu Li Xie ◽  
Guang Quan Cheng

With the development of network science, the link prediction problem has attracted more and more attention. Among which, link prediction methods based on similarity has been most widely studied. Previous methods depicting similarity of nodes mainly consider their common neighbors. But in this paper, from the view of network environment of nodes, which is to analysis the links around the pair of nodes, derive nodes similarity through that of links, a new way to solve the link prediction problem is provided. This paper establishes a link prediction model based on similarity between links, presents the LE index. Finally, the LE index is tested on five real datasets, and compared with existing similarity-based link prediction methods, the experimental results show that LE index can achieve good prediction accuracy, especially outperforms the other methods in the Yeast network.


Author(s):  
Hongchang Gao ◽  
Heng Huang

Network embedding has attracted a surge of attention in recent years. It is to learn the low-dimensional representation for nodes in a network, which benefits downstream tasks such as node classification and link prediction. Most of the existing approaches learn node representations only based on the topological structure, yet nodes are often associated with rich attributes in many real-world applications. Thus, it is important and necessary to learn node representations based on both the topological structure and node attributes. In this paper, we propose a novel deep attributed network embedding approach, which can capture the high non-linearity and preserve various proximities in both topological structure and node attributes. At the same time, a novel strategy is proposed to guarantee the learned node representation can encode the consistent and complementary information from the topological structure and node attributes. Extensive experiments on benchmark datasets have verified the effectiveness of our proposed approach.


Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1407
Author(s):  
Peng Wang ◽  
Jing Zhou ◽  
Yuzhang Liu ◽  
Xingchen Zhou

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Tianwen Luo ◽  
Yutong Wang ◽  
Xuefeng Shan ◽  
Ye Bai ◽  
Chun Huang ◽  
...  

Abstract Background The identification of the homogeneous and heterogeneous risk factors for different types of metastases in colorectal cancer (CRC) may shed light on the aetiology and help individualize prophylactic treatment. The present study characterized the incidence differences and identified the homogeneous and heterogeneous risk factors associated with distant metastases in CRC. Methods CRC patients registered in the SEER database between 2010 and 2016 were included in this study. Logistic regression was used to analyse homogeneous and heterogeneous risk factors for the occurrence of different types of metastases. Nomograms were constructed to predict the risk for developing metastases, and the performance was quantitatively assessed using the receiver operating characteristics (ROC) curve and calibration curve. Results A total of 204,595 eligible CRC patients were included in our study, and 17.07% of them had distant metastases. The overall incidences of liver metastases, lung metastases, bone metastases, and brain metastases were 15.34%, 5.22%, 1.26%, and 0.29%, respectively. The incidence of distant metastases differed by age, gender, and the original CRC sites. Poorly differentiated grade, more lymphatic metastasis, higher carcinoembryonic antigen (CEA), and different metastatic organs were all positively associated with four patterns of metastases. In contrast, age, sex, race, insurance status, position, and T stage were heterogeneously associated with metastases. The calibration and ROC curves exhibited good performance for predicting distant metastases. Conclusions The incidence of distant metastases in CRC exhibited distinct differences, and the patients had homogeneous and heterogeneous associated risk factors. Although limited risk factors were included in the present study, the established nomogram showed good prediction performance.


Sign in / Sign up

Export Citation Format

Share Document