Embedding Methods or Link-based Similarity Measures, Which is Better for Link Prediction?

2020 ◽

Vol 9 (5) ◽

pp. 1690-1696

Keyword(s):

Biological Networks ◽

Link Prediction ◽

Preferential Attachment ◽

Similarity Measures ◽

Telecommunication Networks ◽

Different Dimensions ◽

Low Dimensional ◽

Embedding Methods ◽

Node Embeddings ◽

Interacting Components

Networks have proved to be very helpful in modelling complex systems with interacting components. There are various problems across various domains where the systems can be modelled in the form of a network with links between interacting components. The Problem of Link Prediction deals with predicting missing links in a given network. The application of link prediction ranges across various disciplines including biological networks, transportation networks, social networks, telecommunication networks, etc. In this paper, we use node embedding methods to encode the nodes into low dimensional embeddings and predict links based on the edge embeddings computed by taking the hadamard product of the participating nodes. We further compare the accuracy of the models trained on different dimensions of embeddings. We also study how the introduction of additional features changes the accuracy when introduced to various dimensions of node embeddings. The additional features include overlapping measures such as Jaccard similarity, Adamic-Adar score and dot product between node embeddings as well as heuristic features i.e. Common Neighbors, Resource Allocation, preferential attachment and friend tns score.

Download Full-text

An information theoretic approach to link prediction in multiplex networks

Scientific Reports ◽

10.1038/s41598-021-92427-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Seyed Hossein Jafari ◽

Amir Mahdi Abdolhosseini-Qomi ◽

Masoud Asadpour ◽

Maseud Rahgozar ◽

Naser Yazdani

Keyword(s):

Real World ◽

Link Prediction ◽

Large Scale ◽

Similarity Measures ◽

Prediction Method ◽

General Purpose ◽

Fast Method ◽

Theoretic Approach ◽

Multiplex Networks ◽

Wide Range

AbstractThe entities of real-world networks are connected via different types of connections (i.e., layers). The task of link prediction in multiplex networks is about finding missing connections based on both intra-layer and inter-layer correlations. Our observations confirm that in a wide range of real-world multiplex networks, from social to biological and technological, a positive correlation exists between connection probability in one layer and similarity in other layers. Accordingly, a similarity-based automatic general-purpose multiplex link prediction method—SimBins—is devised that quantifies the amount of connection uncertainty based on observed inter-layer correlations in a multiplex network. Moreover, SimBins enhances the prediction quality in the target layer by incorporating the effect of link overlap across layers. Applying SimBins to various datasets from diverse domains, our findings indicate that SimBins outperforms the compared methods (both baseline and state-of-the-art methods) in most instances when predicting links. Furthermore, it is discussed that SimBins imposes minor computational overhead to the base similarity measures making it a potentially fast method, suitable for large-scale multiplex networks.

Download Full-text

MultiVERSE: a multiplex and multiplex-heterogeneous network embedding approach

Scientific Reports ◽

10.1038/s41598-021-87987-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Léo Pio-Lopez ◽

Alberto Valdeolivas ◽

Laurent Tichit ◽

Élisabeth Remy ◽

Anaïs Baudot

Keyword(s):

Heterogeneous Networks ◽

Heterogeneous Network ◽

Link Prediction ◽

Network Embedding ◽

Multiplex Networks ◽

Multiplex Network ◽

Gene Associations ◽

Different Types ◽

Embedding Methods ◽

Node Embeddings

AbstractNetwork embedding approaches are gaining momentum to analyse a large variety of networks. Indeed, these approaches have demonstrated their effectiveness in tasks such as community detection, node classification, and link prediction. However, very few network embedding methods have been specifically designed to handle multiplex networks, i.e. networks composed of different layers sharing the same set of nodes but having different types of edges. Moreover, to our knowledge, existing approaches cannot embed multiple nodes from multiplex-heterogeneous networks, i.e. networks composed of several multiplex networks containing both different types of nodes and edges. In this study, we propose MultiVERSE, an extension of the VERSE framework using Random Walks with Restart on Multiplex (RWR-M) and Multiplex-Heterogeneous (RWR-MH) networks. MultiVERSE is a fast and scalable method to learn node embeddings from multiplex and multiplex-heterogeneous networks. We evaluate MultiVERSE on several biological and social networks and demonstrate its performance. MultiVERSE indeed outperforms most of the other methods in the tasks of link prediction and network reconstruction for multiplex network embedding, and is also efficient in link prediction for multiplex-heterogeneous network embedding. Finally, we apply MultiVERSE to study rare disease-gene associations using link prediction and clustering. MultiVERSE is freely available on github at https://github.com/Lpiol/MultiVERSE.

Download Full-text

Semantic Similarity Measures for Topological Link Prediction

Computational Science and Its Applications – ICCSA 2020 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-58814-4_10 ◽

2020 ◽

pp. 132-142 ◽

Cited By ~ 1

Author(s):

Giulio Biondi ◽

Valentina Franzoni

Keyword(s):

Semantic Similarity ◽

Link Prediction ◽

Similarity Measures

Download Full-text

Candidate gene prioritization using graph embedding

10.1101/2020.02.03.927913 ◽

2020 ◽

Author(s):

Quan Do ◽

Pierre Larmande

Keyword(s):

Link Prediction ◽

Graph Embedding ◽

Prediction Performance ◽

Knowledge Graph ◽

Learning Techniques ◽

Number Of Genes ◽

Important Amount ◽

Candidate Gene Prioritization ◽

Gene Information ◽

Embedding Methods

AbstractCandidate genes prioritization allows to rank among a large number of genes, those that are strongly associated with a phenotype or a disease. Due to the important amount of data that needs to be integrate and analyse, gene-to-phenotype association is still a challenging task. In this paper, we evaluated a knowledge graph approach combined with embedding methods to overcome these challenges. We first introduced a dataset of rice genes created from several open-access databases. Then, we used the Translating Embedding model and Convolution Knowledge Base model, to vectorize gene information. Finally, we evaluated the results using link prediction performance and vectors representation using some unsupervised learning techniques.

Download Full-text

JANE: Jointly Adversarial Network Embedding

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/192 ◽

2020 ◽

Author(s):

Liang Yang ◽

Yuexue Wang ◽

Junhua Gu ◽

Chuan Wang ◽

Xiaochun Cao ◽

...

Keyword(s):

Link Prediction ◽

Real Data ◽

Semantic Space ◽

Network Embedding ◽

Generative Adversarial Network ◽

Adversarial Learning ◽

Adversarial Network ◽

Node Clustering ◽

Topology Information ◽

Embedding Methods

Motivated by the capability of Generative Adversarial Network on exploring the latent semantic space and capturing semantic variations in the data distribution, adversarial learning has been adopted in network embedding to improve the robustness. However, this important ability is lost in existing adversarially regularized network embedding methods, because their embedding results are directly compared to the samples drawn from perturbation (Gaussian) distribution without any rectification from real data. To overcome this vital issue, a novel Joint Adversarial Network Embedding (JANE) framework is proposed to jointly distinguish the real and fake combinations of the embeddings, topology information and node features. JANE contains three pluggable components, Embedding module, Generator module and Discriminator module. The overall objective function of JANE is defined in a min-max form, which can be optimized via alternating stochastic gradient. Extensive experiments demonstrate the remarkable superiority of the proposed JANE on link prediction (3% gains in both AUC and AP) and node clustering (5% gain in F1 score).

Download Full-text

A new study of using temporality and weights to improve similarity measures for link prediction of social networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-17770 ◽

2018 ◽

Vol 34 (4) ◽

pp. 2667-2678

Author(s):

Farshad Aghabozorgi ◽

Mohammad Reza Khayyambashi

Keyword(s):

Social Networks ◽

Link Prediction ◽

Similarity Measures

Download Full-text

Multivariate Time Series Link Prediction for Evolving Heterogeneous Network

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622018500530 ◽

2019 ◽

Vol 18 (01) ◽

pp. 241-286 ◽

Cited By ~ 8

Author(s):

Alper Ozcan ◽

Sule Gunduz Oguducu

Keyword(s):

Time Series ◽

Real World ◽

Link Prediction ◽

Multivariate Time Series ◽

Similarity Measures ◽

Dynamic Structure ◽

Social Bookmarking ◽

Node Connectivity ◽

Worldwide Web ◽

Heterogeneous Social Networks

Link prediction is considered as one of the key tasks in various data mining applications for recommendation systems, bioinformatics, security and worldwide web. The majority of previous works in link prediction mainly focus on the homogeneous networks which only consider one type of node and link. However, real-world networks have heterogeneous interactions and complicated dynamic structure, which make link prediction a more challenging task. In this paper, we have studied the problem of link prediction in the dynamic, undirected, weighted/unweighted, heterogeneous social networks which are composed of multiple types of nodes and links that change over time. We propose a novel method, called Multivariate Time Series Link Prediction for evolving heterogeneous networks that incorporate (1) temporal evolution of the network; (2) correlations between link evolution and multi-typed relationships; (3) local and global similarity measures; and (4) node connectivity information. Our proposed method and the previously proposed time series methods are evaluated experimentally on a real-world bibliographic network (DBLP) and a social bookmarking network (Delicious). Experimental results show that the proposed method outperforms the previous methods in terms of AUC measures in different test cases.

Download Full-text

Discovering spurious links in multiplex networks based on interlayer relevance

Journal of Complex Networks ◽

10.1093/comnet/cnz007 ◽

2019 ◽

Vol 7 (5) ◽

pp. 641-658 ◽

Cited By ~ 2

Author(s):

Zeynab Samei ◽

Mahdi Jalili

Keyword(s):

Social Networks ◽

Social Networking ◽

Link Prediction ◽

Large Scale ◽

Transportation Networks ◽

Similarity Index ◽

Similarity Measures ◽

Computation Complexity ◽

Multiplex Networks ◽

Large Scale Networks

Abstract Many real-world complex systems can be better modelled as multiplex networks, where the same individuals develop connections in multiple layers. Examples include social networks between individuals on multiple social networking platforms, and transportation networks between cities based on air, rail and road networks. Accurately predicting spurious links in multiplex networks is a challenging issue. In this article, we show that one can effectively use interlayer information to build an algorithm for spurious link prediction. We propose a similarity index that combines intralayer similarity with interlayer relevance for the link prediction purpose. The proposed similarity index is used to rank the node pairs, and identify those that are likely to be spurious. Our experimental results show that the proposed metric is much more accurate than intralayer similarity measures in correctly predicting the spurious links. The proposed method is an unsupervised method and has low computation complexity, and thus can be effectively applied for spurious link prediction in large-scale networks.

Download Full-text

On Investigating Both Effectiveness and Efficiency of Embedding Methods in Task of Similarity Computation of Nodes in Graphs

Applied Sciences ◽

10.3390/app11010162 ◽

2020 ◽

Vol 11 (1) ◽

pp. 162

Author(s):

Masoud Reyhani Hamedani ◽

Sang-Wook Kim

Keyword(s):

Social Relations ◽

Dimensional Space ◽

Similarity Measures ◽

Parameter Tuning ◽

Original Graph ◽

Wide Range ◽

Similarity Computation ◽

Effectiveness And Efficiency ◽

Low Dimensional ◽

Embedding Methods

One of the important tasks in a graph is to compute the similarity between two nodes; link-based similarity measures (in short, similarity measures) are well-known and conventional techniques for this task that exploit the relations between nodes (i.e., links) in the graph. Graph embedding methods (in short, embedding methods) convert nodes in a graph into vectors in a low-dimensional space by preserving social relations among nodes in the original graph. Instead of applying a similarity measure to the graph to compute the similarity between nodes a and b, we can consider the proximity between corresponding vectors of a and b obtained by an embedding method as the similarity between a and b. Although embedding methods have been analyzed in a wide range of machine learning tasks such as link prediction and node classification, they are not investigated in terms of similarity computation of nodes. In this paper, we investigate both effectiveness and efficiency of embedding methods in the task of similarity computation of nodes by comparing them with those of similarity measures. To the best of our knowledge, this is the first work that examines the application of embedding methods in this special task. Based on the results of our extensive experiments with five well-known and publicly available datasets, we found the following observations for embedding methods: (1) with all datasets, they show less effectiveness than similarity measures except for one dataset, (2) they underperform similarity measures with all datasets in terms of efficiency except for one dataset, (3) they have more parameters than similarity measures, thereby leading to a time-consuming parameter tuning process, (4) increasing the number of dimensions does not necessarily improve their effectiveness in computing the similarity of nodes.

Download Full-text

Embedding Methods or Link-based Similarity Measures, Which is Better for Link Prediction?

Link Prediction in Complex Networks using Embedding Techniques and Similarity Measures

An information theoretic approach to link prediction in multiplex networks

MultiVERSE: a multiplex and multiplex-heterogeneous network embedding approach

Semantic Similarity Measures for Topological Link Prediction

Candidate gene prioritization using graph embedding

JANE: Jointly Adversarial Network Embedding

A new study of using temporality and weights to improve similarity measures for link prediction of social networks

Multivariate Time Series Link Prediction for Evolving Heterogeneous Network

Discovering spurious links in multiplex networks based on interlayer relevance

On Investigating Both Effectiveness and Efficiency of Embedding Methods in Task of Similarity Computation of Nodes in Graphs

Export Citation Format