Adaptive Similarity Function with Structural Features of Network Embedding for Missing Link Prediction

Link prediction is a fundamental problem of data science, which usually calls for unfolding the mechanisms that govern the micro-dynamics of networks. In this regard, using features obtained from network embedding for predicting links has drawn widespread attention. Although methods based on edge features or node similarity have been proposed to solve the link prediction problem, many technical challenges still exist due to the unique structural properties of networks, especially when the networks are sparse. From the graph mining perspective, we first give empirical evidence of the inconsistency between heuristic and learned edge features. Then, we propose a novel link prediction framework, AdaSim, by introducing an Adaptive Similarity function using features obtained from network embedding based on random walks. The node feature representations are obtained by optimizing a graph-based objective function. Instead of generating edge features using binary operators, we perform link prediction solely leveraging the node features of the network. We define a flexible similarity function with one tunable parameter, which serves as a penalty of the original similarity measure. The optimal value is learned through supervised learning and thus is adaptive to data distribution. To evaluate the performance of our proposed algorithm, we conduct extensive experiments on eleven disparate networks of the real world. Experimental results show that AdaSim achieves better performance than state-of-the-art algorithms and is robust to different sparsities of the networks.

Download Full-text

Link prediction via layer relevance of multiplex networks

International Journal of Modern Physics C ◽

10.1142/s0129183117501017 ◽

2017 ◽

Vol 28 (08) ◽

pp. 1750101 ◽

Cited By ~ 7

Author(s):

Yabing Yao ◽

Ruisheng Zhang ◽

Fan Yang ◽

Yongna Yuan ◽

Qingshuang Sun ◽

...

Keyword(s):

Structural Properties ◽

Link Prediction ◽

Structural Information ◽

Similarity Index ◽

Single Layer ◽

Structural Features ◽

Prediction Performance ◽

Multiplex Networks ◽

Multiplex Network ◽

Node Similarity

In complex networks, the existing link prediction methods primarily focus on the internal structural information derived from single-layer networks. However, the role of interlayer information is hardly recognized in multiplex networks, which provide more diverse structural features than single-layer networks. Actually, the structural properties and functions of one layer can affect that of other layers in multiplex networks. In this paper, the effect of interlayer structural properties on the link prediction performance is investigated in multiplex networks. By utilizing the intralayer and interlayer information, we propose a novel “Node Similarity Index” based on “Layer Relevance” (NSILR) of multiplex network for link prediction. The performance of NSILR index is validated on each layer of seven multiplex networks in real-world systems. Experimental results show that the NSILR index can significantly improve the prediction performance compared with the traditional methods, which only consider the intralayer information. Furthermore, the more relevant the layers are, the higher the performance is enhanced.

Download Full-text

Proximity Measures as Graph Convolution Matrices for Link Prediction in Biological Networks

10.1101/2020.11.14.382655 ◽

2020 ◽

Author(s):

Mustafa Coşkun ◽

Mehmet Koyutürk

Keyword(s):

Link Prediction ◽

Similarity Measures ◽

Graph Representation ◽

Supplementary Information ◽

Great Promise ◽

Network Embedding ◽

Common Neighbor ◽

Node Similarity ◽

Topological Characteristics ◽

Low Dimensional

AbstractMotivationLink prediction is an important and well-studied problem in computational biology, with a broad range of applications including disease gene prioritization, drug-disease associations, and drug response in cancer. The general principle in link prediction is to use the topological characteristics and the attributes–if available– of the nodes in the network to predict new links that are likely to emerge/disappear. Recently, graph representation learning methods, which aim to learn a low-dimensional representation of topological characteristics and the attributes of the nodes, have drawn increasing attention to solve the link prediction problem via learnt low-dimensional features. Most prominently, Graph Convolution Network (GCN)-based network embedding methods have demonstrated great promise in link prediction due to their ability of capturing non-linear information of the network. To date, GCN-based network embedding algorithms utilize a Laplacian matrix in their convolution layers as the convolution matrix and the effect of the convolution matrix on algorithm performance has not been comprehensively characterized in the context of link prediction in biomedical networks. On the other hand, for a variety of biomedical link prediction tasks, traditional node similarity measures such as Common Neighbor, Ademic-Adar, and other have shown promising results, and hence there is a need to systematically evaluate the node similarity measures as convolution matrices in terms of their usability and potential to further the state-of-the-art.ResultsWe select 8 representative node similarity measures as convolution matrices within the single-layered GCN graph embedding method and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug–drug interaction (DDI) prediction, protein–protein interaction (PPI) prediction. Our experimental results demonstrate that the node similarity-based convolution matrices significantly improves GCN-based embedding algorithms and deserve more attention in the future biomedical link predictionAvailabilityOur method is implemented as a python library and is available at [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

W-MMP2Vec: Topic-driven network embedding model for link prediction in content-based heterogeneous information network

Intelligent Data Analysis ◽

10.3233/ida-205168 ◽

2021 ◽

Vol 25 (3) ◽

pp. 711-738

Author(s):

Phu Pham ◽

Phuc Do

Keyword(s):

Link Prediction ◽

Representation Learning ◽

Information Network ◽

Network Embedding ◽

Heterogeneous Information Network ◽

Heterogeneous Information ◽

Learning Framework ◽

Novel Approach ◽

Proposed Model ◽

Meta Path

Link prediction on heterogeneous information network (HIN) is considered as a challenge problem due to the complexity and diversity in types of nodes and links. Currently, there are remained challenges of meta-path-based link prediction in HIN. Previous works of link prediction in HIN via network embedding approach are mainly focused on exploiting features of node rather than existing relations in forms of meta-paths between nodes. In fact, predicting the existence of new links between non-linked nodes is absolutely inconvincible. Moreover, recent HIN-based embedding models also lack of thorough evaluations on the topic similarity between text-based nodes along given meta-paths. To tackle these challenges, in this paper, we proposed a novel approach of topic-driven multiple meta-path-based HIN representation learning framework, namely W-MMP2Vec. Our model leverages the quality of node representations by combining multiple meta-paths as well as calculating the topic similarity weight for each meta-path during the processes of network embedding learning in content-based HINs. To validate our approach, we apply W-TMP2Vec model in solving several link prediction tasks in both content-based and non-content-based HINs (DBLP, IMDB and BlogCatalog). The experimental outputs demonstrate the effectiveness of proposed model which outperforms recent state-of-the-art HIN representation learning models.

Download Full-text

ALPINE: Active Link Prediction Using Network Embedding

Applied Sciences ◽

10.3390/app11115043 ◽

2021 ◽

Vol 11 (11) ◽

pp. 5043

Author(s):

Xi Chen ◽

Bo Kang ◽

Jefrey Lijffijt ◽

Tijl De Bie

Keyword(s):

Active Learning ◽

Protein Interactions ◽

Link Prediction ◽

Prediction Accuracy ◽

Real Data ◽

Network Embedding ◽

Protein Protein Interactions ◽

Additional Information ◽

The Cost ◽

Active Link

Many real-world problems can be formalized as predicting links in a partially observed network. Examples include Facebook friendship suggestions, the prediction of protein–protein interactions, and the identification of hidden relationships in a crime network. Several link prediction algorithms, notably those recently introduced using network embedding, are capable of doing this by just relying on the observed part of the network. Often, whether two nodes are linked can be queried, albeit at a substantial cost (e.g., by questionnaires, wet lab experiments, or undercover work). Such additional information can improve the link prediction accuracy, but owing to the cost, the queries must be made with due consideration. Thus, we argue that an active learning approach is of great potential interest and developed ALPINE (Active Link Prediction usIng Network Embedding), a framework that identifies the most useful link status by estimating the improvement in link prediction accuracy to be gained by querying it. We proposed several query strategies for use in combination with ALPINE, inspired by the optimal experimental design and active learning literature. Experimental results on real data not only showed that ALPINE was scalable and boosted link prediction accuracy with far fewer queries, but also shed light on the relative merits of the strategies, providing actionable guidance for practitioners.

Download Full-text

Towards effective link prediction: A hybrid similarity model

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200344 ◽

2020 ◽

pp. 1-14

Author(s):

Longjie Li ◽

Lu Wang ◽

Hongsheng Luo ◽

Xiaoyun Chen

Keyword(s):

Link Prediction ◽

Structural Similarity ◽

Research Direction ◽

Structural Features ◽

Proposed Model ◽

Similarity Model ◽

Weight Calculation ◽

Stable Performance ◽

Grey Relation ◽

Important Research Direction

Link prediction is an important research direction in complex network analysis and has drawn increasing attention from researchers in various fields. So far, a plethora of structural similarity-based methods have been proposed to solve the link prediction problem. To achieve stable performance on different networks, this paper proposes a hybrid similarity model to conduct link prediction. In the proposed model, the Grey Relation Analysis (GRA) approach is employed to integrate four carefully selected similarity indexes, which are designed according to different structural features. In addition, to adaptively estimate the weight for each index based on the observed network structures, a new weight calculation method is presented by considering the distribution of similarity scores. Due to taking separate similarity indexes into account, the proposed method is applicable to multiple different types of network. Experimental results show that the proposed method outperforms other prediction methods in terms of accuracy and stableness on 10 benchmark networks.

Download Full-text

Scene Complexity: A New Perspective on Understanding the Scene Semantics of Remote Sensing and Designing Image-Adaptive Convolutional Neural Networks

Remote Sensing ◽

10.3390/rs13040742 ◽

2021 ◽

Vol 13 (4) ◽

pp. 742

Author(s):

Jian Peng ◽

Xiaoming Mei ◽

Wenbo Li ◽

Liang Hong ◽

Bingyu Sun ◽

...

Keyword(s):

Remote Sensing ◽

Neural Networks ◽

Fundamental Problem ◽

Semantic Representation ◽

Feature Learning ◽

Essential Elements ◽

Complex Scene ◽

Feature Representations ◽

The Right ◽

The Relationship

Scene understanding of remote sensing images is of great significance in various applications. Its fundamental problem is how to construct representative features. Various convolutional neural network architectures have been proposed for automatically learning features from images. However, is the current way of configuring the same architecture to learn all the data while ignoring the differences between images the right one? It seems to be contrary to our intuition: it is clear that some images are easier to recognize, and some are harder to recognize. This problem is the gap between the characteristics of the images and the learning features corresponding to specific network structures. Unfortunately, the literature so far lacks an analysis of the two. In this paper, we explore this problem from three aspects: we first build a visual-based evaluation pipeline of scene complexity to characterize the intrinsic differences between images; then, we analyze the relationship between semantic concepts and feature representations, i.e., the scalability and hierarchy of features which the essential elements in CNNs of different architectures, for remote sensing scenes of different complexity; thirdly, we introduce CAM, a visualization method that explains feature learning within neural networks, to analyze the relationship between scenes with different complexity and semantic feature representations. The experimental results show that a complex scene would need deeper and multi-scale features, whereas a simpler scene would need lower and single-scale features. Besides, the complex scene concept is more dependent on the joint semantic representation of multiple objects. Furthermore, we propose the framework of scene complexity prediction for an image and utilize it to design a depth and scale-adaptive model. It achieves higher performance but with fewer parameters than the original model, demonstrating the potential significance of scene complexity.

Download Full-text

MultiVERSE: a multiplex and multiplex-heterogeneous network embedding approach

Scientific Reports ◽

10.1038/s41598-021-87987-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Léo Pio-Lopez ◽

Alberto Valdeolivas ◽

Laurent Tichit ◽

Élisabeth Remy ◽

Anaïs Baudot

Keyword(s):

Heterogeneous Networks ◽

Heterogeneous Network ◽

Link Prediction ◽

Network Embedding ◽

Multiplex Networks ◽

Multiplex Network ◽

Gene Associations ◽

Different Types ◽

Embedding Methods ◽

Node Embeddings

AbstractNetwork embedding approaches are gaining momentum to analyse a large variety of networks. Indeed, these approaches have demonstrated their effectiveness in tasks such as community detection, node classification, and link prediction. However, very few network embedding methods have been specifically designed to handle multiplex networks, i.e. networks composed of different layers sharing the same set of nodes but having different types of edges. Moreover, to our knowledge, existing approaches cannot embed multiple nodes from multiplex-heterogeneous networks, i.e. networks composed of several multiplex networks containing both different types of nodes and edges. In this study, we propose MultiVERSE, an extension of the VERSE framework using Random Walks with Restart on Multiplex (RWR-M) and Multiplex-Heterogeneous (RWR-MH) networks. MultiVERSE is a fast and scalable method to learn node embeddings from multiplex and multiplex-heterogeneous networks. We evaluate MultiVERSE on several biological and social networks and demonstrate its performance. MultiVERSE indeed outperforms most of the other methods in the tasks of link prediction and network reconstruction for multiplex network embedding, and is also efficient in link prediction for multiplex-heterogeneous network embedding. Finally, we apply MultiVERSE to study rare disease-gene associations using link prediction and clustering. MultiVERSE is freely available on github at https://github.com/Lpiol/MultiVERSE.

Download Full-text

Network embedding based link prediction in dynamic networks

Future Generation Computer Systems ◽

10.1016/j.future.2021.09.024 ◽

2021 ◽

Author(s):

Shashi Prakash Tripathi ◽

Rahul Kumar Yadav ◽

Abhay Kumar Rai

Keyword(s):

Link Prediction ◽

Dynamic Networks ◽

Network Embedding

Download Full-text

Semisupervised Community Preserving Network Embedding with Pairwise Constraints

Complexity ◽

10.1155/2020/7953758 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Dong Liu ◽

Yan Ru ◽

Qinpeng Li ◽

Shibin Wang ◽

Jianwei Niu

Keyword(s):

Community Structure ◽

Link Prediction ◽

Learning Algorithms ◽

Nonnegative Matrix ◽

Machine Learning Algorithms ◽

Network Visualization ◽

Network Embedding ◽

Pairwise Constraints ◽

Node Clustering ◽

Low Dimensional

Network embedding aims to learn the low-dimensional representations of nodes in networks. It preserves the structure and internal attributes of the networks while representing nodes as low-dimensional dense real-valued vectors. These vectors are used as inputs of machine learning algorithms for network analysis tasks such as node clustering, classification, link prediction, and network visualization. The network embedding algorithms, which considered the community structure, impose a higher level of constraint on the similarity of nodes, and they make the learned node embedding results more discriminative. However, the existing network representation learning algorithms are mostly unsupervised models; the pairwise constraint information, which represents community membership, is not effectively utilized to obtain node embedding results that are more consistent with prior knowledge. This paper proposes a semisupervised modularized nonnegative matrix factorization model, SMNMF, while preserving the community structure for network embedding; the pairwise constraints (must-link and cannot-link) information are effectively fused with the adjacency matrix and node similarity matrix of the network so that the node representations learned by the model are more interpretable. Experimental results on eight real network datasets show that, comparing with the representative network embedding methods, the node representations learned after incorporating the pairwise constraints can obtain higher accuracy in node clustering task and the results of link prediction, and network visualization tasks indicate that the semisupervised model SMNMF is more discriminative than unsupervised ones.

Download Full-text

JANE: Jointly Adversarial Network Embedding

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/192 ◽

2020 ◽

Author(s):

Liang Yang ◽

Yuexue Wang ◽

Junhua Gu ◽

Chuan Wang ◽

Xiaochun Cao ◽

...

Keyword(s):

Link Prediction ◽

Real Data ◽

Semantic Space ◽

Network Embedding ◽

Generative Adversarial Network ◽

Adversarial Learning ◽

Adversarial Network ◽

Node Clustering ◽

Topology Information ◽

Embedding Methods

Motivated by the capability of Generative Adversarial Network on exploring the latent semantic space and capturing semantic variations in the data distribution, adversarial learning has been adopted in network embedding to improve the robustness. However, this important ability is lost in existing adversarially regularized network embedding methods, because their embedding results are directly compared to the samples drawn from perturbation (Gaussian) distribution without any rectification from real data. To overcome this vital issue, a novel Joint Adversarial Network Embedding (JANE) framework is proposed to jointly distinguish the real and fake combinations of the embeddings, topology information and node features. JANE contains three pluggable components, Embedding module, Generator module and Discriminator module. The overall objective function of JANE is defined in a min-max form, which can be optimized via alternating stochastic gradient. Extensive experiments demonstrate the remarkable superiority of the proposed JANE on link prediction (3% gains in both AUC and AP) and node clustering (5% gain in F1 score).

Download Full-text