Explainable and Efficient Link Prediction in Real-World Network Data

Stacking models for nearly optimal link prediction in complex networks

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1914950117 ◽

2020 ◽

Vol 117 (38) ◽

pp. 23393-23400 ◽

Cited By ~ 4

Author(s):

Amir Ghasemian ◽

Homa Hosseinmardi ◽

Aram Galstyan ◽

Edoardo M. Airoldi ◽

Aaron Clauset

Keyword(s):

Social Networks ◽

Data Collection ◽

Real World ◽

Link Prediction ◽

State Of The Art ◽

Network Data ◽

Prediction Errors ◽

Partially Observed ◽

Speed Up ◽

Large Corpus

Most real-world networks are incompletely observed. Algorithms that can accurately predict which links are missing can dramatically speed up network data collection and improve network model validation. Many algorithms now exist for predicting missing links, given a partially observed network, but it has remained unknown whether a single best predictor exists, how link predictability varies across methods and networks from different domains, and how close to optimality current methods are. We answer these questions by systematically evaluating 203 individual link predictor algorithms, representing three popular families of methods, applied to a large corpus of 550 structurally diverse networks from six scientific domains. We first show that individual algorithms exhibit a broad diversity of prediction errors, such that no one predictor or family is best, or worst, across all realistic inputs. We then exploit this diversity using network-based metalearning to construct a series of “stacked” models that combine predictors into a single algorithm. Applied to a broad range of synthetic networks, for which we may analytically calculate optimal performance, these stacked models achieve optimal or nearly optimal levels of accuracy. Applied to real-world networks, stacked models are superior, but their accuracy varies strongly by domain, suggesting that link prediction may be fundamentally easier in social networks than in biological or technological networks. These results indicate that the state of the art for link prediction comes from combining individual algorithms, which can achieve nearly optimal predictions. We close with a brief discussion of limitations and opportunities for further improvements.

Download Full-text

Accurate similarity index based on activity and connectivity of node for link prediction

International Journal of Modern Physics B ◽

10.1142/s0217979215501088 ◽

2015 ◽

Vol 29 (17) ◽

pp. 1550108 ◽

Cited By ~ 9

Author(s):

Longjie Li ◽

Lvjian Qian ◽

Xiaoping Wang ◽

Shishun Luo ◽

Xiaoyun Chen

Keyword(s):

Complex Networks ◽

Real World ◽

Link Prediction ◽

Similarity Index ◽

Experimental Results ◽

Network Data ◽

Prediction Problem ◽

Similarity Indices ◽

Average Activity ◽

Fundamental Requirement

Recent years have witnessed the increasing of available network data; however, much of those data is incomplete. Link prediction, which can find the missing links of a network, plays an important role in the research and analysis of complex networks. Based on the assumption that two unconnected nodes which are highly similar are very likely to have an interaction, most of the existing algorithms solve the link prediction problem by computing nodes' similarities. The fundamental requirement of those algorithms is accurate and effective similarity indices. In this paper, we propose a new similarity index, namely similarity based on activity and connectivity (SAC), which performs link prediction more accurately. To compute the similarity between two nodes, this index employs the average activity of these two nodes in their common neighborhood and the connectivities between them and their common neighbors. The higher the average activity is and the stronger the connectivities are, the more similar the two nodes are. The proposed index not only commendably distinguishes the contributions of paths but also incorporates the influence of endpoints. Therefore, it can achieve a better predicting result. To verify the performance of SAC, we conduct experiments on 10 real-world networks. Experimental results demonstrate that SAC outperforms the compared baselines.

Download Full-text

Higher-order temporal network effects through triplet evolution

Scientific Reports ◽

10.1038/s41598-021-94389-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Qing Yao ◽

Bingsheng Chen ◽

Tim S. Evans ◽

Kim Christensen

Keyword(s):

Real World ◽

Link Prediction ◽

Higher Order ◽

Prediction Algorithm ◽

Interaction Patterns ◽

Temporal Networks ◽

World Systems ◽

Order Interaction ◽

Space And Time ◽

Pairwise Interactions

AbstractWe study the evolution of networks through ‘triplets’—three-node graphlets. We develop a method to compute a transition matrix to describe the evolution of triplets in temporal networks. To identify the importance of higher-order interactions in the evolution of networks, we compare both artificial and real-world data to a model based on pairwise interactions only. The significant differences between the computed matrix and the calculated matrix from the fitted parameters demonstrate that non-pairwise interactions exist for various real-world systems in space and time, such as our data sets. Furthermore, this also reveals that different patterns of higher-order interaction are involved in different real-world situations. To test our approach, we then use these transition matrices as the basis of a link prediction algorithm. We investigate our algorithm’s performance on four temporal networks, comparing our approach against ten other link prediction methods. Our results show that higher-order interactions in both space and time play a crucial role in the evolution of networks as we find our method, along with two other methods based on non-local interactions, give the best overall performance. The results also confirm the concept that the higher-order interaction patterns, i.e., triplet dynamics, can help us understand and predict the evolution of different real-world systems.

Download Full-text

An information theoretic approach to link prediction in multiplex networks

Scientific Reports ◽

10.1038/s41598-021-92427-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Seyed Hossein Jafari ◽

Amir Mahdi Abdolhosseini-Qomi ◽

Masoud Asadpour ◽

Maseud Rahgozar ◽

Naser Yazdani

Keyword(s):

Real World ◽

Link Prediction ◽

Large Scale ◽

Similarity Measures ◽

Prediction Method ◽

General Purpose ◽

Fast Method ◽

Theoretic Approach ◽

Multiplex Networks ◽

Wide Range

AbstractThe entities of real-world networks are connected via different types of connections (i.e., layers). The task of link prediction in multiplex networks is about finding missing connections based on both intra-layer and inter-layer correlations. Our observations confirm that in a wide range of real-world multiplex networks, from social to biological and technological, a positive correlation exists between connection probability in one layer and similarity in other layers. Accordingly, a similarity-based automatic general-purpose multiplex link prediction method—SimBins—is devised that quantifies the amount of connection uncertainty based on observed inter-layer correlations in a multiplex network. Moreover, SimBins enhances the prediction quality in the target layer by incorporating the effect of link overlap across layers. Applying SimBins to various datasets from diverse domains, our findings indicate that SimBins outperforms the compared methods (both baseline and state-of-the-art methods) in most instances when predicting links. Furthermore, it is discussed that SimBins imposes minor computational overhead to the base similarity measures making it a potentially fast method, suitable for large-scale multiplex networks.

Download Full-text

Assessment of mixed network processes with shared inputs and undesirable factors

Operations Research and Decisions ◽

10.37190/ord200106 ◽

2020 ◽

Vol 30 (1) ◽

Author(s):

Maryam Nematizadeh ◽

Alireza Amirteimoori ◽

Sohrab Kordrostami ◽

Mohsen Vaez-Ghasemi

Keyword(s):

Data Envelopment Analysis ◽

Real World ◽

Network Data ◽

Data Envelopment ◽

Parallel Section ◽

Mixed Network ◽

Weak Disposability ◽

Proposed Model ◽

Shared Inputs

In the real world, there are processes whose structures are like a parallel-series mixed network. Network data envelopment analysis (NDEA) is one of the appropriate methods for assessing the performance of processes with these structures. In the paper, mixed processes with two parallel and series components are considered, in which the first component or parallel section consists of the shared in-puts, and the second component or series section consists of undesirable factors. By considering the weak disposability assumption for undesirable factors, a DEA approach as based on network slack-based measure (NSBM) is introduced to evaluate the performance of processes with mixed structures. The proposed model is illustrated with a real case study. Then, the model is developed to discriminate efficient units.

Download Full-text

Visualizing real-world networks

10.32920/ryerson.14665824 ◽

2021 ◽

Author(s):

Lyndsay Roach

Keyword(s):

Complex Networks ◽

Real World ◽

The Internet ◽

Network Data ◽

Community Structures ◽

Large Networks ◽

Computing Power ◽

On Line ◽

Using Data ◽

The Impact

The study of networks has been propelled by improvements in computing power, enabling our ability to mine and store large amounts of network data. Moreover, the ubiquity of the internet has afforded us access to records of interactions that have previously been invisible. We are now able to study complex networks with anywhere from hundreds to billions of nodes; however, it is difficult to visualize large networks in a meaningful way. We explore the process of visualizing real-world networks. We first discuss the properties of complex networks and the mechanisms used in the network visualizing software Gephi. Then we provide examples of voting, trade, and linguistic networks using data extracted from on-line sources. We investigate the impact of hidden community structures on the analysis of these real-world networks.

Download Full-text

Network Data Characteristics

Statistical Techniques for Network Security ◽

10.4018/978-1-59904-708-9.ch004 ◽

2011 ◽

pp. 104-122

Author(s):

Yu Wang

Keyword(s):

Real World ◽

Random Variables ◽

Network Data ◽

Traffic Data ◽

Real World Data ◽

Additional Information ◽

Data Points ◽

Key Features ◽

Basic Concepts ◽

Data Elements

Data represents the natural phenomena of our real world. Data is constructed by rows and columns; usually rows represent the observations and columns represent the variables. Observations, also called subjects, records, or data points, represent a phenomenon in the real world and variables, as also known as data elements or data fields, represent the characteristics of observations in data. Variables take different values for different observations, which can make observations independent of each other. Figure 4.1 illustrates a section of TCP/IP traffic data, in which the rows are individual network traffics, and the columns, separated by a space, are characteristics of the traffics. In this example, the first column is a session index of each connection and the second column is the date when the connection occurred. In this chapter, we will discuss some fundamental key features of variables and network data. We will present detailed discussions on variable characteristics and distributions in Sections Random Variables and Variables Distributions, and describe network data modules in Section Network Data Modules. The material covered in this chapter will help readers who do not have a solid background in this area gain an understanding of the basic concepts of variables and data. Additional information can be found from Introduction to the Practice of Statistics by Moore and McCabe (1998).

Download Full-text

Link prediction based on network embedding and similarity transferring methods

Modern Physics Letters B ◽

10.1142/s0217984920501699 ◽

2020 ◽

Vol 34 (16) ◽

pp. 2050169

Author(s):

Wei Yu ◽

Xiaoyu Liu ◽

Bo Ouyang

Keyword(s):

Real World ◽

Link Prediction ◽

Free Parameter ◽

Network Science ◽

Ad Hoc ◽

Prediction Algorithm ◽

Network Embedding ◽

Science Community ◽

The Cost ◽

Accuracy Of Prediction

In network science, link prediction is a technique used to predict missing or future relationships based on currently observed connections. Much attention from the network science community is paid to this direction recently. However, most present approaches predict links based on ad hoc similarity definitions. To address this issue, we propose a link prediction algorithm named Transferring Similarity Based on Adjacency Embedding (TSBAE). TSBAE is based on network embedding, where the potential information of the structure is preserved in the embedded vector space, and the similarity is inherently captured by the distance of these vectors. Furthermore, to accommodate the fact that the similarity should be transferable, indirect similarity between nodes is incorporated to improve the accuracy of prediction. The experimental results on 10 real-world networks show that TSBAE outperforms the baseline algorithms in the task of link prediction, with the cost of tuning a free parameter in the prediction.

Download Full-text

Enhancing link prediction by exploring community membership of nodes

International Journal of Modern Physics B ◽

10.1142/s021797921950382x ◽

2019 ◽

Vol 33 (31) ◽

pp. 1950382

Author(s):

Shenshen Bai ◽

Shiyu Fang ◽

Longjie Li ◽

Rui Liu ◽

Xiaoyun Chen

Keyword(s):

Community Structure ◽

Prediction Model ◽

Link Prediction ◽

Prediction Accuracy ◽

Network Data ◽

Data Link ◽

Structure Information ◽

Community Membership

With the proliferation of available network data, link prediction has become increasingly important and captured growing attention from various disciplines. To enhance the prediction accuracy by making full use of community structure information, this paper proposes a new link prediction model, namely CMS, in which different community memberships of nodes are investigated. In the opinion of CMS, different memberships can have different influence to link’s formation. To estimate the connection likelihood between two nodes, the CMS model weights the contribution of each shared neighbor according to the corresponding community membership. Three CMS-based methods are derived by introducing three forms of contribution that neighbors make. Extensive experiments on 12 networks are conducted to evaluate the performance of CMS-based methods. The results manifest that CMS-based methods are more effective and robust than baselines.

Download Full-text

Link prediction based on local weighted paths for complex networks

International Journal of Modern Physics C ◽

10.1142/s012918311750053x ◽

2017 ◽

Vol 28 (04) ◽

pp. 1750053

Author(s):

Yabing Yao ◽

Ruisheng Zhang ◽

Fan Yang ◽

Yongna Yuan ◽

Rongjing Hu ◽

...

Keyword(s):

Complex Networks ◽

Real World ◽

Link Prediction ◽

Structural Similarity ◽

Prediction Performance ◽

Topological Feature ◽

Topological Features ◽

Node Similarity ◽

Weighted Paths ◽

Path Dependent

As a significant problem in complex networks, link prediction aims to find the missing and future links between two unconnected nodes by estimating the existence likelihood of potential links. It plays an important role in understanding the evolution mechanism of networks and has broad applications in practice. In order to improve prediction performance, a variety of structural similarity-based methods that rely on different topological features have been put forward. As one topological feature, the path information between node pairs is utilized to calculate the node similarity. However, many path-dependent methods neglect the different contributions of paths for a pair of nodes. In this paper, a local weighted path (LWP) index is proposed to differentiate the contributions between paths. The LWP index considers the effect of the link degrees of intermediate links and the connectivity influence of intermediate nodes on paths to quantify the path weight in the prediction procedure. The experimental results on 12 real-world networks show that the LWP index outperforms other seven prediction baselines.

Download Full-text