scholarly journals On spectral embedding performance and elucidating network structure in stochastic blockmodel graphs

2019 ◽  
Vol 7 (3) ◽  
pp. 269-291 ◽  
Author(s):  
Joshua Cape ◽  
Minh Tang ◽  
Carey E. Priebe

AbstractStatistical inference on graphs often proceeds via spectral methods involving low-dimensional embeddings of matrix-valued graph representations such as the graph Laplacian or adjacency matrix. In this paper, we analyze the asymptotic information-theoretic relative performance of Laplacian spectral embedding and adjacency spectral embedding for block assignment recovery in stochastic blockmodel graphs by way of Chernoff information. We investigate the relationship between spectral embedding performance and underlying network structure (e.g., homogeneity, affinity, core-periphery, and (un)balancedness) via a comprehensive treatment of the two-block stochastic blockmodel and the class of K-blockmodels exhibiting homogeneous balanced affinity structure. Our findings support the claim that, for a particular notion of sparsity, loosely speaking, “Laplacian spectral embedding favors relatively sparse graphs, whereas adjacency spectral embedding favors not-too-sparse graphs.” We also provide evidence in support of the claim that “adjacency spectral embedding favors core-periphery network structure.”

2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Koya Sato ◽  
Mizuki Oka ◽  
Alain Barrat ◽  
Ciro Cattuto

AbstractLow-dimensional vector representations of network nodes have proven successful to feed graph data to machine learning algorithms and to improve performance across diverse tasks. Most of the embedding techniques, however, have been developed with the goal of achieving dense, low-dimensional encoding of network structure and patterns. Here, we present a node embedding technique aimed at providing low-dimensional feature vectors that are informative of dynamical processes occurring over temporal networks – rather than of the network structure itself – with the goal of enabling prediction tasks related to the evolution and outcome of these processes. We achieve this by using a lossless modified supra-adjacency representation of temporal networks and building on standard embedding techniques for static graphs based on random walks. We show that the resulting embedding vectors are useful for prediction tasks related to paradigmatic dynamical processes, namely epidemic spreading over empirical temporal networks. In particular, we illustrate the performance of our approach for the prediction of nodes’ epidemic states in single instances of a spreading process. We show how framing this task as a supervised multi-label classification task on the embedding vectors allows us to estimate the temporal evolution of the entire system from a partial sampling of nodes at random times, with potential impact for nowcasting infectious disease dynamics.


Author(s):  
Kishlay Jha ◽  
Guangxu Xun ◽  
Aidong Zhang

Abstract Motivation Many real-world biomedical interactions such as ‘gene-disease’, ‘disease-symptom’ and ‘drug-target’ are modeled as a bipartite network structure. Learning meaningful representations for such networks is a fundamental problem in the research area of Network Representation Learning (NRL). NRL approaches aim to translate the network structure into low-dimensional vector representations that are useful to a variety of biomedical applications. Despite significant advances, the existing approaches still have certain limitations. First, a majority of these approaches do not model the unique topological properties of bipartite networks. Consequently, their straightforward application to the bipartite graphs yields unsatisfactory results. Second, the existing approaches typically learn representations from static networks. This is limiting for the biomedical bipartite networks that evolve at a rapid pace, and thus necessitate the development of approaches that can update the representations in an online fashion. Results In this research, we propose a novel representation learning approach that accurately preserves the intricate bipartite structure, and efficiently updates the node representations. Specifically, we design a customized autoencoder that captures the proximity relationship between nodes participating in the bipartite bicliques (2 × 2 sub-graph), while preserving both the global and local structures. Moreover, the proposed structure-preserving technique is carefully interleaved with the central tenets of continual machine learning to design an incremental learning strategy that updates the node representations in an online manner. Taken together, the proposed approach produces meaningful representations with high fidelity and computational efficiency. Extensive experiments conducted on several biomedical bipartite networks validate the effectiveness and rationality of the proposed approach.


2020 ◽  
Vol 117 (27) ◽  
pp. 15403-15408
Author(s):  
Lawrence K. Saul

We propose a latent variable model to discover faithful low-dimensional representations of high-dimensional data. The model computes a low-dimensional embedding that aims to preserve neighborhood relationships encoded by a sparse graph. The model both leverages and extends current leading approaches to this problem. Like t-distributed Stochastic Neighborhood Embedding, the model can produce two- and three-dimensional embeddings for visualization, but it can also learn higher-dimensional embeddings for other uses. Like LargeVis and Uniform Manifold Approximation and Projection, the model produces embeddings by balancing two goals—pulling nearby examples closer together and pushing distant examples further apart. Unlike these approaches, however, the latent variables in our model provide additional structure that can be exploited for learning. We derive an Expectation–Maximization procedure with closed-form updates that monotonically improve the model’s likelihood: In this procedure, embeddings are iteratively adapted by solving sparse, diagonally dominant systems of linear equations that arise from a discrete graph Laplacian. For large problems, we also develop an approximate coarse-graining procedure that avoids the need for negative sampling of nonadjacent nodes in the graph. We demonstrate the model’s effectiveness on datasets of images and text.


2020 ◽  
Vol 7 (2) ◽  
pp. 190714 ◽  
Author(s):  
Omar Shetta ◽  
Mahesan Niranjan

The application of machine learning to inference problems in biology is dominated by supervised learning problems of regression and classification, and unsupervised learning problems of clustering and variants of low-dimensional projections for visualization. A class of problems that have not gained much attention is detecting outliers in datasets, arising from reasons such as gross experimental, reporting or labelling errors. These could also be small parts of a dataset that are functionally distinct from the majority of a population. Outlier data are often identified by considering the probability density of normal data and comparing data likelihoods against some threshold. This classical approach suffers from the curse of dimensionality, which is a serious problem with omics data which are often found in very high dimensions. We develop an outlier detection method based on structured low-rank approximation methods. The objective function includes a regularizer based on neighbourhood information captured in the graph Laplacian. Results on publicly available genomic data show that our method robustly detects outliers whereas a density-based method fails even at moderate dimensions. Moreover, we show that our method has better clustering and visualization performance on the recovered low-dimensional projection when compared with popular dimensionality reduction techniques.


2015 ◽  
Vol 768 ◽  
pp. 549-571 ◽  
Author(s):  
Aditya G. Nair ◽  
Kunihiko Taira

We examine discrete vortex dynamics in two-dimensional flow through a network-theoretic approach. The interaction of the vortices is represented with a graph, which allows the use of network-theoretic approaches to identify key vortex-to-vortex interactions. We employ sparsification techniques on these graph representations based on spectral theory to construct sparsified models and evaluate the dynamics of vortices in the sparsified set-up. Identification of vortex structures based on graph sparsification and sparse vortex dynamics is illustrated through an example of point-vortex clusters interacting amongst themselves. We also evaluate the performance of sparsification with increasing number of point vortices. The sparsified-dynamics model developed with spectral graph theory requires a reduced number of vortex-to-vortex interactions but agrees well with the full nonlinear dynamics. Furthermore, the sparsified model derived from the sparse graphs conserves the invariants of discrete vortex dynamics. We highlight the similarities and differences between the present sparsified-dynamics model and reduced-order models.


2020 ◽  
Vol 34 (04) ◽  
pp. 6737-6745
Author(s):  
Ce Zhang ◽  
Hady W. Lauw

Oftentimes documents are linked to one another in a network structure,e.g., academic papers cite other papers, Web pages link to other pages. In this paper we propose a holistic topic model to learn meaningful and unified low-dimensional representations for networked documents that seek to preserve both textual content and network structure. On the basis of reconstructing not only the input document but also its adjacent neighbors, we develop two neural encoder architectures. Adjacent-Encoder, or AdjEnc, induces competition among documents for topic propagation, and reconstruction among neighbors for semantic capture. Adjacent-Encoder-X, or AdjEnc-X, extends this to also encode the network structure in addition to document content. We evaluate our models on real-world document networks quantitatively and qualitatively, outperforming comparable baselines comprehensively.


2017 ◽  
Vol 58 ◽  
pp. 185-229 ◽  
Author(s):  
James Cussens ◽  
Matti Järvisalo ◽  
Janne H. Korhonen ◽  
Mark Bartlett

The challenging task of learning structures of probabilistic graphical models is an important problem within modern AI research. Recent years have witnessed several major algorithmic advances in structure learning for Bayesian networks - arguably the most central class of graphical models - especially in what is known as the score-based setting. A successful generic approach to optimal Bayesian network structure learning (BNSL), based on integer programming (IP), is implemented in the GOBNILP system. Despite the recent algorithmic advances, current understanding of foundational aspects underlying the IP based approach to BNSL is still somewhat lacking. Understanding fundamental aspects of cutting planes and the related separation problem is important not only from a purely theoretical perspective, but also since it holds out the promise of further improving the efficiency of state-of-the-art approaches to solving BNSL exactly. In this paper, we make several theoretical contributions towards these goals: (i) we study the computational complexity of the separation problem, proving that the problem is NP-hard; (ii) we formalise and analyse the relationship between three key polytopes underlying the IP-based approach to BNSL; (iii) we study the facets of the three polytopes both from the theoretical and practical perspective, providing, via exhaustive computation, a complete enumeration of facets for low-dimensional family-variable polytopes; and, furthermore, (iv) we establish a tight connection of the BNSL problem to the acyclic subgraph problem.


Author(s):  
Daokun Zhang ◽  
Jie Yin ◽  
Xingquan Zhu ◽  
Chengqi Zhang

This paper addresses social network embedding, which aims to embed social network nodes, including user profile information, into a latent low-dimensional space. Most of the existing works on network embedding only consider network structure, but ignore user-generated content that could be potentially helpful in learning a better joint network representation. Different from rich node content in citation networks, user profile information in social networks is useful but noisy, sparse, and incomplete. To properly utilize this information, we propose a new algorithm called User Profile Preserving Social Network Embedding (UPP-SNE), which incorporates user profile with network structure to jointly learn a vector representation of a social network. The theme of UPP-SNE is to embed user profile information via a nonlinear mapping into a consistent subspace, where network structure is seamlessly encoded to jointly learn informative node representations. Extensive experiments on four real-world social networks show that compared to state-of-the-art baselines, our method learns better social network representations and achieves substantial performance gains in node classification and clustering tasks.


2014 ◽  
Vol 12 (05) ◽  
pp. 1450025 ◽  
Author(s):  
Shang Gao ◽  
Ibrahim Karakira ◽  
Salim Afra ◽  
Ghada Naji ◽  
Reda Alhajj ◽  
...  

Network is a powerful structure which reveals valuable characteristics of the underlying data. However, previous work on evaluating the predictive performance of network-based biomarkers does not take nodal connectedness into account. We argue that it is necessary to maximize the benefit from the network structure by employing appropriate techniques. To address this, we aim to learn a weight coefficient for each node in the network from the quantitative measure such as gene expression data. The weight coefficients are computed from an optimization problem which minimizes the total weighted difference between nodes in a network structure; this can be expressed in terms of graph Laplacian. After obtaining the coefficient vector for the network markers, we can then compute the corresponding network predictor. We demonstrate the effectiveness of the proposed method by conducting experiments using published breast cancer biomarkers with three patient cohorts. Network markers are first grouped based on GO terms related to cancer hallmarks. We compare the predictive performance of each network marker group across gene expression datasets. We also evaluate the network predictor against the average method for feature aggregation. The reported results show that the predictive performance of network markers is generally not consistent across patient cohorts.


Sign in / Sign up

Export Citation Format

Share Document