NECo: A node embedding algorithm for multiplex heterogeneous networks

AbstractComplex diseases such as hypertension, cancer, and diabetes cause nearly 70% of the deaths in the U.S. and involve multiple genes and their interactions with environmental factors. Therefore, identification of genetic factors to understand and decrease the morbidity and mortality from complex diseases is an important and challenging task. With the generation of an unprecedented amount of multi-omics datasets, network-based methods have become popular to represent the multilayered complex molecular interactions. Particularly node embeddings, the low-dimensional representations of nodes in a network are utilized for gene function prediction. Integrated network analysis of multi-omics data alleviates the issues related to missing data and lack of context-specific datasets. Most of the node embedding methods, however, are unable to integrate multiple types of datasets from genes and phenotypes. To address this limitation, we developed a node embedding algorithm called Node Embeddings of Complex networks (NECo) that can utilize multilayered heterogeneous networks of genes and phenotypes. We evaluated the performance of NECo using genotypic and phenotypic datasets from rat (Rattus norvegicus) disease models to classify hypertension disease-related genes. Our method significantly outperformed the state-of-the-art node embedding methods, with AUC of 94.97% compared 85.98% in the second-best performer, and predicted genes not previously implicated in hypertension.Availability and implementationThe source code is available on GitHub at https://github.com/bozdaglab/NECo.

Download Full-text

MultiVERSE: a multiplex and multiplex-heterogeneous network embedding approach

Scientific Reports ◽

10.1038/s41598-021-87987-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Léo Pio-Lopez ◽

Alberto Valdeolivas ◽

Laurent Tichit ◽

Élisabeth Remy ◽

Anaïs Baudot

Keyword(s):

Heterogeneous Networks ◽

Heterogeneous Network ◽

Link Prediction ◽

Network Embedding ◽

Multiplex Networks ◽

Multiplex Network ◽

Gene Associations ◽

Different Types ◽

Embedding Methods ◽

Node Embeddings

AbstractNetwork embedding approaches are gaining momentum to analyse a large variety of networks. Indeed, these approaches have demonstrated their effectiveness in tasks such as community detection, node classification, and link prediction. However, very few network embedding methods have been specifically designed to handle multiplex networks, i.e. networks composed of different layers sharing the same set of nodes but having different types of edges. Moreover, to our knowledge, existing approaches cannot embed multiple nodes from multiplex-heterogeneous networks, i.e. networks composed of several multiplex networks containing both different types of nodes and edges. In this study, we propose MultiVERSE, an extension of the VERSE framework using Random Walks with Restart on Multiplex (RWR-M) and Multiplex-Heterogeneous (RWR-MH) networks. MultiVERSE is a fast and scalable method to learn node embeddings from multiplex and multiplex-heterogeneous networks. We evaluate MultiVERSE on several biological and social networks and demonstrate its performance. MultiVERSE indeed outperforms most of the other methods in the tasks of link prediction and network reconstruction for multiplex network embedding, and is also efficient in link prediction for multiplex-heterogeneous network embedding. Finally, we apply MultiVERSE to study rare disease-gene associations using link prediction and clustering. MultiVERSE is freely available on github at https://github.com/Lpiol/MultiVERSE.

Download Full-text

Rule-Guided Compositional Representation Learning on Knowledge Graphs

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5687 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2950-2958

Author(s):

Guanglin Niu ◽

Yongfei Zhang ◽

Bo Li ◽

Peng Cui ◽

Si Liu ◽

...

Keyword(s):

State Of The Art ◽

Representation Learning ◽

Vector Spaces ◽

Semantic Structure ◽

Completion Task ◽

Joint Embedding ◽

Semantic Associations ◽

Structured Information ◽

Low Dimensional ◽

Embedding Methods

Representation learning on a knowledge graph (KG) is to embed entities and relations of a KG into low-dimensional continuous vector spaces. Early KG embedding methods only pay attention to structured information encoded in triples, which would cause limited performance due to the structure sparseness of KGs. Some recent attempts consider paths information to expand the structure of KGs but lack explainability in the process of obtaining the path representations. In this paper, we propose a novel Rule and Path-based Joint Embedding (RPJE) scheme, which takes full advantage of the explainability and accuracy of logic rules, the generalization of KG embedding as well as the supplementary semantic structure of paths. Specifically, logic rules of different lengths (the number of relations in rule body) in the form of Horn clauses are first mined from the KG and elaborately encoded for representation learning. Then, the rules of length 2 are applied to compose paths accurately while the rules of length 1 are explicitly employed to create semantic associations among relations and constrain relation embeddings. Moreover, the confidence level of each rule is also considered in optimization to guarantee the availability of applying the rule to representation learning. Extensive experimental results illustrate that RPJE outperforms other state-of-the-art baselines on KG completion task, which also demonstrate the superiority of utilizing logic rules as well as paths for improving the accuracy and explainability of representation learning.

Download Full-text

Structural Adversarial Variational Auto-Encoder for Attributed Network Embedding

Applied Sciences ◽

10.3390/app11052371 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2371

Author(s):

Junjian Zhan ◽

Feng Li ◽

Yang Wang ◽

Daoyu Lin ◽

Guangluan Xu

Keyword(s):

State Of The Art ◽

Global Information ◽

Network Embedding ◽

Sampling Process ◽

Attributed Network ◽

Benchmark Datasets ◽

Adversarial Training ◽

Low Dimensional ◽

Embedding Methods ◽

Local Proximity

As most networks come with some content in each node, attributed network embedding has aroused much research interest. Most existing attributed network embedding methods aim at learning a fixed representation for each node encoding its local proximity. However, those methods usually neglect the global information between nodes distant from each other and distribution of the latent codes. We propose Structural Adversarial Variational Graph Auto-Encoder (SAVGAE), a novel framework which encodes the network structure and node content into low-dimensional embeddings. On one hand, our model captures the local proximity and proximities at any distance of a network by exploiting a high-order proximity indicator named Rooted Pagerank. On the other hand, our method learns the data distribution of each node representation while circumvents the side effect its sampling process causes on learning a robust embedding through adversarial training. On benchmark datasets, we demonstrate that our method performs competitively compared with state-of-the-art models.

Download Full-text

Graph embedding on biomedical networks: methods, applications and evaluations

Bioinformatics ◽

10.1093/bioinformatics/btz718 ◽

2019 ◽

Cited By ~ 14

Author(s):

Xiang Yue ◽

Zhen Wang ◽

Jingong Huang ◽

Srinivasan Parthasarathy ◽

Soheil Moosavinasab ◽

...

Keyword(s):

Protein Function ◽

State Of The Art ◽

Protein Function Prediction ◽

Graph Embedding ◽

Experimental Results ◽

Supplementary Information ◽

Protein Protein Interaction ◽

Biological Features ◽

Low Dimensional ◽

Embedding Methods

Abstract Motivation Graph embedding learning that aims to automatically learn low-dimensional node representations, has drawn increasing attention in recent years. To date, most recent graph embedding methods are evaluated on social and information networks and are not comprehensively studied on biomedical networks under systematic experiments and analyses. On the other hand, for a variety of biomedical network analysis tasks, traditional techniques such as matrix factorization (which can be seen as a type of graph embedding methods) have shown promising results, and hence there is a need to systematically evaluate the more recent graph embedding methods (e.g. random walk-based and neural network-based) in terms of their usability and potential to further the state-of-the-art. Results We select 11 representative graph embedding methods and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug–drug interaction (DDI) prediction, protein–protein interaction (PPI) prediction; and 2 node classification tasks: medical term semantic type classification, protein function prediction. Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis. Compared with three state-of-the-art methods for DDAs, DDIs and protein function predictions, the recent graph embedding methods achieve competitive performance without using any biological features and the learned embeddings can be treated as complementary representations for the biological features. By summarizing the experimental results, we provide general guidelines for properly selecting graph embedding methods and setting their hyper-parameters for different biomedical tasks. Availability and implementation As part of our contributions in the paper, we develop an easy-to-use Python package with detailed instructions, BioNEV, available at: https://github.com/xiangyue9607/BioNEV, including all source code and datasets, to facilitate studying various graph embedding methods on biomedical tasks. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

FREDE

Proceedings of the VLDB Endowment ◽

10.14778/3447689.3447713 ◽

2021 ◽

Vol 14 (6) ◽

pp. 1102-1110

Author(s):

Anton Tsitsulin ◽

Marina Munkhoeva ◽

Davide Mottin ◽

Panagiotis Karras ◽

Ivan Oseledets ◽

...

Keyword(s):

Data Science ◽

State Of The Art ◽

Graph Embedding ◽

Space Complexity ◽

Similarity Matrix ◽

Data Engineering ◽

Matrix Sketching ◽

Low Dimensional ◽

Diverse Data ◽

Embedding Methods

Low-dimensional representations, or embeddings , of a graph's nodes facilitate several practical data science and data engineering tasks. As such embeddings rely, explicitly or implicitly, on a similarity measure among nodes, they require the computation of a quadratic similarity matrix, inducing a tradeoff between space complexity and embedding quality. To date, no graph embedding work combines (i) linear space complexity, (ii) a nonlinear transform as its basis, and (iii) nontrivial quality guarantees. In this paper we introduce FREDE ( FREquent Directions Embedding ), a graph embedding based on matrix sketching that combines those three desiderata. Starting out from the observation that embedding methods aim to preserve the covariance among the rows of a similarity matrix, FREDE iteratively improves on quality while individually processing rows of a nonlinearly transformed PPR similarity matrix derived from a state-of-the-art graph embedding method and provides, at any iteration , column-covariance approximation guarantees in due course almost indistinguishable from those of the optimal approximation by SVD. Our experimental evaluation on variably sized networks shows that FREDE performs almost as well as SVD and competitively against state-of-the-art embedding methods in diverse data science tasks, even when it is based on as little as 10% of node similarities.

Download Full-text

SPINE: Structural Identity Preserved Inductive Network Embedding

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/333 ◽

2019 ◽

Cited By ~ 3

Author(s):

Junliang Guo ◽

Linli Xu ◽

Jingchang Liu

Keyword(s):

State Of The Art ◽

Structural Information ◽

Critical Role ◽

Network Embedding ◽

Dimensional Network ◽

Structural Identity ◽

Benchmark Datasets ◽

Low Dimensional ◽

Embedding Methods ◽

Local Proximity

Recent advances in the field of network embedding have shown that low-dimensional network representation is playing a critical role in network analysis. Most existing network embedding methods encode the local proximity of a node, such as the first- and second-order proximities. While being efficient, these methods are short of leveraging the global structural information between nodes distant from each other. In addition, most existing methods learn embeddings on one single fixed network, and thus cannot be generalized to unseen nodes or networks without retraining. In this paper we present SPINE, a method that can jointly capture the local proximity and proximities at any distance, while being inductive to efficiently deal with unseen nodes or networks. Extensive experimental results on benchmark datasets demonstrate the superiority of the proposed framework over the state of the art.

Download Full-text

SNEQ: Semi-Supervised Attributed Network Embedding with Attention-Based Quantisation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5832 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4091-4098 ◽

Cited By ~ 1

Author(s):

Tao He ◽

Lianli Gao ◽

Jingkuan Song ◽

Xin Wang ◽

Kejie Huang ◽

...

Keyword(s):

Link Prediction ◽

Large Scale ◽

State Of The Art ◽

Network Embedding ◽

Compression Method ◽

Network Analytics ◽

Attributed Network ◽

Large Scale Networks ◽

Low Dimensional ◽

Embedding Methods

Learning accurate low-dimensional embeddings for a network is a crucial task as it facilitates many network analytics tasks. Moreover, the trained embeddings often require a significant amount of space to store, making storage and processing a challenge, especially as large-scale networks become more prevalent. In this paper, we present a novel semi-supervised network embedding and compression method, SNEQ, that is competitive with state-of-art embedding methods while being far more space- and time-efficient. SNEQ incorporates a novel quantisation method based on a self-attention layer that is trained in an end-to-end fashion, which is able to dramatically compress the size of the trained embeddings, thus reduces storage footprint and accelerates retrieval speed. Our evaluation on four real-world networks of diverse characteristics shows that SNEQ outperforms a number of state-of-the-art embedding methods in link prediction, node classification and node recommendation. Moreover, the quantised embedding shows a great advantage in terms of storage and time compared with continuous embeddings as well as hashing methods.

Download Full-text

Discrete Network Embedding

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/493 ◽

2018 ◽

Cited By ~ 13

Author(s):

Xiaobo Shen ◽

Shirui Pan ◽

Weiwei Liu ◽

Yew-Soon Ong ◽

Quan-Sen Sun

Keyword(s):

Large Scale ◽

State Of The Art ◽

Ground Truth ◽

Network Embedding ◽

Unified Framework ◽

Network Nodes ◽

Compact Representations ◽

Low Dimensional ◽

Vector Representations ◽

Embedding Methods

Network embedding aims to seek low-dimensional vector representations for network nodes, by preserving the network structure. The network embedding is typically represented in continuous vector, which imposes formidable challenges in storage and computation costs, particularly in large-scale applications. To address the issue, this paper proposes a novel discrete network embedding (DNE) for more compact representations. In particular, DNE learns short binary codes to represent each node. The Hamming similarity between two binary embeddings is then employed to well approximate the ground-truth similarity. A novel discrete multi-class classifier is also developed to expedite classification. Moreover, we propose to jointly learn the discrete embedding and classifier within a unified framework to improve the compactness and discrimination of network embedding. Extensive experiments on node classification consistently demonstrate that DNE exhibits lower storage and computational complexity than state-of-the-art network embedding methods, while obtains competitive classification results.

Download Full-text

Link Prediction in Complex Networks using Embedding Techniques and Similarity Measures

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2762.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 1690-1696

Keyword(s):

Biological Networks ◽

Link Prediction ◽

Preferential Attachment ◽

Similarity Measures ◽

Telecommunication Networks ◽

Different Dimensions ◽

Low Dimensional ◽

Embedding Methods ◽

Node Embeddings ◽

Interacting Components

Networks have proved to be very helpful in modelling complex systems with interacting components. There are various problems across various domains where the systems can be modelled in the form of a network with links between interacting components. The Problem of Link Prediction deals with predicting missing links in a given network. The application of link prediction ranges across various disciplines including biological networks, transportation networks, social networks, telecommunication networks, etc. In this paper, we use node embedding methods to encode the nodes into low dimensional embeddings and predict links based on the edge embeddings computed by taking the hadamard product of the participating nodes. We further compare the accuracy of the models trained on different dimensions of embeddings. We also study how the introduction of additional features changes the accuracy when introduced to various dimensions of node embeddings. The additional features include overlapping measures such as Jaccard similarity, Adamic-Adar score and dot product between node embeddings as well as heuristic features i.e. Common Neighbors, Resource Allocation, preferential attachment and friend tns score.

Download Full-text

Text-Graph Enhanced Knowledge Graph Representation Learning

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.697856 ◽

2021 ◽

Vol 4 ◽

Author(s):

Linmei Hu ◽

Mengmei Zhang ◽

Shaohua Li ◽

Jinghan Shi ◽

Chuan Shi ◽

...

Keyword(s):

State Of The Art ◽

Representation Learning ◽

Graph Representation ◽

Semantic Relationships ◽

Convolutional Networks ◽

Gating Mechanism ◽

Benchmark Datasets ◽

Knowledge Graphs ◽

Low Dimensional ◽

Embedding Methods

Knowledge Graphs (KGs) such as Freebase and YAGO have been widely adopted in a variety of NLP tasks. Representation learning of Knowledge Graphs (KGs) aims to map entities and relationships into a continuous low-dimensional vector space. Conventional KG embedding methods (such as TransE and ConvE) utilize only KG triplets and thus suffer from structure sparsity. Some recent works address this issue by incorporating auxiliary texts of entities, typically entity descriptions. However, these methods usually focus only on local consecutive word sequences, but seldom explicitly use global word co-occurrence information in a corpus. In this paper, we propose to model the whole auxiliary text corpus with a graph and present an end-to-end text-graph enhanced KG embedding model, named Teger. Specifically, we model the auxiliary texts with a heterogeneous entity-word graph (called text-graph), which entails both local and global semantic relationships among entities and words. We then apply graph convolutional networks to learn informative entity embeddings that aggregate high-order neighborhood information. These embeddings are further integrated with the KG triplet embeddings via a gating mechanism, thus enriching the KG representations and alleviating the inherent structure sparsity. Experiments on benchmark datasets show that our method significantly outperforms several state-of-the-art methods.

Download Full-text