How much topological structure is preserved by graph embeddings?

Graph embedding aims at learning representations of nodes in a low dimensional vector space. Good embeddings should preserve the graph topological structure. To study how much such structure can be preserved, we propose evaluation methods from four aspects: 1) How well the graph can be reconstructed based on the embeddings, 2) The divergence of the original link distribution and the embedding-derived distribution, 3) The consistency of communities discovered from the graph and embeddings, and 4) To what extent we can employ embeddings to facilitate link prediction. We find that it is insufficient to rely on the embeddings to reconstruct the original graph, to discover communities, and to predict links at a high precision. Thus, the embeddings by the state-of-the-art approaches can only preserve part of the topological structure.

Download Full-text

TransET: Knowledge Graph Embedding with Entity Types

Electronics ◽

10.3390/electronics10121407 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1407

Author(s):

Peng Wang ◽

Jing Zhou ◽

Yuzhang Liu ◽

Xingchen Zhou

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Score Function ◽

Graph Embedding ◽

Vector Spaces ◽

Knowledge Graph ◽

Semantic Features ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.

Download Full-text

AI-CTO: Knowledge graph for automated and dependable software stack solution

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200899 ◽

2021 ◽

Vol 40 (1) ◽

pp. 799-812

Author(s):

Xiaoyun Xu ◽

Jingzheng Wu ◽

Mutian Yang ◽

Tianyue Luo ◽

Qianru Meng ◽

...

Keyword(s):

Current Practice ◽

Graph Embedding ◽

Software Systems ◽

Professional Experience ◽

Knowledge Graph ◽

Dimensional Vector ◽

Dimensional Vector Space ◽

Svm Model ◽

Low Dimensional ◽

Industry Experience

As the scale of software systems continues expanding, software architecture is receiving more and more attention as the blueprint for the complex software system. An outstanding architecture requires a lot of professional experience and expertise. In current practice, architects try to find solutions manually, which is time-consuming and error-prone because of the knowledge barrier between newcomers and experienced architects. The problem can be solved by easing the process of apply experience from prominent architects. To this end, this paper proposes a novel graph-embedding-based method, AI-CTO, to automatically suggest software stack solutions according to the knowledge and experience of prominent architects. Firstly, AI-CTO converts existing industry experience to knowledge, i.e., knowledge graph. Secondly, the knowledge graph is embedded in a low-dimensional vector space. Then, the entity vectors are used to predict valuable software stack solutions by an SVM model. We evaluate AI-CTO with two case studies and compare its solutions with the software stacks of large companies. The experiment results show that AI-CTO can find effective and correct stack solutions and it outperforms other baseline methods.

Download Full-text

Understanding Negative Sampling in Knowledge Graph Embedding

International Journal of Artificial Intelligence & Applications ◽

10.5121/ijaia.2021.12105 ◽

2021 ◽

Vol 12 (1) ◽

pp. 71-81

Author(s):

Jing Qian ◽

Gangmin Li ◽

Katie Atkinson ◽

Yong Yue

Keyword(s):

Link Prediction ◽

Graph Embedding ◽

Knowledge Graph ◽

Direct Impact ◽

Dimensional Vector Space ◽

Dynamic Distribution ◽

Space Efficiency ◽

Node Classification ◽

Low Dimensional

Knowledge graph embedding (KGE) is to project entities and relations of a knowledge graph (KG) into a low-dimensional vector space, which has made steady progress in recent years. Conventional KGE methods, especially translational distance-based models, are trained through discriminating positive samples from negative ones. Most KGs store only positive samples for space efficiency. Negative sampling thus plays a crucial role in encoding triples of a KG. The quality of generated negative samples has a direct impact on the performance of learnt knowledge representation in a myriad of downstream tasks, such as recommendation, link prediction and node classification. We summarize current negative sampling approaches in KGE into three categories, static distribution-based, dynamic distribution-based and custom cluster-based respectively. Based on this categorization we discuss the most prevalent existing approaches and their characteristics. It is a hope that this review can provide some guidelines for new thoughts about negative sampling in KGE.

Download Full-text

Persona2vec: a flexible multi-role representations learning framework for graphs

PeerJ Computer Science ◽

10.7717/peerj-cs.439 ◽

2021 ◽

Vol 7 ◽

pp. e439

Author(s):

Jisung Yoon ◽

Kai-Cheng Yang ◽

Woo-Sung Jung ◽

Yong-Yeol Ahn

Keyword(s):

Community Structure ◽

Link Prediction ◽

Graph Mining ◽

State Of The Art ◽

Multiple Representations ◽

Graph Embedding ◽

Learning Framework ◽

Overlapping Community ◽

Art Performance ◽

Low Dimensional

Graph embedding techniques, which learn low-dimensional representations of a graph, are achieving state-of-the-art performance in many graph mining tasks. Most existing embedding algorithms assign a single vector to each node, implicitly assuming that a single representation is enough to capture all characteristics of the node. However, across many domains, it is common to observe pervasively overlapping community structure, where most nodes belong to multiple communities, playing different roles depending on the contexts. Here, we propose persona2vec, a graph embedding framework that efficiently learns multiple representations of nodes based on their structural contexts. Using link prediction-based evaluation, we show that our framework is significantly faster than the existing state-of-the-art model while achieving better performance.

Download Full-text

A Novel Negative Sampling Based on Frequency of Relational Association Entities for Knowledge Graph Embedding

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2068 ◽

2021 ◽

Author(s):

Wanhua Cao ◽

Yi Zhang ◽

Juntao Liu ◽

Ziyun Rao

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Evaluation Criteria ◽

Relation Extraction ◽

Graph Embedding ◽

Semantic Space ◽

Knowledge Graph ◽

Knowledge Reasoning ◽

Relation Prediction ◽

Low Dimensional

Knowledge graph embedding improves the performance of relation extraction and knowledge reasoning by encoding entities and relationships in low-dimensional semantic space. During training, negative samples are usually constructed by replacing the head/tail entity. And the different replacing relationships lead to different accuracy of the prediction results. This paper develops a negative triplets construction framework according to the frequency of relational association entities. The proposed construction framework can fully consider the quantitative of relations and entities in the dataset to assign the proportion of relation and entity replacement and the frequency of the entities associated with each relationship to set reasonable proportions for different relations. To verify the validity of the proposed construction framework, it is integrated into the state-of-the-art knowledge graph embedding models, such as TransE, TransH, DistMult, ComplEx, and Analogy. And both the evaluation criteria of relation prediction and entity prediction are used to evaluate the performance of link prediction more comprehensively. The experimental results on two commonly used datasets, WN18 and FB15K, show that the proposed method improves entity link and triplet classification accuracy, especially the accuracy of relational link prediction.

Download Full-text

Exploiting node metadata to predict interactions in large networks using graph embedding and neural networks

10.1101/2021.06.10.447991 ◽

2021 ◽

Author(s):

Rogini Runghen ◽

Daniel B Stouffer ◽

Giulio Valentino Dalla Riva

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Link Prediction ◽

Graph Embedding ◽

Feature Space ◽

Machine Learning Techniques ◽

Large Networks ◽

Data Set ◽

Learning Techniques ◽

Low Dimensional

Collecting network interaction data is difficult. Non-exhaustive sampling and complex hidden processes often result in an incomplete data set. Thus, identifying potentially present but unobserved interactions is crucial both in understanding the structure of large scale data, and in predicting how previously unseen elements will interact. Recent studies in network analysis have shown that accounting for metadata (such as node attributes) can improve both our understanding of how nodes interact with one another, and the accuracy of link prediction. However, the dimension of the object we need to learn to predict interactions in a network grows quickly with the number of nodes. Therefore, it becomes computationally and conceptually challenging for large networks. Here, we present a new predictive procedure combining a graph embedding method with machine learning techniques to predict interactions on the base of nodes' metadata. Graph embedding methods project the nodes of a network onto a---low dimensional---latent feature space. The position of the nodes in the latent feature space can then be used to predict interactions between nodes. Learning a mapping of the nodes' metadata to their position in a latent feature space corresponds to a classic---and low dimensional---machine learning problem. In our current study we used the Random Dot Product Graph model to estimate the embedding of an observed network, and we tested different neural networks architectures to predict the position of nodes in the latent feature space. Flexible machine learning techniques to map the nodes onto their latent positions allow to account for multivariate and possibly complex nodes' metadata. To illustrate the utility of the proposed procedure, we apply it to a large dataset of tourist visits to destinations across New Zealand. We found that our procedure accurately predicts interactions for both existing nodes and nodes newly added to the network, while being computationally feasible even for very large networks. Overall, our study highlights that by exploiting the properties of a well understood statistical model for complex networks and combining it with standard machine learning techniques, we can simplify the link prediction problem when incorporating multivariate node metadata. Our procedure can be immediately applied to different types of networks, and to a wide variety of data from different systems. As such, both from a network science and data science perspective, our work offers a flexible and generalisable procedure for link prediction.

Download Full-text

GLEE: Geometric Laplacian Eigenmap Embedding

Journal of Complex Networks ◽

10.1093/comnet/cnaa007 ◽

2020 ◽

Vol 8 (2) ◽

Author(s):

Leo Torres ◽

Kevin S Chan ◽

Tina Eliassi-Rad

Keyword(s):

Link Prediction ◽

Graph Embedding ◽

Laplacian Matrix ◽

Dimensional Representation ◽

Laplacian Eigenmaps ◽

New Approach ◽

Graph Reconstruction ◽

Node Similarity ◽

Distance Minimization ◽

Low Dimensional

Abstract Graph embedding seeks to build a low-dimensional representation of a graph $G$. This low-dimensional representation is then used for various downstream tasks. One popular approach is Laplacian Eigenmaps (LE), which constructs a graph embedding based on the spectral properties of the Laplacian matrix of $G$. The intuition behind it, and many other embedding techniques, is that the embedding of a graph must respect node similarity: similar nodes must have embeddings that are close to one another. Here, we dispose of this distance-minimization assumption. Instead, we use the Laplacian matrix to find an embedding with geometric properties instead of spectral ones, by leveraging the so-called simplex geometry of $G$. We introduce a new approach, Geometric Laplacian Eigenmap Embedding, and demonstrate that it outperforms various other techniques (including LE) in the tasks of graph reconstruction and link prediction.

Download Full-text

Knowledge graph embedding for data mining vs. knowledge graph embedding for link prediction – two sides of the same coin?

Semantic Web ◽

10.3233/sw-212892 ◽

2022 ◽

pp. 1-24

Author(s):

Jan Portisch ◽

Nicolas Heist ◽

Heiko Paulheim

Keyword(s):

Data Mining ◽

Link Prediction ◽

Graph Embedding ◽

Knowledge Graph ◽

Graph Embeddings ◽

Similarity Functions ◽

Evaluation Methodologies ◽

Series Of Experiments ◽

Two Sides ◽

Lower Dimensional

Knowledge Graph Embeddings, i.e., projections of entities and relations to lower dimensional spaces, have been proposed for two purposes: (1) providing an encoding for data mining tasks, and (2) predicting links in a knowledge graph. Both lines of research have been pursued rather in isolation from each other so far, each with their own benchmarks and evaluation methodologies. In this paper, we argue that both tasks are actually related, and we show that the first family of approaches can also be used for the second task and vice versa. In two series of experiments, we provide a comparison of both families of approaches on both tasks, which, to the best of our knowledge, has not been done so far. Furthermore, we discuss the differences in the similarity functions evoked by the different embedding approaches.

Download Full-text

Graph embedding on biomedical networks: methods, applications and evaluations

Bioinformatics ◽

10.1093/bioinformatics/btz718 ◽

2019 ◽

Cited By ~ 14

Author(s):

Xiang Yue ◽

Zhen Wang ◽

Jingong Huang ◽

Srinivasan Parthasarathy ◽

Soheil Moosavinasab ◽

...

Keyword(s):

Protein Function ◽

State Of The Art ◽

Protein Function Prediction ◽

Graph Embedding ◽

Experimental Results ◽

Supplementary Information ◽

Protein Protein Interaction ◽

Biological Features ◽

Low Dimensional ◽

Embedding Methods

Abstract Motivation Graph embedding learning that aims to automatically learn low-dimensional node representations, has drawn increasing attention in recent years. To date, most recent graph embedding methods are evaluated on social and information networks and are not comprehensively studied on biomedical networks under systematic experiments and analyses. On the other hand, for a variety of biomedical network analysis tasks, traditional techniques such as matrix factorization (which can be seen as a type of graph embedding methods) have shown promising results, and hence there is a need to systematically evaluate the more recent graph embedding methods (e.g. random walk-based and neural network-based) in terms of their usability and potential to further the state-of-the-art. Results We select 11 representative graph embedding methods and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug–drug interaction (DDI) prediction, protein–protein interaction (PPI) prediction; and 2 node classification tasks: medical term semantic type classification, protein function prediction. Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis. Compared with three state-of-the-art methods for DDAs, DDIs and protein function predictions, the recent graph embedding methods achieve competitive performance without using any biological features and the learned embeddings can be treated as complementary representations for the biological features. By summarizing the experimental results, we provide general guidelines for properly selecting graph embedding methods and setting their hyper-parameters for different biomedical tasks. Availability and implementation As part of our contributions in the paper, we develop an easy-to-use Python package with detailed instructions, BioNEV, available at: https://github.com/xiangyue9607/BioNEV, including all source code and datasets, to facilitate studying various graph embedding methods on biomedical tasks. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Bootstrapping Entity Alignment with Knowledge Graph Embedding

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/611 ◽

2018 ◽

Cited By ~ 35

Author(s):

Zequn Sun ◽

Wei Hu ◽

Qingheng Zhang ◽

Yuzhong Qu

Keyword(s):

Performance Improvement ◽

Real World ◽

State Of The Art ◽

Graph Embedding ◽

Training Data ◽

Knowledge Graph ◽

Error Accumulation ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Embedding-based entity alignment represents different knowledge graphs (KGs) as low-dimensional embeddings and finds entity alignment by measuring the similarities between entity embeddings. Existing approaches have achieved promising results, however, they are still challenged by the lack of enough prior alignment as labeled training data. In this paper, we propose a bootstrapping approach to embedding-based entity alignment. It iteratively labels likely entity alignment as training data for learning alignment-oriented KG embeddings. Furthermore, it employs an alignment editing method to reduce error accumulation during iterations. Our experiments on real-world datasets showed that the proposed approach significantly outperformed the state-of-the-art embedding-based ones for entity alignment. The proposed alignment-oriented KG embedding, bootstrapping process and alignment editing method all contributed to the performance improvement.

Download Full-text