Knowledge Graph Representation via Similarity-Based Embedding

Knowledge graph, a typical multi-relational structure, includes large-scale facts of the world, yet it is still far away from completeness. Knowledge graph embedding, as a representation method, constructs a low-dimensional and continuous space to describe the latent semantic information and predict the missing facts. Among various solutions, almost all embedding models have high time and memory-space complexities and, hence, are difficult to apply to large-scale knowledge graphs. Some other embedding models, such as TransE and DistMult, although with lower complexity, ignore inherent features and only use correlations between different entities to represent the features of each entity. To overcome these shortcomings, we present a novel low-complexity embedding model, namely, SimE-ER, to calculate the similarity of entities in independent and associated spaces. In SimE-ER, each entity (relation) is described as two parts. The entity (relation) features in independent space are represented by the features entity (relation) intrinsically owns and, in associated space, the entity (relation) features are expressed by the entity (relation) features they connect. And the similarity between the embeddings of the same entities in different representation spaces is high. In experiments, we evaluate our model with two typical tasks: entity prediction and relation prediction. Compared with the state-of-the-art models, our experimental results demonstrate that SimE-ER outperforms existing competitors and has low time and memory-space complexities.

Download Full-text

Knowledge Embedding with Geospatial Distance Restriction for Geographic Knowledge Graph Completion

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8060254 ◽

2019 ◽

Vol 8 (6) ◽

pp. 254 ◽

Cited By ~ 1

Author(s):

Peiyuan Qiu ◽

Jialiang Gao ◽

Li Yu ◽

Feng Lu

Keyword(s):

Large Scale ◽

Semantic Network ◽

Average Error ◽

Knowledge Graph ◽

Web Resource ◽

Related Information ◽

Geographic Knowledge ◽

Relation Prediction ◽

Low Dimensional

A Geographic Knowledge Graph (GeoKG) links geographic relation triplets into a large-scale semantic network utilizing the semantic of geo-entities and geo-relations. Unfortunately, the sparsity of geo-related information distribution on the web leads to a situation where information extraction systems can hardly detect enough references of geographic information in the massive web resource to be able to build relatively complete GeoKGs. This incompleteness, due to missing geo-entities or geo-relations in GeoKG fact triplets, seriously impacts the performance of GeoKG applications. In this paper, a method with geospatial distance restriction is presented to optimize knowledge embedding for GeoKG completion. This method aims to encode both the semantic information and geospatial distance restriction of geo-entities and geo-relations into a continuous, low-dimensional vector space. Then, the missing facts of the GeoKG can be supplemented through vector operations. Specifically, the geospatial distance restriction is realized as the weights of the objective functions of current translation knowledge embedding models. These optimized models output the optimized representations of geo-entities and geo-relations for the GeoKG’s completion. The effects of the presented method are validated with a real GeoKG. Compared with the results of the original models, the presented method improves the metric Hits@10(Filter) by an average of 6.41% for geo-entity prediction, and the Hits@1(Filter) by an average of 31.92%, for geo-relation prediction. Furthermore, the capacity of the proposed method to predict the locations of unknown entities is validated. The results show the geospatial distance restriction reduced the average error distance of prediction by between 54.43% and 57.24%. All the results support the geospatial distance restriction hiding in the GeoKG contributing to refining the embedding representations of geo-entities and geo-relations, which plays a crucial role in improving the quality of GeoKG completion.

Download Full-text

Knowledge Graph Completion for the Chinese Text of Cultural Relics Based on Bidirectional Encoder Representations from Transformers with Entity-Type Information

Entropy ◽

10.3390/e22101168 ◽

2020 ◽

Vol 22 (10) ◽

pp. 1168

Author(s):

Min Zhang ◽

Guohua Geng ◽

Sheng Zeng ◽

Huaping Jia

Keyword(s):

Chinese Text ◽

Large Scale ◽

Semantic Information ◽

Training Model ◽

Knowledge Graph ◽

Deep Model ◽

The Rich ◽

Relation Prediction ◽

Type Information ◽

Cultural Relics

Knowledge graph completion can make knowledge graphs more complete, which is a meaningful research topic. However, the existing methods do not make full use of entity semantic information. Another challenge is that a deep model requires large-scale manually labelled data, which greatly increases manual labour. In order to alleviate the scarcity of labelled data in the field of cultural relics and capture the rich semantic information of entities, this paper proposes a model based on the Bidirectional Encoder Representations from Transformers (BERT) with entity-type information for the knowledge graph completion of the Chinese texts of cultural relics. In this work, the knowledge graph completion task is treated as a classification task, while the entities, relations and entity-type information are integrated as a textual sequence, and the Chinese characters are used as a token unit in which input representation is constructed by summing token, segment and position embeddings. A small number of labelled data are used to pre-train the model, and then, a large number of unlabelled data are used to fine-tune the pre-training model. The experiment results show that the BERT-KGC model with entity-type information can enrich the semantics information of the entities to reduce the degree of ambiguity of the entities and relations to some degree and achieve more effective performance than the baselines in triple classification, link prediction and relation prediction tasks using 35% of the labelled data of cultural relics.

Download Full-text

Caps-OWKG: a capsule network model for open-world knowledge graph

International Journal of Machine Learning and Cybernetics ◽

10.1007/s13042-020-01259-4 ◽

2021 ◽

Author(s):

Yuhan Wang ◽

Weidong Xiao ◽

Zhen Tan ◽

Xiang Zhao

Keyword(s):

Representation Learning ◽

Graph Representation ◽

Knowledge Graph ◽

World Knowledge ◽

Relational Structures ◽

Open World ◽

Latent Features ◽

Knowledge Graphs ◽

Low Dimensional ◽

Better Than

AbstractKnowledge graphs are typical multi-relational structures, which is consisted of many entities and relations. Nonetheless, existing knowledge graphs are still sparse and far from being complete. To refine the knowledge graphs, representation learning is utilized to embed entities and relations into low-dimensional spaces. Many existing knowledge graphs embedding models focus on learning latent features in close-world assumption but omit the changeable of each knowledge graph.In this paper, we propose a knowledge graph representation learning model, called Caps-OWKG, which leverages the capsule network to capture the both known and unknown triplets features in open-world knowledge graph. It combines the descriptive text and knowledge graph to get descriptive embedding and structural embedding, simultaneously. Then, the both above embeddings are used to calculate the probability of triplet authenticity. We verify the performance of Caps-OWKG on link prediction task with two common datasets FB15k-237-OWE and DBPedia50k. The experimental results are better than other baselines, and achieve the state-of-the-art performance.

Download Full-text

SepNE: Bringing Separability to Network Embedding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014261 ◽

2019 ◽

Vol 33 ◽

pp. 4261-4268 ◽

Cited By ~ 2

Author(s):

Ziyao Li ◽

Liang Zhang ◽

Guojie Song

Keyword(s):

Large Scale ◽

State Of The Art ◽

Dynamic Networks ◽

Distributed Learning ◽

Network Embedding ◽

Large Networks ◽

Comparable Accuracy ◽

Large Scale Networks ◽

Low Dimensional ◽

Almost All

Many successful methods have been proposed for learning low dimensional representations on large-scale networks, while almost all existing methods are designed in inseparable processes, learning embeddings for entire networks even when only a small proportion of nodes are of interest. This leads to great inconvenience, especially on super-large or dynamic networks, where these methods become almost impossible to implement. In this paper, we formalize the problem of separated matrix factorization, based on which we elaborate a novel objective function that preserves both local and global information. We further propose SepNE, a simple and flexible network embedding algorithm which independently learns representations for different subsets of nodes in separated processes. By implementing separability, our algorithm reduces the redundant efforts to embed irrelevant nodes, yielding scalability to super-large networks, automatic implementation in distributed learning and further adaptations. We demonstrate the effectiveness of this approach on several real-world networks with different scales and subjects. With comparable accuracy, our approach significantly outperforms state-of-the-art baselines in running times on large networks.

Download Full-text

A Method to Learn Embedding of a Probabilistic Medical Knowledge Graph: Algorithm Development (Preprint)

10.2196/preprints.17645 ◽

2019 ◽

Author(s):

Linfeng Li ◽

Peng Wang ◽

Yao Wang ◽

Shenghui Wang ◽

Jun Yan ◽

...

Keyword(s):

Medical Records ◽

Large Scale ◽

Semantic Representation ◽

Medical Knowledge ◽

Mapping Function ◽

Graph Algorithm ◽

Knowledge Graph ◽

Knowledge Graphs ◽

Representation Method ◽

Better Than

BACKGROUND Knowledge graph embedding is an effective semantic representation method for entities and relations in knowledge graphs. Several translation-based algorithms, including TransE, TransH, TransR, TransD, and TranSparse, have been proposed to learn effective embedding vectors from typical knowledge graphs in which the relations between head and tail entities are deterministic. However, in medical knowledge graphs, the relations between head and tail entities are inherently probabilistic. This difference introduces a challenge in embedding medical knowledge graphs. OBJECTIVE We aimed to address the challenge of how to learn the probability values of triplets into representation vectors by making enhancements to existing TransX (where X is E, H, R, D, or Sparse) algorithms, including the following: (1) constructing a mapping function between the score value and the probability, and (2) introducing probability-based loss of triplets into the original margin-based loss function. METHODS We performed the proposed PrTransX algorithm on a medical knowledge graph that we built from large-scale real-world electronic medical records data. We evaluated the embeddings using link prediction task. RESULTS Compared with the corresponding TransX algorithms, the proposed PrTransX performed better than the TransX model in all evaluation indicators, achieving a higher proportion of corrected entities ranked in the top 10 and normalized discounted cumulative gain of the top 10 predicted tail entities, and lower mean rank. CONCLUSIONS The proposed PrTransX successfully incorporated the uncertainty of the knowledge triplets into the embedding vectors.

Download Full-text

A Framework for Service Semantic Description Based on Knowledge Graph

Electronics ◽

10.3390/electronics10091017 ◽

2021 ◽

Vol 10 (9) ◽

pp. 1017

Author(s):

Qitong Sun ◽

Jun Han ◽

Dianfu Ma

Keyword(s):

Service Discovery ◽

Large Scale ◽

Semantic Information ◽

Knowledge Graph ◽

Data Sets ◽

Accuracy Rate ◽

Data Set ◽

File Storage ◽

Representation Method ◽

The Relationship

To construct a large-scale service knowledge graph is necessary. We propose a method, namely semantic information extension, for service knowledge graphs. We insist on the information of services described by Web Services Description Language (WSDL) and we design the ontology layer of web service knowledge graph and construct the service graph, and using the WSDL document data set, the generated service knowledge graph contains 3738 service entities. In particular, our method can give a full performance to its effect in service discovery. To evaluate our approach, we conducted two sets of experiments to explore the relationship between services and classify services that develop by service descriptions. We constructed two experimental data sets, then designed and trained two different deep neural networks for the two tasks to extract the semantics of the natural language used in the service discovery task. In the prediction task of exploring the relationship between services, the prediction accuracy rate reached 95.1%, and in the service classification experiment, the accuracy rate of TOP5 reached 60.8%. Our experience shows that the service knowledge graph has additional advantages over traditional file storage when managing additional semantic information is effective and the new service representation method is helpful for service discovery and composition tasks.

Download Full-text

An Efficient Knowledge-Graph-Based Web Service Recommendation Algorithm

Symmetry ◽

10.3390/sym11030392 ◽

2019 ◽

Vol 11 (3) ◽

pp. 392 ◽

Cited By ~ 2

Author(s):

Zhiying Cao ◽

Xinghao Qiao ◽

Shuo Jiang ◽

Xiuguo Zhang

Keyword(s):

Web Service ◽

Semantic Information ◽

Dimensional Space ◽

Representation Learning ◽

Recall Rate ◽

Graph Representation ◽

Knowledge Graph ◽

Service Recommendation ◽

Recommendation Algorithm ◽

Low Dimensional

Using semantic information can help to accurately find suitable services from a variety of available (different semantics) services, and the semantic information of Web services can be described in detail in a Web service knowledge graph. In this paper, a Web service recommendation algorithm based on knowledge graph representation learning (kg-WSR) is proposed. The algorithm embeds the entities and relationships of the knowledge graph into the low-dimensional vector space. By calculating the distance between service entities in low-dimensional space, the relationship information of services which is not considered in recommendation approaches using a collaborative filtering algorithm is incorporated into the recommendation algorithm to enhance the accurateness of the result. The experimental results show that this algorithm can not only effectively improve the accuracy rate, recall rate, and coverage rate of recommendation but also solve the cold start problem to some extent.

Download Full-text

Knowledge Graph Representation with Jointly Structural and Textual Encoding

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/183 ◽

2017 ◽

Cited By ~ 21

Author(s):

Jiacheng Xu ◽

Xipeng Qiu ◽

Kan Chen ◽

Xuanjing Huang

Keyword(s):

Graph Representation ◽

Knowledge Graph ◽

Neural Models ◽

Structure Information ◽

Related Information ◽

Gating Mechanism ◽

Deep Architecture ◽

Classification Tasks ◽

Low Dimensional ◽

Representation Of Knowledge

The objective of knowledge graph embedding is to encode both entities and relations of knowledge graphs into continuous low-dimensional vector spaces. Previously, most works focused on symbolic representation of knowledge graph with structure information, which can not handle new entities or entities with few facts well. In this paper, we propose a novel deep architecture to utilize both structural and textual information of entities. Specifically, we introduce three neural models to encode the valuable information from text description of entity, among which an attentive model can select related information as needed. Then, a gating mechanism is applied to integrate representations of structure and text into a unified architecture. Experiments show that our models outperform baseline and obtain state-of-the-art results on link prediction and triplet classification tasks.

Download Full-text

A Method to Learn Embedding of a Probabilistic Medical Knowledge Graph: Algorithm Development

JMIR Medical Informatics ◽

10.2196/17645 ◽

2020 ◽

Vol 8 (5) ◽

pp. e17645

Author(s):

Linfeng Li ◽

Peng Wang ◽

Yao Wang ◽

Shenghui Wang ◽

Jun Yan ◽

...

Keyword(s):

Medical Records ◽

Large Scale ◽

Semantic Representation ◽

Medical Knowledge ◽

Mapping Function ◽

Graph Algorithm ◽

Knowledge Graph ◽

Knowledge Graphs ◽

Representation Method ◽

Better Than

Background Knowledge graph embedding is an effective semantic representation method for entities and relations in knowledge graphs. Several translation-based algorithms, including TransE, TransH, TransR, TransD, and TranSparse, have been proposed to learn effective embedding vectors from typical knowledge graphs in which the relations between head and tail entities are deterministic. However, in medical knowledge graphs, the relations between head and tail entities are inherently probabilistic. This difference introduces a challenge in embedding medical knowledge graphs. Objective We aimed to address the challenge of how to learn the probability values of triplets into representation vectors by making enhancements to existing TransX (where X is E, H, R, D, or Sparse) algorithms, including the following: (1) constructing a mapping function between the score value and the probability, and (2) introducing probability-based loss of triplets into the original margin-based loss function. Methods We performed the proposed PrTransX algorithm on a medical knowledge graph that we built from large-scale real-world electronic medical records data. We evaluated the embeddings using link prediction task. Results Compared with the corresponding TransX algorithms, the proposed PrTransX performed better than the TransX model in all evaluation indicators, achieving a higher proportion of corrected entities ranked in the top 10 and normalized discounted cumulative gain of the top 10 predicted tail entities, and lower mean rank. Conclusions The proposed PrTransX successfully incorporated the uncertainty of the knowledge triplets into the embedding vectors.

Download Full-text

A Novel Negative Sampling Based on Frequency of Relational Association Entities for Knowledge Graph Embedding

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2068 ◽

2021 ◽

Author(s):

Wanhua Cao ◽

Yi Zhang ◽

Juntao Liu ◽

Ziyun Rao

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Evaluation Criteria ◽

Relation Extraction ◽

Graph Embedding ◽

Semantic Space ◽

Knowledge Graph ◽

Knowledge Reasoning ◽

Relation Prediction ◽

Low Dimensional

Knowledge graph embedding improves the performance of relation extraction and knowledge reasoning by encoding entities and relationships in low-dimensional semantic space. During training, negative samples are usually constructed by replacing the head/tail entity. And the different replacing relationships lead to different accuracy of the prediction results. This paper develops a negative triplets construction framework according to the frequency of relational association entities. The proposed construction framework can fully consider the quantitative of relations and entities in the dataset to assign the proportion of relation and entity replacement and the frequency of the entities associated with each relationship to set reasonable proportions for different relations. To verify the validity of the proposed construction framework, it is integrated into the state-of-the-art knowledge graph embedding models, such as TransE, TransH, DistMult, ComplEx, and Analogy. And both the evaluation criteria of relation prediction and entity prediction are used to evaluate the performance of link prediction more comprehensively. The experimental results on two commonly used datasets, WN18 and FB15K, show that the proposed method improves entity link and triplet classification accuracy, especially the accuracy of relational link prediction.

Download Full-text