An Approach to Knowledge Base Completion by a Committee-Based Knowledge Graph Embedding

Knowledge bases such as Freebase, YAGO, DBPedia, and Nell contain a number of facts with various entities and relations. Since they store many facts, they are regarded as core resources for many natural language processing tasks. Nevertheless, they are not normally complete and have many missing facts. Such missing facts keep them from being used in diverse applications in spite of their usefulness. Therefore, it is significant to complete knowledge bases. Knowledge graph embedding is one of the promising approaches to completing a knowledge base and thus many variants of knowledge graph embedding have been proposed. It maps all entities and relations in knowledge base onto a low dimensional vector space. Then, candidate facts that are plausible in the space are determined as missing facts. However, any single knowledge graph embedding is insufficient to complete a knowledge base. As a solution to this problem, this paper defines knowledge base completion as a ranking task and proposes a committee-based knowledge graph embedding model for improving the performance of knowledge base completion. Since each knowledge graph embedding has its own idiosyncrasy, we make up a committee of various knowledge graph embeddings to reflect various perspectives. After ranking all candidate facts according to their plausibility computed by the committee, the top-k facts are chosen as missing facts. Our experimental results on two data sets show that the proposed model achieves higher performance than any single knowledge graph embedding and shows robust performances regardless of k. These results prove that the proposed model considers various perspectives in measuring the plausibility of candidate facts.

Download Full-text

Entity Alignment between Knowledge Graphs Using Attribute Embeddings

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301297 ◽

2019 ◽

Vol 33 ◽

pp. 297-304 ◽

Cited By ~ 26

Author(s):

Bayu Distiawan Trisedya ◽

Jianzhong Qi ◽

Rui Zhang

Keyword(s):

Real World ◽

Graph Embedding ◽

Knowledge Bases ◽

Knowledge Graph ◽

World Knowledge ◽

Large Numbers ◽

Proposed Model ◽

Alignment Task ◽

Transitivity Rule ◽

Knowledge Graphs

The task of entity alignment between knowledge graphs aims to find entities in two knowledge graphs that represent the same real-world entity. Recently, embedding-based models are proposed for this task. Such models are built on top of a knowledge graph embedding model that learns entity embeddings to capture the semantic similarity between entities in the same knowledge graph. We propose to learn embeddings that can capture the similarity between entities in different knowledge graphs. Our proposed model helps align entities from different knowledge graphs, and hence enables the integration of multiple knowledge graphs. Our model exploits large numbers of attribute triples existing in the knowledge graphs and generates attribute character embeddings. The attribute character embedding shifts the entity embeddings from two knowledge graphs into the same space by computing the similarity between entities based on their attributes. We use a transitivity rule to further enrich the number of attributes of an entity to enhance the attribute character embedding. Experiments using real-world knowledge bases show that our proposed model achieves consistent improvements over the baseline models by over 50% in terms of hits@1 on the entity alignment task.

Download Full-text

LENA: Locality-Expanded Neural Embedding for Knowledge Base Completion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33012895 ◽

2019 ◽

Vol 33 ◽

pp. 2895-2902

Author(s):

Fanshuang Kong ◽

Richong Zhang ◽

Yongyi Mao ◽

Ting Deng

Keyword(s):

Knowledge Base ◽

Structural Information ◽

Relevant Information ◽

Knowledge Bases ◽

Loss Functions ◽

Knowledge Graph ◽

Global Information ◽

Sufficient Statistic ◽

Proposed Model ◽

Significant Research

Embedding based models for knowledge base completion have demonstrated great successes and attracted significant research interest. In this work, we observe that existing embedding models all have their loss functions decomposed into atomic loss functions, each on a triple or an postulated edge in the knowledge graph. Such an approach essentially implies that conditioned on the embeddings of the triple, whether the triple is factual is independent of the structure of the knowledge graph. Although arguably the embeddings of the entities and relation in the triple contain certain structural information of the knowledge base, we believe that the global information contained in the embeddings of the triple can be insufficient and such an assumption is overly optimistic in heterogeneous knowledge bases. Motivated by this understanding, in this work we propose a new embedding model in which we discard the assumption that the embeddings of the entities and relation in a triple is a sufficient statistic for the triple’s factual existence. More specifically, the proposed model assumes that whether a triple is factual depends not only on the embedding of the triple but also on the embeddings of the entities and relations in a larger graph neighbourhood. In this model, attention mechanisms are constructed to select the relevant information in the graph neighbourhood so that irrelevant signals in the neighbourhood are suppressed. Termed locality-expanded neural embedding with attention (LENA), this model is tested on four standard datasets and compared with several stateof-the-art models for knowledge base completion. Extensive experiments suggest that LENA outperforms the existing models in virtually every metric.

Download Full-text

TransET: Knowledge Graph Embedding with Entity Types

Electronics ◽

10.3390/electronics10121407 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1407

Author(s):

Peng Wang ◽

Jing Zhou ◽

Yuzhang Liu ◽

Xingchen Zhou

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Score Function ◽

Graph Embedding ◽

Vector Spaces ◽

Knowledge Graph ◽

Semantic Features ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.

Download Full-text

Improving the Quality of Linked Data Using Statistical Distributions

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch074 ◽

2018 ◽

pp. 1638-1664 ◽

Cited By ~ 1

Author(s):

Heiko Paulheim ◽

Christian Bizer

Keyword(s):

Knowledge Base ◽

Linked Data ◽

Relational Databases ◽

Knowledge Bases ◽

Structured Data ◽

Data Sources ◽

Data Sets ◽

Statistical Distributions ◽

The Web

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.

Download Full-text

Graph representation learning: a survey

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2020.13 ◽

2020 ◽

Vol 9 ◽

Author(s):

Fenxiao Chen ◽

Yun-Cheng Wang ◽

Bin Wang ◽

C.-C. Jay Kuo

Keyword(s):

Graph Embedding ◽

Large Data ◽

Representation Learning ◽

Graph Representation ◽

Data Sets ◽

Graph Data ◽

Graph Properties ◽

Wide Range ◽

Regular Lattices ◽

Low Dimensional

Abstract Research on graph representation learning has received great attention in recent years since most data in real-world applications come in the form of graphs. High-dimensional graph data are often in irregular forms. They are more difficult to analyze than image/video/audio data defined on regular lattices. Various graph embedding techniques have been developed to convert the raw graph data into a low-dimensional vector representation while preserving the intrinsic graph properties. In this review, we first explain the graph embedding task and its challenges. Next, we review a wide range of graph embedding techniques with insights. Then, we evaluate several stat-of-the-art methods against small and large data sets and compare their performance. Finally, potential applications and future directions are presented.

Download Full-text

AI-CTO: Knowledge graph for automated and dependable software stack solution

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200899 ◽

2021 ◽

Vol 40 (1) ◽

pp. 799-812

Author(s):

Xiaoyun Xu ◽

Jingzheng Wu ◽

Mutian Yang ◽

Tianyue Luo ◽

Qianru Meng ◽

...

Keyword(s):

Current Practice ◽

Graph Embedding ◽

Software Systems ◽

Professional Experience ◽

Knowledge Graph ◽

Dimensional Vector ◽

Dimensional Vector Space ◽

Svm Model ◽

Low Dimensional ◽

Industry Experience

As the scale of software systems continues expanding, software architecture is receiving more and more attention as the blueprint for the complex software system. An outstanding architecture requires a lot of professional experience and expertise. In current practice, architects try to find solutions manually, which is time-consuming and error-prone because of the knowledge barrier between newcomers and experienced architects. The problem can be solved by easing the process of apply experience from prominent architects. To this end, this paper proposes a novel graph-embedding-based method, AI-CTO, to automatically suggest software stack solutions according to the knowledge and experience of prominent architects. Firstly, AI-CTO converts existing industry experience to knowledge, i.e., knowledge graph. Secondly, the knowledge graph is embedded in a low-dimensional vector space. Then, the entity vectors are used to predict valuable software stack solutions by an SVM model. We evaluate AI-CTO with two case studies and compare its solutions with the software stacks of large companies. The experiment results show that AI-CTO can find effective and correct stack solutions and it outperforms other baseline methods.

Download Full-text

A Survey on Knowledge Graph Embeddings for Link Prediction

Symmetry ◽

10.3390/sym13030485 ◽

2021 ◽

Vol 13 (3) ◽

pp. 485

Author(s):

Meihong Wang ◽

Linling Qiu ◽

Xiaoli Wang

Keyword(s):

Language Processing ◽

Link Prediction ◽

Knowledge Graph ◽

Practical Utilization ◽

Complete Knowledge ◽

Benchmark Datasets ◽

Comprehensive Survey ◽

Knowledge Graphs ◽

Representation Techniques ◽

Open Nature

Knowledge graphs (KGs) have been widely used in the field of artificial intelligence, such as in information retrieval, natural language processing, recommendation systems, etc. However, the open nature of KGs often implies that they are incomplete, having self-defects. This creates the need to build a more complete knowledge graph for enhancing the practical utilization of KGs. Link prediction is a fundamental task in knowledge graph completion that utilizes existing relations to infer new relations so as to build a more complete knowledge graph. Numerous methods have been proposed to perform the link-prediction task based on various representation techniques. Among them, KG-embedding models have significantly advanced the state of the art in the past few years. In this paper, we provide a comprehensive survey on KG-embedding models for link prediction in knowledge graphs. We first provide a theoretical analysis and comparison of existing methods proposed to date for generating KG embedding. Then, we investigate several representative models that are classified into five categories. Finally, we conducted experiments on two benchmark datasets to report comprehensive findings and provide some new insights into the strengths and weaknesses of existing models.

Download Full-text

Entity-Centric Fully Connected GCN for Relation Classification

Applied Sciences ◽

10.3390/app11041377 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1377

Author(s):

Jun Long ◽

Ye Wang ◽

Xiangxiang Wei ◽

Zhen Ding ◽

Qianqian Qi ◽

...

Keyword(s):

Language Processing ◽

Knowledge Graph ◽

Data Sets ◽

Semantic Features ◽

Convolutional Network ◽

Aggregate Information ◽

Dependency Tree ◽

Relation Classification ◽

The Cost ◽

Fully Connected

Relation classification is an important task in the field of natural language processing, and it is one of the important steps in constructing a knowledge graph, which can greatly reduce the cost of constructing a knowledge graph. The Graph Convolutional Network (GCN) is an effective model for accurate relation classification, which models the dependency tree of textual instances to extract the semantic features of relation mentions. Previous GCN based methods treat each node equally. However, the contribution of different words to express a certain relation is different, especially the entity mentions in the sentence. In this paper, a novel GCN based relation classifier is propose, which treats the entity nodes as two global nodes in the dependency tree. These two global nodes directly connect with other nodes, which can aggregate information from the whole tree with only one convolutional layer. In this way, the method can not only simplify the complexity of the model, but also generate expressive relation representation. Experimental results on two widely used data sets, SemEval-2010 Task 8 and TACRED, show that our model outperforms all the compared baselines in this paper, which illustrates that the model can effectively utilize the dependencies between nodes and improve the performance of relation classification.

Download Full-text

Understanding Negative Sampling in Knowledge Graph Embedding

International Journal of Artificial Intelligence & Applications ◽

10.5121/ijaia.2021.12105 ◽

2021 ◽

Vol 12 (1) ◽

pp. 71-81

Author(s):

Jing Qian ◽

Gangmin Li ◽

Katie Atkinson ◽

Yong Yue

Keyword(s):

Link Prediction ◽

Graph Embedding ◽

Knowledge Graph ◽

Direct Impact ◽

Dimensional Vector Space ◽

Dynamic Distribution ◽

Space Efficiency ◽

Node Classification ◽

Low Dimensional

Knowledge graph embedding (KGE) is to project entities and relations of a knowledge graph (KG) into a low-dimensional vector space, which has made steady progress in recent years. Conventional KGE methods, especially translational distance-based models, are trained through discriminating positive samples from negative ones. Most KGs store only positive samples for space efficiency. Negative sampling thus plays a crucial role in encoding triples of a KG. The quality of generated negative samples has a direct impact on the performance of learnt knowledge representation in a myriad of downstream tasks, such as recommendation, link prediction and node classification. We summarize current negative sampling approaches in KGE into three categories, static distribution-based, dynamic distribution-based and custom cluster-based respectively. Based on this categorization we discuss the most prevalent existing approaches and their characteristics. It is a hope that this review can provide some guidelines for new thoughts about negative sampling in KGE.

Download Full-text

End-to-end Relation-Enhanced Learnable Graph Self-attention Network for Knowledge Graphs Embedding

10.21203/rs.3.rs-396932/v1 ◽

2021 ◽

Author(s):

Shengchen Jiang ◽

Hongbin Wang ◽

Xiang Hou

Keyword(s):

Large Scale ◽

Structural Characteristics ◽

Graph Embedding ◽

Knowledge Graph ◽

Data Sets ◽

Relevance Ranking ◽

Convolutional Network ◽

Attention Network ◽

Knowledge Graphs ◽

End To End

Abstract The existing methods ignore the adverse effect of knowledge graph incompleteness on knowledge graph embedding. In addition, the complexity and large-scale of knowledge information hinder knowledge graph embedding performance of the classic graph convolutional network. In this paper, we analyzed the structural characteristics of knowledge graph and the imbalance of knowledge information. Complex knowledge information requires that the model should have better learnability, rather than linearly weighted qualitative constraints, so the method of end-to-end relation-enhanced learnable graph self-attention network for knowledge graphs embedding is proposed. Firstly, we construct the relation-enhanced adjacency matrix to consider the incompleteness of the knowledge graph. Secondly, the graph self-attention network is employed to obtain the global encoding and relevance ranking of entity node information. Thirdly, we propose the concept of convolutional knowledge subgraph, it is constructed according to the entity relevance ranking. Finally, we improve the training effect of the convKB model by changing the construction of negative samples to obtain a better reliability score in the decoder. The experimental results based on the data sets FB15k-237 and WN18RR show that the proposed method facilitates more comprehensive representation of knowledge information than the existing methods, in terms of Hits@10 and MRR.

Download Full-text