Representation Learning of Knowledge Graphs with Embedding Subspaces

Most of the existing knowledge graph embedding models are supervised methods and largely relying on the quality and quantity of obtainable labelled training data. The cost of obtaining high quality triples is high and the data sources are facing a serious problem of data sparsity, which may result in insufficient training of long-tail entities. However, unstructured text encoding entities and relational knowledge can be obtained anywhere in large quantities. Word vectors of entity names estimated from the unlabelled raw text using natural language model encode syntax and semantic properties of entities. Yet since these feature vectors are estimated through minimizing prediction error on unsupervised entity names, they may not be the best for knowledge graphs. We propose a two-phase approach to adapt unsupervised entity name embeddings to a knowledge graph subspace and jointly learn the adaptive matrix and knowledge representation. Experiments on Freebase show that our method can rely less on the labelled data and outperforms the baselines when the labelled data is relatively less. Especially, it is applicable to zero-shot scenario.

Download Full-text

One-Shot Relation Learning for Knowledge Graphs via Neighborhood Aggregation and Paths Encoding

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3484729 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-19

Author(s):

Jian Sun ◽

Yu Zhou ◽

Chengqing Zong

Keyword(s):

Link Prediction ◽

Auxiliary Information ◽

Training Data ◽

Knowledge Graph ◽

Long Tail ◽

Neighborhood Information ◽

Knowledge Graphs ◽

Relation Prediction ◽

Relation Learning ◽

The Impact

The relation learning between two entities is an essential task in knowledge graph (KG) completion that has received much attention recently. Previous work almost exclusively focused on relations widely seen in the original KGs, which means that enough training data are available for modeling. However, long-tail relations that only show in a few triples are actually much more common in practical KGs. Without sufficiently large training data, the performance of existing models on predicting long-tail relations drops impressively. This work aims to predict the relation under a challenging setting where only one instance is available for training. We propose a path-based one-shot relation prediction framework, which can extract neighborhood information of an entity based on the relation query attention mechanism to learn transferable knowledge among the same relation. Simultaneously, to reduce the impact of long-tail entities on relation prediction, we selectively fuse path information between entity pairs as auxiliary information of relation features. Experiments in three one-shot relation learning datasets show that our proposed framework substantially outperforms existing models on one-shot link prediction and relation prediction.

Download Full-text

SPADE: A Semi-supervised Probabilistic Approach for Detecting Errors in Tables

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/488 ◽

2021 ◽

Author(s):

Minh Pham ◽

Craig A. Knoblock ◽

Muhao Chen ◽

Binh Vu ◽

Jay Pujara

Keyword(s):

Error Detection ◽

Data Augmentation ◽

State Of The Art ◽

Probabilistic Approach ◽

Human Interaction ◽

Training Data ◽

Two Phase ◽

Learning Classifier ◽

Supervised Methods ◽

Phase Data

Error detection is one of the most important steps in data cleaning and usually requires extensive human interaction to ensure quality. Existing supervised methods in error detection require a significant amount of training data while unsupervised methods rely on fixed inductive biases, which are usually hard to generalize, to solve the problem. In this paper, we present SPADE, a novel semi-supervised probabilistic approach for error detection. SPADE introduces a novel probabilistic active learning model, where the system suggests examples to be labeled based on the agreements between user labels and indicative signals, which are designed to capture potential errors. SPADE uses a two-phase data augmentation process to enrich a dataset before training a deep learning classifier to detect unlabeled errors. In our evaluation, SPADE achieves an average F1-score of 0.91 over five datasets and yields a 10% improvement compared with the state-of-the-art systems.

Download Full-text

Caps-OWKG: a capsule network model for open-world knowledge graph

International Journal of Machine Learning and Cybernetics ◽

10.1007/s13042-020-01259-4 ◽

2021 ◽

Author(s):

Yuhan Wang ◽

Weidong Xiao ◽

Zhen Tan ◽

Xiang Zhao

Keyword(s):

Representation Learning ◽

Graph Representation ◽

Knowledge Graph ◽

World Knowledge ◽

Relational Structures ◽

Open World ◽

Latent Features ◽

Knowledge Graphs ◽

Low Dimensional ◽

Better Than

AbstractKnowledge graphs are typical multi-relational structures, which is consisted of many entities and relations. Nonetheless, existing knowledge graphs are still sparse and far from being complete. To refine the knowledge graphs, representation learning is utilized to embed entities and relations into low-dimensional spaces. Many existing knowledge graphs embedding models focus on learning latent features in close-world assumption but omit the changeable of each knowledge graph.In this paper, we propose a knowledge graph representation learning model, called Caps-OWKG, which leverages the capsule network to capture the both known and unknown triplets features in open-world knowledge graph. It combines the descriptive text and knowledge graph to get descriptive embedding and structural embedding, simultaneously. Then, the both above embeddings are used to calculate the probability of triplet authenticity. We verify the performance of Caps-OWKG on link prediction task with two common datasets FB15k-237-OWE and DBPedia50k. The experimental results are better than other baselines, and achieve the state-of-the-art performance.

Download Full-text

A Joint Model for Representation Learning of Tibetan Knowledge Graph Based on Encyclopedia

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3447248 ◽

2021 ◽

Vol 20 (2) ◽

pp. 1-17

Author(s):

Yuan Sun ◽

Andong Chen ◽

Chaofan Chen ◽

Tianci Xia ◽

Xiaobing Zhao

Keyword(s):

Neural Networks ◽

Language Processing ◽

Representation Learning ◽

Joint Model ◽

Graph Representation ◽

Knowledge Graph ◽

High Resource ◽

Complex Information ◽

Language Knowledge ◽

Knowledge Graphs

Learning the representation of a knowledge graph is critical to the field of natural language processing. There is a lot of research for English knowledge graph representation. However, for the low-resource languages, such as Tibetan, how to represent sparse knowledge graphs is a key problem. In this article, aiming at scarcity of Tibetan knowledge graphs, we extend the Tibetan knowledge graph by using the triples of the high-resource language knowledge graphs and Point of Information map information. To improve the representation learning of the Tibetan knowledge graph, we propose a joint model to merge structure and entity description information based on the Translating Embeddings and Convolution Neural Networks models. In addition, to solve the segmentation errors, we use character and word embedding to learn more complex information in Tibetan. Finally, the experimental results show that our model can make a better representation of the Tibetan knowledge graph than the baseline.

Download Full-text

Bootstrapping Entity Alignment with Knowledge Graph Embedding

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/611 ◽

2018 ◽

Cited By ~ 35

Author(s):

Zequn Sun ◽

Wei Hu ◽

Qingheng Zhang ◽

Yuzhong Qu

Keyword(s):

Performance Improvement ◽

Real World ◽

State Of The Art ◽

Graph Embedding ◽

Training Data ◽

Knowledge Graph ◽

Error Accumulation ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Embedding-based entity alignment represents different knowledge graphs (KGs) as low-dimensional embeddings and finds entity alignment by measuring the similarities between entity embeddings. Existing approaches have achieved promising results, however, they are still challenged by the lack of enough prior alignment as labeled training data. In this paper, we propose a bootstrapping approach to embedding-based entity alignment. It iteratively labels likely entity alignment as training data for learning alignment-oriented KG embeddings. Furthermore, it employs an alignment editing method to reduce error accumulation during iterations. Our experiments on real-world datasets showed that the proposed approach significantly outperformed the state-of-the-art embedding-based ones for entity alignment. The proposed alignment-oriented KG embedding, bootstrapping process and alignment editing method all contributed to the performance improvement.

Download Full-text

A Comparative Study of Distributional and Symbolic Paradigms for Relational Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/843 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sebastijan Dumancic ◽

Alberto Garcia-Duran ◽

Mathias Niepert

Keyword(s):

Euclidean Space ◽

Knowledge Base ◽

Relational Learning ◽

Statistical Relational Learning ◽

Representation Learning ◽

The Other ◽

Learning Approaches ◽

Relational Knowledge ◽

Knowledge Graphs ◽

Relational Classification

Many real-world domains can be expressed as graphs and, more generally, as multi-relational knowledge graphs. Though reasoning and learning with knowledge graphs has traditionally been addressed by symbolic approaches such as Statistical relational learning, recent methods in (deep) representation learning have shown promising results for specialised tasks such as knowledge base completion. These approaches, also known as distributional, abandon the traditional symbolic paradigm by replacing symbols with vectors in Euclidean space. With few exceptions, symbolic and distributional approaches are explored in different communities and little is known about their respective strengths and weaknesses. In this work, we compare distributional and symbolic relational learning approaches on various standard relational classification and knowledge base completion tasks. Furthermore, we analyse the properties of the datasets and relate them to the performance of the methods in the comparison. The results reveal possible indicators that could help in choosing one approach over the other for particular knowledge graphs.

Download Full-text

Mobile Software Assurance Informed through Knowledge Graph Construction: The OWASP Threat of Insecure Data Storage

Journal of Computer Science Research ◽

10.30564/jcsr.v2i2.1765 ◽

2020 ◽

Vol 2 (2) ◽

Author(s):

Suzanna Schmeelk ◽

Lixin Tao

Keyword(s):

Data Storage ◽

Program Analysis ◽

Web Application ◽

Security Analysis ◽

Knowledge Graph ◽

Healthcare Applications ◽

Sensitive Data ◽

Knowledge Graphs ◽

Mobile Malware Detection ◽

Software Assurance

Many organizations, to save costs, are movinheg to t Bring Your Own Mobile Device (BYOD) model and adopting applications built by third-parties at an unprecedented rate. Our research examines software assurance methodologies specifically focusing on security analysis coverage of the program analysis for mobile malware detection, mitigation, and prevention. This research focuses on secure software development of Android applications by developing knowledge graphs for threats reported by the Open Web Application Security Project (OWASP). OWASP maintains lists of the top ten security threats to web and mobile applications. We develop knowledge graphs based on the two most recent top ten threat years and show how the knowledge graph relationships can be discovered in mobile application source code. We analyze 200+ healthcare applications from GitHub to gain an understanding of their software assurance of their developed software for one of the OWASP top ten moble threats, the threat of “Insecure Data Storage.” We find that many of the applications are storing personally identifying information (PII) in potentially vulnerable places leaving users exposed to higher risks for the loss of their sensitive data.

Download Full-text

Using Knowledge Graph and Search Query Click Logs in Statistical Language Model for Speech Recognition

10.21437/interspeech.2017-1790 ◽

2017 ◽

Author(s):

Weiwu Zhu

Keyword(s):

Speech Recognition ◽

Language Model ◽

Knowledge Graph ◽

Search Query ◽

Statistical Language Model

Download Full-text

Structure-Augmented Text Representation Learning for Efficient Knowledge Graph Completion

Proceedings of the Web Conference 2021 ◽

10.1145/3442381.3450043 ◽

2021 ◽

Author(s):

Bo Wang ◽

Tao Shen ◽

Guodong Long ◽

Tianyi Zhou ◽

Ying Wang ◽

...

Keyword(s):

Representation Learning ◽

Knowledge Graph ◽

Text Representation

Download Full-text

TransET: Knowledge Graph Embedding with Entity Types

Electronics ◽

10.3390/electronics10121407 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1407

Author(s):

Peng Wang ◽

Jing Zhou ◽

Yuzhang Liu ◽

Xingchen Zhou

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Score Function ◽

Graph Embedding ◽

Vector Spaces ◽

Knowledge Graph ◽

Semantic Features ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.

Download Full-text