Text-Graph Enhanced Knowledge Graph Representation Learning

Rule-Guided Compositional Representation Learning on Knowledge Graphs

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5687 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2950-2958

Author(s):

Guanglin Niu ◽

Yongfei Zhang ◽

Bo Li ◽

Peng Cui ◽

Si Liu ◽

...

Keyword(s):

State Of The Art ◽

Representation Learning ◽

Vector Spaces ◽

Semantic Structure ◽

Completion Task ◽

Joint Embedding ◽

Semantic Associations ◽

Structured Information ◽

Low Dimensional ◽

Embedding Methods

Representation learning on a knowledge graph (KG) is to embed entities and relations of a KG into low-dimensional continuous vector spaces. Early KG embedding methods only pay attention to structured information encoded in triples, which would cause limited performance due to the structure sparseness of KGs. Some recent attempts consider paths information to expand the structure of KGs but lack explainability in the process of obtaining the path representations. In this paper, we propose a novel Rule and Path-based Joint Embedding (RPJE) scheme, which takes full advantage of the explainability and accuracy of logic rules, the generalization of KG embedding as well as the supplementary semantic structure of paths. Specifically, logic rules of different lengths (the number of relations in rule body) in the form of Horn clauses are first mined from the KG and elaborately encoded for representation learning. Then, the rules of length 2 are applied to compose paths accurately while the rules of length 1 are explicitly employed to create semantic associations among relations and constrain relation embeddings. Moreover, the confidence level of each rule is also considered in optimization to guarantee the availability of applying the rule to representation learning. Extensive experimental results illustrate that RPJE outperforms other state-of-the-art baselines on KG completion task, which also demonstrate the superiority of utilizing logic rules as well as paths for improving the accuracy and explainability of representation learning.

Download Full-text

A Co-Embedding Model with Variational Auto-Encoder for Knowledge Graphs

Applied Sciences ◽

10.3390/app12020715 ◽

2022 ◽

Vol 12 (2) ◽

pp. 715

Author(s):

Luodi Xie ◽

Huimin Huang ◽

Qing Du

Keyword(s):

State Of The Art ◽

Relation Extraction ◽

Semantic Space ◽

Knowledge Graph ◽

High Quality ◽

Gaussian Distributions ◽

Benchmark Datasets ◽

Semantic Spaces ◽

Knowledge Graphs ◽

Low Dimensional

Knowledge graph (KG) embedding has been widely studied to obtain low-dimensional representations for entities and relations. It serves as the basis for downstream tasks, such as KG completion and relation extraction. Traditional KG embedding techniques usually represent entities/relations as vectors or tensors, mapping them in different semantic spaces and ignoring the uncertainties. The affinities between entities and relations are ambiguous when they are not embedded in the same latent spaces. In this paper, we incorporate a co-embedding model for KG embedding, which learns low-dimensional representations of both entities and relations in the same semantic space. To address the issue of neglecting uncertainty for KG components, we propose a variational auto-encoder that represents KG components as Gaussian distributions. In addition, compared with previous methods, our method has the advantages of high quality and interpretability. Our experimental results on several benchmark datasets demonstrate our model’s superiority over the state-of-the-art baselines.

Download Full-text

Star Topology Convolution for Graph Representation Learning

10.36227/techrxiv.12805799.v2 ◽

2020 ◽

Author(s):

Chong Wu ◽

Zhenan Feng ◽

Jiangbin Zheng ◽

Houwang Zhang ◽

Jiawang Cao ◽

...

Keyword(s):

Protein Identification ◽

State Of The Art ◽

Feature Space ◽

Representation Learning ◽

Graph Representation ◽

Global Features ◽

Star Topology ◽

Identification Methods ◽

Benchmark Datasets ◽

Deep Layers

<div><div><div><p>We present a novel graph convolutional method called star topology convolution (STC). This method makes graph convolution more similar to conventional convolutional neural networks (CNNs) in Euclidean feature space. Unlike most existing spectral convolutional methods, this method learns subgraphs which have a star topology rather than a fixed graph. It has fewer parameters in its convolutional filter and is inductive so that it is more flexible and can be applied to large and evolving graphs. As for CNNs in Euclidean feature space, the convolutional filter is localized and maintains a good weight sharing property. By introducing deep layers, the method can learn global features like a CNN. To validate the method, STC was compared to state-of-the-art spectral convolutional and spatial convolutional methods in a supervised learning setting on three benchmark datasets: Cora, Citeseer and Pubmed. The experimental results show that STC outperforms the other methods. STC was also applied to protein identification tasks and outperformed traditional and advanced protein identification methods.</p></div></div></div>

Download Full-text

Adverse Drug Event Prediction using Noisy Literature-Derived Knowledge Graphs: Algorithm Development and Evaluation (Preprint)

10.2196/preprints.32730 ◽

2021 ◽

Author(s):

Soham Dasgupta ◽

Aishwarya Jayagopal ◽

Abel Lim Jun Hong ◽

Ragunathan Mariappan ◽

Vaibhav Rajan

Keyword(s):

Predictive Models ◽

State Of The Art ◽

Representation Learning ◽

Biomedical Literature ◽

Drug Event ◽

Classification Models ◽

Learning Methods ◽

Post Marketing Surveillance ◽

Benchmark Datasets ◽

Knowledge Graphs

BACKGROUND Adverse Drug Events (ADEs) are unintended side-effects of drugs that cause substantial clinical and economic burden globally. Not all ADEs are discovered during clinical trials and so, post-marketing surveillance, called pharmacovigilance, is routinely conducted to find unknown ADEs. A wealth of information, that facilitates ADE discovery, lies in the enormous and continuously growing body of biomedical literature. Knowledge graphs (KG) encode information from the literature, where vertices and edges represent clinical concepts and their relations respectively. The scale and unstructured form of the literature necessitates the use of natural language processing (NLP) to automatically create such KGs. Previous studies have demonstrated the utility of such literature-derived KGs in ADE prediction. Through unsupervised learning of representations (features) of clinical concepts from the KG, that are used in machine learning models, state-of-the-art results for ADE prediction were obtained on benchmark datasets. OBJECTIVE In literature-derived KGs there is `noise’ in the form of false positive (erroneous) and false negative (absent) nodes and edges due to limitations of the NLP techniques used to infer the KGs. Previous representation learning methods do not account for such inaccuracies in the graph. NLP algorithms can quantify the confidence in their inference of extracted concepts and relations from the literature. Our hypothesis that motivates this work is that by utilizing such confidence scores during representation learning, the learnt embeddings would yield better features for ADE prediction models. METHODS We develop methods to utilize these confidence scores on two well-known representation learning methods – Deepwalk and TransE – to develop their `weighted’ versions – Weighted Deepwalk and Weighted TransE. These methods are used to learn representations from a large literature-derived KG, SemMedDB, containing more than 93 million clinical relations. They are compared with Embeddings of Sematic Predictions (ESP), that, to our knowledge, is the best reported representation learning method on SemMedDB with state-of-the-art results for ADE prediction. Representations learnt from different methods are used (separately) as features of drugs and diseases to build classification models for ADE prediction using benchmark datasets. The classification performance of all the methods is compared rigorously over multiple cross-validation settings. RESULTS The `weighted’ versions we design are able to learn representations that yield more accurate predictive models compared to both the corresponding unweighted versions of Deepwalk and TransE, as well as ESP, in our experiments. Performance improvements are up to 5.75% in F1 score and 8.4% in AUC, thus advancing the state-of-the-art in ADE prediction from literature-derived KGs. Implementation of our new methods and all experiments are available at https://bitbucket.org/cdal/kb_embeddings. CONCLUSIONS Our classification models can be used to aid pharmacovigilance teams in detecting potentially new ADEs. Our experiments demonstrate the importance of modelling inaccuracies in the inferred KGs for representation learning, which may also be useful in other predictive models that utilize literature-derived KGs.

Download Full-text

Structural Adversarial Variational Auto-Encoder for Attributed Network Embedding

Applied Sciences ◽

10.3390/app11052371 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2371

Author(s):

Junjian Zhan ◽

Feng Li ◽

Yang Wang ◽

Daoyu Lin ◽

Guangluan Xu

Keyword(s):

State Of The Art ◽

Global Information ◽

Network Embedding ◽

Sampling Process ◽

Attributed Network ◽

Benchmark Datasets ◽

Adversarial Training ◽

Low Dimensional ◽

Embedding Methods ◽

Local Proximity

As most networks come with some content in each node, attributed network embedding has aroused much research interest. Most existing attributed network embedding methods aim at learning a fixed representation for each node encoding its local proximity. However, those methods usually neglect the global information between nodes distant from each other and distribution of the latent codes. We propose Structural Adversarial Variational Graph Auto-Encoder (SAVGAE), a novel framework which encodes the network structure and node content into low-dimensional embeddings. On one hand, our model captures the local proximity and proximities at any distance of a network by exploiting a high-order proximity indicator named Rooted Pagerank. On the other hand, our method learns the data distribution of each node representation while circumvents the side effect its sampling process causes on learning a robust embedding through adversarial training. On benchmark datasets, we demonstrate that our method performs competitively compared with state-of-the-art models.

Download Full-text

Caps-OWKG: a capsule network model for open-world knowledge graph

International Journal of Machine Learning and Cybernetics ◽

10.1007/s13042-020-01259-4 ◽

2021 ◽

Author(s):

Yuhan Wang ◽

Weidong Xiao ◽

Zhen Tan ◽

Xiang Zhao

Keyword(s):

Representation Learning ◽

Graph Representation ◽

Knowledge Graph ◽

World Knowledge ◽

Relational Structures ◽

Open World ◽

Latent Features ◽

Knowledge Graphs ◽

Low Dimensional ◽

Better Than

AbstractKnowledge graphs are typical multi-relational structures, which is consisted of many entities and relations. Nonetheless, existing knowledge graphs are still sparse and far from being complete. To refine the knowledge graphs, representation learning is utilized to embed entities and relations into low-dimensional spaces. Many existing knowledge graphs embedding models focus on learning latent features in close-world assumption but omit the changeable of each knowledge graph.In this paper, we propose a knowledge graph representation learning model, called Caps-OWKG, which leverages the capsule network to capture the both known and unknown triplets features in open-world knowledge graph. It combines the descriptive text and knowledge graph to get descriptive embedding and structural embedding, simultaneously. Then, the both above embeddings are used to calculate the probability of triplet authenticity. We verify the performance of Caps-OWKG on link prediction task with two common datasets FB15k-237-OWE and DBPedia50k. The experimental results are better than other baselines, and achieve the state-of-the-art performance.

Download Full-text

Unsupervised extractive multi-document summarization method based on transfer learning from BERT multi-task fine-tuning

Journal of Information Science ◽

10.1177/0165551521990616 ◽

2021 ◽

pp. 016555152199061

Author(s):

Salima Lamsiyah ◽

Abdelkader El Mahdaouy ◽

Saïd El Alaoui Ouatik ◽

Bernard Espinasse

Keyword(s):

Transfer Learning ◽

State Of The Art ◽

Representation Learning ◽

Fine Tuning ◽

Text Representation ◽

Document Summarization ◽

Semantic Relationships ◽

Benchmark Datasets ◽

Tuning Methods ◽

Fine Tune

Text representation is a fundamental cornerstone that impacts the effectiveness of several text summarization methods. Transfer learning using pre-trained word embedding models has shown promising results. However, most of these representations do not consider the order and the semantic relationships between words in a sentence, and thus they do not carry the meaning of a full sentence. To overcome this issue, the current study proposes an unsupervised method for extractive multi-document summarization based on transfer learning from BERT sentence embedding model. Moreover, to improve sentence representation learning, we fine-tune BERT model on supervised intermediate tasks from GLUE benchmark datasets using single-task and multi-task fine-tuning methods. Experiments are performed on the standard DUC’2002–2004 datasets. The obtained results show that our method has significantly outperformed several baseline methods and achieves a comparable and sometimes better performance than the recent state-of-the-art deep learning–based methods. Furthermore, the results show that fine-tuning BERT using multi-task learning has considerably improved the performance.

Download Full-text

Star Topology Convolution for Graph Representation Learning

10.36227/techrxiv.12805799.v1 ◽

2020 ◽

Author(s):

Chong Wu ◽

Zhenan Feng ◽

Jiangbin Zheng ◽

Houwang Zhang ◽

Jiawang Cao ◽

...

Keyword(s):

Protein Identification ◽

State Of The Art ◽

Feature Space ◽

Representation Learning ◽

Graph Representation ◽

Convolution Kernel ◽

Star Topology ◽

Identification Methods ◽

Feature Spaces ◽

Benchmark Datasets

<div><div><div><p>We present a novel graph convolutional method called star topology convolution (STC). This method makes graph convolution more similar to conventional convolutional in neural networks (CNNs) in Euclidean feature space. Unlike most existing spectral convolution methods, this method learns subgraphs which have a star topology rather than a fixed graph. It has fewer parameters in its convolution kernel and is inductive so that it is more flexible and can be applied to large and evolving graphs. As for CNNs in Euclidean feature spaces, the convolution kernel is localized and maintains good sharing. By increasing the depth of a layer, the method can learn lobal features like a CNN. To validate the method, STC was compared to state-of-the-art spectral convolution and spatial convolution methods in a supervised learning setting on three benchmark datasets: Cora, Citeseer and Pubmed. The experimental results show that STC outperforms the other methods. STC was also applied to protein identification tasks and outperformed traditional and advanced protein identification methods.</p></div></div></div>

Download Full-text

SPINE: Structural Identity Preserved Inductive Network Embedding

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/333 ◽

2019 ◽

Cited By ~ 3

Author(s):

Junliang Guo ◽

Linli Xu ◽

Jingchang Liu

Keyword(s):

State Of The Art ◽

Structural Information ◽

Critical Role ◽

Network Embedding ◽

Dimensional Network ◽

Structural Identity ◽

Benchmark Datasets ◽

Low Dimensional ◽

Embedding Methods ◽

Local Proximity

Recent advances in the field of network embedding have shown that low-dimensional network representation is playing a critical role in network analysis. Most existing network embedding methods encode the local proximity of a node, such as the first- and second-order proximities. While being efficient, these methods are short of leveraging the global structural information between nodes distant from each other. In addition, most existing methods learn embeddings on one single fixed network, and thus cannot be generalized to unseen nodes or networks without retraining. In this paper we present SPINE, a method that can jointly capture the local proximity and proximities at any distance, while being inductive to efficiently deal with unseen nodes or networks. Extensive experimental results on benchmark datasets demonstrate the superiority of the proposed framework over the state of the art.

Download Full-text

Star Topology Convolution for Graph Representation Learning

10.36227/techrxiv.12805799 ◽

2020 ◽

Author(s):

Chong Wu ◽

Zhenan Feng ◽

Jiangbin Zheng ◽

Houwang Zhang ◽

Jiawang Cao ◽

...

Keyword(s):

Protein Identification ◽

State Of The Art ◽

Feature Space ◽

Representation Learning ◽

Graph Representation ◽

Global Features ◽

Star Topology ◽

Identification Methods ◽

Benchmark Datasets ◽

Deep Layers

<div><div><div><p>We present a novel graph convolutional method called star topology convolution (STC). This method makes graph convolution more similar to conventional convolutional neural networks (CNNs) in Euclidean feature space. Unlike most existing spectral convolutional methods, this method learns subgraphs which have a star topology rather than a fixed graph. It has fewer parameters in its convolutional filter and is inductive so that it is more flexible and can be applied to large and evolving graphs. As for CNNs in Euclidean feature space, the convolutional filter is localized and maintains a good weight sharing property. By introducing deep layers, the method can learn global features like a CNN. To validate the method, STC was compared to state-of-the-art spectral convolutional and spatial convolutional methods in a supervised learning setting on three benchmark datasets: Cora, Citeseer and Pubmed. The experimental results show that STC outperforms the other methods. STC was also applied to protein identification tasks and outperformed traditional and advanced protein identification methods.</p></div></div></div>

Download Full-text