A Method to Learn Embedding of a Probabilistic Medical Knowledge Graph: Algorithm Development

Linfeng Li; Peng Wang; Yao Wang; Shenghui Wang; Jun Yan; Jinpeng Jiang; Buzhou Tang; Chengliang Wang; Yuting Liu

doi:10.2196/17645

A Method to Learn Embedding of a Probabilistic Medical Knowledge Graph: Algorithm Development

JMIR Medical Informatics ◽

10.2196/17645 ◽

2020 ◽

Vol 8 (5) ◽

pp. e17645

Author(s):

Linfeng Li ◽

Peng Wang ◽

Yao Wang ◽

Shenghui Wang ◽

Jun Yan ◽

...

Keyword(s):

Medical Records ◽

Large Scale ◽

Semantic Representation ◽

Medical Knowledge ◽

Mapping Function ◽

Graph Algorithm ◽

Knowledge Graph ◽

Knowledge Graphs ◽

Representation Method ◽

Better Than

Background Knowledge graph embedding is an effective semantic representation method for entities and relations in knowledge graphs. Several translation-based algorithms, including TransE, TransH, TransR, TransD, and TranSparse, have been proposed to learn effective embedding vectors from typical knowledge graphs in which the relations between head and tail entities are deterministic. However, in medical knowledge graphs, the relations between head and tail entities are inherently probabilistic. This difference introduces a challenge in embedding medical knowledge graphs. Objective We aimed to address the challenge of how to learn the probability values of triplets into representation vectors by making enhancements to existing TransX (where X is E, H, R, D, or Sparse) algorithms, including the following: (1) constructing a mapping function between the score value and the probability, and (2) introducing probability-based loss of triplets into the original margin-based loss function. Methods We performed the proposed PrTransX algorithm on a medical knowledge graph that we built from large-scale real-world electronic medical records data. We evaluated the embeddings using link prediction task. Results Compared with the corresponding TransX algorithms, the proposed PrTransX performed better than the TransX model in all evaluation indicators, achieving a higher proportion of corrected entities ranked in the top 10 and normalized discounted cumulative gain of the top 10 predicted tail entities, and lower mean rank. Conclusions The proposed PrTransX successfully incorporated the uncertainty of the knowledge triplets into the embedding vectors.

A Method to Learn Embedding of a Probabilistic Medical Knowledge Graph: Algorithm Development (Preprint)

10.2196/preprints.17645 ◽

2019 ◽

Author(s):

Linfeng Li ◽

Peng Wang ◽

Yao Wang ◽

Shenghui Wang ◽

Jun Yan ◽

...

Keyword(s):

Medical Records ◽

Large Scale ◽

Semantic Representation ◽

Medical Knowledge ◽

Mapping Function ◽

Graph Algorithm ◽

Knowledge Graph ◽

Knowledge Graphs ◽

Representation Method ◽

Better Than

BACKGROUND Knowledge graph embedding is an effective semantic representation method for entities and relations in knowledge graphs. Several translation-based algorithms, including TransE, TransH, TransR, TransD, and TranSparse, have been proposed to learn effective embedding vectors from typical knowledge graphs in which the relations between head and tail entities are deterministic. However, in medical knowledge graphs, the relations between head and tail entities are inherently probabilistic. This difference introduces a challenge in embedding medical knowledge graphs. OBJECTIVE We aimed to address the challenge of how to learn the probability values of triplets into representation vectors by making enhancements to existing TransX (where X is E, H, R, D, or Sparse) algorithms, including the following: (1) constructing a mapping function between the score value and the probability, and (2) introducing probability-based loss of triplets into the original margin-based loss function. METHODS We performed the proposed PrTransX algorithm on a medical knowledge graph that we built from large-scale real-world electronic medical records data. We evaluated the embeddings using link prediction task. RESULTS Compared with the corresponding TransX algorithms, the proposed PrTransX performed better than the TransX model in all evaluation indicators, achieving a higher proportion of corrected entities ranked in the top 10 and normalized discounted cumulative gain of the top 10 predicted tail entities, and lower mean rank. CONCLUSIONS The proposed PrTransX successfully incorporated the uncertainty of the knowledge triplets into the embedding vectors.

Path-based knowledge reasoning with textual semantic information for medical knowledge graph completion

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01622-7 ◽

2021 ◽

Vol 21 (S9) ◽

Author(s):

Yinyu Lan ◽

Shizhu He ◽

Kang Liu ◽

Xiangrong Zeng ◽

Shengping Liu ◽

...

Keyword(s):

Semantic Information ◽

State Of The Art ◽

Semantic Representation ◽

Medical Knowledge ◽

The State ◽

Language Models ◽

Knowledge Graph ◽

Knowledge Reasoning ◽

Numerical Computing ◽

Knowledge Graphs

Abstract Background Knowledge graphs (KGs), especially medical knowledge graphs, are often significantly incomplete, so it necessitating a demand for medical knowledge graph completion (MedKGC). MedKGC can find new facts based on the existed knowledge in the KGs. The path-based knowledge reasoning algorithm is one of the most important approaches to this task. This type of method has received great attention in recent years because of its high performance and interpretability. In fact, traditional methods such as path ranking algorithm take the paths between an entity pair as atomic features. However, the medical KGs are very sparse, which makes it difficult to model effective semantic representation for extremely sparse path features. The sparsity in the medical KGs is mainly reflected in the long-tailed distribution of entities and paths. Previous methods merely consider the context structure in the paths of knowledge graph and ignore the textual semantics of the symbols in the path. Therefore, their performance cannot be further improved due to the two aspects of entity sparseness and path sparseness. Methods To address the above issues, this paper proposes two novel path-based reasoning methods to solve the sparsity issues of entity and path respectively, which adopts the textual semantic information of entities and paths for MedKGC. By using the pre-trained model BERT, combining the textual semantic representations of the entities and the relationships, we model the task of symbolic reasoning in the medical KG as a numerical computing issue in textual semantic representation. Results Experiments results on the publicly authoritative Chinese symptom knowledge graph demonstrated that the proposed method is significantly better than the state-of-the-art path-based knowledge graph reasoning methods, and the average performance is improved by 5.83% for all relations. Conclusions In this paper, we propose two new knowledge graph reasoning algorithms, which adopt textual semantic information of entities and paths and can effectively alleviate the sparsity problem of entities and paths in the MedKGC. As far as we know, it is the first method to use pre-trained language models and text path representations for medical knowledge reasoning. Our method can complete the impaired symptom knowledge graph in an interpretable way, and it outperforms the state-of-the-art path-based reasoning methods.

Demographic Aware Probabilistic Medical Knowledge Graph Embeddings of Electronic Medical Records

Artificial Intelligence in Medicine - Lecture Notes in Computer Science ◽

10.1007/978-3-030-77211-6_48 ◽

2021 ◽

pp. 408-417

Author(s):

Aynur Guluzade ◽

Endri Kacupaj ◽

Maria Maleshkova

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Medical Knowledge ◽

Knowledge Graph ◽

Graph Embeddings

Knowledge Graphs

Biodiversity Information Science and Standards ◽

10.3897/biss.5.73796 ◽

2021 ◽

Vol 5 ◽

Author(s):

Roderic Page

Keyword(s):

Knowledge Management ◽

Large Scale ◽

Personal Knowledge ◽

Knowledge Graph ◽

Specific Knowledge ◽

Management Tools ◽

Global Projects ◽

Knowledge Graphs ◽

Constructing Knowledge ◽

Knowledge Management Tools

Knowledge graphs embody the idea of "everything connected to everything else." As attractive as this seems, there is a substantial gap between the dream of fully interconnected knowledge and the reality of data that is still mostly siloed, or weakly connected by shared strings such as taxonomic names. How do we move forward? Do we focus on building our own domain- or project-specific knowledge graphs, or do we engage with global projects such as Wikidata? Do we construct knowledge graphs, or focus on making our data "knowledge graph ready" by adopting structured markup in the hope that knowledge graphs will spontaneously self-assemble from that data? Do we focus on large-scale, database-driven projects (e.g., triple stores in the cloud), or do we rely on more localised and distributed approaches, such as annotations (e.g., hypothes.is), "content-hash" systems where a cryptographic hash of the data is also its identifier (Elliott et al. 2020), or the growing number of personal knowledge management tools (e.g., Roam, Obsidian, LogSeq)? This talk will share experiences (the good, bad, and the ugly) as I have tried to transition from naïve advocacy to constructing knowledge graphs (Page 2019), or participating in their construction (Page 2021).

End-to-End Argumentation Knowledge Graph Construction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6231 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7367-7374

Author(s):

Khalid Al-Khatib ◽

Yufang Hou ◽

Henning Wachsmuth ◽

Charles Jochim ◽

Francesca Bonin ◽

...

Keyword(s):

Large Scale ◽

Question Answering ◽

Knowledge Graph ◽

Exploratory Search ◽

Text Generation ◽

Fake News ◽

High Quality ◽

Web Based ◽

Knowledge Graphs ◽

End To End

This paper studies the end-to-end construction of an argumentation knowledge graph that is intended to support argument synthesis, argumentative question answering, or fake news detection, among others. The study is motivated by the proven effectiveness of knowledge graphs for interpretable and controllable text generation and exploratory search. Original in our work is that we propose a model of the knowledge encapsulated in arguments. Based on this model, we build a new corpus that comprises about 16k manual annotations of 4740 claims with instances of the model's elements, and we develop an end-to-end framework that automatically identifies all modeled types of instances. The results of experiments show the potential of the framework for building a web-based argumentation graph that is of high quality and large scale.

Caps-OWKG: a capsule network model for open-world knowledge graph

International Journal of Machine Learning and Cybernetics ◽

10.1007/s13042-020-01259-4 ◽

2021 ◽

Author(s):

Yuhan Wang ◽

Weidong Xiao ◽

Zhen Tan ◽

Xiang Zhao

Keyword(s):

Representation Learning ◽

Graph Representation ◽

Knowledge Graph ◽

World Knowledge ◽

Relational Structures ◽

Open World ◽

Latent Features ◽

Knowledge Graphs ◽

Low Dimensional ◽

Better Than

AbstractKnowledge graphs are typical multi-relational structures, which is consisted of many entities and relations. Nonetheless, existing knowledge graphs are still sparse and far from being complete. To refine the knowledge graphs, representation learning is utilized to embed entities and relations into low-dimensional spaces. Many existing knowledge graphs embedding models focus on learning latent features in close-world assumption but omit the changeable of each knowledge graph.In this paper, we propose a knowledge graph representation learning model, called Caps-OWKG, which leverages the capsule network to capture the both known and unknown triplets features in open-world knowledge graph. It combines the descriptive text and knowledge graph to get descriptive embedding and structural embedding, simultaneously. Then, the both above embeddings are used to calculate the probability of triplet authenticity. We verify the performance of Caps-OWKG on link prediction task with two common datasets FB15k-237-OWE and DBPedia50k. The experimental results are better than other baselines, and achieve the state-of-the-art performance.

Separating Wheat from Chaff: Joining Biomedical Knowledge and Patient Data for Repurposing Medications

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019565 ◽

2019 ◽

Vol 33 ◽

pp. 9565-9572 ◽

Cited By ~ 3

Author(s):

Galia Nordon ◽

Gideon Koren ◽

Varda Shalev ◽

Eric Horvitz ◽

Kira Radinsky

Keyword(s):

Clinical Trials ◽

Medical Records ◽

Large Scale ◽

Medical Literature ◽

Medical Knowledge ◽

Drug Repurposing ◽

Biomedical Literature ◽

Biomedical Knowledge ◽

Health Records ◽

Clinical Expert

We present a system that jointly harnesses large-scale electronic health records data and a concept graph mined from the medical literature to guide drug repurposing—the process of applying known drugs in new ways to treat diseases. Our study is unique in methods and scope, per the scale of the concept graph and the quantity of data. We harness 10 years of nation-wide medical records of more than 1.5 million people and extract medical knowledge from all of PubMed, the world’s largest corpus of online biomedical literature. We employ links on the concept graph to provide causal signals to prioritize candidate influences between medications and target diseases. We show results of the system on studies of drug repurposing for hypertension and diabetes. In both cases, we present drug families identified by the algorithm which were previously unknown. We verify the results via clinical expert opinion and by prospective clinical trials on hypertension.

Research on Medical Knowledge Graph for Stroke

Journal of Healthcare Engineering ◽

10.1155/2021/5531327 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Binjie Cheng ◽

Jin Zhang ◽

Hong Liu ◽

Meiling Cai ◽

Ying Wang

Keyword(s):

Medical Knowledge ◽

Similarity Measures ◽

Graph Model ◽

Knowledge Graph ◽

Medical Field ◽

Graph Models ◽

Experimental Part ◽

External Data ◽

Embedded Model ◽

Knowledge Graphs

Knowledge graph can effectively analyze and construct the essential characteristics of data. At present, scholars have proposed many knowledge graph models from different perspectives, especially in the medical field, but there are still relatively few studies on stroke diseases using medical knowledge graphs. Therefore, this paper will build a medical knowledge graph model for stroke. Firstly, a stroke disease dictionary and an ontology database are built through the international standard medical term sets and semiautomatic extraction-based crowdsourcing website data. Secondly, the external data are linked to the nodes of the existing knowledge graph via the entity similarity measures and the knowledge representation is performed by the knowledge graph embedded model. Thirdly, the structure of the established knowledge graph is modified continuously through iterative updating. Finally, in the experimental part, the proposed stroke medical knowledge graph is applied to the real stroke data and the performance of the proposed knowledge graph approach on the series of Trans ∗ models is compared.

End-to-end Relation-Enhanced Learnable Graph Self-attention Network for Knowledge Graphs Embedding

10.21203/rs.3.rs-396932/v1 ◽

2021 ◽

Author(s):

Shengchen Jiang ◽

Hongbin Wang ◽

Xiang Hou

Keyword(s):

Large Scale ◽

Structural Characteristics ◽

Graph Embedding ◽

Knowledge Graph ◽

Data Sets ◽

Relevance Ranking ◽

Convolutional Network ◽

Attention Network ◽

Knowledge Graphs ◽

End To End

Abstract The existing methods ignore the adverse effect of knowledge graph incompleteness on knowledge graph embedding. In addition, the complexity and large-scale of knowledge information hinder knowledge graph embedding performance of the classic graph convolutional network. In this paper, we analyzed the structural characteristics of knowledge graph and the imbalance of knowledge information. Complex knowledge information requires that the model should have better learnability, rather than linearly weighted qualitative constraints, so the method of end-to-end relation-enhanced learnable graph self-attention network for knowledge graphs embedding is proposed. Firstly, we construct the relation-enhanced adjacency matrix to consider the incompleteness of the knowledge graph. Secondly, the graph self-attention network is employed to obtain the global encoding and relevance ranking of entity node information. Thirdly, we propose the concept of convolutional knowledge subgraph, it is constructed according to the entity relevance ranking. Finally, we improve the training effect of the convKB model by changing the construction of negative samples to obtain a better reliability score in the decoder. The experimental results based on the data sets FB15k-237 and WN18RR show that the proposed method facilitates more comprehensive representation of knowledge information than the existing methods, in terms of Hits@10 and MRR.

Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6392 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8673-8680

Author(s):

Pengda Qin ◽

Xin Wang ◽

Wenhu Chen ◽

Chunyun Zhang ◽

Weiran Xu ◽

...

Keyword(s):

Supervised Classification ◽

Large Scale ◽

Relational Learning ◽

Generative Adversarial Networks ◽

Knowledge Graph ◽

Semantic Features ◽

Performance Improvements ◽

Adversarial Networks ◽

Knowledge Graphs ◽

Noisy Text

Large-scale knowledge graphs (KGs) are shown to become more important in current information systems. To expand the coverage of KGs, previous studies on knowledge graph completion need to collect adequate training instances for newly-added relations. In this paper, we consider a novel formulation, zero-shot learning, to free this cumbersome curation. For newly-added relations, we attempt to learn their semantic features from their text descriptions and hence recognize the facts of unseen relations with no examples being seen. For this purpose, we leverage Generative Adversarial Networks (GANs) to establish the connection between text and knowledge graph domain: The generator learns to generate the reasonable relation embeddings merely with noisy text descriptions. Under this setting, zero-shot learning is naturally converted to a traditional supervised classification task. Empirically, our method is model-agnostic that could be potentially applied to any version of KG embeddings, and consistently yields performance improvements on NELL and Wiki dataset.