Representing a Heterogeneous Pharmaceutical Knowledge-Graph with Textual Information

Frontiers in Research Metrics and Analytics ◽

10.3389/frma.2021.670206 ◽

2021 ◽

Vol 6 ◽

Author(s):

Masaki Asada ◽

Nallappan Gunasekaran ◽

Makoto Miwa ◽

Yutaka Sasaki

Keyword(s):

Drug Discovery ◽

Link Prediction ◽

Knowledge Graph ◽

Graph Embeddings ◽

Textual Information ◽

Prediction Task

We deal with a heterogeneous pharmaceutical knowledge-graph containing textual information built from several databases. The knowledge graph is a heterogeneous graph that includes a wide variety of concepts and attributes, some of which are provided in the form of textual pieces of information which have not been targeted in the conventional graph completion tasks. To investigate the utility of textual information for knowledge graph completion, we generate embeddings from textual descriptions given to heterogeneous items, such as drugs and proteins, while learning knowledge graph embeddings. We evaluate the obtained graph embeddings on the link prediction task for knowledge graph completion, which can be used for drug discovery and repurposing. We also compare the results with existing methods and discuss the utility of the textual information.

Download Full-text

Shall I Work with Them? A Knowledge Graph-Based Approach for Predicting Future Research Collaborations

Entropy ◽

10.3390/e23060664 ◽

2021 ◽

Vol 23 (6) ◽

pp. 664

Author(s):

Nikos Kanakaris ◽

Nikolaos Giarelis ◽

Ilias Siachos ◽

Nikos Karacapilidis

Keyword(s):

Language Processing ◽

Scientific Knowledge ◽

Link Prediction ◽

Performance Metrics ◽

Future Research ◽

Knowledge Graph ◽

Prediction Problem ◽

Textual Information ◽

Research Collaborations ◽

Processing Techniques

We consider the prediction of future research collaborations as a link prediction problem applied on a scientific knowledge graph. To the best of our knowledge, this is the first work on the prediction of future research collaborations that combines structural and textual information of a scientific knowledge graph through a purposeful integration of graph algorithms and natural language processing techniques. Our work: (i) investigates whether the integration of unstructured textual data into a single knowledge graph affects the performance of a link prediction model, (ii) studies the effect of previously proposed graph kernels based approaches on the performance of an ML model, as far as the link prediction problem is concerned, and (iii) proposes a three-phase pipeline that enables the exploitation of structural and textual information, as well as of pre-trained word embeddings. We benchmark the proposed approach against classical link prediction algorithms using accuracy, recall, and precision as our performance metrics. Finally, we empirically test our approach through various feature combinations with respect to the link prediction problem. Our experimentations with the new COVID-19 Open Research Dataset demonstrate a significant improvement of the abovementioned performance metrics in the prediction of future research collaborations.

Download Full-text

Unsupervised Embedding Enhancements of Knowledge Graphs using Textual Associations

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/725 ◽

2019 ◽

Cited By ~ 1

Author(s):

Neil Veira ◽

Brian Keng ◽

Kanchana Padmanabhan ◽

Andreas Veneris

Keyword(s):

Link Prediction ◽

Graph Embedding ◽

Structured Data ◽

Knowledge Graph ◽

Sources Of Information ◽

Textual Information ◽

Text Document ◽

First Case ◽

Textual Data ◽

Unsupervised Approach

Knowledge graph embeddings are instrumental for representing and learning from multi-relational data, with recent embedding models showing high effectiveness for inferring new facts from existing databases. However, such precisely structured data is usually limited in quantity and in scope. Therefore, to fully optimize the embeddings it is important to also consider more widely available sources of information such as text. This paper describes an unsupervised approach to incorporate textual information by augmenting entity embeddings with embeddings of associated words. The approach does not modify the optimization objective for the knowledge graph embedding, which allows it to be integrated with existing embedding models. Two distinct forms of textual data are considered, with different embedding enhancements proposed for each case. In the first case, each entity has an associated text document that describes it. In the second case, a text document is not available, and instead entities occur as words or phrases in an unstructured corpus of text fragments. Experiments show that both methods can offer improvement on the link prediction task when applied to many different knowledge graph embedding models.

Download Full-text

Learning Translation-Based Knowledge Graph Embeddings by N-Pair Translation Loss

Applied Sciences ◽

10.3390/app10113964 ◽

2020 ◽

Vol 10 (11) ◽

pp. 3964

Author(s):

Hyun-Je Song ◽

A-Yeong Kim ◽

Seong-Bae Park

Keyword(s):

Link Prediction ◽

Early Stage ◽

Score Function ◽

Knowledge Graph ◽

Graph Embeddings ◽

Local Optima ◽

Slow Convergence ◽

Translation Operators ◽

Vector Representations ◽

Ranking Loss

Translation-based knowledge graph embeddings learn vector representations of entities and relations by treating relations as translation operators over the entities in an embedding space. Since the translation is represented through a score function, translation-based embeddings are trained in general by minimizing a margin-based ranking loss, which assigns a low score to positive triples and a high score to negative triples. However, this type of embedding suffers from slow convergence and poor local optima because the loss adopts only one pair of a positive and a negative triple at a single update of learning parameters. Therefore, this paper proposes the N-pair translation loss that considers multiple negative triples at one update. The N-pair translation loss employs multiple negative triples as well as one positive triple and allows the positive triple to be compared against the multiple negative triples at each parameter update. As a result, it becomes possible to obtain better vector representations rapidly. The experimental results on link prediction prove that the proposed loss helps to quickly converge toward good optima at the early stage of training.

Download Full-text

On the Utilization of Structural and Textual Information of a Scientific Knowledge Graph to Discover Future Research Collaborations: A Link Prediction Perspective

Discovery Science - Lecture Notes in Computer Science ◽

10.1007/978-3-030-61527-7_29 ◽

2020 ◽

pp. 437-450

Author(s):

Nikolaos Giarelis ◽

Nikos Kanakaris ◽

Nikos Karacapilidis

Keyword(s):

Scientific Knowledge ◽

Link Prediction ◽

Future Research ◽

Knowledge Graph ◽

Textual Information ◽

Research Collaborations

Download Full-text

Knowledge graph embedding for data mining vs. knowledge graph embedding for link prediction – two sides of the same coin?

Semantic Web ◽

10.3233/sw-212892 ◽

2022 ◽

pp. 1-24

Author(s):

Jan Portisch ◽

Nicolas Heist ◽

Heiko Paulheim

Keyword(s):

Data Mining ◽

Link Prediction ◽

Graph Embedding ◽

Knowledge Graph ◽

Graph Embeddings ◽

Similarity Functions ◽

Evaluation Methodologies ◽

Series Of Experiments ◽

Two Sides ◽

Lower Dimensional

Knowledge Graph Embeddings, i.e., projections of entities and relations to lower dimensional spaces, have been proposed for two purposes: (1) providing an encoding for data mining tasks, and (2) predicting links in a knowledge graph. Both lines of research have been pursued rather in isolation from each other so far, each with their own benchmarks and evaluation methodologies. In this paper, we argue that both tasks are actually related, and we show that the first family of approaches can also be used for the second task and vice versa. In two series of experiments, we provide a comparison of both families of approaches on both tasks, which, to the best of our knowledge, has not been done so far. Furthermore, we discuss the differences in the similarity functions evoked by the different embedding approaches.

Download Full-text

Biomedical Text Link Prediction for Drug Discovery: A Case Study with COVID-19

Pharmaceutics ◽

10.3390/pharmaceutics13060794 ◽

2021 ◽

Vol 13 (6) ◽

pp. 794

Author(s):

Kevin McCoy ◽

Sateesh Gudapati ◽

Lawrence He ◽

Elaina Horlander ◽

David Kartchner ◽

...

Keyword(s):

Drug Discovery ◽

Prediction Model ◽

Link Prediction ◽

Nucleoside Analogs ◽

Biomedical Literature ◽

Prediction Algorithm ◽

Knowledge Graph ◽

Recombinant Interferon ◽

Repurposed Drugs

Link prediction in artificial intelligence is used to identify missing links or derive future relationships that can occur in complex networks. A link prediction model was developed using the complex heterogeneous biomedical knowledge graph, SemNet, to predict missing links in biomedical literature for drug discovery. A web application visualized knowledge graph embeddings and link prediction results using TransE, CompleX, and RotatE based methods. The link prediction model achieved up to 0.44 hits@10 on the entity prediction tasks. The recent outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), also known as COVID-19, served as a case study to demonstrate the efficacy of link prediction modeling for drug discovery. The link prediction algorithm guided identification and ranking of repurposed drug candidates for SARS-CoV-2 primarily by text mining biomedical literature from previous coronaviruses, including SARS and middle east respiratory syndrome (MERS). Repurposed drugs included potential primary SARS-CoV-2 treatment, adjunctive therapies, or therapeutics to treat side effects. The link prediction accuracy for nodes ranked highly for SARS coronavirus was 0.875 as calculated by human in the loop validation on existing COVID-19 specific data sets. Drug classes predicted as highly ranked include anti-inflammatory, nucleoside analogs, protease inhibitors, antimalarials, envelope proteins, and glycoproteins. Examples of highly ranked predicted links to SARS-CoV-2: human leukocyte interferon, recombinant interferon-gamma, cyclosporine, antiviral therapy, zidovudine, chloroquine, vaccination, methotrexate, artemisinin, alkaloids, glycyrrhizic acid, quinine, flavonoids, amprenavir, suramin, complement system proteins, fluoroquinolones, bone marrow transplantation, albuterol, ciprofloxacin, quinolone antibacterial agents, and hydroxymethylglutaryl-CoA reductase inhibitors. Approximately 40% of identified drugs were not previously connected to SARS, such as edetic acid or biotin. In summary, link prediction can effectively suggest repurposed drugs for emergent diseases.

Download Full-text