KnowlyBERT - Hybrid Query Answering over Language Models and Knowledge Graphs

Author(s):  
Jan-Christoph Kalo ◽  
Leandra Fichtel ◽  
Philipp Ehler ◽  
Wolf-Tilo Balke
2021 ◽  
Vol 21 (S9) ◽  
Author(s):  
Yinyu Lan ◽  
Shizhu He ◽  
Kang Liu ◽  
Xiangrong Zeng ◽  
Shengping Liu ◽  
...  

Abstract Background Knowledge graphs (KGs), especially medical knowledge graphs, are often significantly incomplete, so it necessitating a demand for medical knowledge graph completion (MedKGC). MedKGC can find new facts based on the existed knowledge in the KGs. The path-based knowledge reasoning algorithm is one of the most important approaches to this task. This type of method has received great attention in recent years because of its high performance and interpretability. In fact, traditional methods such as path ranking algorithm take the paths between an entity pair as atomic features. However, the medical KGs are very sparse, which makes it difficult to model effective semantic representation for extremely sparse path features. The sparsity in the medical KGs is mainly reflected in the long-tailed distribution of entities and paths. Previous methods merely consider the context structure in the paths of knowledge graph and ignore the textual semantics of the symbols in the path. Therefore, their performance cannot be further improved due to the two aspects of entity sparseness and path sparseness. Methods To address the above issues, this paper proposes two novel path-based reasoning methods to solve the sparsity issues of entity and path respectively, which adopts the textual semantic information of entities and paths for MedKGC. By using the pre-trained model BERT, combining the textual semantic representations of the entities and the relationships, we model the task of symbolic reasoning in the medical KG as a numerical computing issue in textual semantic representation. Results Experiments results on the publicly authoritative Chinese symptom knowledge graph demonstrated that the proposed method is significantly better than the state-of-the-art path-based knowledge graph reasoning methods, and the average performance is improved by 5.83% for all relations. Conclusions In this paper, we propose two new knowledge graph reasoning algorithms, which adopt textual semantic information of entities and paths and can effectively alleviate the sparsity problem of entities and paths in the MedKGC. As far as we know, it is the first method to use pre-trained language models and text path representations for medical knowledge reasoning. Our method can complete the impaired symptom knowledge graph in an interpretable way, and it outperforms the state-of-the-art path-based reasoning methods.


2021 ◽  
Author(s):  
Michihiro Yasunaga ◽  
Hongyu Ren ◽  
Antoine Bosselut ◽  
Percy Liang ◽  
Jure Leskovec

2021 ◽  
Author(s):  
Alessandro Oltramari ◽  
Jonathan Francis ◽  
Filip Ilievski ◽  
Kaixin Ma ◽  
Roshanak Mirzaee

This chapter illustrates how suitable neuro-symbolic models for language understanding can enable domain generalizability and robustness in downstream tasks. Different methods for integrating neural language models and knowledge graphs are discussed. The situations in which this combination is most appropriate are characterized, including quantitative evaluation and qualitative error analysis on a variety of commonsense question answering benchmark datasets.


2021 ◽  
Vol 14 (6) ◽  
pp. 943-956
Author(s):  
Efthymia Tsamoura ◽  
David Carral ◽  
Enrico Malizia ◽  
Jacopo Urbani

The chase is a well-established family of algorithms used to materialize Knowledge Bases (KBs) for tasks like query answering under dependencies or data cleaning. A general problem of chase algorithms is that they might perform redundant computations. To counter this problem, we introduce the notion of Trigger Graphs (TGs), which guide the execution of the rules avoiding redundant computations. We present the results of an extensive theoretical and empirical study that seeks to answer when and how TGs can be computed and what are the benefits of TGs when applied over real-world KBs. Our results include introducing algorithms that compute (minimal) TGs. We implemented our approach in a new engine, called GLog, and our experiments show that it can be significantly more efficient than the chase enabling us to materialize Knowledge Graphs with 17B facts in less than 40 min using a single machine with commodity hardware.


Author(s):  
Tiange Xu ◽  
Fu Zhang

Relation extraction is to extract the semantic relation between entity pairs in text, and it is a key point in building Knowledge Graphs and information extraction. The rapid development of deep learning in recent years has resulted in rich research results in relation extraction tasks. At present, the accuracy of relation extraction tasks based on pre-trained language models such as BERT exceeds the methods based on Convolutional or Recurrent Neural Networks. This review mainly summarizes the research progress of pre-trained language models such as BERT in supervised learning and distant supervision relation extraction. In addition, the directions for future research and some comparisons and analyses are discussed in our whole survey. The survey may help readers understand and catch some key techniques about the issue, and identify some future research directions.


2020 ◽  
Vol 34 (03) ◽  
pp. 2925-2933 ◽  
Author(s):  
Chaitanya Malaviya ◽  
Chandra Bhagavatula ◽  
Antoine Bosselut ◽  
Yejin Choi

Automatic KB completion for commonsense knowledge graphs (e.g., ATOMIC and ConceptNet) poses unique challenges compared to the much studied conventional knowledge bases (e.g., Freebase). Commonsense knowledge graphs use free-form text to represent nodes, resulting in orders of magnitude more nodes compared to conventional KBs ( ∼18x more nodes in ATOMIC compared to Freebase (FB15K-237)). Importantly, this implies significantly sparser graph structures — a major challenge for existing KB completion methods that assume densely connected graphs over a relatively smaller set of nodes.In this paper, we present novel KB completion models that can address these challenges by exploiting the structural and semantic context of nodes. Specifically, we investigate two key ideas: (1) learning from local graph structure, using graph convolutional networks and automatic graph densification and (2) transfer learning from pre-trained language models to knowledge graphs for enhanced contextual representation of knowledge. We describe our method to incorporate information from both these sources in a joint model and provide the first empirical results for KB completion on ATOMIC and evaluation with ranking metrics on ConceptNet. Our results demonstrate the effectiveness of language model representations in boosting link prediction performance and the advantages of learning from local graph structure (+1.5 points in MRR for ConceptNet) when training on subgraphs for computational efficiency. Further analysis on model predictions shines light on the types of commonsense knowledge that language models capture well.


Author(s):  
Ruud van Bakel ◽  
Teodor Aleksiev ◽  
Daniel Daza ◽  
Dimitrios Alivanistos ◽  
Michael Cochez

AbstractLarge, heterogeneous datasets are characterized by missing or even erroneous information. This is more evident when they are the product of community effort or automatic fact extraction methods from external sources, such as text. A special case of the aforementioned phenomenon can be seen in knowledge graphs, where this mostly appears in the form of missing or incorrect edges and nodes.Structured querying on such incomplete graphs will result in incomplete sets of answers, even if the correct entities exist in the graph, since one or more edges needed to match the pattern are missing. To overcome this problem, several algorithms for approximate structured query answering have been proposed. Inspired by modern Information Retrieval metrics, these algorithms produce a ranking of all entities in the graph, and their performance is further evaluated based on how high in this ranking the correct answers appear.In this work we take a critical look at this way of evaluation. We argue that performing a ranking-based evaluation is not sufficient to assess methods for complex query answering. To solve this, we introduce Message Passing Query Boxes (MPQB), which takes binary classification metrics back into use and shows the effect this has on the recently proposed query embedding method MPQE.


Sign in / Sign up

Export Citation Format

Share Document