Using Semantics and Statistics to Turn Data into Knowledge

Many information extraction and knowledge base construction systems are addressing the challenge of deriving knowledge from text. A key problem in constructing these knowledge bases from sources like the web is overcoming the erroneous and incomplete information found in millions of candidate extractions. To solve this problem, we turn to semantics — using ontological constraints between candidate facts to eliminate errors. In this article, we represent the desired knowledge base as a knowledge graph and introduce the problem of knowledge graph identification, collectively resolving the entities, labels, and relations present in the knowledge graph. Knowledge graph identification requires reasoning jointly over millions of extractions simultaneously, posing a scalability challenge to many approaches. We use probabilistic soft logic (PSL), a recently-introduced statistical relational learning framework, to implement an efficient solution to knowledge graph identification and present state-of-the-art results for knowledge graph construction while performing an order of magnitude faster than competing methods.

Download Full-text

A Knowledge Base Completion Model Based on Path Feature Learning

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2018.1.3104 ◽

2018 ◽

Vol 13 (1) ◽

pp. 71 ◽

Cited By ~ 2

Author(s):

Xixun Lin ◽

Yanchun Liang ◽

Limin Wang ◽

Xu Wang ◽

Mary Qu Yang ◽

...

Keyword(s):

Knowledge Base ◽

Large Scale ◽

Relational Learning ◽

Feature Learning ◽

Statistical Relational Learning ◽

Knowledge Bases ◽

Ranking Algorithm ◽

Stage Model ◽

Second Stage ◽

Urgent Task

Large-scale knowledge bases, as the foundations for promoting the development of artificial intelligence, have attracted increasing attention in recent years. These knowledge bases contain billions of facts in triple format; yet, they suffer from sparse relations between entities. Researchers proposed the path ranking algorithm (PRA) to solve this fatal problem. To improve the scalability of knowledge inference, PRA exploits random walks to find Horn clauses with chain structures to predict new relations given existing facts. This method can be regarded as a statistical classification issue for statistical relational learning (SRL). However, large-scale knowledge base completion demands superior accuracy and scalability. In this paper, we propose the path feature learning model (PFLM) to achieve this urgent task. More precisely, we define a two-stage model: the first stage aims to learn path features from the existing knowledge base and extra parsed corpus; the second stage uses these path features to predict new relations. The experimental results demonstrate that the PFLM can learn meaningful features and can achieve significant and consistent improvements compared with previous work.

Download Full-text

Elementary

International Journal on Semantic Web and Information Systems ◽

10.4018/jswis.2012070103 ◽

2012 ◽

Vol 8 (3) ◽

pp. 42-73 ◽

Cited By ~ 47

Author(s):

Feng Niu ◽

Ce Zhang ◽

Christopher Ré ◽

Jude Shavlik

Keyword(s):

Machine Learning ◽

Conceptual Framework ◽

Knowledge Base ◽

Statistical Inference ◽

State Of The Art ◽

Daily Basis ◽

Knowledge Bases ◽

Structured Data ◽

Wide Range ◽

Knowledge Base Construction

Researchers have approached knowledge-base construction (KBC) with a wide range of data resources and techniques. The authors present Elementary, a prototype KBC system that is able to combine diverse resources and different KBC techniques via machine learning and statistical inference to construct knowledge bases. Using Elementary, they have implemented a solution to the TAC-KBP challenge with quality comparable to the state of the art, as well as an end-to-end online demonstration that automatically and continuously enriches Wikipedia with structured data by reading millions of webpages on a daily basis. The authors describe several challenges and their solutions in designing, implementing, and deploying Elementary. In particular, the authors first describe the conceptual framework and architecture of Elementary to integrate different data resources and KBC techniques in a principled manner. They then discuss how they address scalability challenges to enable Web-scale deployment. The authors empirically show that this decomposition-based inference approach achieves higher performance than prior inference approaches. To validate the effectiveness of Elementary’s approach to KBC, they experimentally show that its ability to incorporate diverse signals has positive impacts on KBC quality.

Download Full-text

Cautious Rule-Based Collective Inference

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/922 ◽

2019 ◽

Author(s):

Martin Svatos

Keyword(s):

Relational Learning ◽

Statistical Relational Learning ◽

Knowledge Graph ◽

Inference Process ◽

Rule Based ◽

Test Error ◽

Popular Approach ◽

First Order ◽

Theoretical Test ◽

Collective Inference

Collective inference is a popular approach for solving tasks as knowledge graph completion within the statistical relational learning field. There are many existing solutions for this task, however, each of them is subjected to some limitation, either by restriction to only some learning settings, lacking interpretability of the model or theoretical test error bounds. We propose an approach based on cautious inference process which uses first-order rules and provides PAC-style bounds.

Download Full-text

Data-Driven Metaphor Recognition and Explanation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00235 ◽

2013 ◽

Vol 1 ◽

pp. 379-390 ◽

Cited By ~ 9

Author(s):

Hongsong Li ◽

Kenny Q. Zhu ◽

Haixun Wang

Keyword(s):

Knowledge Base ◽

State Of The Art ◽

Knowledge Bases ◽

Important Task ◽

Data Driven ◽

Web Pages ◽

Inference Mechanism ◽

Art Methods ◽

Data Driven Approach ◽

Machine Reading

Recognizing metaphors and identifying the source-target mappings is an important task as metaphorical text poses a big challenge for machine reading. To address this problem, we automatically acquire a metaphor knowledge base and an isA knowledge base from billions of web pages. Using the knowledge bases, we develop an inference mechanism to recognize and explain the metaphors in the text. To our knowledge, this is the first purely data-driven approach of probabilistic metaphor acquisition, recognition, and explanation. Our results shows that it significantly outperforms other state-of-the-art methods in recognizing and explaining metaphors.

Download Full-text

An Approach to Knowledge Base Completion by a Committee-Based Knowledge Graph Embedding

Applied Sciences ◽

10.3390/app10082651 ◽

2020 ◽

Vol 10 (8) ◽

pp. 2651

Author(s):

Su Jeong Choi ◽

Hyun-Je Song ◽

Seong-Bae Park

Keyword(s):

Knowledge Base ◽

Language Processing ◽

Graph Embedding ◽

Knowledge Bases ◽

Knowledge Graph ◽

Data Sets ◽

Complete Knowledge ◽

Proposed Model ◽

Ranking Task ◽

Low Dimensional

Knowledge bases such as Freebase, YAGO, DBPedia, and Nell contain a number of facts with various entities and relations. Since they store many facts, they are regarded as core resources for many natural language processing tasks. Nevertheless, they are not normally complete and have many missing facts. Such missing facts keep them from being used in diverse applications in spite of their usefulness. Therefore, it is significant to complete knowledge bases. Knowledge graph embedding is one of the promising approaches to completing a knowledge base and thus many variants of knowledge graph embedding have been proposed. It maps all entities and relations in knowledge base onto a low dimensional vector space. Then, candidate facts that are plausible in the space are determined as missing facts. However, any single knowledge graph embedding is insufficient to complete a knowledge base. As a solution to this problem, this paper defines knowledge base completion as a ranking task and proposes a committee-based knowledge graph embedding model for improving the performance of knowledge base completion. Since each knowledge graph embedding has its own idiosyncrasy, we make up a committee of various knowledge graph embeddings to reflect various perspectives. After ranking all candidate facts according to their plausibility computed by the committee, the top-k facts are chosen as missing facts. Our experimental results on two data sets show that the proposed model achieves higher performance than any single knowledge graph embedding and shows robust performances regardless of k. These results prove that the proposed model considers various perspectives in measuring the plausibility of candidate facts.

Download Full-text

Principles and practice in verifying rule-based systems

The Knowledge Engineering Review ◽

10.1017/s026988890000624x ◽

1992 ◽

Vol 7 (2) ◽

pp. 115-141 ◽

Cited By ~ 40

Author(s):

Alun D. Preece ◽

Rajjan Shinghal ◽

Aïda Batarekh

Keyword(s):

Expert System ◽

Knowledge Base ◽

State Of The Art ◽

Knowledge Bases ◽

Order Logic ◽

First Order Logic ◽

Rule Based ◽

First Order ◽

Rule Bases ◽

System Knowledge

AbstractThis paper surveys the verification of expert system knowledge bases by detecting anomalies. Such anomalies are highly indicative of errors in the knowledge base. The paper is in two parts. The first part describes four types of anomaly: redundancy, ambivalence, circularity, and deficiency. We consider rule bases which are based on first-order logic, and explain the anomalies in terms of the syntax and semantics of logic. The second part presents a review of five programs which have been built to detect various subsets of the anomalies. The four anomalies provide a framework for comparing the capabilities of the five tools, and we highlight the strengths and weaknesses of each approach. This paper therefore provides not only a set of underlying principles for performing knowledge base verification through anomaly detection, but also a survey of the state-of-the-art in building practical tools for carrying out such verification. The reader of this paper is expected to be familiar with first-order logic.

Download Full-text

Refining Automatically Extracted Knowledge Bases Using Crowdsourcing

Computational Intelligence and Neuroscience ◽

10.1155/2017/4092135 ◽

2017 ◽

Vol 2017 ◽

pp. 1-17

Author(s):

Chunhua Li ◽

Pengpeng Zhao ◽

Victor S. Sheng ◽

Xuefeng Xian ◽

Jian Wu ◽

...

Keyword(s):

Knowledge Base ◽

State Of The Art ◽

Knowledge Bases ◽

Important Research ◽

Semantic Constraints ◽

Knowledge Base Refinement ◽

Automatic Methods ◽

Automated Algorithms ◽

Research Challenge

Machine-constructed knowledge bases often contain noisy and inaccurate facts. There exists significant work in developing automated algorithms for knowledge base refinement. Automated approaches improve the quality of knowledge bases but are far from perfect. In this paper, we leverage crowdsourcing to improve the quality of automatically extracted knowledge bases. As human labelling is costly, an important research challenge is how we can use limited human resources to maximize the quality improvement for a knowledge base. To address this problem, we first introduce a concept of semantic constraints that can be used to detect potential errors and do inference among candidate facts. Then, based on semantic constraints, we propose rank-based and graph-based algorithms for crowdsourced knowledge refining, which judiciously select the most beneficial candidate facts to conduct crowdsourcing and prune unnecessary questions. Our experiments show that our method improves the quality of knowledge bases significantly and outperforms state-of-the-art automatic methods under a reasonable crowdsourcing cost.

Download Full-text

End-to-End Structure-Aware Convolutional Networks for Knowledge Base Completion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013060 ◽

2019 ◽

Vol 33 ◽

pp. 3060-3067 ◽

Cited By ~ 20

Author(s):

Chao Shang ◽

Yun Tang ◽

Jing Huang ◽

Jinbo Bi ◽

Xiaodong He ◽

...

Keyword(s):

Knowledge Base ◽

State Of The Art ◽

The State ◽

Graph Connectivity ◽

Knowledge Graph ◽

Graph Node ◽

Convolutional Network ◽

Node Attributes ◽

Knowledge Graphs ◽

End To End

Knowledge graph embedding has been an active research topic for knowledge base completion, with progressive improvement from the initial TransE, TransH, DistMult et al to the current state-of-the-art ConvE. ConvE uses 2D convolution over embeddings and multiple layers of nonlinear features to model knowledge graphs. The model can be efficiently trained and scalable to large knowledge graphs. However, there is no structure enforcement in the embedding space of ConvE. The recent graph convolutional network (GCN) provides another way of learning graph node embedding by successfully utilizing graph connectivity structure. In this work, we propose a novel end-to-end StructureAware Convolutional Network (SACN) that takes the benefit of GCN and ConvE together. SACN consists of an encoder of a weighted graph convolutional network (WGCN), and a decoder of a convolutional network called Conv-TransE. WGCN utilizes knowledge graph node structure, node attributes and edge relation types. It has learnable weights that adapt the amount of information from neighbors used in local aggregation, leading to more accurate embeddings of graph nodes. Node attributes in the graph are represented as additional nodes in the WGCN. The decoder Conv-TransE enables the state-of-the-art ConvE to be translational between entities and relations while keeps the same link prediction performance as ConvE. We demonstrate the effectiveness of the proposed SACN on standard FB15k-237 and WN18RR datasets, and it gives about 10% relative improvement over the state-of-theart ConvE in terms of HITS@1, HITS@3 and HITS@10.

Download Full-text

Explainable Reasoning over Knowledge Graphs for Recommendation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015329 ◽

2019 ◽

Vol 33 ◽

pp. 5329-5336 ◽

Cited By ~ 63

Author(s):

Xiang Wang ◽

Dingxian Wang ◽

Canran Xu ◽

Xiangnan He ◽

Yixin Cao ◽

...

Keyword(s):

Knowledge Base ◽

State Of The Art ◽

Recurrent Network ◽

User Preferences ◽

Knowledge Graph ◽

Complementary Information ◽

Sequential Dependencies ◽

Factorization Machine ◽

Knowledge Graphs ◽

Collaborative Knowledge

Incorporating knowledge graph into recommender systems has attracted increasing attention in recent years. By exploring the interlinks within a knowledge graph, the connectivity between users and items can be discovered as paths, which provide rich and complementary information to user-item interactions. Such connectivity not only reveals the semantics of entities and relations, but also helps to comprehend a user’s interest. However, existing efforts have not fully explored this connectivity to infer user preferences, especially in terms of modeling the sequential dependencies within and holistic semantics of a path.In this paper, we contribute a new model named Knowledgeaware Path Recurrent Network (KPRN) to exploit knowledge graph for recommendation. KPRN can generate path representations by composing the semantics of both entities and relations. By leveraging the sequential dependencies within a path, we allow effective reasoning on paths to infer the underlying rationale of a user-item interaction. Furthermore, we design a new weighted pooling operation to discriminate the strengths of different paths in connecting a user with an item, endowing our model with a certain level of explainability. We conduct extensive experiments on two datasets about movie and music, demonstrating significant improvements over state-of-the-art solutions Collaborative Knowledge Base Embedding and Neural Factorization Machine.

Download Full-text

LENA: Locality-Expanded Neural Embedding for Knowledge Base Completion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33012895 ◽

2019 ◽

Vol 33 ◽

pp. 2895-2902

Author(s):

Fanshuang Kong ◽

Richong Zhang ◽

Yongyi Mao ◽

Ting Deng

Keyword(s):

Knowledge Base ◽

Structural Information ◽

Relevant Information ◽

Knowledge Bases ◽

Loss Functions ◽

Knowledge Graph ◽

Global Information ◽

Sufficient Statistic ◽

Proposed Model ◽

Significant Research

Embedding based models for knowledge base completion have demonstrated great successes and attracted significant research interest. In this work, we observe that existing embedding models all have their loss functions decomposed into atomic loss functions, each on a triple or an postulated edge in the knowledge graph. Such an approach essentially implies that conditioned on the embeddings of the triple, whether the triple is factual is independent of the structure of the knowledge graph. Although arguably the embeddings of the entities and relation in the triple contain certain structural information of the knowledge base, we believe that the global information contained in the embeddings of the triple can be insufficient and such an assumption is overly optimistic in heterogeneous knowledge bases. Motivated by this understanding, in this work we propose a new embedding model in which we discard the assumption that the embeddings of the entities and relation in a triple is a sufficient statistic for the triple’s factual existence. More specifically, the proposed model assumes that whether a triple is factual depends not only on the embedding of the triple but also on the embeddings of the entities and relations in a larger graph neighbourhood. In this model, attention mechanisms are constructed to select the relevant information in the graph neighbourhood so that irrelevant signals in the neighbourhood are suppressed. Termed locality-expanded neural embedding with attention (LENA), this model is tested on four standard datasets and compared with several stateof-the-art models for knowledge base completion. Extensive experiments suggest that LENA outperforms the existing models in virtually every metric.

Download Full-text