Incorporating Vector Space Similarity in Random Walk Inference over Knowledge Bases

The nearest neighbor random walk on subspaces of a vector space and rate of convergence

Journal of Theoretical Probability ◽

10.1007/bf02212882 ◽

1995 ◽

Vol 8 (2) ◽

pp. 321-346 ◽

Cited By ~ 1

Author(s):

Anthony J. D'Aristotile

Keyword(s):

Random Walk ◽

Vector Space ◽

Rate Of Convergence ◽

Nearest Neighbor

Download Full-text

Combining Fact Extraction and Verification with Neural Semantic Matching Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016859 ◽

2019 ◽

Vol 33 ◽

pp. 6859-6866 ◽

Cited By ~ 7

Author(s):

Yixin Nie ◽

Haonan Chen ◽

Mohit Bansal

Keyword(s):

Vector Space ◽

Semantic Relatedness ◽

Document Retrieval ◽

Knowledge Bases ◽

Semantic Matching ◽

Matching Problem ◽

Matching Networks ◽

Three Stages ◽

Matching Models ◽

Semantic Awareness

The increasing concern with misinformation has stimulated research efforts on automatic fact checking. The recentlyreleased FEVER dataset introduced a benchmark factverification task in which a system is asked to verify a claim using evidential sentences from Wikipedia documents. In this paper, we present a connected system consisting of three homogeneous neural semantic matching models that conduct document retrieval, sentence selection, and claim verification jointly for fact extraction and verification. For evidence retrieval (document retrieval and sentence selection), unlike traditional vector space IR models in which queries and sources are matched in some pre-designed term vector space, we develop neural models to perform deep semantic matching from raw textual input, assuming no intermediate term representation and no access to structured external knowledge bases. We also show that Pageview frequency can also help improve the performance of evidence retrieval results, that later can be matched by using our neural semantic matching network. For claim verification, unlike previous approaches that simply feed upstream retrieved evidence and the claim to a natural language inference (NLI) model, we further enhance the NLI model by providing it with internal semantic relatedness scores (hence integrating it with the evidence retrieval modules) and ontological WordNet features. Experiments on the FEVER dataset indicate that (1) our neural semantic matching method outperforms popular TF-IDF and encoder models, by significant margins on all evidence retrieval metrics, (2) the additional relatedness score and WordNet features improve the NLI model via better semantic awareness, and (3) by formalizing all three subtasks as a similar semantic matching problem and improving on all three stages, the complete model is able to achieve the state-of-the-art results on the FEVER test set (two times greater than baseline results).1

Download Full-text

Automated Rule Base Completion as Bayesian Concept Induction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016228 ◽

2019 ◽

Vol 33 ◽

pp. 6228-6235 ◽

Cited By ~ 2

Author(s):

Zied Bouraoui ◽

Steven Schockaert

Keyword(s):

Vector Space ◽

Inductive Reasoning ◽

Knowledge Bases ◽

Space Representation ◽

Rule Base ◽

Gaussian Distributions ◽

Concept Induction ◽

Model Rule ◽

The Given ◽

Vector Space Representation

Considerable attention has recently been devoted to the problem of automatically extending knowledge bases by applying some form of inductive reasoning. While the vast majority of existing work is centred around so-called knowledge graphs, in this paper we consider a setting where the input consists of a set of (existential) rules. To this end, we exploit a vector space representation of the considered concepts, which is partly induced from the rule base itself and partly from a pre-trained word embedding. Inspired by recent approaches to concept induction, we then model rule templates in this vector space embedding using Gaussian distributions. Unlike many existing approaches, we learn rules by directly exploiting regularities in the given rule base, and do not require that a database with concept and relation instances is given. As a result, our method can be applied to a wide variety of ontologies. We present experimental results that demonstrate the effectiveness of our method.

Download Full-text

Tracking random walk systems with vector space adaptive filters

IEEE Transactions on Circuits and Systems II Analog and Digital Signal Processing ◽

10.1109/82.404086 ◽

1995 ◽

Vol 42 (8) ◽

pp. 543-547 ◽

Cited By ~ 8

Author(s):

G.A. Williamson

Keyword(s):

Random Walk ◽

Vector Space ◽

Adaptive Filters

Download Full-text

Graph-based entity-oriented search

ACM SIGIR Forum ◽

10.1145/3476415.3476430 ◽

2021 ◽

Vol 55 (1) ◽

pp. 1-2

Author(s):

José Devezas

Keyword(s):

Random Walk ◽

Ad Hoc ◽

Evaluation Process ◽

Document Retrieval ◽

Knowledge Bases ◽

Ranking Function ◽

Test Collection ◽

Information Need ◽

Keyword Query ◽

Walk Score

Entity-oriented search has revolutionized search engines. In the era of Google Knowledge Graph and Microsoft Satori, users demand an effortless process of search. Whether they express an information need through a keyword query, expecting documents and entities, or through a clicked entity, expecting related entities, there is an inherent need for the combination of corpora and knowledge bases to obtain an answer. Such integration frequently relies on independent signals extracted from inverted indexes, and from quad indexes indirectly accessed through queries to a triplestore. However, relying on two separate representation models inhibits the effective cross-referencing of information, discarding otherwise available relations that could lead to a better ranking. Moreover, different retrieval tasks often demand separate implementations, although the problem is, at its core, the same. With the goal of harnessing all available information to optimize retrieval, we explore joint representation models of documents and entities, while taking a step towards the definition of a more general retrieval approach. Specifically, we propose that graphs should be used to incorporate explicit and implicit information derived from the relations between text found in corpora and entities found in knowledge bases. We also take advantage of this framework to elaborate a general model for entity-oriented search, proposing a universal ranking function for the tasks of ad hoc document retrieval (leveraging entities), ad hoc entity retrieval, and entity list completion. At a conceptual stage, we begin by proposing the graph-of-entity, based on the relations between combinations of term and entity nodes. We introduce the entity weight as the corresponding ranking function, relying on the idea of seed nodes for representing the query, either directly through term nodes, or based on the expansion to adjacent entity nodes. The score is computed based on a series of geodesic distances to the remaining nodes, providing a ranking for the documents (or entities) in the graph. In order to improve on the low scalability of the graph-of-entity, we then redesigned this model in a way that reduced the number of edges in relation to the number of nodes, by relying on the hypergraph data structure. The resulting model, which we called hypergraph-of-entity, is the main contribution of this thesis. The obtained reduction was achieved by replacing binary edges with n -ary relations based on sets of nodes and entities (undirected document hyperedges), sets of entities (undirected hyperedges, either based on cooccurrence or a grouping by semantic subject), and pairs of a set of terms and a set of one entity (directed hyperedges, mapping text to an object). We introduce the random walk score as the corresponding ranking function, relying on the same idea of seed nodes, similar to the entity weight in the graph-of-entity. Scoring based on this function is highly reliant on the structure of the hypergraph, which we call representation-driven retrieval. As such, we explore several extensions of the hypergraph-of-entity, including relations of synonymy, or contextual similarity, as well as different weighting functions per node and hyperedge type. We also propose TF-bins as a discretization for representing term frequency in the hypergraph-of-entity. For the random walk score, we propose and explore several parameters, including length and repeats, with or without seed node expansion, direction, or weights, and with or without a certain degree of node and/or hyperedge fatigue, a concept that we also propose. For evaluation, we took advantage of TREC 2017 OpenSearch track, which relied on an online evaluation process based on the Living Labs API, and we also participated in TREC 2018 Common Core track, which was based on the newly introduced TREC Washington Post Corpus. Our main experiments were supported on the INEX 2009 Wikipedia collection, which proved to be a fundamental test collection for assessing retrieval effectiveness across multiple tasks. At first, our experiments solely focused on ad hoc document retrieval, ensuring that the model performed adequately for a classical task. We then expanded the work to cover all three entity-oriented search tasks. Results supported the viability of a general retrieval model, opening novel challenges in information retrieval, and proposing a new path towards generality in this area.

Download Full-text