scholarly journals Graph-based entity-oriented search

2021 ◽  
Vol 55 (1) ◽  
pp. 1-2
Author(s):  
José Devezas

Entity-oriented search has revolutionized search engines. In the era of Google Knowledge Graph and Microsoft Satori, users demand an effortless process of search. Whether they express an information need through a keyword query, expecting documents and entities, or through a clicked entity, expecting related entities, there is an inherent need for the combination of corpora and knowledge bases to obtain an answer. Such integration frequently relies on independent signals extracted from inverted indexes, and from quad indexes indirectly accessed through queries to a triplestore. However, relying on two separate representation models inhibits the effective cross-referencing of information, discarding otherwise available relations that could lead to a better ranking. Moreover, different retrieval tasks often demand separate implementations, although the problem is, at its core, the same. With the goal of harnessing all available information to optimize retrieval, we explore joint representation models of documents and entities, while taking a step towards the definition of a more general retrieval approach. Specifically, we propose that graphs should be used to incorporate explicit and implicit information derived from the relations between text found in corpora and entities found in knowledge bases. We also take advantage of this framework to elaborate a general model for entity-oriented search, proposing a universal ranking function for the tasks of ad hoc document retrieval (leveraging entities), ad hoc entity retrieval, and entity list completion. At a conceptual stage, we begin by proposing the graph-of-entity, based on the relations between combinations of term and entity nodes. We introduce the entity weight as the corresponding ranking function, relying on the idea of seed nodes for representing the query, either directly through term nodes, or based on the expansion to adjacent entity nodes. The score is computed based on a series of geodesic distances to the remaining nodes, providing a ranking for the documents (or entities) in the graph. In order to improve on the low scalability of the graph-of-entity, we then redesigned this model in a way that reduced the number of edges in relation to the number of nodes, by relying on the hypergraph data structure. The resulting model, which we called hypergraph-of-entity, is the main contribution of this thesis. The obtained reduction was achieved by replacing binary edges with n -ary relations based on sets of nodes and entities (undirected document hyperedges), sets of entities (undirected hyperedges, either based on cooccurrence or a grouping by semantic subject), and pairs of a set of terms and a set of one entity (directed hyperedges, mapping text to an object). We introduce the random walk score as the corresponding ranking function, relying on the same idea of seed nodes, similar to the entity weight in the graph-of-entity. Scoring based on this function is highly reliant on the structure of the hypergraph, which we call representation-driven retrieval. As such, we explore several extensions of the hypergraph-of-entity, including relations of synonymy, or contextual similarity, as well as different weighting functions per node and hyperedge type. We also propose TF-bins as a discretization for representing term frequency in the hypergraph-of-entity. For the random walk score, we propose and explore several parameters, including length and repeats, with or without seed node expansion, direction, or weights, and with or without a certain degree of node and/or hyperedge fatigue, a concept that we also propose. For evaluation, we took advantage of TREC 2017 OpenSearch track, which relied on an online evaluation process based on the Living Labs API, and we also participated in TREC 2018 Common Core track, which was based on the newly introduced TREC Washington Post Corpus. Our main experiments were supported on the INEX 2009 Wikipedia collection, which proved to be a fundamental test collection for assessing retrieval effectiveness across multiple tasks. At first, our experiments solely focused on ad hoc document retrieval, ensuring that the model performed adequately for a classical task. We then expanded the work to cover all three entity-oriented search tasks. Results supported the viability of a general retrieval model, opening novel challenges in information retrieval, and proposing a new path towards generality in this area.

2017 ◽  
Vol 60 ◽  
pp. 1127-1164 ◽  
Author(s):  
Ran Ben Basat ◽  
Moshe Tennenholtz ◽  
Oren Kurland

The main goal of search engines is ad hoc retrieval: ranking documents in a corpus by their relevance to the information need expressed by a query. The Probability Ranking Principle (PRP) --- ranking the documents by their relevance probabilities --- is the theoretical foundation of most existing ad hoc document retrieval methods. A key observation that motivates our work is that the PRP does not account for potential post-ranking effects; specifically, changes to documents that result from a given ranking. Yet, in adversarial retrieval settings such as the Web, authors may consistently try to promote their documents in rankings by changing them. We prove that, indeed, the PRP can be sub-optimal in adversarial retrieval settings. We do so by presenting a novel game theoretic analysis of the adversarial setting. The analysis is performed for different types of documents (single-topic and multi-topic) and is based on different assumptions about the writing qualities of documents' authors. We show that in some cases, introducing randomization into the document ranking function yields an overall user utility that transcends that of applying the PRP.


2022 ◽  
Vol 40 (3) ◽  
pp. 1-30
Author(s):  
Procheta Sen ◽  
Debasis Ganguly ◽  
Gareth J. F. Jones

Reducing user effort in finding relevant information is one of the key objectives of search systems. Existing approaches have been shown to effectively exploit the context from the current search session of users for automatically suggesting queries to reduce their search efforts. However, these approaches do not accomplish the end goal of a search system—that of retrieving a set of potentially relevant documents for the evolving information need during a search session. This article takes the problem of query prediction one step further by investigating the problem of contextual recommendation within a search session. More specifically, given the partial context information of a session in the form of a small number of queries, we investigate how a search system can effectively predict the documents that a user would have been presented with had he continued the search session by submitting subsequent queries. To address the problem, we propose a model of contextual recommendation that seeks to capture the underlying semantics of information need transitions of a current user’s search context. This model leverages information from a number of past interactions of other users with similar interactions from an existing search log. To identify similar interactions, as a novel contribution, we propose an embedding approach that jointly learns representations of both individual query terms and also those of queries (in their entirety) from a search log data by leveraging session-level containment relationships. Our experiments conducted on a large query log, namely the AOL, demonstrate that using a joint embedding of queries and their terms within our proposed framework of document retrieval outperforms a number of text-only and sequence modeling based baselines.


2012 ◽  
Vol 12 (1) ◽  
Author(s):  
M. Pretorius ◽  
C. Le Roux

Purpose: To determine the level of sustainability embeddedness in strategising by investigating the public and external communication of companies. Problem investigated: The extent to which sustainability is embedded in the elements of strategy formulation and implementation (and not merely surface-level statements and claims) Design: The researchers designed a measurement tool and scale, the Strategising for Sustainability Index (SSI), based on researched elements of strategising and recent literature on the topic of sustainability and strategy integration. Merit for strategising for sustainability was given to a company on the basis of its fulfilling the relevant criteria. The JSE Top 40 listed companies on the All Share Index as of March 2011 were selected as a purposive sample. Each company's data and each element of the scorecard were judged on a Likert-type five-point scale, with higher scores indicating higher levels of embeddedness in the strategy. A comprehensive evaluation sheet was used to judge the presented data individually and independently for each element of the scorecard instrument. Findings: Ten elements were found relevant and represented: compliance (2 elements); strategy formulation (4 elements); and strategy implementation (4 elements). The findings show wide variation in overall scores. Almost all companies satisfied the compliance requirements but variations were observed in both formulation and implementation embeddedness. The SSI tool has discrimination value despite a relatively complex judging process. The proposed SSI measurement challenges other determinants of sustainability performance, as it incorporates embeddedness of sustainability in strategising. Knowing the level of this could guide management towards directing resources away from 'over-invested' strengths related to sustainability. Considering the scores for the different elements of the instrument would help to prioritise the 'sustainability spend'. Furthermore, the SSI tool directs attention to how sustainability is incorporated in the strategising process. Originality and Value : The measure of the level of embeddedness of sustainability in strategising has not been done before. This study addresses the possible 'window-dressing' claims surrounding sustainability and highlights those companies who have successfully demonstrated that sustainability is not just for reputation purposes and is, in fact, part of their operating as a listed company. Conclusion : Firstly, it was possible to use the SSI framework and the evaluation process and apply it to the sustainability reporting and claims for each firm. Secondly, each element could be judged on the unique scale for the specific element. The SSI measurement tool can be used to describe the level of strategising embeddedness. The SSI tool's framework is based on input of literature and on the foundation of strategic principles. The 5-point scale on the SSI tool serves to describe the achievement of a company for each element. The sustainability claims of these companies varied in embeddedness in the process of strategising. The score is lower for the formulation elements, raising the question of whether some projects are possibly implemented ad hoc to score points, without being necessarily formulated as part of strategy


2020 ◽  
Vol 10 (12) ◽  
pp. 4316 ◽  
Author(s):  
Ivan Boban ◽  
Alen Doko ◽  
Sven Gotovac

Sentence retrieval is an information retrieval technique that aims to find sentences corresponding to an information need. It is used for tasks like question answering (QA) or novelty detection. Since it is similar to document retrieval but with a smaller unit of retrieval, methods for document retrieval are also used for sentence retrieval like term frequency—inverse document frequency (TF-IDF), BM 25 , and language modeling-based methods. The effect of partial matching of words to sentence retrieval is an issue that has not been analyzed. We think that there is a substantial potential for the improvement of sentence retrieval methods if we consider this approach. We adapted TF-ISF, BM 25 , and language modeling-based methods to test the partial matching of terms through combining sentence retrieval with sequence similarity, which allows matching of words that are similar but not identical. All tests were conducted using data from the novelty tracks of the Text Retrieval Conference (TREC). The scope of this paper was to find out if such approach is generally beneficial to sentence retrieval. However, we did not examine in depth how partial matching helps or hinders the finding of relevant sentences.


2018 ◽  
Vol 14 (3) ◽  
pp. 299-316 ◽  
Author(s):  
Chang-Sup Park

Purpose This paper aims to propose a new keyword search method on graph data to improve the relevance of search results and reduce duplication of content nodes in the answer trees obtained by previous approaches based on distinct root semantics. The previous approaches are restricted to find answer trees having different root nodes and thus often generate a result consisting of answer trees with low relevance to the query or duplicate content nodes. The method allows limited redundancy in the root nodes of top-k answer trees to produce more effective query results. Design/methodology/approach A measure for redundancy in a set of answer trees regarding their root nodes is defined, and according to the metric, a set of answer trees with limited root redundancy is proposed for the result of a keyword query on graph data. For efficient query processing, an index on the useful paths in the graph using inverted lists and a hash map is suggested. Then, based on the path index, a top-k query processing algorithm is presented to find most relevant and diverse answer trees given a maximum amount of root redundancy allowed for a set of answer trees. Findings The results of experiments using real graph datasets show that the proposed approach can produce effective query answers which are more diverse in the content nodes and more relevant to the query than the previous approach based on distinct root semantics. Originality/value This paper first takes redundancy in the root nodes of answer trees into account to improve the relevance and content nodes redundancy of query results over the previous distinct root semantics. It can satisfy the users’ various information need on a large and complex graph data using a keyword-based query.


1997 ◽  
Vol 1588 (1) ◽  
pp. 104-109 ◽  
Author(s):  
Gary S. Spring

Expert system validation—that is, testing systems to ascertain whether they achieve acceptable performance levels—has with few exceptions been ad hoc, informal, and of dubious value. Very few efforts have been made in this regard in the transportation area. A discussion of the major issues involved in validating expert systems is provided, as is a review of the work that has been done in this area. The review includes a definition of validation within the context of the overall evaluation process, descriptions and critiques of several approaches to validation, and descriptions of guidelines that have been developed for this purpose.


Sign in / Sign up

Export Citation Format

Share Document