Graph-based entity-oriented search

Entity-oriented search has revolutionized search engines. In the era of Google Knowledge Graph and Microsoft Satori, users demand an effortless process of search. Whether they express an information need through a keyword query, expecting documents and entities, or through a clicked entity, expecting related entities, there is an inherent need for the combination of corpora and knowledge bases to obtain an answer. Such integration frequently relies on independent signals extracted from inverted indexes, and from quad indexes indirectly accessed through queries to a triplestore. However, relying on two separate representation models inhibits the effective cross-referencing of information, discarding otherwise available relations that could lead to a better ranking. Moreover, different retrieval tasks often demand separate implementations, although the problem is, at its core, the same. With the goal of harnessing all available information to optimize retrieval, we explore joint representation models of documents and entities, while taking a step towards the definition of a more general retrieval approach. Specifically, we propose that graphs should be used to incorporate explicit and implicit information derived from the relations between text found in corpora and entities found in knowledge bases. We also take advantage of this framework to elaborate a general model for entity-oriented search, proposing a universal ranking function for the tasks of ad hoc document retrieval (leveraging entities), ad hoc entity retrieval, and entity list completion. At a conceptual stage, we begin by proposing the graph-of-entity, based on the relations between combinations of term and entity nodes. We introduce the entity weight as the corresponding ranking function, relying on the idea of seed nodes for representing the query, either directly through term nodes, or based on the expansion to adjacent entity nodes. The score is computed based on a series of geodesic distances to the remaining nodes, providing a ranking for the documents (or entities) in the graph. In order to improve on the low scalability of the graph-of-entity, we then redesigned this model in a way that reduced the number of edges in relation to the number of nodes, by relying on the hypergraph data structure. The resulting model, which we called hypergraph-of-entity, is the main contribution of this thesis. The obtained reduction was achieved by replacing binary edges with n -ary relations based on sets of nodes and entities (undirected document hyperedges), sets of entities (undirected hyperedges, either based on cooccurrence or a grouping by semantic subject), and pairs of a set of terms and a set of one entity (directed hyperedges, mapping text to an object). We introduce the random walk score as the corresponding ranking function, relying on the same idea of seed nodes, similar to the entity weight in the graph-of-entity. Scoring based on this function is highly reliant on the structure of the hypergraph, which we call representation-driven retrieval. As such, we explore several extensions of the hypergraph-of-entity, including relations of synonymy, or contextual similarity, as well as different weighting functions per node and hyperedge type. We also propose TF-bins as a discretization for representing term frequency in the hypergraph-of-entity. For the random walk score, we propose and explore several parameters, including length and repeats, with or without seed node expansion, direction, or weights, and with or without a certain degree of node and/or hyperedge fatigue, a concept that we also propose. For evaluation, we took advantage of TREC 2017 OpenSearch track, which relied on an online evaluation process based on the Living Labs API, and we also participated in TREC 2018 Common Core track, which was based on the newly introduced TREC Washington Post Corpus. Our main experiments were supported on the INEX 2009 Wikipedia collection, which proved to be a fundamental test collection for assessing retrieval effectiveness across multiple tasks. At first, our experiments solely focused on ad hoc document retrieval, ensuring that the model performed adequately for a classical task. We then expanded the work to cover all three entity-oriented search tasks. Results supported the viability of a general retrieval model, opening novel challenges in information retrieval, and proposing a new path towards generality in this area.

Download Full-text

A Game Theoretic Analysis of the Adversarial Retrieval Setting

Journal of Artificial Intelligence Research ◽

10.1613/jair.5547 ◽

2017 ◽

Vol 60 ◽

pp. 1127-1164 ◽

Cited By ~ 3

Author(s):

Ran Ben Basat ◽

Moshe Tennenholtz ◽

Oren Kurland

Keyword(s):

Ad Hoc ◽

Document Retrieval ◽

Theoretic Analysis ◽

Information Need ◽

Document Ranking ◽

Game Theoretic Analysis ◽

Different Types ◽

Ad Hoc Retrieval ◽

Game Theoretic ◽

Do So

The main goal of search engines is ad hoc retrieval: ranking documents in a corpus by their relevance to the information need expressed by a query. The Probability Ranking Principle (PRP) --- ranking the documents by their relevance probabilities --- is the theoretical foundation of most existing ad hoc document retrieval methods. A key observation that motivates our work is that the PRP does not account for potential post-ranking effects; specifically, changes to documents that result from a given ranking. Yet, in adversarial retrieval settings such as the Web, authors may consistently try to promote their documents in rankings by changing them. We prove that, indeed, the PRP can be sub-optimal in adversarial retrieval settings. We do so by presenting a novel game theoretic analysis of the adversarial setting. The analysis is performed for different types of documents (single-topic and multi-topic) and is based on different assumptions about the writing qualities of documents' authors. We show that in some cases, introducing randomization into the document ranking function yields an overall user utility that transcends that of applying the PRP.

Download Full-text

A Test Collection for Ad-hoc Dataset Retrieval

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ◽

10.1145/3404835.3463261 ◽

2021 ◽

Author(s):

Makoto P. Kato ◽

Hiroaki Ohshima ◽

Ying-Hsang Liu ◽

Hsin-Liang Chen

Keyword(s):

Ad Hoc ◽

Test Collection

Download Full-text

I Know What You Need: Investigating Document Retrieval Effectiveness with Partial Session Contexts

ACM Transactions on Information Systems ◽

10.1145/3488667 ◽

2022 ◽

Vol 40 (3) ◽

pp. 1-30

Author(s):

Procheta Sen ◽

Debasis Ganguly ◽

Gareth J. F. Jones

Keyword(s):

Relevant Information ◽

Document Retrieval ◽

Context Information ◽

Information Need ◽

Search System ◽

Query Log ◽

Sequence Modeling ◽

Joint Embedding ◽

One Step ◽

A Current

Reducing user effort in finding relevant information is one of the key objectives of search systems. Existing approaches have been shown to effectively exploit the context from the current search session of users for automatically suggesting queries to reduce their search efforts. However, these approaches do not accomplish the end goal of a search system—that of retrieving a set of potentially relevant documents for the evolving information need during a search session. This article takes the problem of query prediction one step further by investigating the problem of contextual recommendation within a search session. More specifically, given the partial context information of a session in the form of a small number of queries, we investigate how a search system can effectively predict the documents that a user would have been presented with had he continued the search session by submitting subsequent queries. To address the problem, we propose a model of contextual recommendation that seeks to capture the underlying semantics of information need transitions of a current user’s search context. This model leverages information from a number of past interactions of other users with similar interactions from an existing search log. To identify similar interactions, as a novel contribution, we propose an embedding approach that jointly learns representations of both individual query terms and also those of queries (in their entirety) from a search log data by leveraging session-level containment relationships. Our experiments conducted on a large query log, namely the AOL, demonstrate that using a joint embedding of queries and their terms within our proposed framework of document retrieval outperforms a number of text-only and sequence modeling based baselines.

Download Full-text

Determining the embeddedness of sustainability claims in strategising: A comparative study of the ALSI 40 companies

Acta Commercii ◽

10.4102/ac.v12i1.140 ◽

2012 ◽

Vol 12 (1) ◽

Cited By ~ 2

Author(s):

M. Pretorius ◽

C. Le Roux

Keyword(s):

Ad Hoc ◽

Comprehensive Evaluation ◽

Evaluation Process ◽

Sustainability Reporting ◽

Measurement Tool ◽

Point Scale ◽

Strategy Formulation ◽

Specific Element ◽

Compliance Requirements ◽

A Company

Purpose: To determine the level of sustainability embeddedness in strategising by investigating the public and external communication of companies. Problem investigated: The extent to which sustainability is embedded in the elements of strategy formulation and implementation (and not merely surface-level statements and claims) Design: The researchers designed a measurement tool and scale, the Strategising for Sustainability Index (SSI), based on researched elements of strategising and recent literature on the topic of sustainability and strategy integration. Merit for strategising for sustainability was given to a company on the basis of its fulfilling the relevant criteria. The JSE Top 40 listed companies on the All Share Index as of March 2011 were selected as a purposive sample. Each company's data and each element of the scorecard were judged on a Likert-type five-point scale, with higher scores indicating higher levels of embeddedness in the strategy. A comprehensive evaluation sheet was used to judge the presented data individually and independently for each element of the scorecard instrument. Findings: Ten elements were found relevant and represented: compliance (2 elements); strategy formulation (4 elements); and strategy implementation (4 elements). The findings show wide variation in overall scores. Almost all companies satisfied the compliance requirements but variations were observed in both formulation and implementation embeddedness. The SSI tool has discrimination value despite a relatively complex judging process. The proposed SSI measurement challenges other determinants of sustainability performance, as it incorporates embeddedness of sustainability in strategising. Knowing the level of this could guide management towards directing resources away from 'over-invested' strengths related to sustainability. Considering the scores for the different elements of the instrument would help to prioritise the 'sustainability spend'. Furthermore, the SSI tool directs attention to how sustainability is incorporated in the strategising process. Originality and Value : The measure of the level of embeddedness of sustainability in strategising has not been done before. This study addresses the possible 'window-dressing' claims surrounding sustainability and highlights those companies who have successfully demonstrated that sustainability is not just for reputation purposes and is, in fact, part of their operating as a listed company. Conclusion : Firstly, it was possible to use the SSI framework and the evaluation process and apply it to the sustainability reporting and claims for each firm. Secondly, each element could be judged on the unique scale for the specific element. The SSI measurement tool can be used to describe the level of strategising embeddedness. The SSI tool's framework is based on input of literature and on the foundation of strategic principles. The 5-point scale on the SSI tool serves to describe the achievement of a company for each element. The sustainability claims of these companies varied in embeddedness in the process of strategising. The score is lower for the formulation elements, raising the question of whether some projects are possibly implemented ad hoc to score points, without being necessarily formulated as part of strategy

Download Full-text

Passage Retrieval vs. Document Retrieval in the CLEF 2006 Ad Hoc Monolingual Tasks with the IR-n System

Evaluation of Multilingual and Multi-modal Information Retrieval - Lecture Notes in Computer Science ◽

10.1007/978-3-540-74999-8_8 ◽

2007 ◽

pp. 62-65

Author(s):

Elisa Noguera ◽

Fernando Llopis

Keyword(s):

Ad Hoc ◽

Document Retrieval ◽

Passage Retrieval

Download Full-text

Opportunistic Communication Based on Random Walk Mobility Model for Building Rescue in Ad Hoc Networks

INTERNATIONAL JOURNAL ON Advances in Information Sciences and Service Sciences ◽

10.4156/aiss.vol4.issue20.39 ◽

2012 ◽

Vol 4 (20) ◽

pp. 326-335

Author(s):

Yun Wang ◽

Demin Li ◽

Qianyi Zhang ◽

Wenhui Sun

Keyword(s):

Random Walk ◽

Ad Hoc Networks ◽

Ad Hoc ◽

Mobility Model ◽

Opportunistic Communication ◽

Hoc Networks

Download Full-text

Improving Sentence Retrieval Using Sequence Similarity

Applied Sciences ◽

10.3390/app10124316 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4316 ◽

Cited By ~ 1

Author(s):

Ivan Boban ◽

Alen Doko ◽

Sven Gotovac

Keyword(s):

Question Answering ◽

Sequence Similarity ◽

Novelty Detection ◽

Document Retrieval ◽

Language Modeling ◽

Information Need ◽

Partial Matching ◽

Retrieval Technique ◽

Sentence Retrieval ◽

Using Data

Sentence retrieval is an information retrieval technique that aims to find sentences corresponding to an information need. It is used for tasks like question answering (QA) or novelty detection. Since it is similar to document retrieval but with a smaller unit of retrieval, methods for document retrieval are also used for sentence retrieval like term frequency—inverse document frequency (TF-IDF), BM 25 , and language modeling-based methods. The effect of partial matching of words to sentence retrieval is an issue that has not been analyzed. We think that there is a substantial potential for the improvement of sentence retrieval methods if we consider this approach. We adapted TF-ISF, BM 25 , and language modeling-based methods to test the partial matching of terms through combining sentence retrieval with sequence similarity, which allows matching of words that are similar but not identical. All tests were conducted using data from the novelty tracks of the Text Retrieval Conference (TREC). The scope of this paper was to find out if such approach is generally beneficial to sentence retrieval. However, we did not examine in depth how partial matching helps or hinders the finding of relevant sentences.

Download Full-text

Random walk with long jumps for wireless ad hoc networks

Ad Hoc Networks ◽

10.1016/j.adhoc.2008.03.001 ◽

2009 ◽

Vol 7 (2) ◽

pp. 294-306 ◽

Cited By ~ 21

Author(s):

Roberto Beraldi

Keyword(s):

Random Walk ◽

Ad Hoc Networks ◽

Wireless Ad Hoc Networks ◽

Ad Hoc ◽

Hoc Networks

Download Full-text

Effective keyword search on graph data using limited root redundancy of answer trees

International Journal of Web Information Systems ◽

10.1108/ijwis-10-2017-0070 ◽

2018 ◽

Vol 14 (3) ◽

pp. 299-316 ◽

Cited By ~ 2

Author(s):

Chang-Sup Park

Keyword(s):

Query Processing ◽

Keyword Search ◽

Information Need ◽

Keyword Query ◽

Content Type ◽

Graph Data ◽

Distinct Root ◽

Path Index ◽

Complex Graph ◽

Efficient Query Processing

Purpose This paper aims to propose a new keyword search method on graph data to improve the relevance of search results and reduce duplication of content nodes in the answer trees obtained by previous approaches based on distinct root semantics. The previous approaches are restricted to find answer trees having different root nodes and thus often generate a result consisting of answer trees with low relevance to the query or duplicate content nodes. The method allows limited redundancy in the root nodes of top-k answer trees to produce more effective query results. Design/methodology/approach A measure for redundancy in a set of answer trees regarding their root nodes is defined, and according to the metric, a set of answer trees with limited root redundancy is proposed for the result of a keyword query on graph data. For efficient query processing, an index on the useful paths in the graph using inverted lists and a hash map is suggested. Then, based on the path index, a top-k query processing algorithm is presented to find most relevant and diverse answer trees given a maximum amount of root redundancy allowed for a set of answer trees. Findings The results of experiments using real graph datasets show that the proposed approach can produce effective query answers which are more diverse in the content nodes and more relevant to the query than the previous approach based on distinct root semantics. Originality/value This paper first takes redundancy in the root nodes of answer trees into account to improve the relevance and content nodes redundancy of query results over the previous distinct root semantics. It can satisfy the users’ various information need on a large and complex graph data using a keyword-based query.

Download Full-text

Critical Review of Expert System Validation in Transportation

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/1588-13 ◽

1997 ◽

Vol 1588 (1) ◽

pp. 104-109 ◽

Cited By ~ 5

Author(s):

Gary S. Spring

Keyword(s):

Expert System ◽

Expert Systems ◽

Critical Review ◽

Ad Hoc ◽

Evaluation Process ◽

Performance Levels ◽

System Validation ◽

Overall Evaluation ◽

Definition Of ◽

Made In

Expert system validation—that is, testing systems to ascertain whether they achieve acceptable performance levels—has with few exceptions been ad hoc, informal, and of dubious value. Very few efforts have been made in this regard in the transportation area. A discussion of the major issues involved in validating expert systems is provided, as is a review of the work that has been done in this area. The review includes a definition of validation within the context of the overall evaluation process, descriptions and critiques of several approaches to validation, and descriptions of guidelines that have been developed for this purpose.

Download Full-text