inverted list
Recently Published Documents


TOTAL DOCUMENTS

19
(FIVE YEARS 4)

H-INDEX

3
(FIVE YEARS 1)

2020 ◽  
Vol 8 (6) ◽  
pp. 4419-4428

Advancements of various Geographic Information Technologies have resulted in huge growth in Geo-Textual data. Many Indexing and searching algorithms are developed to handle this Geo-Textual data which contains spatial, textual and temporal information. In past, Indexing and searching algorithms are developed for the applications in which the object trajectory or velocity vector is known in advance and hence we can predict the future position of the objects. There are real time applications like emergency management systems, traffic monitoring, where the objects movements are unpredictable and hence future position of the objects cannot be predicted. Techniques are required to answer the geo-textual kNN query where the velocity vectors or trajectories of moving and moving queries are not known. In case of moving objects, capturing current position of the object and maintaining spatial index optimally is very much essential. The hybrid indexing techniques used earlier are based on R-tree spatial index. The nodes of the R-tree index structure are split or merged to maintain the locations of continuously moving objects, increasing the maintenance cost as compared to the grid index. In this paper a solution is proposed for creating and maintaining hybrid index for moving objects and queries based on grid and inverted list hybrid indexing techniques. The method is also proposed for finding Geo-Textual nearest neighbours for static and moving queries using hybrid index and conceptual partitioning of the grid. The overall gain reported by the experimental work using hybrid index over the non- hybrid index is 30 to 40 percent depending on the grid size chosen for mapping the data space and on the parameters of queries.


2019 ◽  
Vol 4 (3) ◽  
pp. 254-268 ◽  
Author(s):  
Yang Yang ◽  
Wenjie Zhang ◽  
Ying Zhang ◽  
Xuemin Lin ◽  
Liping Wang

Abstract In this paper, we study the problem of selectivity estimation on set containment search. Given a query record Q and a record dataset $${\mathcal {S}}$$ S , we aim to accurately and efficiently estimate the selectivity of set containment search of query Q over $${\mathcal {S}}$$ S . We first extend existing distinct value estimating techniques to solve this problem and develop an inverted list and G-KMV sketch-based approach IL-GKMV. We analyze that the performance of IL-GKMV degrades with the increase in vocabulary size. Motivated by limitations of existing techniques and the inherent challenges of the problem, we resort to developing effective and efficient sampling approaches and propose an ordered trie structure-based sampling approach named OT-Sampling. OT-Sampling partitions records based on element frequency and occurrence patterns and is significantly more accurate compared with simple random sampling method and IL-GKMV. To further enhance the performance, a divide-and-conquer-based sampling approach, DC-Sampling, is presented with an inclusion/exclusion prefix to explore the pruning opportunities. Meanwhile, we consider weighted set containment selectivity estimation and devise stratified random sampling approach named StrRS. We theoretically analyze the proposed techniques regarding various accuracy estimators. Our comprehensive experiments on nine real datasets verify the effectiveness and efficiency of our proposed techniques.


Author(s):  
Yangjun Chen

In this chapter, the authors discuss an efficient and effective index mechanism for search engines to support both conjunctive and disjunctive queries. The main idea behind it is to decompose an inverted list into a collection of disjoint sub-lists. The authors associate each word with an interval sequence, which is created by applying a kind of tree coding to a trie structure constructed over all the word sequences in a database. Then, attach each interval, instead of a word, with an inverted sub-list. In this way, both set intersection and union can be conducted by performing a series of simple interval containment checks. Experiments have been conducted, which shows that the new index is promising. Also, how to maintain indices, when inserting or deleting documents, is discussed in great detail.


Author(s):  
Yangjun Chen

In this chapter, we discuss an efficient and effective index mechanism for search engines to support both conjunctive and disjunctive queries. The main idea behind it is to decompose an inverted list into a collection of disjoint sub-lists. We will associate each word with an interval sequence, which is created by applying a kind of tree coding to a trie structure constructed over all the word sequences in a database. Then, attach each interval, instead of a word, with an inverted sub-list. In this way, both set intersection and union can be conducted by performing a series of simple interval containment checks. Experiments have been conducted, which shows that the new index is promising. Also, how to maintain indexes, when inserting or deleting documents, is discussed in great detail.


2014 ◽  
Vol 10 (1) ◽  
pp. 65-84 ◽  
Author(s):  
Chang-Sup Park ◽  
Sungchae Lim

Purpose – The paper aims to propose an effective method to process keyword-based queries over graph-structured databases which are widely used in various applications such as XML, semantic web, and social network services. To satisfy users' information need, it proposes an extended answer structure for keyword queries, inverted list indexes on keywords and nodes, and query processing algorithms exploiting the inverted lists. The study aims to provide more effective and relevant answers to a given query than the previous approaches in an efficient way. Design/methodology/approach – A new relevance measure for nodes to a given keyword query is defined in the paper and according to the relevance metric, a new answer tree structure is proposed which has no constraint on the number of keyword nodes chosen for each query keyword. For efficient query processing, an inverted list-style index is suggested which pre-computes connectivity and relevance information on the nodes in the graph. Then, a query processing algorithm based on the pre-constructed inverted lists is designed, which aggregates list entries for each graph node relevant to given keywords and identifies top-k root nodes of answer trees most relevant to the given query. The basic search method is also enhanced by using extend inverted lists which store additional relevance information of the related entries in the lists in order to estimate the relevance score of a node more closely and to find top-k answers more efficiently. Findings – Experiments with real datasets and various test queries were conducted for evaluating effectiveness and performance of the proposed methods in comparison with one of the previous approaches. The experimental results show that the proposed methods with an extended answer structure produce more effective top-k results than the compared previous method for most of the queries, especially for those with OR semantics. An extended inverted list and enhanced search algorithm are shown to achieve much improvement on the execution performance compared to the basic search method. Originality/value – This paper proposes a new extended answer structure and query processing scheme for keyword queries on graph databases which can satisfy the users' information need represented by a keyword set having various semantics.


2013 ◽  
Vol 462-463 ◽  
pp. 1106-1109
Author(s):  
Hong Yuan Ma

Web search engine caches the results which is frequently queried by users. It is an effective approach to improve the efficiency of Web search engines. In this paper, we give some valuable experience in our design and implementation of a Web search engine cache system. We present there design principles: logical layer processing, event-based communication architecture and avoiding frequent data copy. We also introduce the architecture presented in practice, including connection processor, application processor, query results caching processor, inverted list caching processor and list intersection caching processor. Experiments are conducted in our cache system using a real Web search engine query log.


Sign in / Sign up

Export Citation Format

Share Document