Re-Ordered FEGC and Block Based FEGC for Inverted File Compression

2013 ◽  
Vol 3 (1) ◽  
pp. 71-88 ◽  
Author(s):  
V. Glory ◽  
S. Domnic

Data compression has been widely used in many Information Retrieval based applications like web search engines, digital libraries, etc. to enable the retrieval of data to be faster. In these applications, universal codes (Elias codes (EC), Fibonacci code (FC), Rice code (RC), Extended Golomb code (EGC), Fast Extended Golomb code (FEGC) etc.) have been preferably used than statistical codes (Huffman codes, Arithmetic codes etc). Universal codes are easy to be constructed and decoded than statistical codes. In this paper, the authors have proposed two methods to construct universal codes based on the ideas used in Rice code and Fast Extended Golomb Code. One of the authors’ methods, Re-ordered FEGC, can be suitable to represent small, middle and large range integers where Rice code works well for small and middle range integers. It is also competing with FC, EGC and FEGC in representing small, middle and large range integers. But it could be faster in decoding than FC, EGC and FEGC. The authors’ another coder, Block based RFEGC, uses local divisor rather than global divisor to improve the performance (both compression and decompression) of RFEGC. To evaluate the performance of the authors’ coders, the authors have applied their methods to compress the integer values of the inverted files constructed from TREC, Wikipedia and FIRE collections. Experimental results show that their coders achieve better performance (both compression and decompression) for those files which contain significant distribution of middle and large range integers.

2017 ◽  
Vol 73 (3) ◽  
pp. 509-527 ◽  
Author(s):  
Christiane Behnert ◽  
Dirk Lewandowski

Purpose The purpose of this paper is to demonstrate how to apply traditional information retrieval (IR) evaluation methods based on standards from the Text REtrieval Conference and web search evaluation to all types of modern library information systems (LISs) including online public access catalogues, discovery systems, and digital libraries that provide web search features to gather information from heterogeneous sources. Design/methodology/approach The authors apply conventional procedures from IR evaluation to the LIS context considering the specific characteristics of modern library materials. Findings The authors introduce a framework consisting of five parts: search queries, search results, assessors, testing, and data analysis. The authors show how to deal with comparability problems resulting from diverse document types, e.g., electronic articles vs printed monographs and what issues need to be considered for retrieval tests in the library context. Practical implications The framework can be used as a guideline for conducting retrieval effectiveness studies in the library context. Originality/value Although a considerable amount of research has been done on IR evaluation, and standards for conducting retrieval effectiveness studies do exist, to the authors’ knowledge this is the first attempt to provide a systematic framework for evaluating the retrieval effectiveness of twenty-first-century LISs. The authors demonstrate which issues must be considered and what decisions must be made by researchers prior to a retrieval test.


Author(s):  
Qiaozhu Mei ◽  
Dragomir Radev

This chapter is a basic introduction to text information retrieval. Information Retrieval (IR) refers to the activities of obtaining information resources (usually in the form of textual documents) from a much larger collection, which are relevant to an information need of the user (usually expressed as a query). Practical instances of an IR system include digital libraries and Web search engines. This chapter presents the typical architecture of an IR system, an overview of the methods corresponding to the design and the implementation of each major component of an information retrieval system, a discussion of evaluation methods for an IR system, and finally a summary of recent developments and research trends in the field of information retrieval.


Author(s):  
Iris Xie

The emergence of the Internet has allowed millions of people to use a variety of electronic information retrieval (IR) systems, such as digital libraries, Web search engines, online databases, and Online Public Access Catalogues (OPACs). The nature of IR is interaction. Interactive information retrieval is defined as the communication process between the users and the IR systems. However, the dynamics of interactive IR is not yet fully understood. Moreover, most of the existing IR systems do not support the full range of users’ interactions with IR systems. Instead, they only support one type of information-seeking strategy: how to specify queries by using terms to select relevant information. However, new digital environments require users to apply multiple information-seeking strategies and shift from one information- seeking strategy to another in the information retrieval process.


2011 ◽  
Vol 8 (3) ◽  
pp. 711-737 ◽  
Author(s):  
Peiquan Jin ◽  
Hong Chen ◽  
Xujian Zhao ◽  
Xiaowen Li ◽  
Lihua Yue

Temporal information plays important roles in Web search, as Web pages intrinsically involve crawled time and most Web pages contain time keywords in their content. How to integrate temporal information in Web search engines has been a research focus in recent years, among which some key issues such as temporal-textual indexing and temporal information extraction have to be first studied. In this paper, we first present a framework of temporal-textual Web search engine. And then, we concentrate on designing a new hybrid index structure for temporal and textual information of Web pages. In particular, we propose to integrate B+-tree, inverted file and a typical temporal index called MAP21-Tree, to handle temporal-textual queries. We study five mechanisms to implement a hybrid index structure for temporal-textual queries, which use different ways to organize the inverted file, B+-tree and MAP-21 tree. After a theoretic analysis on the performance of those five index structures, we conduct experiments on both simulated and real data sets to make performance comparison. The experimental results show that among all the index schemes the first-inverted-file-then-MAP21-tree index structure has the best query performance and thus is an acceptable choice to be the temporal-textual index for future time-aware search engines.


Author(s):  
Mathijs E. Fix ◽  
Dannis M. Brouwer ◽  
Ronald G. K. M. Aarts

Abstract Flexure based compliant mechanisms suited for a large range of motion can be designed by handling the challenges arising from combining low compliance in the desired directions, high support stiffness, low stresses and high unwanted natural frequencies. Current topology optimization tools typically can’t model large deflections of flexures, are too conceptual or are case specific. In this research, a new spatial topological synthesis algorithm based on building blocks is proposed to optimize the performance of an initial design. The algorithm consists of successive shape optimizations and layout syntheses. In each shape optimization the dimensions for some layout are optimized. The layout synthesis strategically replaces the most “critical” building block with a better option. To maximize the first unwanted natural frequency the replacement strategy depends the strain energy distribution of the accompanying mode shape. The algorithm is tested for the design of a 1-DOF flexure hinge. The obtained final layout agrees with results known from literature.


2020 ◽  
Vol 38 (4) ◽  
pp. 725-744
Author(s):  
Xiaojuan Zhang ◽  
Xixi Jiang ◽  
Jiewen Qin

Purpose The purpose of this study is to generate diversified results for temporally ambiguous queries and the candidate queries are ensured to have a high coverage of subtopics, which are derived from different temporal periods. Design/methodology/approach Two novel time-aware query suggestion diversification models are developed by integrating semantics and temporality information involved in queries into two state-of-the-art explicit diversification algorithms (i.e. IA-select and xQuaD), respectively, and then specifying the components on which these two models rely on. Most importantly, first explored is how to explicitly determine query subtopics for each unique query from the query log or clicked documents and then modeling the subtopics into query suggestion diversification. The discussion on how to mine temporal intent behind a query from query log is also followed. Finally, to verify the effectiveness of the proposal, experiments on a real-world query log are conducted. Findings Preliminary experiments demonstrate that the proposed method can significantly outperform the existing state-of-the-art methods in terms of producing the candidate query suggestion for temporally ambiguous queries. Originality/value This study reports the first attempt to generate query suggestions indicating diverse interested time points to the temporally ambiguous (input) queries. The research will be useful in enhancing users’ search experience through helping them to formulate accurate queries for their search tasks. In addition, the approaches investigated in the paper are general enough to be used in many domains; that is, experimental information retrieval systems, Web search engines, document archives and digital libraries.


2019 ◽  
Vol 37 (3) ◽  
pp. 401-418 ◽  
Author(s):  
Jingye Qu ◽  
Jiangping Chen

Purpose This paper aims to introduce the construction methods, image organization, collection use and access of benchmark image collections to the digital library (DL) community. It aims to connect two distinct communities: the DL community and image processing researchers so that future image collections could be better constructed, organized and managed for both human and computer use. Design/methodology/approach Image collections are first identified through an extensive literature review of published journal articles and a web search. Then, a coding scheme focusing on image collections’ creation, organization, access and use is developed. Next, three major benchmark image collections are analysed based on the proposed coding scheme. Finally, the characteristics of benchmark image collections are summarized and compared to DLs. Findings Although most of the image collections in DLs are carefully curated and organized using various metadata schema based on an image’s external features to facilitate human use, the benchmark image collections created for promoting image processing algorithms are annotated on an image’s content to the pixel level, which makes each image collection a more fine-grained, organized database appropriate for developing automatic techniques on classification summarization, visualization and content-based retrieval. Research limitations/implications This paper overviews image collections by their application fields. The three most representative natural image collections in general areas are analysed in detail based on a homemade coding scheme, which could be further extended. Also, domain-specific image collections, such as medical image collections or collections for scientific purposes, are not covered. Practical implications This paper helps DLs with image collections to understand how benchmark image collections used by current image processing research are created, organized and managed. It informs multiple parties pertinent to image collections to collaborate on building, sustaining, enriching and providing access to image collections. Originality/value This paper is the first attempt to review and summarize benchmark image collections for DL managers and developers. The collection creation process and image organization used in these benchmark image collections open a new perspective to digital librarians for their future DL collection development.


Paper The goal of search engines is to return accurate and complete results. Satisfying concrete user information needs becomes more and more difficult because of inability in it complete explicit specification and short comes of keyword-based searching and indexing. General search engines have indexed millions of web resources and often return thousands of results to the user query (most of them often inadequate). To increase result’s precession, users sometimes choose search engines, specialized in searching concrete domain, personalized or semantic search. A grand variety of specialized search engines may be found (and used) in the internet, but no one may guarantee finding of existing in the web and needed for the concrete user resources. In this paper we present our research on building a meta-search engine that uses domain and user profile ontologies, as well as information (or metadata), directly extracted from web sites to improve search result quality. We state main requirements to the search engine for students, PHD students and scientists, propose a conceptual model and discuss approaches of it practical realization. Our prototype metasearch engine first perform interactive semantic query refinement and then, using refined query, it automatically generate several search queries, sends them to different digital libraries and web search engines, augments and ranks returned results, using ontologically represented domain and user metadata. For testing our model, we develop domain ontologies in the electronic domain. We will use ontological terminology representation to propose recommendations for query disambiguation, and to ensure knowledge for reranking the returned results. We also present some partial initial implementations query disambiguation strategies and testing results.


Author(s):  
Ioannis N. Kouris ◽  
Christos Makris ◽  
Evangelos Theodoridis ◽  
Athanasios Tsakalidis

Information retrieval is the computational discipline that deals with the efficient representation, organization, and access to information objects that represent natural language texts (Baeza-Yates, & Ribeiro-Neto, 1999; Salton & McGill, 1983; Witten, Moûat, & Bell, 1999). A crucial subproblem in the information retrieval area is the design and implementation of efficient data structures and algorithms for indexing and searching information objects that are vaguely described. In this article, we are going to present the latest developments in the indexing area by giving special emphasis to: data structures and algorithmic techniques for string manipulation, space efficient implementations, and compression techniques for efficient storage of information objects. The aforementioned problems appear in a series of applications as digital libraries, molecular sequence databases (DNA sequences, protein databases [Gusûeld, 1997)], implementation of Web search engines, web mining and information filtering.


Sign in / Sign up

Export Citation Format

Share Document