Efficient Evaluation of Partial Match Queries for XML Documents Using Information Retrieval Techniques

Author(s):  
Young-Ho Park ◽  
Kyu-Young Whang ◽  
Byung Suk Lee ◽  
Wook-Shin Han
Author(s):  
Daniela Morais Fonte ◽  
Daniela da Cruz ◽  
Pedro Rangel Henriques ◽  
Alda Lopes Gancarski

XML is a widely used general-purpose annotation formalism for creating custom markup languages. XML annotations give structure to plain documents to interpret their content. To extract information from XML documents XPath and XQuery languages can be used. However, the learning of these dialects requires a considerable effort. In this context, the traditional Query-By-Example methodology (for Relational Databases) can be an important contribution to leverage this learning process, freeing the user from knowing the specific query language details or even the document structure. This chapter describes how to apply the Query-By-Example concept in a Web-application for information retrieval from XML documents, the GuessXQ system. This engine is capable of deducing, from an example, the respective XQuery statement. The example consists of marking the desired components directly on a sample document, picked-up from a collection. After inferring the corresponding query, GuessXQ applies it to the collection to obtain the desired result.


Author(s):  
Thomas Mandl

In the 1960s, automatic indexing methods for texts were developed. They had already implemented the “bag-ofwords” approach, which still prevails. Although automatic indexing is widely used today, many information providers and even Internet services still rely on human information work. In the 1970s, research shifted its interest to partial-match retrieval models and proved their superiority over Boolean retrieval models. Vector-space and later probabilistic retrieval models were developed. However, it took until the 1990s for partial-match models to succeed in the market. The Internet played a great role in this success. All Web search engines were based on partial-match models and provided ranked lists as results rather than unordered sets of documents. Consumers got used to this kind of search systems, and all big search engines included partial-match functionality. However, there are many niches in which Boolean methods still dominate, for example, patent retrieval. The basis for information retrieval systems may be pictures, graphics, videos, music objects, structured documents, or combinations thereof. This article is mainly concerned with information retrieval for text documents.


Author(s):  
Kenji Hatano ◽  
Hiroko Kinutani ◽  
Masatoshi Yoshikawa ◽  
Shunsuke Uemura

2006 ◽  
Vol 79 (2) ◽  
pp. 180-190 ◽  
Author(s):  
Young-Ho Park ◽  
Kyu-Young Whang ◽  
Byung Suk Lee ◽  
Wook-Shin Han

2009 ◽  
Author(s):  
Bozhidar Georgiev ◽  
Adriana Georgieva ◽  
George Venkov ◽  
Ralitza Kovacheva ◽  
Vesela Pasheva

Sign in / Sign up

Export Citation Format

Share Document