Foundation of Keyword Search in XML

Advances in Data Mining and Database Management - Design, Performance, and Analysis of Innovative Information Retrieval ◽

10.4018/978-1-4666-1975-3.ch001 ◽

2013 ◽

pp. 1-16

Author(s):

Weidong Yang ◽

Hao Zhu

Keyword(s):

Common Ancestor ◽

Keyword Search ◽

Query Languages ◽

Search Algorithms ◽

The Other ◽

Inverted Index ◽

Xml Database ◽

Xml Keyword Search ◽

Lowest Common Ancestor ◽

Structured Information

It has become desirable to provide a way of keyword search for users to query structured information in an XML database (data-centric retrieval) by combining database and information retrieval techniques. Therefore, the key challenges of keyword search in the XML database are how to define appropriate result models meeting user’s search intents, how to search the results by using efficient algorithms, and how to ranking the results. In this chapter, on one hand, the authors present the foundational knowledge of XML keyword search such as XML data models, XML query languages, inverted index, and Dewey encoding. On the other hand, some existing typical researches of keyword search in XML are presented, including the results models such as Smallest Lowest Common Ancestor (SLCA), Exclusive Lowest Common Ancestor (ELCA), Meaningful Lowest Common Ancestor (MLCA), the related search algorithms, and the ranking approaches.

Download Full-text

Research and Implementation of XML Keyword Search Algorithm Based on Semantic Relatives

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.267.811 ◽

2011 ◽

Vol 267 ◽

pp. 811-815

Author(s):

Ming Yan Shen ◽

Xin Li ◽

Xiang Fu Meng

Keyword(s):

Common Ancestor ◽

Keyword Search ◽

Search Algorithm ◽

Search Method ◽

Document Structure ◽

Xml Documents ◽

Xml Keyword Search ◽

Lowest Common Ancestor ◽

Xml Document

The XML keyword search has been used widely in the application of XML documents. Most of the XML keyword search approaches are based on the LCA (lowest common ancestor) or its variants, which usually leads to the un-ideal recall and precision. This paper presents a novel XML keyword search method which based on semantic relatives. The method fully considers the semantic characteristics of the XML document structure. Based on the stack, the algorithm is also presented to merge the semantic relative nodes containing the keyword as the results of XML keyword search. The results of experiments have been identified the efficient and efficiency of our method.

Download Full-text

XKFitler

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2011010101 ◽

2011 ◽

Vol 1 (1) ◽

pp. 1-18 ◽

Cited By ~ 1

Author(s):

Weidong Yang ◽

Fei Fang ◽

Nan Li ◽

Zhongyu (Joan) Lu

Keyword(s):

Common Ancestor ◽

Keyword Search ◽

Stream Processing ◽

Query Languages ◽

Information Discovery ◽

Text Documents ◽

Filter System ◽

Lowest Common Ancestor ◽

Xml Stream ◽

User Friendly

Most existing XML stream processing systems adopt full structured query languages, such as XPath or XQuery, but they are difficult for ordinary users to learn and use. Keyword search is a user-friendly information discovery technique that has been extensively studied for text documents. This paper presents an XML stream filter system called XKFitler, which is the first system for supporting keyword search over XML stream. In XKFitler, the concepts of XLCA (eXclusive Lowest Common Ancestor) and XLCA Connecting Tree (XLCACT) are used to define the search semantic and results of keywords, and present an approach to filter XML stream according to keywords. The prototype XKFilter is implemented in the experiments.

Download Full-text

XKFilter

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch001 ◽

2013 ◽

pp. 1-18

Author(s):

Weidong Yang ◽

Fei Fang ◽

Nan Li ◽

Zhongyu (Joan) Lu

Keyword(s):

Common Ancestor ◽

Keyword Search ◽

Stream Processing ◽

Query Languages ◽

Information Discovery ◽

Text Documents ◽

Filter System ◽

Lowest Common Ancestor ◽

Xml Stream ◽

User Friendly

Most existing XML stream processing systems adopt full structured query languages, such as XPath or XQuery, but they are difficult for ordinary users to learn and use. Keyword search is a user-friendly information discovery technique that has been extensively studied for text documents. This paper presents an XML stream filter system called XKFilter, which is the first system for supporting keyword search over XML stream. In XKFilter, the concepts of XLCA (eXclusive Lowest Common Ancestor) and XLCA Connecting Tree (XLCACT) are used to define the search semantic and results of keywords, and present an approach to filter XML stream according to keywords. The prototype XKFilter is implemented in the experiments.

Download Full-text

XML Keyword Search Algorithm Based on Level-Traverse Encoding

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.263-266.1553 ◽

2012 ◽

Vol 263-266 ◽

pp. 1553-1558

Author(s):

Quan Zhu Yao ◽

Bing Tian ◽

Wang Yun He

Keyword(s):

Common Ancestor ◽

Keyword Search ◽

Search Algorithm ◽

Bottom Up ◽

Xml Documents ◽

Xml Keyword Search ◽

Lowest Common Ancestor ◽

Definition Of ◽

Labeling Method

For XML documents, existing keyword retrieval methods encode each node with Dewey encoding, comparing Dewey encodings part by part is necessary in LCA computation. When the depth of XML is large, lots of LCA computations will affect the performance of keyword search. In this paper we propose a novel labeling method called Level-TRaverse (LTR) encoding, combine with the definition of the result set based on Exclusive Lowest Common Ancestor (ELCA),design a query Bottom-Up Level Algorithm(BULA).The experiments demonstrate this method improves the efficiency and the veracity of XML keyword retrieval.

Download Full-text

An Efficient and Flexible Approach of Keyword Search in XML

Advances in Data Mining and Database Management - Design, Performance, and Analysis of Innovative Information Retrieval ◽

10.4018/978-1-4666-1975-3.ch002 ◽

2013 ◽

pp. 17-30

Author(s):

Weidong Yang ◽

Hao Zhu

Keyword(s):

Keyword Search ◽

Scoring Function ◽

Search Algorithms ◽

System Implementation ◽

Flexible Approach ◽

New Model ◽

Search Results ◽

Xml Keyword Search ◽

System Administrator

In this chapter, firstly, the LCA-based approaches for XML keyword search are analyzed and compared with each other. Several fundamental flaws of LCA-based models are explored, of which, the most important one is that the search results are eternally determined nonadjustable. Then, the chapter presents a system of adaptive keyword search in XML, called AdaptiveXKS, which employs a novel and flexible result model for avoiding these defects. Within the new model, a scoring function is presented to judge the quality of each result, and the considered metrics of evaluating results are weighted and can be updated as needed. Through the interface, the system administrator or the users can adjust some parameters according to their search intentions. One of three searching algorithms could also be chosen freely in order to catch specific querying requirements. Section 1 describes the Introduction and motivation. Section 2 defines the result model. In section 3 the scoring function is discussed deeply. Section 4 presents the system implementation and gives the detailed keyword search algorithms. Section 5 presents the experiments. Section 6 is the related work. Section 7 is the conclusion of this chapter.

Download Full-text

Keyword Search in XML Streams

Advances in Data Mining and Database Management - Design, Performance, and Analysis of Innovative Information Retrieval ◽

10.4018/978-1-4666-1975-3.ch006 ◽

2013 ◽

pp. 73-89

Author(s):

Weidong Yang ◽

Hao Zhu

Keyword(s):

System Architecture ◽

Keyword Search ◽

Query Languages ◽

Filter System ◽

Section 8 ◽

Lowest Common Ancestor ◽

Xml Stream ◽

Keyword Searching ◽

Processing Techniques ◽

Xml Streams

Most existing XML stream processing techniques adopt full structured query languages such as XPath or XQuery, which are difficult for ordinary users to learn and use. This chapter presents an XML stream filter system called XKFitler, which uses keyword to filter XML streams. In XKFitler, we use the concepts of XLCA (eXclusive Lowest Common Ancestor) and XLCA Connecting Tree (XLCACT) to define the search semantic and results of keywords, and present an approach to filter XML stream according to keywords. In section 1, the background of keyword search in XML streams is introduced. Section 2 explains the searching results. In section 3, a stack-based keyword searching algorithm for XML stream filtering without schemas is presented in-depth. Section 4 presents a keyword search over XML streams by using schema information. The system architecture of XKFilter is described in section 5. Section 6 is the experiments to show the performance. Section 7 discusses the related work. Section 8 is the summaries of this chapter.

Download Full-text

Review on Keyword Search and Ranking Techniques for Semi-Structured Data

Knowledge-Intensive Economies and Opportunities for Social, Organizational, and Technological Growth - Advances in Knowledge Acquisition, Transfer, and Management ◽

10.4018/978-1-5225-7347-0.ch013 ◽

2019 ◽

pp. 248-270

Author(s):

Dayananda P. ◽

Sowmyarani C. N.

Keyword(s):

Data Storage ◽

Common Ancestor ◽

Keyword Search ◽

Structured Data ◽

Important Task ◽

Xml Data ◽

Lowest Common Ancestor ◽

Storage Hierarchy

The size of semi-structured data is increasing continuously. Handling semi-structured data efficiently is a challenging task. Keyword search is an important task, and required information can be retrieved without having knowledge of data storage hierarchy. There are several challenges in handling XML data. This chapter discusses various challenges in terms of lowest common ancestor (LCA) semantics, processing of queries efficiently, retrieving top-k results for user needed data. The existing approach is defined under many classes based on how the problem and solution are tackled. Analysis of keyword search and ranking techniques for retrieving desired information are discussed in detail.

Download Full-text

Efficient XML Keyword Search Using H-Reduction Factor and Interactive Algorithm

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v9i12.2966 ◽

2014 ◽

Vol 9 (12) ◽

pp. 2022 ◽

Cited By ~ 3

Author(s):

A. Mary Posonia ◽

V. L. Jyothi

Keyword(s):

Keyword Search ◽

Reduction Factor ◽

Xml Keyword Search ◽

Interactive Algorithm

Download Full-text

XML keyword search algorithm based on smallest lowest entity sub-tree interrelated

Journal of Computer Applications ◽

10.3724/sp.j.1087.2012.01090 ◽

2013 ◽

Vol 32 (4) ◽

pp. 1090-1093

Author(s):

Quan-zhu YAO ◽

Xun-bin YU

Keyword(s):

Keyword Search ◽

Search Algorithm ◽

Xml Keyword Search

Download Full-text

Determination of the Number of Conserved Chromosomal Segments Between Species

Genetics ◽

10.1093/genetics/157.3.1387 ◽

2001 ◽

Vol 157 (3) ◽

pp. 1387-1395 ◽

Cited By ~ 2

Author(s):

Sudhir Kumar ◽

Sudhindra R Gadagkar ◽

Alan Filipski ◽

Xun Gu

Keyword(s):

Statistical Approach ◽

Common Ancestor ◽

Chromosomal Rearrangements ◽

Structural Similarity ◽

The Other ◽

Segment Length ◽

Human Genomes ◽

Genomic Divergence ◽

Human And Mouse

AbstractGenomic divergence between species can be quantified in terms of the number of chromosomal rearrangements that have occurred in the respective genomes following their divergence from a common ancestor. These rearrangements disrupt the structural similarity between genomes, with each rearrangement producing additional, albeit shorter, conserved segments. Here we propose a simple statistical approach on the basis of the distribution of the number of markers in contiguous sets of autosomal markers (CSAMs) to estimate the number of conserved segments. CSAM identification requires information on the relative locations of orthologous markers in one genome and only the chromosome number on which each marker resides in the other genome. We propose a simple mathematical model that can account for the effect of the nonuniformity of the breakpoints and markers on the observed distribution of the number of markers in different conserved segments. Computer simulations show that the number of CSAMs increases linearly with the number of chromosomal rearrangements under a variety of conditions. Using the CSAM approach, the estimate of the number of conserved segments between human and mouse genomes is 529 ± 84, with a mean conserved segment length of 2.8 cM. This length is <40% of that currently accepted for human and mouse genomes. This means that the mouse and human genomes have diverged at a rate of ∼1.15 rearrangements per million years. By contrast, mouse and rat are diverging at a rate of only ∼0.74 rearrangements per million years.

Download Full-text