Review on Keyword Search and Ranking Techniques for Semi-Structured Data

Author(s):  
Dayananda P. ◽  
Sowmyarani C. N.

The size of semi-structured data is increasing continuously. Handling semi-structured data efficiently is a challenging task. Keyword search is an important task, and required information can be retrieved without having knowledge of data storage hierarchy. There are several challenges in handling XML data. This chapter discusses various challenges in terms of lowest common ancestor (LCA) semantics, processing of queries efficiently, retrieving top-k results for user needed data. The existing approach is defined under many classes based on how the problem and solution are tackled. Analysis of keyword search and ranking techniques for retrieving desired information are discussed in detail.

2011 ◽  
Vol 267 ◽  
pp. 811-815
Author(s):  
Ming Yan Shen ◽  
Xin Li ◽  
Xiang Fu Meng

The XML keyword search has been used widely in the application of XML documents. Most of the XML keyword search approaches are based on the LCA (lowest common ancestor) or its variants, which usually leads to the un-ideal recall and precision. This paper presents a novel XML keyword search method which based on semantic relatives. The method fully considers the semantic characteristics of the XML document structure. Based on the stack, the algorithm is also presented to merge the semantic relative nodes containing the keyword as the results of XML keyword search. The results of experiments have been identified the efficient and efficiency of our method.


2011 ◽  
Vol 1 (1) ◽  
pp. 1-18 ◽  
Author(s):  
Weidong Yang ◽  
Fei Fang ◽  
Nan Li ◽  
Zhongyu (Joan) Lu

Most existing XML stream processing systems adopt full structured query languages, such as XPath or XQuery, but they are difficult for ordinary users to learn and use. Keyword search is a user-friendly information discovery technique that has been extensively studied for text documents. This paper presents an XML stream filter system called XKFitler, which is the first system for supporting keyword search over XML stream. In XKFitler, the concepts of XLCA (eXclusive Lowest Common Ancestor) and XLCA Connecting Tree (XLCACT) are used to define the search semantic and results of keywords, and present an approach to filter XML stream according to keywords. The prototype XKFilter is implemented in the experiments.


Author(s):  
Weidong Yang ◽  
Hao Zhu

It has become desirable to provide a way of keyword search for users to query structured information in an XML database (data-centric retrieval) by combining database and information retrieval techniques. Therefore, the key challenges of keyword search in the XML database are how to define appropriate result models meeting user’s search intents, how to search the results by using efficient algorithms, and how to ranking the results. In this chapter, on one hand, the authors present the foundational knowledge of XML keyword search such as XML data models, XML query languages, inverted index, and Dewey encoding. On the other hand, some existing typical researches of keyword search in XML are presented, including the results models such as Smallest Lowest Common Ancestor (SLCA), Exclusive Lowest Common Ancestor (ELCA), Meaningful Lowest Common Ancestor (MLCA), the related search algorithms, and the ranking approaches.


Author(s):  
Weidong Yang ◽  
Fei Fang ◽  
Nan Li ◽  
Zhongyu (Joan) Lu

Most existing XML stream processing systems adopt full structured query languages, such as XPath or XQuery, but they are difficult for ordinary users to learn and use. Keyword search is a user-friendly information discovery technique that has been extensively studied for text documents. This paper presents an XML stream filter system called XKFilter, which is the first system for supporting keyword search over XML stream. In XKFilter, the concepts of XLCA (eXclusive Lowest Common Ancestor) and XLCA Connecting Tree (XLCACT) are used to define the search semantic and results of keywords, and present an approach to filter XML stream according to keywords. The prototype XKFilter is implemented in the experiments.


2012 ◽  
Vol 263-266 ◽  
pp. 1553-1558
Author(s):  
Quan Zhu Yao ◽  
Bing Tian ◽  
Wang Yun He

For XML documents, existing keyword retrieval methods encode each node with Dewey encoding, comparing Dewey encodings part by part is necessary in LCA computation. When the depth of XML is large, lots of LCA computations will affect the performance of keyword search. In this paper we propose a novel labeling method called Level-TRaverse (LTR) encoding, combine with the definition of the result set based on Exclusive Lowest Common Ancestor (ELCA),design a query Bottom-Up Level Algorithm(BULA).The experiments demonstrate this method improves the efficiency and the veracity of XML keyword retrieval.


2011 ◽  
Vol 23 (12) ◽  
pp. 1761-1762
Author(s):  
Surajit Chaudhuri ◽  
Yi Chen ◽  
Jeffrey Xu Yu

2014 ◽  
Vol 288 ◽  
pp. 135-152 ◽  
Author(s):  
Jaime I. Lopez-Veyna ◽  
Victor J. Sosa-Sosa ◽  
Ivan Lopez-Arevalo

2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Yue Zhao ◽  
Ye Yuan ◽  
Guoren Wang

This paper describes a keyword search measure on probabilistic XML data based on ELM (extreme learning machine). We use this method to carry out keyword search on probabilistic XML data. A probabilistic XML document differs from a traditional XML document to realize keyword search in the consideration of possible world semantics. A probabilistic XML document can be seen as a set of nodes consisting of ordinary nodes and distributional nodes. ELM has good performance in text classification applications. As the typical semistructured data; the label of XML data possesses the function of definition itself. Label and context of the node can be seen as the text data of this node. ELM offers significant advantages such as fast learning speed, ease of implementation, and effective node classification. Set intersection can compute SLCA quickly in the node sets which is classified by using ELM. In this paper, we adopt ELM to classify nodes and compute probability. We propose two algorithms that are based on ELM and probability threshold to improve the overall performance. The experimental results verify the benefits of our methods according to various evaluation metrics.


10.37236/409 ◽  
2010 ◽  
Vol 17 (1) ◽  
Author(s):  
Markus Kuba ◽  
Stephan Wagner

By a theorem of Dobrow and Smythe, the depth of the $k$th node in very simple families of increasing trees (which includes, among others, binary increasing trees, recursive trees and plane ordered recursive trees) follows the same distribution as the number of edges of the form $j-(j+1)$ with $j < k$. In this short note, we present a simple bijective proof of this fact, which also shows that the result actually holds within a wider class of increasing trees. We also discuss some related results that follow from the bijection as well as a possible generalization. Finally, we use another similar bijection to determine the distribution of the depth of the lowest common ancestor of two nodes.


Sign in / Sign up

Export Citation Format

Share Document