Review on Keyword Search and Ranking Techniques for Semi-Structured Data

Knowledge-Intensive Economies and Opportunities for Social, Organizational, and Technological Growth - Advances in Knowledge Acquisition, Transfer, and Management ◽

10.4018/978-1-5225-7347-0.ch013 ◽

2019 ◽

pp. 248-270

Author(s):

Dayananda P. ◽

Sowmyarani C. N.

Keyword(s):

Data Storage ◽

Common Ancestor ◽

Keyword Search ◽

Structured Data ◽

Important Task ◽

Xml Data ◽

Lowest Common Ancestor ◽

Storage Hierarchy

The size of semi-structured data is increasing continuously. Handling semi-structured data efficiently is a challenging task. Keyword search is an important task, and required information can be retrieved without having knowledge of data storage hierarchy. There are several challenges in handling XML data. This chapter discusses various challenges in terms of lowest common ancestor (LCA) semantics, processing of queries efficiently, retrieving top-k results for user needed data. The existing approach is defined under many classes based on how the problem and solution are tackled. Analysis of keyword search and ranking techniques for retrieving desired information are discussed in detail.

Download Full-text

Research and Implementation of XML Keyword Search Algorithm Based on Semantic Relatives

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.267.811 ◽

2011 ◽

Vol 267 ◽

pp. 811-815

Author(s):

Ming Yan Shen ◽

Xin Li ◽

Xiang Fu Meng

Keyword(s):

Common Ancestor ◽

Keyword Search ◽

Search Algorithm ◽

Search Method ◽

Document Structure ◽

Xml Documents ◽

Xml Keyword Search ◽

Lowest Common Ancestor ◽

Xml Document

The XML keyword search has been used widely in the application of XML documents. Most of the XML keyword search approaches are based on the LCA (lowest common ancestor) or its variants, which usually leads to the un-ideal recall and precision. This paper presents a novel XML keyword search method which based on semantic relatives. The method fully considers the semantic characteristics of the XML document structure. Based on the stack, the algorithm is also presented to merge the semantic relative nodes containing the keyword as the results of XML keyword search. The results of experiments have been identified the efficient and efficiency of our method.

Download Full-text

XKFitler

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2011010101 ◽

2011 ◽

Vol 1 (1) ◽

pp. 1-18 ◽

Cited By ~ 1

Author(s):

Weidong Yang ◽

Fei Fang ◽

Nan Li ◽

Zhongyu (Joan) Lu

Keyword(s):

Common Ancestor ◽

Keyword Search ◽

Stream Processing ◽

Query Languages ◽

Information Discovery ◽

Text Documents ◽

Filter System ◽

Lowest Common Ancestor ◽

Xml Stream ◽

User Friendly

Most existing XML stream processing systems adopt full structured query languages, such as XPath or XQuery, but they are difficult for ordinary users to learn and use. Keyword search is a user-friendly information discovery technique that has been extensively studied for text documents. This paper presents an XML stream filter system called XKFitler, which is the first system for supporting keyword search over XML stream. In XKFitler, the concepts of XLCA (eXclusive Lowest Common Ancestor) and XLCA Connecting Tree (XLCACT) are used to define the search semantic and results of keywords, and present an approach to filter XML stream according to keywords. The prototype XKFilter is implemented in the experiments.

Download Full-text

Foundation of Keyword Search in XML

Advances in Data Mining and Database Management - Design, Performance, and Analysis of Innovative Information Retrieval ◽

10.4018/978-1-4666-1975-3.ch001 ◽

2013 ◽

pp. 1-16

Author(s):

Weidong Yang ◽

Hao Zhu

Keyword(s):

Common Ancestor ◽

Keyword Search ◽

Query Languages ◽

Search Algorithms ◽

The Other ◽

Inverted Index ◽

Xml Database ◽

Xml Keyword Search ◽

Lowest Common Ancestor ◽

Structured Information

It has become desirable to provide a way of keyword search for users to query structured information in an XML database (data-centric retrieval) by combining database and information retrieval techniques. Therefore, the key challenges of keyword search in the XML database are how to define appropriate result models meeting user’s search intents, how to search the results by using efficient algorithms, and how to ranking the results. In this chapter, on one hand, the authors present the foundational knowledge of XML keyword search such as XML data models, XML query languages, inverted index, and Dewey encoding. On the other hand, some existing typical researches of keyword search in XML are presented, including the results models such as Smallest Lowest Common Ancestor (SLCA), Exclusive Lowest Common Ancestor (ELCA), Meaningful Lowest Common Ancestor (MLCA), the related search algorithms, and the ranking approaches.

Download Full-text

XKFilter

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch001 ◽

2013 ◽

pp. 1-18

Author(s):

Weidong Yang ◽

Fei Fang ◽

Nan Li ◽

Zhongyu (Joan) Lu

Keyword(s):

Common Ancestor ◽

Keyword Search ◽

Stream Processing ◽

Query Languages ◽

Information Discovery ◽

Text Documents ◽

Filter System ◽

Lowest Common Ancestor ◽

Xml Stream ◽

User Friendly

Most existing XML stream processing systems adopt full structured query languages, such as XPath or XQuery, but they are difficult for ordinary users to learn and use. Keyword search is a user-friendly information discovery technique that has been extensively studied for text documents. This paper presents an XML stream filter system called XKFilter, which is the first system for supporting keyword search over XML stream. In XKFilter, the concepts of XLCA (eXclusive Lowest Common Ancestor) and XLCA Connecting Tree (XLCACT) are used to define the search semantic and results of keywords, and present an approach to filter XML stream according to keywords. The prototype XKFilter is implemented in the experiments.

Download Full-text

XML Keyword Search Algorithm Based on Level-Traverse Encoding

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.263-266.1553 ◽

2012 ◽

Vol 263-266 ◽

pp. 1553-1558

Author(s):

Quan Zhu Yao ◽

Bing Tian ◽

Wang Yun He

Keyword(s):

Common Ancestor ◽

Keyword Search ◽

Search Algorithm ◽

Bottom Up ◽

Xml Documents ◽

Xml Keyword Search ◽

Lowest Common Ancestor ◽

Definition Of ◽

Labeling Method

For XML documents, existing keyword retrieval methods encode each node with Dewey encoding, comparing Dewey encodings part by part is necessary in LCA computation. When the depth of XML is large, lots of LCA computations will affect the performance of keyword search. In this paper we propose a novel labeling method called Level-TRaverse (LTR) encoding, combine with the definition of the result set based on Exclusive Lowest Common Ancestor (ELCA),design a query Bottom-Up Level Algorithm(BULA).The experiments demonstrate this method improves the efficiency and the veracity of XML keyword retrieval.

Download Full-text

Guest Editors Introduction: Special Section on Keyword Search on Structured Data

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2011.216 ◽

2011 ◽

Vol 23 (12) ◽

pp. 1761-1762

Author(s):

Surajit Chaudhuri ◽

Yi Chen ◽

Jeffrey Xu Yu

Keyword(s):

Keyword Search ◽

Special Section ◽

Structured Data

Download Full-text

A low redundancy strategy for keyword search in structured and semi-structured data

Information Sciences ◽

10.1016/j.ins.2014.07.054 ◽

2014 ◽

Vol 288 ◽

pp. 135-152 ◽

Cited By ~ 3

Author(s):

Jaime I. Lopez-Veyna ◽

Victor J. Sosa-Sosa ◽

Ivan Lopez-Arevalo

Keyword(s):

Keyword Search ◽

Structured Data

Download Full-text

Keyword Search over Probabilistic XML Documents Based on Node Classification

Mathematical Problems in Engineering ◽

10.1155/2015/210961 ◽

2015 ◽

Vol 2015 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Yue Zhao ◽

Ye Yuan ◽

Guoren Wang

Keyword(s):

Keyword Search ◽

Possible World ◽

Xml Data ◽

Fast Learning ◽

Probabilistic Xml ◽

Learning Speed ◽

Xml Document ◽

Probability Threshold ◽

Node Classification ◽

Learning Machine

This paper describes a keyword search measure on probabilistic XML data based on ELM (extreme learning machine). We use this method to carry out keyword search on probabilistic XML data. A probabilistic XML document differs from a traditional XML document to realize keyword search in the consideration of possible world semantics. A probabilistic XML document can be seen as a set of nodes consisting of ordinary nodes and distributional nodes. ELM has good performance in text classification applications. As the typical semistructured data; the label of XML data possesses the function of definition itself. Label and context of the node can be seen as the text data of this node. ELM offers significant advantages such as fast learning speed, ease of implementation, and effective node classification. Set intersection can compute SLCA quickly in the node sets which is classified by using ELM. In this paper, we adopt ELM to classify nodes and compute probability. We propose two algorithms that are based on ELM and probability threshold to improve the overall performance. The experimental results verify the benefits of our methods according to various evaluation metrics.

Download Full-text

On the Distribution of Depths in Increasing Trees

The Electronic Journal of Combinatorics ◽

10.37236/409 ◽

2010 ◽

Vol 17 (1) ◽

Author(s):

Markus Kuba ◽

Stephan Wagner

Keyword(s):

Common Ancestor ◽

Short Note ◽

Bijective Proof ◽

Lowest Common Ancestor ◽

Recursive Trees

By a theorem of Dobrow and Smythe, the depth of the $k$th node in very simple families of increasing trees (which includes, among others, binary increasing trees, recursive trees and plane ordered recursive trees) follows the same distribution as the number of edges of the form $j-(j+1)$ with $j < k$. In this short note, we present a simple bijective proof of this fact, which also shows that the result actually holds within a wider class of increasing trees. We also discuss some related results that follow from the bijection as well as a possible generalization. Finally, we use another similar bijection to determine the distribution of the depth of the lowest common ancestor of two nodes.

Download Full-text

Self-Verifiable Attribute-Based Keyword Search Scheme for Distributed Data Storage in Fog Computing with Fast Decryption

IEEE Transactions on Network and Service Management ◽

10.1109/tnsm.2021.3123475 ◽

2021 ◽

pp. 1-1

Author(s):

Ke Gu ◽

Wen Bin Zhang ◽

Xiong Li ◽

Wei Jia Jia

Keyword(s):

Data Storage ◽

Keyword Search ◽

Fog Computing ◽

Distributed Data ◽

Distributed Data Storage

Download Full-text