XML SCHEMA MATCHING

Author(s):  
JIANGUO LU ◽  
JU WANG ◽  
SHENGRUI WANG

XML Schema matching problem can be formulated as follows: given two XML Schemas, find the best mapping between the elements and attributes of the schemas, and the overall similarity between them. XML Schema matching is an important problem in data integration, schema evolution, and software reuse. This paper describes a matching system that can find accurate matches and scales to large XML Schemas with hundreds of nodes. In our system, XML Schemas are modeled as labeled and unordered trees, and the schema matching problem is turned into a tree matching problem. We proposed Approximate Common Structures in trees, and developed a tree matching algorithm based on this concept. Compared with the traditional tree edit-distance algorithm and other schema matching systems, our algorithm is faster and more suitable for large XML Schema matching.

Author(s):  
Hongzhi Wang ◽  
Jianzhong Li ◽  
Fei Li

Similarity detection between large XML fragment sets is broadly used in many applications such as data integration and XML de-duplication. Extensive methods are used to find similar XML fragments, such as the pq-gram state-of-the-art method which allows for relatively high join quality and efficiency. In this chapter, we propose pq-hash as an improvement to pq-grams. As the base of pq-hash, a randomized data structure, pq-array, is developed. With pq-array, large trees are represented as small fixed sized arrays. To efficiently perform similarity join on XML fragment sets, in this chapter we propose a cluster-based partition strategy as well as a sort-merge & hash join strategy to avoid nested loop join. Both our theoretical analysis and experimental results confirm that, while retaining high join quality, pq-hash gains much higher efficiency than pq-grams, and our strategies for approximate join are effective.


2020 ◽  
Vol 18 (4) ◽  
pp. 31-50
Author(s):  
Vinay Vachharajani ◽  
Jyoti Pareek

The demand for higher education keeps on increasing. The invention of information technology and e-learning have, to a large extent, solved the problem of shortage of skilled and qualified teachers. But there is no guarantee that this will ensure the high quality of learning. In spite of large number of students, though the delivery of learning materials and tests to the students have become very easy by uploading the same on the web, assessment could be tedious. There is a need to develop tools and technologies for fully automated assessment. In this paper, an innovative algorithm has been proposed for matching structures of two use-case diagrams drawn by a student and an expert respectively for automatic assessment of the same. Zhang and Shasha's tree edit distance algorithm has been extended for assessing use-case diagrams. Results from 445 students' answers based on 14 different scenarios are analyzed to evaluate the performance of the proposed algorithm. No comparable study has been reported by any other diagram assessing algorithms in the research literature.


2011 ◽  
Vol 412 (4-5) ◽  
pp. 352-364 ◽  
Author(s):  
Tatsuya Akutsu ◽  
Daiji Fukagawa ◽  
Atsuhiro Takasu ◽  
Takeyuki Tamura

2014 ◽  
Vol 571-572 ◽  
pp. 575-579
Author(s):  
Hai Tao Ma ◽  
Chang Yong Yu ◽  
Chang Ming Xu ◽  
Miao Fang

We explored the subtree matching problem of probabilistic XML documents: finding the matches of an XML query tree over a probabilistic XML document, using the canonical tree edit distance as a similarity measure between subtrees. Probabilistic XML is a probability distribution model capturing uncertainty of both value and structure. Query over probabilistic XML documents is difficult: an naivie algorithm has exponential complexity by directly compute the tree edit distance between the query tree and each certain XML tree represented by the probabilistic XML document. Based on the method of tree edit distance computation over certain XML subtrees, we defined a minimum-solution to the edit distance computation, which means the minimum cost to translate the query tree to the probabilistic XML tree. Furthermore, we developed an algorithm---ASM (Algorithm of Subtree Matching) to compute the minimum solution. Finally, we proved the complexity of ASM is linear in the size of the probabilistic XML document.


2020 ◽  
Vol 16 (4) ◽  
pp. 1-22
Author(s):  
Karl Bringmann ◽  
Paweł Gawrychowski ◽  
Shay Mozes ◽  
Oren Weimann

2009 ◽  
Vol 6 (1) ◽  
pp. 1-19 ◽  
Author(s):  
Erik D. Demaine ◽  
Shay Mozes ◽  
Benjamin Rossman ◽  
Oren Weimann

2013 ◽  
Vol 48 ◽  
pp. 1-22 ◽  
Author(s):  
M. Alabbas ◽  
A. Ramsay

Many natural language processing (NLP) applications require the computation of similarities between pairs of syntactic or semantic trees. Many researchers have used tree edit distance for this task, but this technique suffers from the drawback that it deals with single node operations only. We have extended the standard tree edit distance algorithm to deal with subtree transformation operations as well as single nodes. The extended algorithm with subtree operations, TED+ST, is more effective and flexible than the standard algorithm, especially for applications that pay attention to relations among nodes (e.g. in linguistic trees, deleting a modifier subtree should be cheaper than the sum of deleting its components individually). We describe the use of TED+ST for checking entailment between two Arabic text snippets. The preliminary results of using TED+ST were encouraging when compared with two string-based approaches and with the standard algorithm.


Author(s):  
Joshua Amavi ◽  
Jacques Chabin ◽  
Mirian Halfeld-Ferrari ◽  
Pierre Réty
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document