XML SCHEMA MATCHING

XML Schema matching problem can be formulated as follows: given two XML Schemas, find the best mapping between the elements and attributes of the schemas, and the overall similarity between them. XML Schema matching is an important problem in data integration, schema evolution, and software reuse. This paper describes a matching system that can find accurate matches and scales to large XML Schemas with hundreds of nodes. In our system, XML Schemas are modeled as labeled and unordered trees, and the schema matching problem is turned into a tree matching problem. We proposed Approximate Common Structures in trees, and developed a tree matching algorithm based on this concept. Compared with the traditional tree edit-distance algorithm and other schema matching systems, our algorithm is faster and more suitable for large XML Schema matching.

Download Full-text

Efficient Identification of Similar XML Fragments Based on Tree Edit Distance

Advances in Data Mining and Database Management - XML Data Mining ◽

10.4018/978-1-61350-356-0.ch004 ◽

2011 ◽

pp. 78-97

Author(s):

Hongzhi Wang ◽

Jianzhong Li ◽

Fei Li

Keyword(s):

Data Structure ◽

Theoretical Analysis ◽

Data Integration ◽

Edit Distance ◽

Experimental Results ◽

Tree Edit Distance ◽

Similarity Join ◽

Similarity Detection ◽

Large Trees ◽

Nested Loop

Similarity detection between large XML fragment sets is broadly used in many applications such as data integration and XML de-duplication. Extensive methods are used to find similar XML fragments, such as the pq-gram state-of-the-art method which allows for relatively high join quality and efficiency. In this chapter, we propose pq-hash as an improvement to pq-grams. As the base of pq-hash, a randomized data structure, pq-array, is developed. With pq-array, large trees are represented as small fixed sized arrays. To efficiently perform similarity join on XML fragment sets, in this chapter we propose a cluster-based partition strategy as well as a sort-merge & hash join strategy to avoid nested loop join. Both our theoretical analysis and experimental results confirm that, while retaining high join quality, pq-hash gains much higher efficiency than pq-grams, and our strategies for approximate join are effective.

Download Full-text

Effective Structure Matching Algorithm for Automatic Assessment of Use-Case Diagram

International Journal of Distance Education Technologies ◽

10.4018/ijdet.2020100103 ◽

2020 ◽

Vol 18 (4) ◽

pp. 31-50

Author(s):

Vinay Vachharajani ◽

Jyoti Pareek

Keyword(s):

Edit Distance ◽

Research Literature ◽

Use Case ◽

Automatic Assessment ◽

Tree Edit Distance ◽

Matching Algorithm ◽

Qualified Teachers ◽

E Learning ◽

Use Case Diagram

The demand for higher education keeps on increasing. The invention of information technology and e-learning have, to a large extent, solved the problem of shortage of skilled and qualified teachers. But there is no guarantee that this will ensure the high quality of learning. In spite of large number of students, though the delivery of learning materials and tests to the students have become very easy by uploading the same on the web, assessment could be tedious. There is a need to develop tools and technologies for fully automated assessment. In this paper, an innovative algorithm has been proposed for matching structures of two use-case diagrams drawn by a student and an expert respectively for automatic assessment of the same. Zhang and Shasha's tree edit distance algorithm has been extended for assessing use-case diagrams. Results from 445 students' answers based on 14 different scenarios are analyzed to evaluate the performance of the proposed algorithm. No comparable study has been reported by any other diagram assessing algorithms in the research literature.

Download Full-text

Fast Computation of the Tree Edit Distance between Unordered Trees Using IP Solvers

Discovery Science - Lecture Notes in Computer Science ◽

10.1007/978-3-319-11812-3_14 ◽

2014 ◽

pp. 156-167 ◽

Cited By ~ 1

Author(s):

Seiichi Kondo ◽

Keisuke Otaki ◽

Madori Ikeda ◽

Akihiro Yamamoto

Keyword(s):

Edit Distance ◽

Fast Computation ◽

Tree Edit Distance ◽

Unordered Trees

Download Full-text

Formalizing the XML Schema Matching Problem as a Constraint Optimization Problem

Lecture Notes in Computer Science - Database and Expert Systems Applications ◽

10.1007/11546924_33 ◽

2005 ◽

pp. 333-342 ◽

Cited By ~ 6

Author(s):

Marko Smiljanić ◽

Maurice van Keulen ◽

Willem Jonker

Keyword(s):

Optimization Problem ◽

Xml Schema ◽

Schema Matching ◽

Constraint Optimization ◽

Matching Problem

Download Full-text

Exact algorithms for computing the tree edit distance between unordered trees

Theoretical Computer Science ◽

10.1016/j.tcs.2010.10.002 ◽

2011 ◽

Vol 412 (4-5) ◽

pp. 352-364 ◽

Cited By ~ 15

Author(s):

Tatsuya Akutsu ◽

Daiji Fukagawa ◽

Atsuhiro Takasu ◽

Takeyuki Tamura

Keyword(s):

Edit Distance ◽

Exact Algorithms ◽

Tree Edit Distance ◽

Unordered Trees

Download Full-text

Efficiently Subtree Matching between XML and Probabilistic XML Documents

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.575 ◽

2014 ◽

Vol 571-572 ◽

pp. 575-579

Author(s):

Hai Tao Ma ◽

Chang Yong Yu ◽

Chang Ming Xu ◽

Miao Fang

Keyword(s):

Edit Distance ◽

Distribution Model ◽

Tree Edit Distance ◽

Matching Problem ◽

Distance Computation ◽

Xml Documents ◽

Probabilistic Xml ◽

Xml Document ◽

Query Tree ◽

Minimum Solution

We explored the subtree matching problem of probabilistic XML documents: finding the matches of an XML query tree over a probabilistic XML document, using the canonical tree edit distance as a similarity measure between subtrees. Probabilistic XML is a probability distribution model capturing uncertainty of both value and structure. Query over probabilistic XML documents is difficult: an naivie algorithm has exponential complexity by directly compute the tree edit distance between the query tree and each certain XML tree represented by the probabilistic XML document. Based on the method of tree edit distance computation over certain XML subtrees, we defined a minimum-solution to the edit distance computation, which means the minimum cost to translate the query tree to the probabilistic XML tree. Furthermore, we developed an algorithm---ASM (Algorithm of Subtree Matching) to compute the minimum solution. Finally, we proved the complexity of ASM is linear in the size of the probabilistic XML document.

Download Full-text

Tree Edit Distance Cannot be Computed in Strongly Subcubic Time (Unless APSP Can)

ACM Transactions on Algorithms ◽

10.1145/3381878 ◽

2020 ◽

Vol 16 (4) ◽

pp. 1-22

Author(s):

Karl Bringmann ◽

Paweł Gawrychowski ◽

Shay Mozes ◽

Oren Weimann

Keyword(s):

Edit Distance ◽

Tree Edit Distance

Download Full-text

An optimal decomposition algorithm for tree edit distance

ACM Transactions on Algorithms ◽

10.1145/1644015.1644017 ◽

2009 ◽

Vol 6 (1) ◽

pp. 1-19 ◽

Cited By ~ 64

Author(s):

Erik D. Demaine ◽

Shay Mozes ◽

Benjamin Rossman ◽

Oren Weimann

Keyword(s):

Edit Distance ◽

Decomposition Algorithm ◽

Tree Edit Distance

Download Full-text

Natural Language Inference for Arabic Using Extended Tree Edit Distance with Subtrees

Journal of Artificial Intelligence Research ◽

10.1613/jair.3892 ◽

2013 ◽

Vol 48 ◽

pp. 1-22 ◽

Cited By ~ 10

Author(s):

M. Alabbas ◽

A. Ramsay

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Edit Distance ◽

Arabic Text ◽

Tree Edit Distance ◽

Single Node ◽

Preliminary Results ◽

Standard Algorithm ◽

Standard Tree

Many natural language processing (NLP) applications require the computation of similarities between pairs of syntactic or semantic trees. Many researchers have used tree edit distance for this task, but this technique suffers from the drawback that it deals with single node operations only. We have extended the standard tree edit distance algorithm to deal with subtree transformation operations as well as single nodes. The extended algorithm with subtree operations, TED+ST, is more effective and flexible than the standard algorithm, especially for applications that pay attention to relations among nodes (e.g. in linguistic trees, deleting a modifier subtree should be cheaper than the sum of deleting its components individually). We describe the use of TED+ST for checking entailment between two Arabic text snippets. The preliminary results of using TED+ST were encouraging when compared with two string-based approaches and with the standard algorithm.

Download Full-text

A ToolBox for Conservative XML Schema Evolution and Document Adaptation

Lecture Notes in Computer Science - Database and Expert Systems Applications ◽

10.1007/978-3-319-10073-9_24 ◽

2014 ◽

pp. 299-307 ◽

Cited By ~ 6

Author(s):

Joshua Amavi ◽

Jacques Chabin ◽

Mirian Halfeld-Ferrari ◽

Pierre Réty

Keyword(s):

Xml Schema ◽

Schema Evolution

Download Full-text