Minimum tree edit distance between XML and Probabilistic XML documents

Author(s):  
Haitao Ma ◽  
Changming Xu ◽  
Miao Fang ◽  
Changyong Yu
2014 ◽  
Vol 571-572 ◽  
pp. 575-579
Author(s):  
Hai Tao Ma ◽  
Chang Yong Yu ◽  
Chang Ming Xu ◽  
Miao Fang

We explored the subtree matching problem of probabilistic XML documents: finding the matches of an XML query tree over a probabilistic XML document, using the canonical tree edit distance as a similarity measure between subtrees. Probabilistic XML is a probability distribution model capturing uncertainty of both value and structure. Query over probabilistic XML documents is difficult: an naivie algorithm has exponential complexity by directly compute the tree edit distance between the query tree and each certain XML tree represented by the probabilistic XML document. Based on the method of tree edit distance computation over certain XML subtrees, we defined a minimum-solution to the edit distance computation, which means the minimum cost to translate the query tree to the probabilistic XML tree. Furthermore, we developed an algorithm---ASM (Algorithm of Subtree Matching) to compute the minimum solution. Finally, we proved the complexity of ASM is linear in the size of the probabilistic XML document.


2020 ◽  
Vol 16 (4) ◽  
pp. 1-22
Author(s):  
Karl Bringmann ◽  
Paweł Gawrychowski ◽  
Shay Mozes ◽  
Oren Weimann

2009 ◽  
Vol 6 (1) ◽  
pp. 1-19 ◽  
Author(s):  
Erik D. Demaine ◽  
Shay Mozes ◽  
Benjamin Rossman ◽  
Oren Weimann

2013 ◽  
Vol 48 ◽  
pp. 1-22 ◽  
Author(s):  
M. Alabbas ◽  
A. Ramsay

Many natural language processing (NLP) applications require the computation of similarities between pairs of syntactic or semantic trees. Many researchers have used tree edit distance for this task, but this technique suffers from the drawback that it deals with single node operations only. We have extended the standard tree edit distance algorithm to deal with subtree transformation operations as well as single nodes. The extended algorithm with subtree operations, TED+ST, is more effective and flexible than the standard algorithm, especially for applications that pay attention to relations among nodes (e.g. in linguistic trees, deleting a modifier subtree should be cheaper than the sum of deleting its components individually). We describe the use of TED+ST for checking entailment between two Arabic text snippets. The preliminary results of using TED+ST were encouraging when compared with two string-based approaches and with the standard algorithm.


Sign in / Sign up

Export Citation Format

Share Document