Efficient Entity Resolution on XML Data Based on Entity-Describe-Attribute

2011 ◽  
Vol 34 (11) ◽  
pp. 2131-2141 ◽  
Author(s):  
Ya-Kun LI ◽  
Hong-Zhi WANG ◽  
Hong GAO ◽  
Jian-Zhong LI
Keyword(s):  

For the ability to represent data from a wide variety of sources, XML is rapidly emerging as the new standard for data representation and exchange on Web and e-government. To effectively use XML data in practice, entity resolution, which has been proven extremely useful in data fusion, inconsistency detection, and data repairing, must be in place to improve the quality of the XML data. In this chapter, the authors deal specifically with object identification on XML data, the application of which includes XML document management in highly dynamic applications like the Web and peer-to-peer systems, detection of duplicate elements in nested XML data, and finding similar identities among objects from multiple Web sources. The authors survey techniques of pairwise and groupwise entity resolution for XML data, which adopt structured information to describe the similarity or distance of XML data, like XML document and XML elements in document, and find the matching pairs which describe same object or classify them into separate groups, each group corresponding to the same object in real world. There are a lot of ways to describe the XML structure and content, such as a tree, Bayesian network, and set. The authors introduce some well-known algorithm base on these structures to solve matching XML data problems. Finally, the authors discuss directions for future research.


2014 ◽  
Vol 36 (8) ◽  
pp. 1714-1728
Author(s):  
Jun-Feng ZHOU ◽  
Bo WANG ◽  
Shan-Shan TIAN ◽  
Zi-Yang CHEN ◽  
Jing-Feng GUO
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document