A Data Type-Driven Property Alignment Framework for Product Duplicate Detection on the Web

Author(s):  
Gijs van Rooij ◽  
Ravi Sewnarain ◽  
Martin Skogholt ◽  
Tim van der Zaan ◽  
Flavius Frasincar ◽  
...  
Author(s):  
Anil Ahlawat ◽  
Kalpna Sagar

Introduction: The need for efficient search engines has been identified with the ever-increasing technological advancement and huge growing demand of data on the web. Method: Automating duplicate detection over query results in identifying the records from multiple web databases that point to the similar real-world entity and returns non-matching records to the end-users. The proposed algorithm in this paper is based upon an unsupervised approach with classifiers over heterogeneous web databases that return more accurate results with high precision, F-measure, and recall. Different assessments are also executed to analyze the efficacy of the proposed algorithm for identification of the duplicates. Result: Results show that the proposed algorithm has greater precision, F-score measure, and the same recall values as compared to standard UDD. Conclusion: This paper concludes that the proposed algorithm outperforms standard UDD. Discussion: This paper aims to introduce an algorithm that automates the process of duplicate detection for lexical heterogeneous web databases.


2008 ◽  
Vol 11 (2) ◽  
pp. 83-85
Author(s):  
Howard Wilson
Keyword(s):  

2005 ◽  
Vol 8 (1) ◽  
pp. 16-18
Author(s):  
Howard F. Wilson
Keyword(s):  

1999 ◽  
Vol 3 (2) ◽  
pp. 6-6
Author(s):  
Barbara Shadden
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document