Odess: Speeding up Resemblance Detection for Redundancy Elimination by Fast Content-Defined Sampling

Author(s):  
Xiangyu Zou ◽  
Cai Deng ◽  
Wen Xia ◽  
Philip Shilane ◽  
Haoliang Tan ◽  
...  
1994 ◽  
Vol 29 (6) ◽  
pp. 159-170 ◽  
Author(s):  
Preston Briggs ◽  
Keith D. Cooper

2016 ◽  
Vol 51 (8) ◽  
pp. 1-13 ◽  
Author(s):  
Lei Wang ◽  
Fan Yang ◽  
Liangji Zhuang ◽  
Huimin Cui ◽  
Fang Lv ◽  
...  

2021 ◽  
Vol 129 ◽  
pp. 103460
Author(s):  
Carmen González-Lluch ◽  
Raquel Plumed ◽  
David Pérez-López ◽  
Pedro Company ◽  
Manuel Contero ◽  
...  

2017 ◽  
Vol 4 (1) ◽  
pp. 95-110 ◽  
Author(s):  
Deepika Punj ◽  
Ashutosh Dixit

In order to manage the vast information available on web, crawler plays a significant role. The working of crawler should be optimized to get maximum and unique information from the World Wide Web. In this paper, architecture of migrating crawler is proposed which is based on URL ordering, URL scheduling and document redundancy elimination mechanism. The proposed ordering technique is based on URL structure, which plays a crucial role in utilizing the web efficiently. Scheduling ensures that URLs should go to optimum agent for downloading. To ensure this, characteristics of both agents and URLs are taken into consideration for scheduling. Duplicate documents are also removed to make the database unique. To reduce matching time, document matching is made on the basis of their Meta information only. The agents of proposed migrating crawler work more efficiently than traditional single crawler by providing ordering and scheduling of URLs.


2017 ◽  
Vol 12 (03) ◽  
pp. 72-79
Author(s):  
Prof. Dola Sanjay S. ◽  
G. Pallavi ◽  
B. Tarun Kumar ◽  
M. Tejaswini ◽  
M. Ratna Kireeti

Sign in / Sign up

Export Citation Format

Share Document