scholarly journals Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval

Author(s):  
Xueguang Ma ◽  
Minghan Li ◽  
Kai Sun ◽  
Ji Xin ◽  
Jimmy Lin
1994 ◽  
Vol 29 (6) ◽  
pp. 159-170 ◽  
Author(s):  
Preston Briggs ◽  
Keith D. Cooper

2016 ◽  
Vol 51 (8) ◽  
pp. 1-13 ◽  
Author(s):  
Lei Wang ◽  
Fan Yang ◽  
Liangji Zhuang ◽  
Huimin Cui ◽  
Fang Lv ◽  
...  

2021 ◽  
Vol 129 ◽  
pp. 103460
Author(s):  
Carmen González-Lluch ◽  
Raquel Plumed ◽  
David Pérez-López ◽  
Pedro Company ◽  
Manuel Contero ◽  
...  

2017 ◽  
Vol 4 (1) ◽  
pp. 95-110 ◽  
Author(s):  
Deepika Punj ◽  
Ashutosh Dixit

In order to manage the vast information available on web, crawler plays a significant role. The working of crawler should be optimized to get maximum and unique information from the World Wide Web. In this paper, architecture of migrating crawler is proposed which is based on URL ordering, URL scheduling and document redundancy elimination mechanism. The proposed ordering technique is based on URL structure, which plays a crucial role in utilizing the web efficiently. Scheduling ensures that URLs should go to optimum agent for downloading. To ensure this, characteristics of both agents and URLs are taken into consideration for scheduling. Duplicate documents are also removed to make the database unique. To reduce matching time, document matching is made on the basis of their Meta information only. The agents of proposed migrating crawler work more efficiently than traditional single crawler by providing ordering and scheduling of URLs.


Sign in / Sign up

Export Citation Format

Share Document