Optimal Query Generation for Hidden Web Extraction through Response Analysis

2014 ◽  
Vol 4 (2) ◽  
pp. 1-18
Author(s):  
Sonali Gupta ◽  
Komal Kumar Bhatia

A huge number of Hidden Web databases exists over the WWW forming a massive source of high quality information. Retrieval of this information for enriching the repository of the search engine is the prime target of a Hidden web crawler. Besides this, the crawler should perform this task at an affordable cost and resource utilization. This paper proposes a Random ranking mechanism whereby the queries to be raised by the hidden web crawler have been ranked. By ranking the queries according to the proposed mechanism, the Hidden Web crawler is able to make an optimal choice among the candidate queries and efficiently retrieve the Hidden web databases. The Hidden Web crawler proposed here also possesses an extensible and scalable framework to improve the efficiency of crawling. The proposed approach has also been compared with other methods of Hidden Web crawling existing in the literature.

The Dark Web ◽  
2018 ◽  
pp. 65-83
Author(s):  
Sonali Gupta ◽  
Komal Kumar Bhatia

A huge number of Hidden Web databases exists over the WWW forming a massive source of high quality information. Retrieval of this information for enriching the repository of the search engine is the prime target of a Hidden web crawler. Besides this, the crawler should perform this task at an affordable cost and resource utilization. This paper proposes a Random ranking mechanism whereby the queries to be raised by the hidden web crawler have been ranked. By ranking the queries according to the proposed mechanism, the Hidden Web crawler is able to make an optimal choice among the candidate queries and efficiently retrieve the Hidden web databases. The Hidden Web crawler proposed here also possesses an extensible and scalable framework to improve the efficiency of crawling. The proposed approach has also been compared with other methods of Hidden Web crawling existing in the literature.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

The WWW contains huge amount of information from different areas. This information may be present virtually in the form of web pages, media, articles (research journals / magazine), blogs etc. A major portion of the information is present in web databases that can be retrieved by raising queries at the interface offered by the specific database and is thus called the Hidden Web. An important issue is to efficiently retrieve and provide access to this enormous amount of information through crawling. In this paper, we present the architecture of a parallel crawler for the Hidden Web that avoids download overlaps by following a domain-specific approach. The experimental results further show that the proposed parallel Hidden web crawler (PSHWC), not only effectively but also efficiently extracts and download the contents in the Hidden web databases


2011 ◽  
Vol 15 (4) ◽  
pp. 45-48
Author(s):  
Anuradha, A.K.Sharma m ◽  
A.K. Sharma

IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 117582-117592
Author(s):  
Sawroop Kaur ◽  
G. Geetha
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document