AKSHR: A novel framework for a Domain-specific Hidden Web Crawler

Author(s):  
Komal Kumar Bhatia ◽  
A.K. Sharma ◽  
Rosy Madaan
Author(s):  
Rosy Madaan ◽  
Ashutosh Dixit ◽  
A. K. Sharma ◽  
Komal Kumar Bhatia

The Dark Web ◽  
2018 ◽  
pp. 319-333
Author(s):  
Sudhakar Ranjan ◽  
Komal Kumar Bhatia

Now days with the advent of internet technologies and ecommerce the need for smart search engine for human life is rising. The traditional search engines are not intelligent as well as smart and thus lead to the rise in searching costs. In this paper, architecture of a vertical search engine based on the domain specific hidden web crawler is proposed. To make a least cost vertical search engine improvement in the following techniques like: searching, indexing, ranking, transaction and query interface are suggested. The domain term analyzer filters the useless information to the maximum extent and finally provides the users with high precision information. Through the experimental result it is shown that the system works on accelerating the access, computation, storage, communication time, increased efficiency and work professionally.


2017 ◽  
Vol 7 (2) ◽  
pp. 19-33
Author(s):  
Sudhakar Ranjan ◽  
Komal Kumar Bhatia

Now days with the advent of internet technologies and ecommerce the need for smart search engine for human life is rising. The traditional search engines are not intelligent as well as smart and thus lead to the rise in searching costs. In this paper, architecture of a vertical search engine based on the domain specific hidden web crawler is proposed. To make a least cost vertical search engine improvement in the following techniques like: searching, indexing, ranking, transaction and query interface are suggested. The domain term analyzer filters the useless information to the maximum extent and finally provides the users with high precision information. Through the experimental result it is shown that the system works on accelerating the access, computation, storage, communication time, increased efficiency and work professionally.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

The WWW contains huge amount of information from different areas. This information may be present virtually in the form of web pages, media, articles (research journals / magazine), blogs etc. A major portion of the information is present in web databases that can be retrieved by raising queries at the interface offered by the specific database and is thus called the Hidden Web. An important issue is to efficiently retrieve and provide access to this enormous amount of information through crawling. In this paper, we present the architecture of a parallel crawler for the Hidden Web that avoids download overlaps by following a domain-specific approach. The experimental results further show that the proposed parallel Hidden web crawler (PSHWC), not only effectively but also efficiently extracts and download the contents in the Hidden web databases


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 117582-117592
Author(s):  
Sawroop Kaur ◽  
G. Geetha
Keyword(s):  

Author(s):  
Tao Peng ◽  
Lu Liu

Today more and more information on the Web makes it difficult to get domain-specific information due to the huge amount of data sources and the keywords that have few features. Anchor texts, which contain a few features of a specific topic, play an important role in domain-specific information retrieval, especially in Web page classification. However, the features contained in anchor texts are not informative enough. This paper presents a novel incremental method for Web page classification enhanced by link-contexts and clustering. Directly applying the vector of anchor text to a classifier might not get a good result because of the limited amount of features. Link-context is used first to obtain the contextual information of the anchor text. Then, a hierarchical clustering method is introduced to cluster feature vectors and content unit, which increases the length of a feature vector belonging to one specific class. Finally, incremental SVM is proposed to get the final classifier and increase the accuracy and efficiency of a classifier. Experimental results show that the performance of our proposed method outperforms the conventional topical Web crawler in Harvest rate and Target recall.


Sign in / Sign up

Export Citation Format

Share Document