AKSHR: A novel framework for a Domain-specific Hidden Web Crawler

Now days with the advent of internet technologies and ecommerce the need for smart search engine for human life is rising. The traditional search engines are not intelligent as well as smart and thus lead to the rise in searching costs. In this paper, architecture of a vertical search engine based on the domain specific hidden web crawler is proposed. To make a least cost vertical search engine improvement in the following techniques like: searching, indexing, ranking, transaction and query interface are suggested. The domain term analyzer filters the useless information to the maximum extent and finally provides the users with high precision information. Through the experimental result it is shown that the system works on accelerating the access, computation, storage, communication time, increased efficiency and work professionally.

Download Full-text

Design of a Least Cost (LC) Vertical Search Engine based on Domain Specific Hidden Web Crawler

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2017040102 ◽

2017 ◽

Vol 7 (2) ◽

pp. 19-33

Author(s):

Sudhakar Ranjan ◽

Komal Kumar Bhatia

Keyword(s):

Search Engine ◽

Human Life ◽

Experimental Result ◽

Web Crawler ◽

Domain Specific ◽

Hidden Web ◽

Vertical Search ◽

Communication Time ◽

Vertical Search Engine ◽

Domain Term

Now days with the advent of internet technologies and ecommerce the need for smart search engine for human life is rising. The traditional search engines are not intelligent as well as smart and thus lead to the rise in searching costs. In this paper, architecture of a vertical search engine based on the domain specific hidden web crawler is proposed. To make a least cost vertical search engine improvement in the following techniques like: searching, indexing, ranking, transaction and query interface are suggested. The domain term analyzer filters the useless information to the maximum extent and finally provides the users with high precision information. Through the experimental result it is shown that the system works on accelerating the access, computation, storage, communication time, increased efficiency and work professionally.

Download Full-text

A QIIIEP based domain specific hidden web crawler

Proceedings of the International Conference & Workshop on Emerging Trends in Technology - ICWET '11 ◽

10.1145/1980022.1980073 ◽

2011 ◽

Cited By ~ 1

Author(s):

D. K. Sharma ◽

A. K. Sharma

Keyword(s):

Web Crawler ◽

Domain Specific ◽

Hidden Web

Download Full-text

Design of a Parallel and Scalable Crawler for the Hidden Web

International Journal of Information Retrieval Research ◽

10.4018/ijirr.289612 ◽

2022 ◽

Vol 12 (1) ◽

pp. 0-0

Keyword(s):

Experimental Results ◽

Web Pages ◽

Web Crawler ◽

Huge Amount ◽

Web Databases ◽

Specific Approach ◽

Amount Of Information ◽

Domain Specific ◽

Hidden Web ◽

Enormous Amount

The WWW contains huge amount of information from different areas. This information may be present virtually in the form of web pages, media, articles (research journals / magazine), blogs etc. A major portion of the information is present in web databases that can be retrieved by raising queries at the interface offered by the specific database and is thus called the Hidden Web. An important issue is to efficiently retrieve and provide access to this enormous amount of information through crawling. In this paper, we present the architecture of a parallel crawler for the Hidden Web that avoids download overlaps by following a domain-specific approach. The experimental results further show that the proposed parallel Hidden web crawler (PSHWC), not only effectively but also efficiently extracts and download the contents in the Hidden web databases

Download Full-text

GENERATION OF CLASSIFIER FOR DOMAIN-SPECIFIC HIDDEN WEB SEARCH INTERFACE

Proceedings of the 11th Joint International Computer Conference ◽

10.1142/9789812701534_0148 ◽

2005 ◽

Author(s):

Wencui YUAN ◽

Wanli ZUO ◽

Qingyang XU

Keyword(s):

Web Search ◽

Search Interface ◽

Domain Specific ◽

Hidden Web

Download Full-text

SIMHAR - Smart Distributed Web Crawler for the Hidden Web Using SIM+Hash and Redis Server

IEEE Access ◽

10.1109/access.2020.3004756 ◽

2020 ◽

Vol 8 ◽

pp. 117582-117592

Author(s):

Sawroop Kaur ◽

G. Geetha

Keyword(s):

Web Crawler ◽

Hidden Web

Download Full-text

Comparison of Scheduling Algorithms for Domain Specific Web Crawler

2014 European Network Intelligence Conference ◽

10.1109/enic.2014.14 ◽

2014 ◽

Cited By ~ 3

Author(s):

Krzysztof Filipowski

Keyword(s):

Scheduling Algorithms ◽

Web Crawler ◽

Domain Specific

Download Full-text

Clustering-Based Topical Web Crawling for Topic-Specific Information Retrieval Guided by Incremental Classifier

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194015500011 ◽

2015 ◽

Vol 25 (01) ◽

pp. 147-168 ◽

Cited By ~ 4

Author(s):

Tao Peng ◽

Lu Liu

Keyword(s):

Information Retrieval ◽

Contextual Information ◽

Specific Information ◽

Specific Class ◽

Web Page ◽

Web Crawler ◽

Web Page Classification ◽

Domain Specific ◽

Anchor Text ◽

Page Classification

Today more and more information on the Web makes it difficult to get domain-specific information due to the huge amount of data sources and the keywords that have few features. Anchor texts, which contain a few features of a specific topic, play an important role in domain-specific information retrieval, especially in Web page classification. However, the features contained in anchor texts are not informative enough. This paper presents a novel incremental method for Web page classification enhanced by link-contexts and clustering. Directly applying the vector of anchor text to a classifier might not get a good result because of the limited amount of features. Link-context is used first to obtain the contextual information of the anchor text. Then, a hierarchical clustering method is introduced to cluster feature vectors and content unit, which increases the length of a feature vector belonging to one specific class. Finally, incremental SVM is proposed to get the final classifier and increase the accuracy and efficiency of a classifier. Experimental results show that the performance of our proposed method outperforms the conventional topical Web crawler in Harvest rate and Target recall.

Download Full-text