hidden web Latest Research Papers

Design of a Parallel and Scalable Crawler for the Hidden Web

International Journal of Information Retrieval Research ◽

10.4018/ijirr.289612 ◽

2022 ◽

Vol 12 (1) ◽

pp. 0-0

Keyword(s):

Experimental Results ◽

Web Pages ◽

Web Crawler ◽

Huge Amount ◽

Web Databases ◽

Specific Approach ◽

Amount Of Information ◽

Domain Specific ◽

Hidden Web ◽

Enormous Amount

The WWW contains huge amount of information from different areas. This information may be present virtually in the form of web pages, media, articles (research journals / magazine), blogs etc. A major portion of the information is present in web databases that can be retrieved by raising queries at the interface offered by the specific database and is thus called the Hidden Web. An important issue is to efficiently retrieve and provide access to this enormous amount of information through crawling. In this paper, we present the architecture of a parallel crawler for the Hidden Web that avoids download overlaps by following a domain-specific approach. The experimental results further show that the proposed parallel Hidden web crawler (PSHWC), not only effectively but also efficiently extracts and download the contents in the Hidden web databases

Download Full-text

Engineering collaborations for accessing hidden web resources

10.36375/prepare_u.iei.a203 ◽

2021 ◽

Author(s):

Manpreet Singh Sehgal

Keyword(s):

Web Resources ◽

Hidden Web

Download Full-text

DEEP WEB – HIDDEN WEB. WHAT WE DON'T KNOW ABOUT ON THE INTERNET OR WHAT IS WELL HIDDEN FROM US

National Association of Scientists ◽

10.31618/nas.2413-5291.2021.1.71.471 ◽

2021 ◽

Vol 1 (71) ◽

pp. 21-27

Author(s):

O. Dvoryankin

Keyword(s):

Information Security ◽

Personal Information ◽

Deep Web ◽

The Internet ◽

Hidden Web ◽

The Way

In this article, the study of the Deep Web – "Deep Web" is carried out. The article examines the way how this network appeared and spread, considers its positive and negative sides, studies such issues as: the concept, types, forms and characteristics, as well as compares it with the hidden Internet and the "black Internet" (DarkNet) and suggests methods of personal information security.

Download Full-text

IHWC: intelligent hidden web crawler for harvesting data in urban domains

Complex & Intelligent Systems ◽

10.1007/s40747-021-00471-1 ◽

2021 ◽

Author(s):

Sawroop Kaur ◽

Aman Singh ◽

G. Geetha ◽

Xiaochun Cheng

Keyword(s):

Large Scale ◽

Smart Cities ◽

Quality Data ◽

Web Pages ◽

Web Crawler ◽

Harvest Rate ◽

Web Interfaces ◽

Hidden Web ◽

Special Cases ◽

Daunting Task

AbstractDue to the massive size of the hidden web, searching, retrieving and mining rich and high-quality data can be a daunting task. Moreover, with the presence of forms, data cannot be accessed easily. Forms are dynamic, heterogeneous and spread over trillions of web pages. Significant efforts have addressed the problem of tapping into the hidden web to integrate and mine rich data. Effective techniques, as well as application in special cases, are required to be explored to achieve an effective harvest rate. One such special area is atmospheric science, where hidden web crawling is least implemented, and crawler is required to crawl through the huge web to narrow down the search to specific data. In this study, an intelligent hidden web crawler for harvesting data in urban domains (IHWC) is implemented to address the relative problems such as classification of domains, prevention of exhaustive searching, and prioritizing the URLs. The crawler also performs well in curating pollution-related data. The crawler targets the relevant web pages and discards the irrelevant by implementing rejection rules. To achieve more accurate results for a focused crawl, ICHW crawls the websites on priority for a given topic. The crawler has fulfilled the dual objective of developing an effective hidden web crawler that can focus on diverse domains and to check its integration in searching pollution data in smart cities. One of the objectives of smart cities is to reduce pollution. Resultant crawled data can be used for finding the reason for pollution. The crawler can help the user to search the level of pollution in a specific area. The harvest rate of the crawler is compared with pioneer existing work. With an increase in the size of a dataset, the presented crawler can add significant value to emission accuracy. Our results are demonstrating the accuracy and harvest rate of the proposed framework, and it efficiently collect hidden web interfaces from large-scale sites and achieve higher rates than other crawlers.

Download Full-text

Information Retrieval in the Hidden Web

10.4018/978-1-7998-8061-5.ch003 ◽

2021 ◽

pp. 50-71

Author(s):

Shakeel Ahmed ◽

Shubham Sharma ◽

Saneh Lata Yadav

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Deep Web ◽

Web Content ◽

Web Data ◽

Hidden Web ◽

Special Software ◽

Surface Web ◽

Dark Web

Information retrieval is finding material of unstructured nature within large collections stored on computers. Surface web consists of indexed content accessible by traditional browsers whereas deep or hidden web content cannot be found with traditional search engines and requires a password or network permissions. In deep web, dark web is also growing as new tools make it easier to navigate hidden content and accessible with special software like Tor. According to a study by Nature, Google indexes no more than 16% of the surface web and misses all of the deep web. Any given search turns up just 0.03% of information that exists online. So, the key part of the hidden web remains inaccessible to the users. This chapter deals with positing some questions about this research. Detailed definitions, analogies are explained, and the chapter discusses related work and puts forward all the advantages and limitations of the existing work proposed by researchers. The chapter identifies the need for a system that will process the surface and hidden web data and return integrated results to the users.

Download Full-text