hidden web
Recently Published Documents


TOTAL DOCUMENTS

133
(FIVE YEARS 14)

H-INDEX

12
(FIVE YEARS 1)

2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

The WWW contains huge amount of information from different areas. This information may be present virtually in the form of web pages, media, articles (research journals / magazine), blogs etc. A major portion of the information is present in web databases that can be retrieved by raising queries at the interface offered by the specific database and is thus called the Hidden Web. An important issue is to efficiently retrieve and provide access to this enormous amount of information through crawling. In this paper, we present the architecture of a parallel crawler for the Hidden Web that avoids download overlaps by following a domain-specific approach. The experimental results further show that the proposed parallel Hidden web crawler (PSHWC), not only effectively but also efficiently extracts and download the contents in the Hidden web databases


2021 ◽  
Vol 1 (71) ◽  
pp. 21-27
Author(s):  
O. Dvoryankin

In this article, the study of the Deep Web – "Deep Web" is carried out. The article examines the way how this network appeared and spread, considers its positive and negative sides, studies such issues as: the concept, types, forms and characteristics, as well as compares it with the hidden Internet and the "black Internet" (DarkNet) and suggests methods of personal information security. 


Author(s):  
Sawroop Kaur ◽  
Aman Singh ◽  
G. Geetha ◽  
Xiaochun Cheng

AbstractDue to the massive size of the hidden web, searching, retrieving and mining rich and high-quality data can be a daunting task. Moreover, with the presence of forms, data cannot be accessed easily. Forms are dynamic, heterogeneous and spread over trillions of web pages. Significant efforts have addressed the problem of tapping into the hidden web to integrate and mine rich data. Effective techniques, as well as application in special cases, are required to be explored to achieve an effective harvest rate. One such special area is atmospheric science, where hidden web crawling is least implemented, and crawler is required to crawl through the huge web to narrow down the search to specific data. In this study, an intelligent hidden web crawler for harvesting data in urban domains (IHWC) is implemented to address the relative problems such as classification of domains, prevention of exhaustive searching, and prioritizing the URLs. The crawler also performs well in curating pollution-related data. The crawler targets the relevant web pages and discards the irrelevant by implementing rejection rules. To achieve more accurate results for a focused crawl, ICHW crawls the websites on priority for a given topic. The crawler has fulfilled the dual objective of developing an effective hidden web crawler that can focus on diverse domains and to check its integration in searching pollution data in smart cities. One of the objectives of smart cities is to reduce pollution. Resultant crawled data can be used for finding the reason for pollution. The crawler can help the user to search the level of pollution in a specific area. The harvest rate of the crawler is compared with pioneer existing work. With an increase in the size of a dataset, the presented crawler can add significant value to emission accuracy. Our results are demonstrating the accuracy and harvest rate of the proposed framework, and it efficiently collect hidden web interfaces from large-scale sites and achieve higher rates than other crawlers.


2021 ◽  
pp. 50-71
Author(s):  
Shakeel Ahmed ◽  
Shubham Sharma ◽  
Saneh Lata Yadav

Information retrieval is finding material of unstructured nature within large collections stored on computers. Surface web consists of indexed content accessible by traditional browsers whereas deep or hidden web content cannot be found with traditional search engines and requires a password or network permissions. In deep web, dark web is also growing as new tools make it easier to navigate hidden content and accessible with special software like Tor. According to a study by Nature, Google indexes no more than 16% of the surface web and misses all of the deep web. Any given search turns up just 0.03% of information that exists online. So, the key part of the hidden web remains inaccessible to the users. This chapter deals with positing some questions about this research. Detailed definitions, analogies are explained, and the chapter discusses related work and puts forward all the advantages and limitations of the existing work proposed by researchers. The chapter identifies the need for a system that will process the surface and hidden web data and return integrated results to the users.


2021 ◽  
Vol 190 ◽  
pp. 324-331
Author(s):  
Larisa Ismailova ◽  
Viacheslav Wolfengagen ◽  
Sergey Kosikov
Keyword(s):  

2021 ◽  
Vol 69 (3) ◽  
pp. 2933-2948
Author(s):  
Sawroop Kaur ◽  
Aman Singh ◽  
G. Geetha ◽  
Mehedi Masud ◽  
Mohammed A. Alzain
Keyword(s):  

Science ◽  
2020 ◽  
Vol 369 (6507) ◽  
pp. 1042-1043 ◽  
Author(s):  
Elizabeth Pennisi ◽  
Warren Cornwall
Keyword(s):  

IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 117582-117592
Author(s):  
Sawroop Kaur ◽  
G. Geetha
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document