Exploring `hidden' parts of the web: the hidden web

Author(s):  
S. Gupta ◽  
K.K. Bhatia
Keyword(s):  
Author(s):  
Manuel Álvarez Díaz ◽  
Víctor Manuel Prieto Álvarez ◽  
Fidel Cacheda Seijo
Keyword(s):  

This paper presents an analysis of the most important features of the Web and its evolution and implications on the tools that traverse it to index its content to be searched later. It is important to remark that some of these features of the Web make a quite large subset to remain “hidden”. The analysis of the Web focuses on a snapshot of the Global Web for six different years: 2009 to 2014. The results for each year are analyzed independently and together to facilitate the analysis of both the features at any given time and the changes between the different analyzed years. The objective of the analysis are twofold: to characterize the Web and more importantly, its evolution along the time.


The Dark Web ◽  
2018 ◽  
pp. 84-113
Author(s):  
Manuel Álvarez Díaz ◽  
Víctor Manuel Prieto Álvarez ◽  
Fidel Cacheda Seijo
Keyword(s):  

This paper presents an analysis of the most important features of the Web and its evolution and implications on the tools that traverse it to index its content to be searched later. It is important to remark that some of these features of the Web make a quite large subset to remain “hidden”. The analysis of the Web focuses on a snapshot of the Global Web for six different years: 2009 to 2014. The results for each year are analyzed independently and together to facilitate the analysis of both the features at any given time and the changes between the different analyzed years. The objective of the analysis are twofold: to characterize the Web and more importantly, its evolution along the time.


2012 ◽  
Vol 9 (2) ◽  
pp. 561-583 ◽  
Author(s):  
Víctor Prieto ◽  
Manuel Álvarez ◽  
Rafael López-García ◽  
Fidel Cacheda

The main goal of this study is to present a scale that classifies crawling systems according to their effectiveness in traversing the ?clientside? Hidden Web. First, we perform a thorough analysis of the different client-side technologies and the main features of the web pages in order to determine the basic steps of the aforementioned scale. Then, we define the scale by grouping basic scenarios in terms of several common features, and we propose some methods to evaluate the effectiveness of the crawlers according to the levels of the scale. Finally, we present a testing web site and we show the results of applying the aforementioned methods to the results obtained by some open-source and commercial crawlers that tried to traverse the pages. Only a few crawlers achieve good results in treating client-side technologies. Regarding standalone crawlers, we highlight the open-source crawlers Heritrix and Nutch and the commercial crawler WebCopierPro, which is able to process very complex scenarios. With regard to the crawlers of the main search engines, only Google processes most of the scenarios we have proposed, while Yahoo! and Bing just deal with the basic ones. There are not many studies that assess the capacity of the crawlers to deal with client-side technologies. Also, these studies consider fewer technologies, fewer crawlers and fewer combinations. Furthermore, to the best of our knowledge, our article provides the first scale for classifying crawlers from the point of view of the most important client-side technologies.


Author(s):  
Otto Hans-Martin Lutz ◽  
Jacob Leon Kröger ◽  
Manuel Schneiderbauer ◽  
Manfred Hauswirth

Web tracking is found on 90% of common websites. It allows online behavioral analysis which can reveal insights to sensitive personal data of an individual. Most users are not aware of the amount of web tracking happening in the background. This paper contributes a sonification-based approach to raise user awareness by conveying information on web tracking through sound while the user is browsing the web. We present a framework for live web tracking analysis, conversion to Open Sound Control events and sonification. The amount of web tracking is disclosed by sound each time data is exchanged with a web tracking host. When a connection to one of the most prevalent tracking companies is established, this is additionally indicated by a voice whispering the company name. Compared to existing approaches on web tracking sonification, we add the capability to monitor any network connection, including all browsers, applications and devices. An initial user study with 12 participants showed empirical support for our main hypothesis: exposure to our sonification significantly raises web tracking awareness.


Lámpsakos ◽  
2015 ◽  
pp. 39
Author(s):  
Fernando Pech-May ◽  
Alicia Martínez-Rebollar ◽  
Hugo Estrada-Esquivel ◽  
Eduardo Pedroza-Landa

The web is the most used information source in both academic, scientific and industry forums. Its explosive growth has generated billions of pages with information which may be categorized as surface web, composed of static pages that are indexed into a hidden web, accessible through search templates. This paper presents the development of a crawler that allows searching, queries, and analysis of information in the surface web and hidden in specific domains of the web.


2008 ◽  
Vol 11 (2) ◽  
pp. 83-85
Author(s):  
Howard Wilson
Keyword(s):  

2005 ◽  
Vol 8 (1) ◽  
pp. 16-18
Author(s):  
Howard F. Wilson
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document