Automatic Generation of Ontology for Extracting Hidden Web Pages

Automatic generation of agents for collecting hidden Web pages for data extraction

Data & Knowledge Engineering ◽

10.1016/j.datak.2003.10.003 ◽

2004 ◽

Vol 49 (2) ◽

pp. 177-196 ◽

Cited By ~ 41

Author(s):

Juliano Palmieri Lage ◽

Altigran S. da Silva ◽

Paulo B. Golgher ◽

Alberto H.F. Laender

Keyword(s):

Data Extraction ◽

Automatic Generation ◽

Web Pages ◽

Hidden Web

Download Full-text

Automatic Generation and Use of Negative Terms to Evaluate Topic-Related Web Pages

Lecture Notes in Computer Science - Web and Communication Technologies and Internet-Related Social Issues - HSI 2005 ◽

10.1007/11527725_23 ◽

2005 ◽

pp. 218-227

Author(s):

Young-Tae Byun ◽

Yong-Ho Choi ◽

Kee-Cheol Lee

Keyword(s):

Automatic Generation ◽

Web Pages

Download Full-text

OntoIFML: Automatic Generation of Annotated Web Pages from IFML and Ontologies using the MDA Approach: A Case Study of an EMR Management Application

Proceedings of the 7th International Conference on Model-Driven Engineering and Software Development ◽

10.5220/0007402203530361 ◽

2019 ◽

Cited By ~ 1

Author(s):

Naziha Laaz ◽

Samir Mbarki

Keyword(s):

Automatic Generation ◽

Web Pages ◽

Management Application

Download Full-text

Optimization of Automatic Navigation to Hidden Web Pages by Ranking-Based Browser Preloading

Data Engineering Issues in E-Commerce and Services - Lecture Notes in Computer Science ◽

10.1007/11780397_4 ◽

2006 ◽

pp. 40-49

Author(s):

Justo Hidalgo ◽

José Losada ◽

Manuel Álvarez ◽

Alberto Pan

Keyword(s):

Web Pages ◽

Automatic Navigation ◽

Hidden Web

Download Full-text

OntoIFML: Automatic Generation of Annotated Web Pages from IFML and Ontologies using the MDA Approach: A Case Study of an EMR Management Application

Proceedings of the 7th International Conference on Model-Driven Engineering and Software Development ◽

10.5220/0007402203550363 ◽

2019 ◽

Author(s):

Naziha Laaz ◽

Samir Mbarki

Keyword(s):

Automatic Generation ◽

Web Pages ◽

Management Application

Download Full-text

Automated Web Application ...

10.29007/vs62 ◽

2018 ◽

Author(s):

Priyank Bhojak ◽

Vatsal Shah ◽

Kanu Patel ◽

Deven Gol

Keyword(s):

Dynamic Analysis ◽

Open Source ◽

Web Application ◽

Web Applications ◽

Black Box ◽

Web Pages ◽

Software Bugs ◽

Hidden Web ◽

A New Technique ◽

Input Validation

The rate of web application threats is growing more and more now in days. The most of software bugs are result from inappropriate input validation. It should lead to attack of confidential information, breaking of knowledge integrity. We develop a scanner for detecting SQ injection and XSS type software-bugs which is based on hidden web crawl and make open source scanner with the aim of hidden web crawling which may be require authentication. In this research paper we presents a new technique to find vulnerability which include advantages of black-box analysis of different web pages. And at the end we shows evaluation table which mention comparison of our scanner with two other web scanner tool. So finally this paper additionally shows how easy it is to scan web application bugs with dynamic analysis and retrieve hidden web pages from web applications.

Download Full-text

A scale for crawler effectiveness on the client-side hidden web

Computer Science and Information Systems ◽

10.2298/csis111215015p ◽

2012 ◽

Vol 9 (2) ◽

pp. 561-583 ◽

Cited By ~ 2

Author(s):

Víctor Prieto ◽

Manuel Álvarez ◽

Rafael López-García ◽

Fidel Cacheda

Keyword(s):

Open Source ◽

Search Engines ◽

Web Site ◽

Point Of View ◽

Web Pages ◽

Common Features ◽

Hidden Web ◽

Client Side ◽

The Web

The main goal of this study is to present a scale that classifies crawling systems according to their effectiveness in traversing the ?clientside? Hidden Web. First, we perform a thorough analysis of the different client-side technologies and the main features of the web pages in order to determine the basic steps of the aforementioned scale. Then, we define the scale by grouping basic scenarios in terms of several common features, and we propose some methods to evaluate the effectiveness of the crawlers according to the levels of the scale. Finally, we present a testing web site and we show the results of applying the aforementioned methods to the results obtained by some open-source and commercial crawlers that tried to traverse the pages. Only a few crawlers achieve good results in treating client-side technologies. Regarding standalone crawlers, we highlight the open-source crawlers Heritrix and Nutch and the commercial crawler WebCopierPro, which is able to process very complex scenarios. With regard to the crawlers of the main search engines, only Google processes most of the scenarios we have proposed, while Yahoo! and Bing just deal with the basic ones. There are not many studies that assess the capacity of the crawlers to deal with client-side technologies. Also, these studies consider fewer technologies, fewer crawlers and fewer combinations. Furthermore, to the best of our knowledge, our article provides the first scale for classifying crawlers from the point of view of the most important client-side technologies.

Download Full-text

Real-time Monitoring of Greenhouse Climate Control Using the Internet

HortTechnology ◽

10.21273/horttech.11.4.639 ◽

2001 ◽

Vol 11 (4) ◽

pp. 639-643 ◽

Cited By ~ 1

Author(s):

Niels Ehler ◽

Jesper M. Aaslyng

Keyword(s):

Real Time ◽

Automatic Generation ◽

Web Pages ◽

Climatic Data ◽

Climate Control ◽

Dynamic Information ◽

Ongoing Research ◽

Greenhouse Climate ◽

Internet Application ◽

And Control

The possibility of constructing an Internet application that would enable greenhouse users to track climate and control parameters from any Internet-connected computer was investigated. By constructing a set of HTML-templates, dynamic information from the control-system databases was integrated in real-time, and was uploaded to a common net-server by automatic generation of web pages using software developed during the project. Good performance, reliability and security were obtained and the technology proved to be an efficient way of supplying a broad range of users not only with climatic data but also with results from ongoing research.

Download Full-text

IHWC: intelligent hidden web crawler for harvesting data in urban domains

Complex & Intelligent Systems ◽

10.1007/s40747-021-00471-1 ◽

2021 ◽

Author(s):

Sawroop Kaur ◽

Aman Singh ◽

G. Geetha ◽

Xiaochun Cheng

Keyword(s):

Large Scale ◽

Smart Cities ◽

Quality Data ◽

Web Pages ◽

Web Crawler ◽

Harvest Rate ◽

Web Interfaces ◽

Hidden Web ◽

Special Cases ◽

Daunting Task

AbstractDue to the massive size of the hidden web, searching, retrieving and mining rich and high-quality data can be a daunting task. Moreover, with the presence of forms, data cannot be accessed easily. Forms are dynamic, heterogeneous and spread over trillions of web pages. Significant efforts have addressed the problem of tapping into the hidden web to integrate and mine rich data. Effective techniques, as well as application in special cases, are required to be explored to achieve an effective harvest rate. One such special area is atmospheric science, where hidden web crawling is least implemented, and crawler is required to crawl through the huge web to narrow down the search to specific data. In this study, an intelligent hidden web crawler for harvesting data in urban domains (IHWC) is implemented to address the relative problems such as classification of domains, prevention of exhaustive searching, and prioritizing the URLs. The crawler also performs well in curating pollution-related data. The crawler targets the relevant web pages and discards the irrelevant by implementing rejection rules. To achieve more accurate results for a focused crawl, ICHW crawls the websites on priority for a given topic. The crawler has fulfilled the dual objective of developing an effective hidden web crawler that can focus on diverse domains and to check its integration in searching pollution data in smart cities. One of the objectives of smart cities is to reduce pollution. Resultant crawled data can be used for finding the reason for pollution. The crawler can help the user to search the level of pollution in a specific area. The harvest rate of the crawler is compared with pioneer existing work. With an increase in the size of a dataset, the presented crawler can add significant value to emission accuracy. Our results are demonstrating the accuracy and harvest rate of the proposed framework, and it efficiently collect hidden web interfaces from large-scale sites and achieve higher rates than other crawlers.

Download Full-text

A look at hidden web pages in Italian public administrations

2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN) ◽

10.1109/cason.2012.6412417 ◽

2012 ◽

Cited By ~ 1

Author(s):

Enrico Sorio ◽

Alberto Bartoli ◽

Eric Medvet

Keyword(s):

Web Pages ◽

Hidden Web

Download Full-text