DWDE-IR: An Efficient Deep Web Data Extraction for Information Retrieval on Web Mining

Author(s):  
Aysha Banu ◽  
M. Chitra
Author(s):  
Ily Amalina Ahmad Sabri ◽  
Mustafa Man

The World Wide Web has become a large pool of information. Extracting structured data from a published web pages has drawn attention in the last decade. The process of web data extraction (WDE) has many challenges, dueto variety of web data and the unstructured data from hypertext mark up language (HTML) files. The aim of this paper is to provide a comprehensive overview of current web data extraction techniques, in termsof extracted quality data. This paper focuses on study for data extraction using wrapper approaches and compares each other to identify the best approach to extract data from online sites. To observe the efficiency of the proposed model, we compare the performance of data extraction by single web page extraction with different models such as document object model (DOM), wrapper using hybrid dom and json (WHDJ), wrapper extraction of image using DOM and JSON (WEIDJ) and WEIDJ (no-rules). Finally, the experimentations proved that WEIDJ can extract data fastest and low time consuming compared to other proposed method.<br /><div> </div>


2013 ◽  
Vol 756-759 ◽  
pp. 2583-2587 ◽  
Author(s):  
Zi Yang Han ◽  
Feng Ying Wang ◽  
Ping Sun ◽  
Zheng Yu Li

There are so many Deep Webs in Internet, which contains a large amount of valuable data, This paper proposes a Deep Web data extraction and service system based on the principle of cloud technology. We adopt a kind of multi-node parallel computing system structure and design a task scheduling algorithm in the data extraction process, in above foundation, balance the task load of among nodes to accomplish data extraction rapidly; The experimental results show that cloud parallel computing and dispersed network resources are used to extract data in Deep Web system is valid and improves the data extraction efficiency of Deep Web and service quality.


Author(s):  
B. Umamageswari ◽  
R. Kalpana

Web mining is done on huge amounts of data extracted from WWW. Many researchers have developed several state-of-the-art approaches for web data extraction. So far in the literature, the focus is mainly on the techniques used for data region extraction. Applications which are fed with the extracted data, require fetching data spread across multiple web pages which should be crawled automatically. For this to happen, we need to extract not only data regions, but also the navigation links. Data extraction techniques are designed for specific HTML tags; which questions their universal applicability for carrying out information extraction from differently formatted web pages. This chapter focuses on various web data extraction techniques available for different kinds of data rich pages, classification of web data extraction techniques and comparison of those techniques across many useful dimensions.


Author(s):  
Shenglin Li ◽  
Chen Chen ◽  
Kaiwen Luo ◽  
Bo Song

Sign in / Sign up

Export Citation Format

Share Document