Extracting Data Records Based on Global Schema

With the rapid increasing of web data, deep web is the fastest growing web data carrier. Therefore, the research of deep web, especially on extracting data records from Result pages, has already become an urgent task. We present a data records extraction based on Global Schema method, which automatically extracts the query result records from web pages. This method first analyzes the Query interface and result records instances to build a Global Schema by ontology. Then, the Global Schema is used in the process of extracting data records from result pages and storing these data in a table. Experimental results indicate that this method is accurate to extract data records, as well as to save in a table with a Global Schema.

Download Full-text

A Semantic Query Method for Deep Web

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.347-350.2559 ◽

2013 ◽

Vol 347-350 ◽

pp. 2559-2563

Author(s):

Hao Jiang ◽

Wen Ju Liu ◽

Li Li Lu

Keyword(s):

Semantic Web ◽

Software Architecture ◽

Deep Web ◽

New Method ◽

Web Data ◽

Semantic Query ◽

Web Environment ◽

Query Result ◽

Complete Set ◽

Sql Query

Based on the idea of "functionality-centric", this paper proposes a complete set of oriented semantic query methods for Deep Web, builds up the relevant software architecture, provides a new method for full use of Deep Web data resources in semantic web environment through describing the establishment of the semantic environment, re-writing the SPARQL-to-SQL query, semantic packaging of semantic query result, and the architecture of semantic query services.

Download Full-text

Web Form Entrance Detection and Automatic Form Filling

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.2712 ◽

2013 ◽

Vol 380-384 ◽

pp. 2712-2715

Author(s):

Wen Qian Shang

Keyword(s):

Research Area ◽

Deep Web ◽

Experimental Results ◽

Web Pages ◽

Information Mining ◽

Form Analysis ◽

Entire Process ◽

Web Information

At present, deep web information mining is a considerably potential research area. How to get this massive and valuable information hidden after the database is need to be studied further. So this paper presents an approach that includes web pages analysis, getting forms, form analysis, automatic form filling, automatic form submission and acquiring returning pages. The aim is to let the computer automatically complete this process. The experimental results show the feasibility of this method. It can automatically complete the entire process.

Download Full-text

Efficient Methodology for Deep Web Data Extraction

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i1s.1769 ◽

2021 ◽

Vol 12 (1S) ◽

pp. 286-293

Author(s):

Shilpa Deshmukh, Et. al.

Keyword(s):

Information Extraction ◽

Data Extraction ◽

Deep Web ◽

Web Pages ◽

Web Data ◽

Web Information Extraction ◽

Web Data Extraction ◽

Web Information ◽

Assessment Measure ◽

Enormous Number

Deep Web substance are gotten to by inquiries submitted to Web information bases and the returned information records are enwrapped in progressively created Web pages (they will be called profound Web pages in this paper). Removing organized information from profound Web pages is a difficult issue because of the fundamental mind boggling structures of such pages. As of not long ago, an enormous number of strategies have been proposed to address this issue, however every one of them have characteristic impediments since they are Web-page-programming-language subordinate. As the mainstream two-dimensional media, the substance on Web pages are constantly shown routinely for clients to peruse. This inspires us to look for an alternate path for profound Web information extraction to beat the constraints of past works by using some fascinating normal visual highlights on the profound Web pages. In this paper, a novel vision-based methodology that is Visual Based Deep Web Data Extraction (VBDWDE) Algorithm is proposed. This methodology basically uses the visual highlights on the profound Web pages to execute profound Web information extraction, including information record extraction and information thing extraction. We additionally propose another assessment measure amendment to catch the measure of human exertion expected to create wonderful extraction. Our investigations on a huge arrangement of Web information bases show that the proposed vision-based methodology is exceptionally viable for profound Web information extraction.

Download Full-text

Data extraction and annotation based on domain-specific ontology evolution for deep web

Computer Science and Information Systems ◽

10.2298/csis101011023k ◽

2011 ◽

Vol 8 (3) ◽

pp. 673-692 ◽

Cited By ~ 4

Author(s):

Chen Kerui ◽

Wanli Zuo ◽

Fengling He ◽

Yongheng Chen ◽

Ying Wang

Keyword(s):

Data Extraction ◽

Deep Web ◽

Query Interface ◽

Mapping Data ◽

Specific Domain ◽

Data Annotation ◽

Domain Specific ◽

Query Result ◽

User Query ◽

Sample Set

Deep web respond to a user query result records encoded in HTML files. Data extraction and data annotation, which are important for many applications, extracts and annotates the record from the HTML pages. We proposed an domain-specific ontology based data extraction and annotation technique; we first construct mini-ontology for specific domain according to information of query interface and query result pages; then, use constructed mini-ontology for identifying data areas and mapping data annotations in data extraction; in order to adapt to new sample set, mini-ontology will evolve dynamically based on data extraction and data annotation. Experimental results demonstrate that this method has higher precision and recall in data extraction and data annotation.

Download Full-text

Deep Web Data Source Classification Based on Query Interface Context

2012 Fourth International Conference on Computational and Information Sciences ◽

10.1109/iccis.2012.117 ◽

2012 ◽

Author(s):

Zilu Cui ◽

Yuchen Fu

Keyword(s):

Deep Web ◽

Query Interface ◽

Web Data ◽

Data Source

Download Full-text

A deep web data extraction model for web mining: a review

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v23.i1.pp519-528 ◽

2021 ◽

Vol 23 (1) ◽

pp. 519

Author(s):

Ily Amalina Ahmad Sabri ◽

Mustafa Man

Keyword(s):

Web Mining ◽

Data Extraction ◽

Deep Web ◽

Quality Data ◽

Web Pages ◽

Web Data ◽

Comprehensive Overview ◽

Web Data Extraction ◽

Proposed Model ◽

Extraction Model

The World Wide Web has become a large pool of information. Extracting structured data from a published web pages has drawn attention in the last decade. The process of web data extraction (WDE) has many challenges, dueto variety of web data and the unstructured data from hypertext mark up language (HTML) files. The aim of this paper is to provide a comprehensive overview of current web data extraction techniques, in termsof extracted quality data. This paper focuses on study for data extraction using wrapper approaches and compares each other to identify the best approach to extract data from online sites. To observe the efficiency of the proposed model, we compare the performance of data extraction by single web page extraction with different models such as document object model (DOM), wrapper using hybrid dom and json (WHDJ), wrapper extraction of image using DOM and JSON (WEIDJ) and WEIDJ (no-rules). Finally, the experimentations proved that WEIDJ can extract data fastest and low time consuming compared to other proposed method.<br /><div> </div>

Download Full-text

Deep Web Data Extraction Based on Regular Expression

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.718-720.2242 ◽

2013 ◽

Vol 718-720 ◽

pp. 2242-2247 ◽

Cited By ~ 3

Author(s):

Tao Lin ◽

Bao Hua Qiang ◽

Shi Long ◽

He Qian

Keyword(s):

Regular Expression ◽

Data Extraction ◽

Source Code ◽

Deep Web ◽

Web Pages ◽

Web Data ◽

Common Path ◽

Web Data Integration ◽

The Common ◽

Target Data

Data extraction is an important issue in Deep web data integration. In order to extract the query results of the Deep Web, it is firstly required to locate the target data block correctly. Due to the html source code of web pages can be parsed as well structured DOM, we proposed an effective algorithm for discerning the common path based on hierarchical DOM. Based on the common path and our predefined regular expression, the target data of the Deep Web can be extracted effectively. The experimental results on real websites show that our proposed algorithm is highly effective.

Download Full-text

Deep Web query interface schema matching based on matching degree and semantic similarity

Journal of Computer Applications ◽

10.3724/sp.j.1087.2012.01688 ◽

2013 ◽

Vol 32 (6) ◽

pp. 1688-1691

Author(s):

Yong FENG ◽

Yang ZHANG

Keyword(s):

Semantic Similarity ◽

Deep Web ◽

Schema Matching ◽

Query Interface ◽

Matching Degree

Download Full-text

A Novel Architecture for Deep Web Crawler

International Journal of Information Technology and Web Engineering ◽

10.4018/jitwe.2011010103 ◽

2011 ◽

Vol 6 (1) ◽

pp. 25-48 ◽

Cited By ~ 7

Author(s):

Dilip Kumar Sharma ◽

A. K. Sharma

Keyword(s):

Cost Effective ◽

Deep Web ◽

Web Data ◽

Web Crawler ◽

Web Information ◽

General Search ◽

Web Crawlers

A traditional crawler picks up a URL, retrieves the corresponding page and extracts various links, adding them to the queue. A deep Web crawler, after adding links to the queue, checks for forms. If forms are present, it processes them and retrieves the required information. Various techniques have been proposed for crawling deep Web information, but much remains undiscovered. In this paper, the authors analyze and compare important deep Web information crawling techniques to find their relative limitations and advantages. To minimize limitations of existing deep Web crawlers, a novel architecture is proposed based on QIIIEP specifications (Sharma & Sharma, 2009). The proposed architecture is cost effective and has features of privatized search and general search for deep Web data hidden behind html forms.

Download Full-text

Cross-Fertilizing Deep Web Analysis and Ontology Enrichment

10.31219/osf.io/b3fvz ◽

2017 ◽

Author(s):

Marilena Oita ◽

Antoine Amarilli ◽

Pierre Senellart

Keyword(s):

Domain Knowledge ◽

Deep Web ◽

Web Pages ◽

Complete Understanding ◽

Specific Knowledge ◽

Domain Specific ◽

Domain Specific Knowledge ◽

Web Crawlers ◽

New Perspective ◽

The Impact

Deep Web databases, whose content is presented as dynamically-generated Web pages hidden behind forms, have mostly been left unindexed by search engine crawlers. In order to automatically explore this mass of information, many current techniques assume the existence of domain knowledge, which is costly to create and maintain. In this article, we present a new perspective on form understanding and deep Web data acquisition that does not require any domain-specific knowledge. Unlike previous approaches, we do not perform the various steps in the process (e.g., form understanding, record identification, attribute labeling) independently but integrate them to achieve a more complete understanding of deep Web sources. Through information extraction techniques and using the form itself for validation, we reconcile input and output schemas in a labeled graph which is further aligned with a generic ontology. The impact of this alignment is threefold: first, the resulting semantic infrastructure associated with the form can assist Web crawlers when probing the form for content indexing; second, attributes of response pages are labeled by matching known ontology instances, and relations between attributes are uncovered; and third, we enrich the generic ontology with facts from the deep Web.

Download Full-text