Site-Wide Wrapper Induction for Life Science Deep Web Databases

Clustering Deep Web Databases Semantically

Information Retrieval Technology - Lecture Notes in Computer Science ◽

10.1007/978-3-540-68636-1_35 ◽

2008 ◽

pp. 365-376 ◽

Cited By ~ 4

Author(s):

Ling Song ◽

Jun Ma ◽

Po Yan ◽

Li Lian ◽

Dongmei Zhang

Keyword(s):

Deep Web ◽

Web Databases

Download Full-text

On the Integration of a Large Number of Life Science Web Databases

Lecture Notes in Computer Science - Data Integration in the Life Sciences ◽

10.1007/978-3-540-24745-6_12 ◽

2004 ◽

pp. 172-186 ◽

Cited By ~ 8

Author(s):

Zina Ben Miled ◽

Nianhua Li ◽

Yang Liu ◽

Yue He ◽

Eric Lynch ◽

...

Keyword(s):

Life Science ◽

Web Databases

Download Full-text

Research on discovering deep web entries

Computer Science and Information Systems ◽

10.2298/csis100322028w ◽

2011 ◽

Vol 8 (3) ◽

pp. 779-799 ◽

Cited By ~ 7

Author(s):

Ying Wang ◽

Huilai Li ◽

Wanli Zuo ◽

Fengling He ◽

Xin Wang ◽

...

Keyword(s):

Experimental Evaluation ◽

Structural Characteristics ◽

Deep Web ◽

Web Page ◽

Focused Crawling ◽

Web Databases ◽

Semantic Level ◽

Domain Specific ◽

Web Contents

Ontology plays an important role in locating Domain-Specific Deep Web contents, therefore, this paper presents a novel framework WFF for efficiently locating Domain-Specific Deep Web databases based on focused crawling and ontology by constructing Web Page Classifier(WPC), Form Structure Classifier(FSC) and Form Content Classifier(FCC) in a hierarchical fashion. Firstly, WPC discovers potentially interesting pages based on ontology-assisted focused crawler. Then, FSC analyzes the interesting pages and determines whether these pages subsume searchable forms based on structural characteristics. Lastly, FCC identifies searchable forms that belong to a given domain in the semantic level, and stores these URLs of Domain- Specific searchable forms to a database. Through a detailed experimental evaluation, WFF framework not only simplifies discovering process, but also effectively determines Domain-Specific databases.

Download Full-text

Deep Web Databases Sampling Approach Based on Probability Selection and Rule Mining

2009 International Conference on Computational Intelligence and Software Engineering ◽

10.1109/cise.2009.5362897 ◽

2009 ◽

Author(s):

Yang Xu ◽

Shu-Liang Wang ◽

Jian-Wei Tian

Keyword(s):

Deep Web ◽

Rule Mining ◽

Web Databases ◽

Sampling Approach

Download Full-text

E-FFC: an enhanced form-focused crawler for domain-specific deep web databases

Journal of Intelligent Information Systems ◽

10.1007/s10844-012-0221-8 ◽

2012 ◽

Vol 40 (1) ◽

pp. 159-184 ◽

Cited By ~ 12

Author(s):

Yanni Li ◽

Yuping Wang ◽

Jintao Du

Keyword(s):

Deep Web ◽

Web Databases ◽

Domain Specific

Download Full-text

Query Planning for Searching Inter-dependent Deep-Web Databases

Lecture Notes in Computer Science - Scientific and Statistical Database Management ◽

10.1007/978-3-540-69497-7_5 ◽

2008 ◽

pp. 24-41 ◽

Cited By ~ 8

Author(s):

Fan Wang ◽

Gagan Agrawal ◽

Ruoming Jin

Keyword(s):

Deep Web ◽

Web Databases ◽

Query Planning

Download Full-text

Integration of Query Interfaces for Deep Web Databases

2008 Fourth International Conference on Natural Computation ◽

10.1109/icnc.2008.351 ◽

2008 ◽

Cited By ~ 1

Author(s):

Ying Wang ◽

Wanli Zuo ◽

Tao Peng ◽

Fengling He

Keyword(s):

Deep Web ◽

Web Databases ◽

Query Interfaces

Download Full-text

Full-Page Wrapper Generation for Unsupervised Deep Web Data Extraction

10.36227/techrxiv.16649947 ◽

2021 ◽

Author(s):

Chia-Hui Chang

Keyword(s):

Data Extraction ◽

Deep Web ◽

Training Data ◽

Web Data ◽

Wrapper Induction ◽

Web Data Extraction ◽

Finite State ◽

Training Examples ◽

Sophisticated Analysis ◽

Wrapper Generation

<div>Web data extraction is a key component in many business intelligence tasks, such as data transformation, exchange, and analysis. Many approaches have been proposed, with either labeled training examples (supervised) or annotation-free training pages (unsupervised). However, most research focuses on extraction effectiveness. Not much attention has been paid to extraction efficiency. In fact, most unsupervised web data extraction ignores wrapper generation because they could work alone without any supervision. </div><div>In this paper, we argue that wrapper generation for unsupervised web data extraction is as important as supervised wrapper induction because the generated wrappers could work more efficiently without sophisticated analysis during testing. We consider two approaches for wrapper generation: schema-guided finite-state machine (FSM) approaches and data-driven machine learning (ML) approaches. We exploit unique mandatory templates to improve the FSM-based wrapper, and proposed two convolutional neural network (CNN)-based models for sequence-labeling. The experimental results show that the FSM wrapper performs well even with small training data, while the CNN-based models require more training pages to achieve the same effectiveness but are more efficient with GPU support. Furthermore, FSM wrappers can work as a filter to reduce the number of training pages and advance the learning curve for wrapper generation.</div>

Download Full-text

Full-Page Wrapper Generation for Unsupervised Deep Web Data Extraction

10.36227/techrxiv.16649947.v1 ◽

2021 ◽

Author(s):

Chia-Hui Chang

Keyword(s):

Data Extraction ◽

Deep Web ◽

Training Data ◽

Web Data ◽

Wrapper Induction ◽

Web Data Extraction ◽

Finite State ◽

Training Examples ◽

Sophisticated Analysis ◽

Wrapper Generation

<div>Web data extraction is a key component in many business intelligence tasks, such as data transformation, exchange, and analysis. Many approaches have been proposed, with either labeled training examples (supervised) or annotation-free training pages (unsupervised). However, most research focuses on extraction effectiveness. Not much attention has been paid to extraction efficiency. In fact, most unsupervised web data extraction ignores wrapper generation because they could work alone without any supervision. </div><div>In this paper, we argue that wrapper generation for unsupervised web data extraction is as important as supervised wrapper induction because the generated wrappers could work more efficiently without sophisticated analysis during testing. We consider two approaches for wrapper generation: schema-guided finite-state machine (FSM) approaches and data-driven machine learning (ML) approaches. We exploit unique mandatory templates to improve the FSM-based wrapper, and proposed two convolutional neural network (CNN)-based models for sequence-labeling. The experimental results show that the FSM wrapper performs well even with small training data, while the CNN-based models require more training pages to achieve the same effectiveness but are more efficient with GPU support. Furthermore, FSM wrappers can work as a filter to reduce the number of training pages and advance the learning curve for wrapper generation.</div>

Download Full-text

Automatic classification of deep web databases with simple query interface

2009 International Conference on Industrial Mechatronics and Automation ◽

10.1109/icima.2009.5156566 ◽

2009 ◽

Author(s):

Xuefeng Xian ◽

Pengpeng Zhao ◽

Wei Fang ◽

Jie Xin ◽

Zhiming Cui

Keyword(s):

Automatic Classification ◽

Deep Web ◽

Query Interface ◽

Web Databases ◽

Simple Query

Download Full-text