A Survey on Data Annotation for the Web Databases

Introduction: The need for efficient search engines has been identified with the ever-increasing technological advancement and huge growing demand of data on the web. Method: Automating duplicate detection over query results in identifying the records from multiple web databases that point to the similar real-world entity and returns non-matching records to the end-users. The proposed algorithm in this paper is based upon an unsupervised approach with classifiers over heterogeneous web databases that return more accurate results with high precision, F-measure, and recall. Different assessments are also executed to analyze the efficacy of the proposed algorithm for identification of the duplicates. Result: Results show that the proposed algorithm has greater precision, F-score measure, and the same recall values as compared to standard UDD. Conclusion: This paper concludes that the proposed algorithm outperforms standard UDD. Discussion: This paper aims to introduce an algorithm that automates the process of duplicate detection for lexical heterogeneous web databases.

Download Full-text

Social mobility and smoking: a systematic review

Ciência & Saúde Coletiva ◽

10.1590/1413-81232015205.01642014 ◽

2015 ◽

Vol 20 (5) ◽

pp. 1515-1520 ◽

Cited By ~ 2

Author(s):

Janaína Vieira dos Santos Motta ◽

Natália Peixoto Lima ◽

Maria Teresa Anselmo Olinto ◽

Denise Petrucci Gigante

Keyword(s):

Systematic Review ◽

Social Mobility ◽

Web Of Science ◽

Socioeconomic Level ◽

Downward Mobility ◽

Web Databases ◽

Occupational Classification ◽

Lower Socioeconomic ◽

The Web

The purpose of this study is to review the literature on longitudinal studies that have evaluated the effect of social mobility on the occurrence of smoking in various populations. Articles were selected from the web databases PubMed and Web of Science using the words: follow up, cohort longitudinal prospective, social mobility, social change life, course socioeconomic, smoking, and tobacco. Of the six studies identified in this review, four used occupational classification to measure social mobility. All six were carried out on the continent of Europe. The results indicate higher proportions of tobacco users among those with lower socioeconomic level during the whole period of observation (for all variables analyzed); and that people who suffered downward mobility, that is to say people who were classified as having a higher socioeconomic level at the beginning of life, tended to mimic habits of the new group when they migrated to a lower social group.

Download Full-text

Ubiquitous Access to Web Databases

Web-Powered Databases ◽

10.4018/978-1-59140-035-6.ch009 ◽

2003 ◽

pp. 246-265

Author(s):

Athman Bouguettaya ◽

Brahim Medjahed ◽

Mouorad Ouzzani ◽

Yao Meng

Keyword(s):

Dynamic Environments ◽

Information Discovery ◽

Web Databases ◽

Communication Middleware ◽

Content Description ◽

Space Organization ◽

Ubiquitous Access ◽

Locating Information ◽

Different Levels ◽

The Web

With the emergence of the Web, there is a need to provide across-the-board transparency for accessing and manipulating data irrespective of platforms, locations, and systems. The challenge is to build an infrastructure to support flexible tools for information space organization, communication facilities, information discovery, content description, and assembly of data from heterogeneous sources. In this chapter, we describe the WebFINDIT system. WebFINDIT builds a scalable and uniform infrastructure for locating and accessing heterogeneous and autonomous databases in large and dynamic environments. One key feature of WebFINDIT is the clustering of Web databases into distributed ontologies. The main advantage of this ontological organization is filtering interactions and reducing the overhead of locating information. Another important feature is the large spectrum of heterogeneity being supported. Heterogeneity appears at different levels, including hardware (Sun and NT), operating system (UNIX and NT), database (Oracle, Informix, DB2, ObjectStore), and communication middleware (CORBA, DCOM, EJB, and RMI).

Download Full-text

Incorrect Use of The Web Databases Can Result in Incorrect Information for the Readers: The Case of Lactate-Mitochondria Affair

Journal of Tumor Medicine & Prevention ◽

10.19080/jtmp.2018.03.555612 ◽

2018 ◽

Vol 3 (3) ◽

Author(s):

Salvatore Passarella

Keyword(s):

Web Databases ◽

Incorrect Information ◽

The Web

Download Full-text

WDB's Query Interface Extraction Method Based on Watir & Ruglar Expression

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.467-469.1764 ◽

2011 ◽

Vol 467-469 ◽

pp. 1764-1769

Author(s):

Lin Zhao ◽

Pei Guang Lin ◽

Pei Yao Nie

Keyword(s):

Extraction Method ◽

Regular Expression ◽

Extraction Methods ◽

Query Interface ◽

Web Databases ◽

Use Of Data ◽

The Relationship ◽

The Web ◽

Form Information

With the wide application of the Web databases (WDB), it has become a hot topic of the current research to make full use of data. WDB query interface is an important way to get the WDB data, it is a significant prerequisite to obtain the data efficiently that we can realize the full representation and extraction for WDB query interface. This paper presents a new representation based on owl for WDB query interface; at the same time this paper gives the extraction methods based on regular expression and watir for the context of each query interface, form information and the relationship information between the form fields. This work provides an important foundation for the further classification and integration of query interface.

Download Full-text

Deep Web Database Selection with Classification and Rich Features

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.850-851.720 ◽

2013 ◽

Vol 850-851 ◽

pp. 720-723

Author(s):

Yong Quan Dong ◽

Ping Ling

Keyword(s):

High Performance ◽

Large Scale ◽

Classification Problem ◽

Classification Model ◽

Web Database ◽

Specific Domain ◽

Web Databases ◽

User Query ◽

Database Selection ◽

The Web

The Web has been rapidly deepened by many searchable databases online, where data are hidden behind query interfaces. There may be hundreds or thousands of Web databases providing data of relevance to a specific domain on the Web. In the face of these large-scale Web databases, the core problem is to select the most appropriate ones to a users query. While this problem has received more attentions recently, current approaches still have the simplified and empirical limitations. In this paper, we propose a Web database selection approach based on classification. We cast Web database selection as a classification problem and combine multiple kinds of features which are about the query and Web databases. We use the classification model to obtain the relevancy of every individual Web database for a user query and select top-K ones to provide the query results. Experiments show that our approach yields high performance.

Download Full-text