scholarly journals A Survey on Data Annotation for the Web Databases

2014 ◽  
Vol 16 (2) ◽  
pp. 68-70
Author(s):  
Miss.Priyanka P.Boraste ◽  
Author(s):  
Anil Ahlawat ◽  
Kalpna Sagar

Introduction: The need for efficient search engines has been identified with the ever-increasing technological advancement and huge growing demand of data on the web. Method: Automating duplicate detection over query results in identifying the records from multiple web databases that point to the similar real-world entity and returns non-matching records to the end-users. The proposed algorithm in this paper is based upon an unsupervised approach with classifiers over heterogeneous web databases that return more accurate results with high precision, F-measure, and recall. Different assessments are also executed to analyze the efficacy of the proposed algorithm for identification of the duplicates. Result: Results show that the proposed algorithm has greater precision, F-score measure, and the same recall values as compared to standard UDD. Conclusion: This paper concludes that the proposed algorithm outperforms standard UDD. Discussion: This paper aims to introduce an algorithm that automates the process of duplicate detection for lexical heterogeneous web databases.


2015 ◽  
Vol 20 (5) ◽  
pp. 1515-1520 ◽  
Author(s):  
Janaína Vieira dos Santos Motta ◽  
Natália Peixoto Lima ◽  
Maria Teresa Anselmo Olinto ◽  
Denise Petrucci Gigante

The purpose of this study is to review the literature on longitudinal studies that have evaluated the effect of social mobility on the occurrence of smoking in various populations. Articles were selected from the web databases PubMed and Web of Science using the words: follow up, cohort longitudinal prospective, social mobility, social change life, course socioeconomic, smoking, and tobacco. Of the six studies identified in this review, four used occupational classification to measure social mobility. All six were carried out on the continent of Europe. The results indicate higher proportions of tobacco users among those with lower socioeconomic level during the whole period of observation (for all variables analyzed); and that people who suffered downward mobility, that is to say people who were classified as having a higher socioeconomic level at the beginning of life, tended to mimic habits of the new group when they migrated to a lower social group.


2003 ◽  
pp. 246-265
Author(s):  
Athman Bouguettaya ◽  
Brahim Medjahed ◽  
Mouorad Ouzzani ◽  
Yao Meng

With the emergence of the Web, there is a need to provide across-the-board transparency for accessing and manipulating data irrespective of platforms, locations, and systems. The challenge is to build an infrastructure to support flexible tools for information space organization, communication facilities, information discovery, content description, and assembly of data from heterogeneous sources. In this chapter, we describe the WebFINDIT system. WebFINDIT builds a scalable and uniform infrastructure for locating and accessing heterogeneous and autonomous databases in large and dynamic environments. One key feature of WebFINDIT is the clustering of Web databases into distributed ontologies. The main advantage of this ontological organization is filtering interactions and reducing the overhead of locating information. Another important feature is the large spectrum of heterogeneity being supported. Heterogeneity appears at different levels, including hardware (Sun and NT), operating system (UNIX and NT), database (Oracle, Informix, DB2, ObjectStore), and communication middleware (CORBA, DCOM, EJB, and RMI).


2011 ◽  
Vol 467-469 ◽  
pp. 1764-1769
Author(s):  
Lin Zhao ◽  
Pei Guang Lin ◽  
Pei Yao Nie

With the wide application of the Web databases (WDB), it has become a hot topic of the current research to make full use of data. WDB query interface is an important way to get the WDB data, it is a significant prerequisite to obtain the data efficiently that we can realize the full representation and extraction for WDB query interface. This paper presents a new representation based on owl for WDB query interface; at the same time this paper gives the extraction methods based on regular expression and watir for the context of each query interface, form information and the relationship information between the form fields. This work provides an important foundation for the further classification and integration of query interface.


2013 ◽  
Vol 850-851 ◽  
pp. 720-723
Author(s):  
Yong Quan Dong ◽  
Ping Ling

The Web has been rapidly deepened by many searchable databases online, where data are hidden behind query interfaces. There may be hundreds or thousands of Web databases providing data of relevance to a specific domain on the Web. In the face of these large-scale Web databases, the core problem is to select the most appropriate ones to a users query. While this problem has received more attentions recently, current approaches still have the simplified and empirical limitations. In this paper, we propose a Web database selection approach based on classification. We cast Web database selection as a classification problem and combine multiple kinds of features which are about the query and Web databases. We use the classification model to obtain the relevancy of every individual Web database for a user query and select top-K ones to provide the query results. Experiments show that our approach yields high performance.


2008 ◽  
Vol 11 (2) ◽  
pp. 83-85
Author(s):  
Howard Wilson
Keyword(s):  

2005 ◽  
Vol 8 (1) ◽  
pp. 16-18
Author(s):  
Howard F. Wilson
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document