Web Retrieval of XML Documents

Web-Enabled Systems Integration ◽

10.4018/978-1-59140-041-7.ch009 ◽

2011 ◽

pp. 170-199

Author(s):

Barbara Catania ◽

Elena Ferrari

Keyword(s):

Expressive Power ◽

Data Representation ◽

Query Languages ◽

Heterogeneous Data ◽

Data Sources ◽

Xml Data ◽

Web Documents ◽

Web Retrieval ◽

Heterogeneous Data Sources ◽

The Web

Web is characterized by a huge amount of very heterogeneous data sources, that differ both in media support and format representation. In this scenario, there is the need of an integrating approach for querying heterogeneous Web documents. To this purpose, XML can play an important role since it is becoming a standard for data representation and exchange over the Web. Due to its flexibility, XML is currently being used as an interface language over the Web, by which (part of) document sources are represented and exported. Under this assumption, the problem of querying heterogeneous sources can be reduced to the problem of querying XML data sources. In this chapter, we first survey the most relevant query languages for XML data proposed both by the scientific community and by standardization committees, e.g., W3C, mainly focusing on their expressive power. Then, we investigate how typical Information Retrieval concepts, such as ranking, similarity-based search, and profile-based search, can be applied to XML query languages. Commercial products based on the considered approaches are then briefly surveyed. Finally, we conclude the chapter by providing an overview of the most promising research trends in the fields.

Download Full-text

XML data mediation and collaboration: a proposed comprehensive architecture and query requirements for using XML to mediate heterogeneous data sources and targets

Proceedings of the 34th Annual Hawaii International Conference on System Sciences ◽

10.1109/hicss.2001.927076 ◽

2005 ◽

Cited By ~ 3

Author(s):

P.B. Lowry

Keyword(s):

Heterogeneous Data ◽

Data Sources ◽

Xml Data ◽

Heterogeneous Data Sources

Download Full-text

Integrating Heterogeneous Data Sources in the Web

Database Technologies ◽

10.4018/978-1-60566-058-5.ch150 ◽

2009 ◽

pp. 2472-2488

Author(s):

Angelo Brayner ◽

Marcelo Meirelles ◽

José de Aguiar Moraes Filho

Keyword(s):

Query Language ◽

Heterogeneous Data ◽

Data Sources ◽

Distributed Data ◽

Local Data ◽

Multidatabase System ◽

Integration Strategy ◽

Heterogeneous Data Sources ◽

Integration Problems ◽

The Web

Integrating data sources published on the Web requires an integration strategy that guarantees the local data sources’ autonomy. A multidatabase system (MDBS) has been consolidated as an approach to integrate multiple heterogeneous and distributed data sources in flexible and dynamic environments such as the Web. A key property of MDBSs is to guarantee a higher degree of local autonomy. In order to adopt the MDBS strategy, it is necessary to use a query language, called the MultiDatabase Language (MDL), which provides the necessary constructs for jointly manipulating and accessing data in heterogeneous data sources. In other words, the MDL is responsible for solving integration conflicts. This chapter describes an extension to the XQuery Language, called MXQuery, which supports queries over several data sources and solves such integration problems as semantic heterogeneity and incomplete information.

Download Full-text

A relational data harmonization approach to XML

Journal of Information Science ◽

10.1177/0165551509104231 ◽

2009 ◽

Vol 35 (5) ◽

pp. 571-601 ◽

Cited By ~ 13

Author(s):

Timo Niemi ◽

Turkka Näppilä ◽

Kalervo Järvelin

Keyword(s):

Information Needs ◽

Ad Hoc ◽

Heterogeneous Data ◽

Data Sources ◽

Similar Data ◽

Xml Data ◽

Processing Style ◽

Heterogeneous Data Sources ◽

Autonomous Data Sources ◽

Data Source

There are numerous approaches for integrating data from heterogeneous data sources. A common background assumption is that the data sources remain quite stable and are known in advance. Hence an integration system can be built to manipulate them. In practice there is, however, often a demand for supporting ad hoc information needs concerning unexpected autonomous data sources containing volatile data. A different approach is therefore needed. We propose that semantically similar data are harmonized when extracting data from XML-based data sources. We introduce a constructor algebra, which is a powerful tool in the harmonization of XML data. This algebra is able to form for any XML data source a unique relational representation, called an XML relation. We demonstrate that the XML relation representation supports grouping and aggregation of data needed, for example, in OLAP (online analytical processing) -style applications.

Download Full-text

Integrating Heterogeneous Data Sources in the Web

Web Data Management Practices ◽

10.4018/978-1-59904-228-2.ch009 ◽

2007 ◽

pp. 199-219

Author(s):

Angelo Brayner ◽

Macelo Meireles ◽

José de Aguiar Moraes Filho

Keyword(s):

Query Language ◽

Heterogeneous Data ◽

Data Sources ◽

Distributed Data ◽

Local Data ◽

Multidatabase System ◽

Integration Strategy ◽

Heterogeneous Data Sources ◽

Integration Problems ◽

The Web

Integrating data sources published on the web requires an integration strategy that guarantees local data sources autonomy. Multidatabase System (MDBS) has been consolidated as an approach to integrate multiple heterogeneous and distributed data sources in flexible and dynamic environments such as the Web. A key property of MDBSs is to guarantee a higher degree of local autonomy. In order to adopt the MDBS strategy, it is necessary to use a query language, called multidatabase language (MDL), which provides the necessary constructs for jointly manipulating and accessing data in heterogeneous data sources. In other words, the MDL is responsible for solving integration conflicts. This chapter describes an extension to the XQuery language, called MXQuery, which supports queries over several data sources and solves integration problems as semantic heterogeneity and incomplete information.

Download Full-text

Xml Data Mediation and Collaboration: A Proposed Comprehensive Architecture and Query Requirements for Using Xml to Mediate Heterogeneous Data Sources and Targets

SSRN Electronic Journal ◽

10.2139/ssrn.666172 ◽

2005 ◽

Author(s):

Paul Benjamin Lowry

Keyword(s):

Heterogeneous Data ◽

Data Sources ◽

Xml Data ◽

Heterogeneous Data Sources

Download Full-text

An Eligibility Criteria Query Language for Heterogeneous Data Warehouses

Methods of Information in Medicine ◽

10.3414/me13-02-0027 ◽

2015 ◽

Vol 54 (01) ◽

pp. 41-44 ◽

Cited By ~ 11

Author(s):

A. Taweel ◽

S. Miles ◽

B. C. Delaney ◽

R. Bache

Keyword(s):

Clinical Data ◽

Query Language ◽

Data Representation ◽

Query Languages ◽

Heterogeneous Data ◽

Data Sources ◽

Data Warehouses ◽

Eligibility Criteria ◽

Strong Basis ◽

Temporal Semantics

SummaryIntroduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Managing Interoperability and Complexity in Health Systems”.Objectives: The increasing availability of electronic clinical data provides great potential for finding eligible patients for clinical research. However, data heterogeneity makes it difficult for clinical researchers to interrogate sources consistently. Existing standard query languages are often not sufficient to query across diverse representations. Thus, a higher- level domain language is needed so that queries become data-representation agnostic. To this end, we define a clinician-readable computational language for querying whether patients meet eligibility criteria (ECs) from clinical trials. This language is capable of implementing the temporal semantics required by many ECs, and can be automatically evaluated on heterogeneous data sources.Methods: By reference to standards and examples of existing ECs, a clinician-readable query language was developed. Using a model-based approach, it was implemented to transform captured ECs into queries that interrogate heterogeneous data warehouses. The query language was evaluated on two types of data sources, each different in structure and content.Results: The query language abstracts the level of expressivity so that researchers construct their ECs with no prior knowledge of the data sources. It was evaluated on two types of semantically and structurally diverse data warehouses. This query language is now used to express ECs in the EHR4CR project. A survey shows that it was perceived by the majority of users to be useful, easy to understand and unambiguous.Discussion: An EC-specific language enables clinical researchers to express their ECs as a query such that the user is isolated from complexities of different heterogeneous clinical data sets. More generally, the approach demonstrates that a domain query language has potential for overcoming the problems of semantic interoperability and is applicable where the nature of the queries is well understood and the data is conceptually similar but in different representations.Conclusions: Our language provides a strong basis for use across different clinical domains for expressing ECs by overcoming the heterogeneous nature of electronic clinical data whilst maintaining semantic consistency. It is readily comprehensible by target users. This demonstrates that a domain query language can be both usable and interoperable.

Download Full-text

Data Schema Integration in Web-Enabled Systems

Web-Enabled Systems Integration ◽

10.4018/978-1-59140-041-7.ch003 ◽

2011 ◽

pp. 41-65

Author(s):

Silvana Castano ◽

Valeria De Antonellis ◽

Sabrina De Capitani di Vimercati ◽

Michele Melchiori

Keyword(s):

Data Representation ◽

Heterogeneous Data ◽

Data Sources ◽

Integration Scheme ◽

Schema Integration ◽

Enterprise Information ◽

Heterogeneous Information ◽

Data Schema ◽

Information Interchange ◽

The Web

In the recent years, most enterprises have started to experience the use of the Web for work cooperation to improve efficiency and information interchange. As a consequence, enterprise information systems are being migrated onto the web, and methods and tools to effectively access data provided on the web in different formats from the autonomous heterogeneous data sources are required. In particular, integration tools are required to obtain a uniform data representation by abstracting from the formats in the origin data sources and thus to build a global information space suitable for query and access interface. The chapter will be devoted to discuss the characteristics of data schema integration in web-enabled, and to describe a comprehensive integration scheme for organizing heterogeneous information sources over the web, to enhance the capability of information interchange and interoperation among web-enabled systems.

Download Full-text

A Declarative Approach for Designing Web Portals

Encyclopedia of Portal Technologies and Applications ◽

10.4018/978-1-59140-989-2.ch035 ◽

2011 ◽

pp. 197-203

Author(s):

William Gardner ◽

R. Rajugan

Keyword(s):

Data Exchange ◽

Data Representation ◽

Content Management ◽

Heterogeneous Data ◽

Data Sources ◽

Distributed Model ◽

Heterogeneous Data Sources ◽

Extensible Markup ◽

Management Techniques ◽

Exchange Medium

As many enterprise and industrial content management techniques are moving towards a distributed model, the need to exchange data between heterogeneous data sources in a seamless fashion is constantly increasing. These heterogeneous data sources could arise from server groups from different manufacturers or databases at different sites with their own schemas. Since its introduction in 1996, eXtensible Markup Language (XML) (W3C-XML, 2004) has established itself as the open, presentation independent data representation and exchange medium. XML provides a mechanism for seamless data exchange in many industrial informatics settings. In addition, XML is also emerging as the dominant standard for storing, describing, representing, and interchanging data among various enterprises systems and databases in the context of complex Web enterprises information systems (EIS).

Download Full-text

Database Technologies on the Web

Encyclopedia of Information Science and Technology, First Edition ◽

10.4018/978-1-59140-553-5.ch130 ◽

2005 ◽

pp. 745-749

Author(s):

J. F. Aldana Montes ◽

A. C. Gómez Lora ◽

N. Moreno Vergara ◽

I. Navas Delgado ◽

M. M. Roldán Garcia

Keyword(s):

Heterogeneous Data ◽

Web Technology ◽

Data Sources ◽

Web Technologies ◽

Heterogeneous Data Sources ◽

Functional Components ◽

The Web

Database community has been seriously disturbed with the Web technologies expansion. Particularly, two reports have produced a special commotion in database field. The first one, the Asilomar report (Bernstein et al., 1998), postulates the new directives in databases tendencies, previewing the Web impact in this field. The second one, Breaking out the Box (Silberschatz & Zdonik, 1996), proposes how database community must transfer its technology to be introduced into Web technology. In this sense, the database box must be broken out into its autonomous functional components, and they must be used to reach a solution for the problem of heterogeneous data sources integration.

Download Full-text

Information Credibility Assessment and Meta Data Modeling in Integrating Heterogeneous Data Sources

10.21236/ada409695 ◽

2002 ◽

Cited By ~ 1

Author(s):

Peter P. Chen

Keyword(s):

Data Modeling ◽

Heterogeneous Data ◽

Data Sources ◽

Credibility Assessment ◽

Meta Data ◽

Heterogeneous Data Sources ◽

Information Credibility

Download Full-text