Data Schema Integration in Web-Enabled Systems

In the recent years, most enterprises have started to experience the use of the Web for work cooperation to improve efficiency and information interchange. As a consequence, enterprise information systems are being migrated onto the web, and methods and tools to effectively access data provided on the web in different formats from the autonomous heterogeneous data sources are required. In particular, integration tools are required to obtain a uniform data representation by abstracting from the formats in the origin data sources and thus to build a global information space suitable for query and access interface. The chapter will be devoted to discuss the characteristics of data schema integration in web-enabled, and to describe a comprehensive integration scheme for organizing heterogeneous information sources over the web, to enhance the capability of information interchange and interoperation among web-enabled systems.

Download Full-text

Web Retrieval of XML Documents

Web-Enabled Systems Integration ◽

10.4018/978-1-59140-041-7.ch009 ◽

2011 ◽

pp. 170-199

Author(s):

Barbara Catania ◽

Elena Ferrari

Keyword(s):

Expressive Power ◽

Data Representation ◽

Query Languages ◽

Heterogeneous Data ◽

Data Sources ◽

Xml Data ◽

Web Documents ◽

Web Retrieval ◽

Heterogeneous Data Sources ◽

The Web

Web is characterized by a huge amount of very heterogeneous data sources, that differ both in media support and format representation. In this scenario, there is the need of an integrating approach for querying heterogeneous Web documents. To this purpose, XML can play an important role since it is becoming a standard for data representation and exchange over the Web. Due to its flexibility, XML is currently being used as an interface language over the Web, by which (part of) document sources are represented and exported. Under this assumption, the problem of querying heterogeneous sources can be reduced to the problem of querying XML data sources. In this chapter, we first survey the most relevant query languages for XML data proposed both by the scientific community and by standardization committees, e.g., W3C, mainly focusing on their expressive power. Then, we investigate how typical Information Retrieval concepts, such as ranking, similarity-based search, and profile-based search, can be applied to XML query languages. Commercial products based on the considered approaches are then briefly surveyed. Finally, we conclude the chapter by providing an overview of the most promising research trends in the fields.

Download Full-text

Web Services as XML Data Sources in Enterprise Information Integration

Enterprise Information Systems ◽

10.4018/978-1-61692-852-0.ch405 ◽

2011 ◽

pp. 972-985

Author(s):

Ákos Hajnal ◽

Tamás Kifor ◽

Gergely Lukácsy ◽

László Z. Varga

Keyword(s):

Web Services ◽

Web Service ◽

Information Integration ◽

Digital Libraries ◽

Relational Databases ◽

Query Language ◽

Data Sources ◽

Enterprise Information ◽

Xml Data ◽

The Web

More and more systems provide data through web service interfaces and these data have to be integrated with the legacy relational databases of the enterprise. The integration is usually done with enterprise information integration systems which provide a uniform query language to all information sources, therefore the XML data sources of Web services having a procedural access interface have to be matched with relational data sources having a database interface. In this chapter the authors provide a solution to this problem by describing the Web service wrapper component of the SINTAGMA Enterprise Information Integration system. They demonstrate Web services as XML data sources in enterprise information integration by showing how the web service wrapper component integrates XML data of Web services in the application domain of digital libraries.

Download Full-text

A Framework for Ontology-Based Heterogeneous Data Integration for Cost Management in Product Family Design

Volume 3: 28th Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2008-49803 ◽

2008 ◽

Author(s):

Xiaomeng Chang ◽

Janis Terpenny

Keyword(s):

Product Family ◽

Heterogeneous Data ◽

Schema Integration ◽

Product Family Design ◽

Related Factors ◽

Product Families ◽

Heterogeneous Data Integration ◽

Related Information ◽

Semantic Data ◽

Data Schema

High quality, high impact and economical products and systems are important goals for an enterprise. The usage of product families can be strategic to achieving these goals, yet defining these families can be challenging, requiring the consideration of numerous cost factors. This requires bringing together a great number of heterogeneous data sources of varying formats in a manner that allows the product development team to easily locate and reuse information in a collaborative manner across time and space. To date, our work has focused on the development and use of an Activity-Based Cost ontology (ABC ontology) to guide designers drill down to get at information for product family design. However, this ontology is built in such a way that it can only support information retrieval from the ontology and does not bring together and connect heterogeneous data resources. It does not address the problem of designers who struggle with obtaining relevant details from different departments in an enterprise. While there have been several semantic data schema integration tools for heterogeneous data resources integration, these tools cannot guide users to related information, that would lead to the root cause of the high cost. In this paper, in order to better manage cost in product family design, an ontology-based framework is put forward that builds on our prior work and combines the advantages of ABC ontology and data schema integration tools. The ontology-based framework can guide users to the proper information aspects through querying the central ontology, and give users detailed information about these aspects from heterogeneous data resources with the support of local ontologies. Ultimately, this framework will facilitate designers with better utilization of cost-related factors for product family design from a whole enterprise perspective.

Download Full-text

Integrating Heterogeneous Data Sources in the Web

Database Technologies ◽

10.4018/978-1-60566-058-5.ch150 ◽

2009 ◽

pp. 2472-2488

Author(s):

Angelo Brayner ◽

Marcelo Meirelles ◽

José de Aguiar Moraes Filho

Keyword(s):

Query Language ◽

Heterogeneous Data ◽

Data Sources ◽

Distributed Data ◽

Local Data ◽

Multidatabase System ◽

Integration Strategy ◽

Heterogeneous Data Sources ◽

Integration Problems ◽

The Web

Integrating data sources published on the Web requires an integration strategy that guarantees the local data sources’ autonomy. A multidatabase system (MDBS) has been consolidated as an approach to integrate multiple heterogeneous and distributed data sources in flexible and dynamic environments such as the Web. A key property of MDBSs is to guarantee a higher degree of local autonomy. In order to adopt the MDBS strategy, it is necessary to use a query language, called the MultiDatabase Language (MDL), which provides the necessary constructs for jointly manipulating and accessing data in heterogeneous data sources. In other words, the MDL is responsible for solving integration conflicts. This chapter describes an extension to the XQuery Language, called MXQuery, which supports queries over several data sources and solves such integration problems as semantic heterogeneity and incomplete information.

Download Full-text

Knowledge Acquisition from Semantically Heterogeneous Data

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch172 ◽

2011 ◽

pp. 1110-1116

Author(s):

Doina Caragea ◽

Vasant Honavar

Keyword(s):

Machine Learning ◽

Semantic Web ◽

Knowledge Acquisition ◽

Predictive Models ◽

Heterogeneous Data ◽

Data Sources ◽

Meta Data ◽

Ontological Commitments ◽

Semantically Heterogeneous ◽

The Web

Recent advances in sensors, digital storage, computing and communications technologies have led to a proliferation of autonomously operated, geographically distributed data repositories in virtually every area of human endeavor, including e-business and e-commerce, e-science, e-government, security informatics, etc. Effective use of such data in practice (e.g., building useful predictive models of consumer behavior, discovery of factors that contribute to large climatic changes, analysis of demographic factors that contribute to global poverty, analysis of social networks, or even finding out what makes a book a bestseller) requires accessing and analyzing data from multiple heterogeneous sources. The Semantic Web enterprise (Berners-Lee et al., 2001) is aimed at making the contents of the Web machine interpretable, so that heterogeneous data sources can be used together. Thus, data and resources on the Web are annotated and linked by associating meta data that make explicit the ontological commitments of the data source providers or, in some cases, the shared ontological commitments of a small community of users. Given the autonomous nature of the data sources on the Web and the diverse purposes for which the data are gathered, in the absence of a universal ontology it is inevitable that there is no unique global interpretation of the data, that serves the needs of all users under all scenarios. Many groups have attempted to develop, with varying degrees of success, tools for flexible integration and querying of data from semantically disparate sources (Levy, 2000; Noy, 2004; Doan, & Halevy, 2005), as well as techniques for discovering semantic correspondences between ontologies to assist in this process (Kalfoglou, & Schorlemmer, 2005; Noy and Stuckenschmidt, 2005). These and related advances in Semantic Web technologies present unprecedented opportunities for exploiting multiple related data sources, each annotated with its own meta data, in discovering useful knowledge in many application domains. While there has been significant work on applying machine learning to ontology construction, information extraction from text, and discovery of mappings between ontologies (Kushmerick, et al., 2005), there has been relatively little work on machine learning approaches to knowledge acquisition from data sources annotated with meta data that expose the structure (schema) and semantics (in reference to a particular ontology). However, there is a large body of literature on distributed learning (see (Kargupta, & Chan, 1999) for a survey). Furthermore, recent work (Zhang et al., 2005; Hotho et al., 2003) has shown that in addition to data, the use of meta data in the form of ontologies (class hierarchies, attribute value hierarchies) can improve the quality (accuracy, interpretability) of the learned predictive models. The purpose of this chapter is to precisely define the problem of knowledge acquisition from semantically heterogeneous data and summarize recent advances that have led to a solution to this problem (Caragea et al., 2005).

Download Full-text

An approach for semantic integration of heterogeneous data sources

PeerJ Computer Science ◽

10.7717/peerj-cs.254 ◽

2020 ◽

Vol 6 ◽

pp. e254

Author(s):

Giuseppe Fusco ◽

Lerina Aversano

Keyword(s):

Data Integration ◽

Heterogeneous Data ◽

Semantic Integration ◽

Data Sources ◽

Complex Data ◽

Semantic Heterogeneity ◽

Heterogeneous Information ◽

Heterogeneous Data Sources ◽

Autonomous Data Sources ◽

Unified View

Integrating data from multiple heterogeneous data sources entails dealing with data distributed among heterogeneous information sources, which can be structured, semi-structured or unstructured, and providing the user with a unified view of these data. Thus, in general, gathering information is challenging, and one of the main reasons is that data sources are designed to support specific applications. Very often their structure is unknown to the large part of users. Moreover, the stored data is often redundant, mixed with information only needed to support enterprise processes, and incomplete with respect to the business domain. Collecting, integrating, reconciling and efficiently extracting information from heterogeneous and autonomous data sources is regarded as a major challenge. In this paper, we present an approach for the semantic integration of heterogeneous data sources, DIF (Data Integration Framework), and a software prototype to support all aspects of a complex data integration process. The proposed approach is an ontology-based generalization of both Global-as-View and Local-as-View approaches. In particular, to overcome problems due to semantic heterogeneity and to support interoperability with external systems, ontologies are used as a conceptual schema to represent both data sources to be integrated and the global view.

Download Full-text

INTEGRATION OF MOBILE GIS AND LINKED DATA TECHNOLOGY FOR SPATIO-TEMPORAL TRANSPORT DATA MODEL

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-4-w18-721-2019 ◽

2019 ◽

Vol XLII-4/W18 ◽

pp. 721-724

Author(s):

B. Margan ◽

F. Hakimpour

Keyword(s):

Linked Data ◽

Access Network ◽

Heterogeneous Data ◽

Data Sources ◽

Traffic Information ◽

Standard Format ◽

Mobile Gis ◽

Spatio Temporal ◽

User Friendly ◽

The Web

Abstract. Linked Data is available data on the web in a standard format that is useful for content inspection and insights deriving from data through semantic queries. Querying and Exploring spatial and temporal features of various data sources will be facilitated by using Linked Data. In this paper, an application is presented for linking transport data on the web. Data from Google Maps API and OpenStreetMap linked and published on the web. Spatio-Temporal queries were executed over linked transport data and resulted in network and traffic information in accordance with the user’s position. The client-side of this application contains a web and a mobile application which presents a user interface to access network and traffic information according to the user’s position. The results of the experiment show that by using the intrinsic potential of Linked Data we have tackled the challenges of using heterogeneous data sources and have provided desirable information that could be used for discovering new patterns. The mobile GIS application enables assessing the profits of mentioned technologies through an easy and user-friendly way.

Download Full-text

Integrating Heterogeneous Data Sources in the Web

Web Data Management Practices ◽

10.4018/978-1-59904-228-2.ch009 ◽

2007 ◽

pp. 199-219

Author(s):

Angelo Brayner ◽

Macelo Meireles ◽

José de Aguiar Moraes Filho

Keyword(s):

Query Language ◽

Heterogeneous Data ◽

Data Sources ◽

Distributed Data ◽

Local Data ◽

Multidatabase System ◽

Integration Strategy ◽

Heterogeneous Data Sources ◽

Integration Problems ◽

The Web

Integrating data sources published on the web requires an integration strategy that guarantees local data sources autonomy. Multidatabase System (MDBS) has been consolidated as an approach to integrate multiple heterogeneous and distributed data sources in flexible and dynamic environments such as the Web. A key property of MDBSs is to guarantee a higher degree of local autonomy. In order to adopt the MDBS strategy, it is necessary to use a query language, called multidatabase language (MDL), which provides the necessary constructs for jointly manipulating and accessing data in heterogeneous data sources. In other words, the MDL is responsible for solving integration conflicts. This chapter describes an extension to the XQuery language, called MXQuery, which supports queries over several data sources and solves integration problems as semantic heterogeneity and incomplete information.

Download Full-text

An Eligibility Criteria Query Language for Heterogeneous Data Warehouses

Methods of Information in Medicine ◽

10.3414/me13-02-0027 ◽

2015 ◽

Vol 54 (01) ◽

pp. 41-44 ◽

Cited By ~ 11

Author(s):

A. Taweel ◽

S. Miles ◽

B. C. Delaney ◽

R. Bache

Keyword(s):

Clinical Data ◽

Query Language ◽

Data Representation ◽

Query Languages ◽

Heterogeneous Data ◽

Data Sources ◽

Data Warehouses ◽

Eligibility Criteria ◽

Strong Basis ◽

Temporal Semantics

SummaryIntroduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Managing Interoperability and Complexity in Health Systems”.Objectives: The increasing availability of electronic clinical data provides great potential for finding eligible patients for clinical research. However, data heterogeneity makes it difficult for clinical researchers to interrogate sources consistently. Existing standard query languages are often not sufficient to query across diverse representations. Thus, a higher- level domain language is needed so that queries become data-representation agnostic. To this end, we define a clinician-readable computational language for querying whether patients meet eligibility criteria (ECs) from clinical trials. This language is capable of implementing the temporal semantics required by many ECs, and can be automatically evaluated on heterogeneous data sources.Methods: By reference to standards and examples of existing ECs, a clinician-readable query language was developed. Using a model-based approach, it was implemented to transform captured ECs into queries that interrogate heterogeneous data warehouses. The query language was evaluated on two types of data sources, each different in structure and content.Results: The query language abstracts the level of expressivity so that researchers construct their ECs with no prior knowledge of the data sources. It was evaluated on two types of semantically and structurally diverse data warehouses. This query language is now used to express ECs in the EHR4CR project. A survey shows that it was perceived by the majority of users to be useful, easy to understand and unambiguous.Discussion: An EC-specific language enables clinical researchers to express their ECs as a query such that the user is isolated from complexities of different heterogeneous clinical data sets. More generally, the approach demonstrates that a domain query language has potential for overcoming the problems of semantic interoperability and is applicable where the nature of the queries is well understood and the data is conceptually similar but in different representations.Conclusions: Our language provides a strong basis for use across different clinical domains for expressing ECs by overcoming the heterogeneous nature of electronic clinical data whilst maintaining semantic consistency. It is readily comprehensible by target users. This demonstrates that a domain query language can be both usable and interoperable.

Download Full-text