Data Warehousing Design and Advanced Engineering Applications

Data Warehouses typically are building blocks of decision support systems in companies and public administration. The data contained in a data warehouse is analyzed by means of OnLine Analytical Processing tools, which provide sophisticated features for aggregating and comparing data. Decision support applications depend on the reliability and accuracy of the contained data. Typically, a data warehouse does not only comprise the current snapshot data but also historical data to enable, for instance, analysis over several years. And, as we live in a changing world, one criterion for the reliability and accuracy of the results of such long period queries is their comparability. Whereas data warehouse systems are well prepared for changes in the transactional data, they are, surprisingly, not able to deal with changes in the master data. Nonetheless, such changes do frequently occur. The crucial point for supporting changes is, first of all, being aware of their existence. Second, once you know that a change took place, it is important to know which change (i.e., knowing about differences between versions and relations between the elements of different versions). For data warehouses this means that changes are identified and represented, validity of data and structures are recorded and this knowledge is used for computing correct results for OLAP queries. This chapter is intended to motivate the need for powerful maintenance mechanisms for data warehouse cubes. It presents some basic terms and definitions for the common understanding and introduces the different aspects of data warehouse maintenance. Furthermore, several approaches addressing the problem are presented and classified by their capabilities.

Download Full-text

On Querying Data and Metadata in Multiversion Data Warehouse

Data Warehousing Design and Advanced Engineering Applications ◽

10.4018/978-1-60566-756-0.ch012 ◽

2010 ◽

pp. 206-226 ◽

Cited By ~ 1

Author(s):

Wojciech Leja ◽

Robert Wrembel ◽

Robert Ziembicki

Keyword(s):

User Interface ◽

Data Warehouse ◽

Real World ◽

Query Language ◽

Data Sources ◽

User Requirements ◽

External Data ◽

Structure Changes ◽

History Of Evolution ◽

History Of

Methods of designing a data warehouse (DW) usually assume that its structure is static. In practice, however, a DW structure changes among others as the result of the evolution of external data sources, the changes of the real world represented in a DW, and new user requirements. The most advanced research approaches to managing the evolution of DWs are based on temporal extensions and versioning techniques. An important feature of a DW system supporting evolution is its ability to query different DW states. Such querying is challenging since different DW states may differ with respect to their schemas. As a consequence, a system may not be able to execute a query for some DW states. Our approach to managing the evolution of DWs is based on the so-called Multiversion Data Warehouse (MVDW) that is composed of the sequence of DW versions. In this chapter, we contribute a query language called MVDWQL for querying the MVDW. The MVDWQL supports two types of queries, namely content queries and metadata queries. A content query is used for analyzing the content (i.e., data) of multiple DW versions. A metadata query is used for analyzing the history of evolution of the MVDW. The results of both types of queries are graphically visualized in a user interface.

Download Full-text

Ontology-Based Database Approach for Handling Preferences

Data Warehousing Design and Advanced Engineering Applications ◽

10.4018/978-1-60566-756-0.ch014 ◽

2010 ◽

pp. 248-271 ◽

Cited By ~ 1

Author(s):

Dilek Tapucu ◽

Gayo Diallo ◽

Yamine Ait Ameur ◽

Murat Osman Ünalir

Keyword(s):

Information Systems ◽

Semantic Web ◽

User Preferences ◽

Generic Model ◽

Information Searching ◽

Modelling Language ◽

Preference Model ◽

Proposed Model ◽

The One

Information systems now manage huge amount of data. Users are overwhelmed by the numerous results provided in response to their requests. These results must often be sorted and filtered in order to be usable. Moreover, the “one size fits all” approach has shown its limitation for information searching in many applications, particularly in the e-commerce domain. The capture and exploitation of user preferences have been proposed as a solution to overcome this problem. However, the existing approaches usually define preferences for a particular application. Thus, it is difficult to share and reuse the handled preferences in other contexts. In this chapter, we propose a sharable, formal and generic model to represent user‘s preferences. The model gathers several preferences models proposed in the Database and Semantic Web communities. The novelty of our approach is that the defined preferences are attached to the ontologies which describe the semantic of the data manipulated by the applications. Moreover, the proposed model offers a persistence mechanism and a dedicated language; it is implemented using Ontology-Based Databases (OBDB) system extended in order to take into account preferences. OBDB manage both ontologies and the data instances. The preference model is formally defined using theEXPRESS data modelling language which ensures us a free ambiguity definition and the approach is illustrated through a case study in the tourism domain.

Download Full-text

Construction and Maintenance of Heterogeneous Data Warehouses

Data Warehousing Design and Advanced Engineering Applications ◽

10.4018/978-1-60566-756-0.ch011 ◽

2010 ◽

pp. 189-204 ◽

Cited By ~ 1

Author(s):

M. Badri ◽

F. Boufarès ◽

S. Hamdoun ◽

V. Heiwy ◽

K. Lellahi

Keyword(s):

Data Integration ◽

Equivalence Relation ◽

Order Relation ◽

Heterogeneous Data ◽

Semistructured Data ◽

Unstructured Data ◽

Data Sources ◽

Materialized Views ◽

Formal Framework ◽

Definition Of

The data necessary to decisional ends are increasingly complex. They have heterogeneous formats and come from distributed sources. They can be classified in three categories: the structured data, the semistructured data and unstructured data. In this work, we are interested in the field of data integration with the aim of construction and maintenance of warehouses whose sources are completely heterogeneous and belonging to the various categories. We propose a formal framework based on the definition of an integration environment. A set of “integration relationships” between the components of the sources is thus defined: an equivalence relation and a strict order relation. These relationships are independent of any data sources modelling. These last can be then heterogeneous and having different models and/ or categories. Two different physical architectures, to create and maintain the warehouses and the materialized views, are given.

Download Full-text

QoS-Oriented Grid-Enabled Data Warehouses

Data Warehousing Design and Advanced Engineering Applications ◽

10.4018/978-1-60566-756-0.ch009 ◽

2010 ◽

pp. 150-170

Author(s):

Rogério Luís de Carvalho Costa ◽

Pedro Furtado

Keyword(s):

Data Warehouse ◽

Resource Constraints ◽

Service Level ◽

Distributed Data ◽

Data Warehouses ◽

Grid Environment ◽

Qos Differentiation ◽

Scientific Organizations ◽

Grid Based

Globally accessible data warehouses are useful in many commercial and scientific organizations. For instance, research centers can be put together through a grid infrastructure in order to form a large virtual organization with a huge virtual data warehouse, which should be transparently and efficiently queried by grid participants. As it is frequent in the grid environment, in the Grid-based Data Warehouse one can both have resource constraints and establish Service Level Objectives (SLOs), providing some Quality of Service (QoS) differentiation for each group of users, participant organizations or requested operations. In this work, we discuss query scheduling and data placement in the grid-based data warehouse, proposing the use of QoS-aware strategies. There are some works on parallel and distributed data warehouses, but most do not concern the grid environment and those which do so, use best-effort oriented strategies. Our experimental results show the importance and effectiveness of proposed strategies.

Download Full-text

ChunkSim

Data Warehousing Design and Advanced Engineering Applications ◽

10.4018/978-1-60566-756-0.ch008 ◽

2010 ◽

pp. 131-149

Author(s):

Pedro Furtado

Keyword(s):

Performance Analysis ◽

Simulation Analysis ◽

System Size ◽

Analysis Tool ◽

Data Allocation ◽

Data Warehouses ◽

Availability Analysis ◽

On Demand ◽

Self Tuning ◽

Insight Into

Self-tuning physical database organization involves tools that determine automatically the best solution concerning partitioning, placement, creation and tuning of auxiliary structures (e.g. indexes), based on the workload. To the best of our knowledge, no tool has focused on a relevant issue in parallel databases and in particular data warehouses running on common off-the-shelf hardware in a sharednothing configuration: determining the adequate tradeoff for balancing load and availability with costs (storage and loading costs). In previous work, we argued that effective load and availability balancing over partitioned datasets can be obtained through chunk-wise placement and replication, together with on-demand processing. In this work, we propose ChunkSim, a simulator for system size planning, performance analysis against replication degree and availability analysis. We apply the tool to illustrate the kind of results that can be obtained by it. The whole discussion in the chapter provides very important insight into data allocation and query processing over shared-nothing data warehouses and how a good simulation analysis tool can be built to predict and analyze actual systems and intended deployments.

Download Full-text

Designing Data Marts from XML and Relational Data Sources

Data Warehousing Design and Advanced Engineering Applications ◽

10.4018/978-1-60566-756-0.ch004 ◽

2010 ◽

pp. 55-80 ◽

Cited By ~ 2

Author(s):

Yasser Hachaichi ◽

Jamel Feki ◽

Hanene Ben-Abdallah

Keyword(s):

Design Method ◽

Large Data ◽

Relational Data ◽

Data Types ◽

Economic Competition ◽

Data Mart ◽

Relational Data Model ◽

Xml Documents ◽

Data Marts ◽

Data Volume

Due to the international economic competition, enterprises are ever looking for efficient methods to build data marts/warehouses to analyze the large data volume in their decision making process. On the other hand, even though the relational data model is the most commonly used model, any data mart/ warehouse construction method must now deal with other data types and in particular XML documents which represent the dominant type of data exchanged between partners and retrieved from the Web. This chapter presents a data mart design method that starts from both a relational database source and XML documents compliant to a given DTD. Besides considering these two types of data structures, the originality of our method lies in its being decision maker centered, its automatic extraction of loadable data mart schemas and its genericity.

Download Full-text

From User Requirements to Conceptual Design in Warehouse Design

Data Warehousing Design and Advanced Engineering Applications ◽

10.4018/978-1-60566-756-0.ch001 ◽

2010 ◽

pp. 1-16 ◽

Cited By ~ 11

Author(s):

Matteo Golfarelli

Keyword(s):

Data Warehouse ◽

Conceptual Design ◽

Design Process ◽

Requirement Analysis ◽

User Requirements ◽

Pros And Cons ◽

Warehouse Design ◽

Two Phases ◽

Key Steps ◽

Design Steps

Conceptual design and requirement analysis are two of the key steps within the data warehouse design process. They are to a great extent responsible for the success of a data warehouse project since, during these two phases, the expressivity of the multidimensional schemata is completely defined. This chapter proposes a survey of the literature related to these design steps and points out pros and cons of the different techniques in order to help the reader to identify crucial choices and possible solutions more consciously. Particular attention will be devoted to emphasizing the relationships between the two steps describing how they can be jointly used fruitfully.

Download Full-text

Data Extraction, Transformation and Integration Guided by an Ontology

Data Warehousing Design and Advanced Engineering Applications ◽

10.4018/978-1-60566-756-0.ch002 ◽

2010 ◽

pp. 17-37

Author(s):

Chantal Reynaud ◽

Nathalie Pernelle ◽

Marie-Christine Rousset

Keyword(s):

Data Integration ◽

Data Warehouse ◽

Information Sources ◽

Data Extraction ◽

Data Transformation ◽

Heterogeneous Information ◽

Abstract Schema ◽

Heterogeneous Information Sources ◽

External Sources ◽

Rdf Data

This chapter deals with integration of XML heterogeneous information sources into a data warehouse with data defined in terms of a global abstract schema or ontology. The authors present an approach supporting the acquisition of data from a set of external sources available for an application of interest including data extraction, data transformation and data integration or reconciliation. The integration middleware that the authors propose extracts data from external XML sources which are relevant according to an RDFS+ ontology, transforms returned XML data into RDF facts conformed to the ontology and reconciles RDF data in order to resolve possible redundancies.

Download Full-text

On Modeling and Analysis of Multidimensional Geographic Databases

Data Warehousing Design and Advanced Engineering Applications ◽

10.4018/978-1-60566-756-0.ch006 ◽

2010 ◽

pp. 96-112 ◽

Cited By ~ 3

Author(s):

Sandro Bimonte

Keyword(s):

Data Warehouse ◽

Complex Nature ◽

Web Based ◽

Modeling And Analysis ◽

Multidimensional Databases ◽

Geographic Data ◽

Analytical Review ◽

Flexible Framework ◽

Web Based System ◽

Support Decision Making

Data warehouse and OLAP systems are tools to support decision-making. Geographic information systems (GISs) allow memorizing, analyzing and visualizing geographic data. In order to exploit the complex nature of geographic data, a new kind of decision support system has been developed: spatial OLAP (SOLAP). Spatial OLAP redefines main OLAP concepts: dimension, measure and multidimensional operators. SOLAP systems integrate OLAP and GIS functionalities into a unique interactive and flexible framework. Several research tools have been proposed to explore and the analyze spatio-multidimensional databases. This chapter presents a panorama of SOLAP models and an analytical review of research SOLAP tools. Moreover, the authors describe their Web-based system: GeWOlap. GeWOlap is an OLAP-GIS integrated solution implementing drill and cut spatio-multidimensional operators, and it supports some new spatio-multidimensional operators which change dynamically the structure of the spatial hypercube thanks to spatial analysis operators.

Download Full-text

Data Warehousing Design and Advanced Engineering Applications
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Data Warehouse Maintenance, Evolution and Versioning

On Querying Data and Metadata in Multiversion Data Warehouse

Ontology-Based Database Approach for Handling Preferences

Construction and Maintenance of Heterogeneous Data Warehouses

QoS-Oriented Grid-Enabled Data Warehouses

ChunkSim

Designing Data Marts from XML and Relational Data Sources

From User Requirements to Conceptual Design in Warehouse Design

Data Extraction, Transformation and Integration Guided by an Ontology

On Modeling and Analysis of Multidimensional Geographic Databases

Export Citation Format

Data Warehousing Design and Advanced Engineering ApplicationsLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Data Warehouse Maintenance, Evolution and Versioning

On Querying Data and Metadata in Multiversion Data Warehouse

Ontology-Based Database Approach for Handling Preferences

Construction and Maintenance of Heterogeneous Data Warehouses

QoS-Oriented Grid-Enabled Data Warehouses

ChunkSim

Designing Data Marts from XML and Relational Data Sources

From User Requirements to Conceptual Design in Warehouse Design

Data Extraction, Transformation and Integration Guided by an Ontology

On Modeling and Analysis of Multidimensional Geographic Databases

Data Warehousing Design and Advanced Engineering Applications
Latest Publications