Data Warehousing Design and Advanced Engineering Applications
Latest Publications


TOTAL DOCUMENTS

15
(FIVE YEARS 0)

H-INDEX

3
(FIVE YEARS 0)

Published By IGI Global

9781605667560, 9781605667577

Author(s):  
Johann Eder ◽  
Karl Wiggisser

Data Warehouses typically are building blocks of decision support systems in companies and public administration. The data contained in a data warehouse is analyzed by means of OnLine Analytical Processing tools, which provide sophisticated features for aggregating and comparing data. Decision support applications depend on the reliability and accuracy of the contained data. Typically, a data warehouse does not only comprise the current snapshot data but also historical data to enable, for instance, analysis over several years. And, as we live in a changing world, one criterion for the reliability and accuracy of the results of such long period queries is their comparability. Whereas data warehouse systems are well prepared for changes in the transactional data, they are, surprisingly, not able to deal with changes in the master data. Nonetheless, such changes do frequently occur. The crucial point for supporting changes is, first of all, being aware of their existence. Second, once you know that a change took place, it is important to know which change (i.e., knowing about differences between versions and relations between the elements of different versions). For data warehouses this means that changes are identified and represented, validity of data and structures are recorded and this knowledge is used for computing correct results for OLAP queries. This chapter is intended to motivate the need for powerful maintenance mechanisms for data warehouse cubes. It presents some basic terms and definitions for the common understanding and introduces the different aspects of data warehouse maintenance. Furthermore, several approaches addressing the problem are presented and classified by their capabilities.


Author(s):  
Wojciech Leja ◽  
Robert Wrembel ◽  
Robert Ziembicki

Methods of designing a data warehouse (DW) usually assume that its structure is static. In practice, however, a DW structure changes among others as the result of the evolution of external data sources, the changes of the real world represented in a DW, and new user requirements. The most advanced research approaches to managing the evolution of DWs are based on temporal extensions and versioning techniques. An important feature of a DW system supporting evolution is its ability to query different DW states. Such querying is challenging since different DW states may differ with respect to their schemas. As a consequence, a system may not be able to execute a query for some DW states. Our approach to managing the evolution of DWs is based on the so-called Multiversion Data Warehouse (MVDW) that is composed of the sequence of DW versions. In this chapter, we contribute a query language called MVDWQL for querying the MVDW. The MVDWQL supports two types of queries, namely content queries and metadata queries. A content query is used for analyzing the content (i.e., data) of multiple DW versions. A metadata query is used for analyzing the history of evolution of the MVDW. The results of both types of queries are graphically visualized in a user interface.


Author(s):  
Dilek Tapucu ◽  
Gayo Diallo ◽  
Yamine Ait Ameur ◽  
Murat Osman Ünalir

Information systems now manage huge amount of data. Users are overwhelmed by the numerous results provided in response to their requests. These results must often be sorted and filtered in order to be usable. Moreover, the “one size fits all” approach has shown its limitation for information searching in many applications, particularly in the e-commerce domain. The capture and exploitation of user preferences have been proposed as a solution to overcome this problem. However, the existing approaches usually define preferences for a particular application. Thus, it is difficult to share and reuse the handled preferences in other contexts. In this chapter, we propose a sharable, formal and generic model to represent user‘s preferences. The model gathers several preferences models proposed in the Database and Semantic Web communities. The novelty of our approach is that the defined preferences are attached to the ontologies which describe the semantic of the data manipulated by the applications. Moreover, the proposed model offers a persistence mechanism and a dedicated language; it is implemented using Ontology-Based Databases (OBDB) system extended in order to take into account preferences. OBDB manage both ontologies and the data instances. The preference model is formally defined using theEXPRESS data modelling language which ensures us a free ambiguity definition and the approach is illustrated through a case study in the tourism domain.


Author(s):  
M. Badri ◽  
F. Boufarès ◽  
S. Hamdoun ◽  
V. Heiwy ◽  
K. Lellahi

The data necessary to decisional ends are increasingly complex. They have heterogeneous formats and come from distributed sources. They can be classified in three categories: the structured data, the semistructured data and unstructured data. In this work, we are interested in the field of data integration with the aim of construction and maintenance of warehouses whose sources are completely heterogeneous and belonging to the various categories. We propose a formal framework based on the definition of an integration environment. A set of “integration relationships” between the components of the sources is thus defined: an equivalence relation and a strict order relation. These relationships are independent of any data sources modelling. These last can be then heterogeneous and having different models and/ or categories. Two different physical architectures, to create and maintain the warehouses and the materialized views, are given.


Author(s):  
Rogério Luís de Carvalho Costa ◽  
Pedro Furtado

Globally accessible data warehouses are useful in many commercial and scientific organizations. For instance, research centers can be put together through a grid infrastructure in order to form a large virtual organization with a huge virtual data warehouse, which should be transparently and efficiently queried by grid participants. As it is frequent in the grid environment, in the Grid-based Data Warehouse one can both have resource constraints and establish Service Level Objectives (SLOs), providing some Quality of Service (QoS) differentiation for each group of users, participant organizations or requested operations. In this work, we discuss query scheduling and data placement in the grid-based data warehouse, proposing the use of QoS-aware strategies. There are some works on parallel and distributed data warehouses, but most do not concern the grid environment and those which do so, use best-effort oriented strategies. Our experimental results show the importance and effectiveness of proposed strategies.


Author(s):  
Pedro Furtado

Self-tuning physical database organization involves tools that determine automatically the best solution concerning partitioning, placement, creation and tuning of auxiliary structures (e.g. indexes), based on the workload. To the best of our knowledge, no tool has focused on a relevant issue in parallel databases and in particular data warehouses running on common off-the-shelf hardware in a sharednothing configuration: determining the adequate tradeoff for balancing load and availability with costs (storage and loading costs). In previous work, we argued that effective load and availability balancing over partitioned datasets can be obtained through chunk-wise placement and replication, together with on-demand processing. In this work, we propose ChunkSim, a simulator for system size planning, performance analysis against replication degree and availability analysis. We apply the tool to illustrate the kind of results that can be obtained by it. The whole discussion in the chapter provides very important insight into data allocation and query processing over shared-nothing data warehouses and how a good simulation analysis tool can be built to predict and analyze actual systems and intended deployments.


Author(s):  
Yasser Hachaichi ◽  
Jamel Feki ◽  
Hanene Ben-Abdallah

Due to the international economic competition, enterprises are ever looking for efficient methods to build data marts/warehouses to analyze the large data volume in their decision making process. On the other hand, even though the relational data model is the most commonly used model, any data mart/ warehouse construction method must now deal with other data types and in particular XML documents which represent the dominant type of data exchanged between partners and retrieved from the Web. This chapter presents a data mart design method that starts from both a relational database source and XML documents compliant to a given DTD. Besides considering these two types of data structures, the originality of our method lies in its being decision maker centered, its automatic extraction of loadable data mart schemas and its genericity.


Author(s):  
Matteo Golfarelli

Conceptual design and requirement analysis are two of the key steps within the data warehouse design process. They are to a great extent responsible for the success of a data warehouse project since, during these two phases, the expressivity of the multidimensional schemata is completely defined. This chapter proposes a survey of the literature related to these design steps and points out pros and cons of the different techniques in order to help the reader to identify crucial choices and possible solutions more consciously. Particular attention will be devoted to emphasizing the relationships between the two steps describing how they can be jointly used fruitfully.


Author(s):  
Chantal Reynaud ◽  
Nathalie Pernelle ◽  
Marie-Christine Rousset

This chapter deals with integration of XML heterogeneous information sources into a data warehouse with data defined in terms of a global abstract schema or ontology. The authors present an approach supporting the acquisition of data from a set of external sources available for an application of interest including data extraction, data transformation and data integration or reconciliation. The integration middleware that the authors propose extracts data from external XML sources which are relevant according to an RDFS+ ontology, transforms returned XML data into RDF facts conformed to the ontology and reconciles RDF data in order to resolve possible redundancies.


Author(s):  
Sandro Bimonte

Data warehouse and OLAP systems are tools to support decision-making. Geographic information systems (GISs) allow memorizing, analyzing and visualizing geographic data. In order to exploit the complex nature of geographic data, a new kind of decision support system has been developed: spatial OLAP (SOLAP). Spatial OLAP redefines main OLAP concepts: dimension, measure and multidimensional operators. SOLAP systems integrate OLAP and GIS functionalities into a unique interactive and flexible framework. Several research tools have been proposed to explore and the analyze spatio-multidimensional databases. This chapter presents a panorama of SOLAP models and an analytical review of research SOLAP tools. Moreover, the authors describe their Web-based system: GeWOlap. GeWOlap is an OLAP-GIS integrated solution implementing drill and cut spatio-multidimensional operators, and it supports some new spatio-multidimensional operators which change dynamically the structure of the spatial hypercube thanks to spatial analysis operators.


Sign in / Sign up

Export Citation Format

Share Document