Data Warehouses and OLAP

Running large data warehouses (DW) efficiently over low cost platforms places special requirements on the design of system architecture. The idea is to have the DW on a set of low-cost nodes in a non-dedicated local-area network (LAN). Nodes can run any relational database engine, and the system relies on a partitioning strategy and query processing middle layer. These characteristics are in contrast with typical parallel database systems, which rely on fast dedicated interconnects and hardware, as well as a specialized parallel query optimizer for a specific database engine. This chapter describes the architecture of the Node-Partitioned Data Warehouse (NPDW), designed to run on the low cost environment, focusing on the design for partitioning, efficient parallel join and query transformations. Given the low reliability of the target environment, we also show how replicas are incorporated in the design of a robust NPDW strategy with availability guarantees and how the replicas are used for always-on, always efficient behavior in the presence of periodic load and maintenance tasks.

Download Full-text

Indexing in Data Warehouses

Data Warehouses and OLAP ◽

10.4018/978-1-59904-364-7.ch008 ◽

2011 ◽

pp. 179-202 ◽

Cited By ~ 3

Author(s):

Karen C. Davis ◽

Ashima Gupta

Keyword(s):

Decision Making ◽

Cost Models ◽

Performance Study ◽

Multidimensional Indexing ◽

Decision Making System ◽

Single Attribute ◽

Indexing Technique ◽

Bitmap Indexes ◽

Fast Access ◽

Property Map

Bitmap Indexes (BIs) allow fast access to individual attribute values that are needed to answer a query by storing a bit for each distinct value and tuple. A BI is defined for a single attribute and the encodings are based solely on data values; the Property Map (PMap) is a multidimensional indexing technique that precomputes attribute expressions for each tuple and stores the results as bit strings. In order to determine whether the PMap is competitive with BIs, we conduct a performance study of the PMap with the Range Encoded Bit Sliced Index (REBSI) using cost models to simulate storage and query processing costs for different kinds of query types. We identify parameters that have significant effect on index performance and determine situations in which either index is more suitable. These results could be useful for improving the performance of an analytical decision making system.

Download Full-text

Indexing in Data Warhousing

Data Warehouses and OLAP ◽

10.4018/987-1-59904-364-7.ch008 ◽

2011 ◽

pp. 179-202

Author(s):

Karen C. Davis ◽

Ashima Gupta

Download Full-text

Spatial Online Analytical Processing (SOLAP)

Data Warehouses and OLAP ◽

10.4018/987-1-59904-364-7.ch013 ◽

2011 ◽

pp. 298-319 ◽

Cited By ~ 17

Author(s):

Yvan Bedard ◽

Sonia Rivest ◽

Marie-Josée Proulx

Keyword(s):

Information Systems ◽

Geographical Information Systems ◽

Temporal Analysis ◽

Point Of View ◽

Geographical Information ◽

Spatial Component ◽

On Line ◽

Street Address ◽

Analytical Processing ◽

Spatio Temporal

It is recognized that 80% of data have a spatial component (ex. street address, place name, geographic coordinates, map coordinates). Having the possibilities to display data on maps, to compare maps of different phenomena or epochs, and to combine maps with tables and statistical charts allows one to get more insights into spatial datasets. Furthermore, performing fast spatio-temporal analysis, interactively exploring the data by drilling on maps similarly to drilling on tables and charts, and easily synchronizing such operations among these views is nowadays required by more and more users. This can be done by combining Geographical Information Systems (GIS) with On-Line Analytical Processing (OLAP), paving the way to “SOLAP” (Spatial OLAP). The present chapter focuses on the spatial characteristics of SOLAP from a geomatics engineering point of view: concepts, architectures, tools and remaining challenges.

Download Full-text

OLAP with a Database Cluster

Data Warehouses and OLAP ◽

10.4018/987-1-59904-364-7.ch010 ◽

2011 ◽

pp. 230-252

Author(s):

Uwe Rohm

Keyword(s):

Decision Support ◽

Decision Support Systems ◽

Support Systems ◽

Query Routing ◽

Design Alternatives ◽

Database Cluster ◽

On Line ◽

Cluster A ◽

Commercial Off The Shelf ◽

Data Design

This chapter presents a new approach to on-line decision support systems that is scalable, fast, and capable of analysing even up-to-date data. It is based on a database cluster: a cluster of commercial off-the-shelf computers as hardware infrastructure and off-the-shelf database management systems as transactional storage managers. We focus on central architectural issues and on the performance implications of such a cluster-based decision support system. In the first half, we present a scalable infrastructure and discuss physical data design alternatives for cluster-based on-line decision support systems. In the second half of the chapter, we discuss query routing algorithms and freshness-aware scheduling. This protocol enables users to seamlessly decide how fresh the data analysed should be by allowing for different degrees of freshness of the OLAP nodes. In particular it becomes then possible to trade freshness of data for query performance.

Download Full-text

Data Warehouse Refreshment

Data Warehouses and OLAP ◽

10.4018/987-1-59904-364-7.ch005 ◽

2011 ◽

pp. 111-135 ◽

Cited By ~ 6

Author(s):

Alkis Simitsis ◽

Panos Vassiliadis ◽

Spiros Skiadopoulos ◽

Timos Sellis

Keyword(s):

Data Warehouse ◽

Advantages And Disadvantages ◽

Early Stages ◽

To Come ◽

Back Stage ◽

Etl Workflow

In the early stages of a data warehouse project, the designers/administrators have to come up with a decision concerning the design and deployment of the back-stage architecture. The possible options are (a) the usage of a commercial ETL tool, or (b) the development of an in-house ETL prototype. Both cases have advantages and disadvantages. However, in both cases the design and modeling of the ETL workflows have the same characteristics. The scope of this chapter is to indicate the main challenges, issues, and problems concerning the manufacturing of ETL workflows, in order to assist the designers/administrators to decide which solution suits better to their data warehouse project and to help them construct an efficient, robust and evolvable ETL workflow that implements the refreshment of their warehouse.

Download Full-text

Temporal Semistructured Data Models and Data Warehouses

Data Warehouses and OLAP ◽

10.4018/987-1-59904-364-7.ch012 ◽

2011 ◽

pp. 277-297 ◽

Cited By ~ 2

Author(s):

Carlo Combi ◽

Barbara Oliboni

Keyword(s):

Data Warehouse ◽

Data Model ◽

Heterogeneous Data ◽

Data Models ◽

Semistructured Data ◽

Data Sources ◽

Time Varying ◽

Data Warehouses ◽

Time Dimension ◽

Heterogeneous Data Sources

This chapter describes a graph-based approach to represent information stored in a data warehouse, by means of a temporal semistructured data model. We consider issues related to the representation of semistructured data warehouses, and discuss the set of constraints needed to manage in a correct way the warehouse time, i.e. the time dimension considered storing data in the data warehouse itself. We use a temporal semistructured data model because a data warehouse can contain data coming from different and heterogeneous data sources. This means that data stored in a data warehouse are semistructured in nature, i.e. in different documents the same information can be represented in different ways, and moreover, the document schemata can be available or not. Moreover, information stored into a data warehouse is often time varying, thus as for semistructured data, also in the data warehouse context, it could be useful to consider time.

Download Full-text

Toward Integrating Data Warehousing with Data Mining Techniques

Data Warehouses and OLAP ◽

10.4018/987-1-59904-364-7.ch011 ◽

2011 ◽

pp. 253-276 ◽

Cited By ~ 1

Author(s):

Rokia Missaoui ◽

Ganaël Jatteau ◽

Ameur Boujenoui ◽

Sami Naouali

Keyword(s):

Data Mining ◽

Data Warehousing ◽

Multidimensional Data ◽

Concept Lattices ◽

Data Mining Techniques ◽

Multidimensional Database ◽

On Demand ◽

Closed Itemsets ◽

Analytical Processing ◽

Ultimate Objective

In this paper, we present alternatives for coupling data warehousing and data mining techniques so that they can benefit from each other’s advances for the ultimate objective of efficiently providing a flexible answer to data mining queries addressed either to a bidimensional (relational) or a multidimensional database. In particular, we investigate two techniques: (i) the first one exploits concept lattices for generating frequent closed itemsets, clusters and association rules from multidimensional data, and (ii) the second one defines new operators similar in spirit to online analytical processing (OLAP) techniques to allow “data mining on demand” (i.e., data mining according to user’s needs and perspectives). The implementation of OLAP-like techniques relies on three operations on lattices, namely selection, projection and assembly. A detailed running example serves to illustrate the scope and benefits of the proposed techniques.

Download Full-text

Conceptual Modeling Solutions for the Data Warehouse

Data Warehouses and OLAP ◽

10.4018/987-1-59904-364-7.ch001 ◽

2011 ◽

pp. 1-26 ◽

Cited By ~ 13

Author(s):

Stefano Rizzi

Keyword(s):

Conceptual Model ◽

Data Warehouse ◽

Design Methodology ◽

Conceptual Modeling ◽

The Other ◽

Level Of Abstraction ◽

Practical Guide ◽

Multidimensional Modeling ◽

Basic Concepts ◽

Warehouse Design

In the context of data warehouse design, a basic role is played by conceptual modeling, that provides a higher level of abstraction in describing the warehousing process and architecture in all its aspects, aimed at achieving independence of implementation issues. This chapter focuses on a conceptual model called the DFM, that suits the variety of modeling situations that may be encountered in real projects of small to large complexity. The aim of the chapter is to propose a comprehensive set of solutions for conceptual modeling according to the DFM and to give the designer a practical guide for applying them in the context of a design methodology. Besides the basic concepts of multidimensional modeling, the other issues discussed are descriptive and cross-dimension attributes; convergences; shared, incomplete, recursive, and dynamic hierarchies; multiple and optional arcs; additivity.

Download Full-text

Bitmap Indices for Data Warehouses

Data Warehouses and OLAP ◽

10.4018/987-1-59904-364-7.ch007 ◽

2011 ◽

pp. 157-178 ◽

Cited By ~ 25

Author(s):

Kurt Stockinger ◽

Kesheng Wu

Keyword(s):

Time Complexity ◽

Large Data ◽

Database Systems ◽

Large Data Sets ◽

Future Research ◽

Data Sets ◽

Bitmap Index ◽

Access Method ◽

Efficient Access ◽

Efficient Query Processing

In this chapter we discuss various bitmap index technologies for efficient query processing in data warehousing applications. We review the existing literature and organize the technology into three categories, namely bitmap encoding, compression and binning. We introduce an efficient bitmap compression algorithm and examine the space and time complexity of the compressed bitmap index on large data sets from real applications. According to the conventional wisdom, bitmap indices are only efficient for low-cardinality attributes. However, we show that the compressed bitmap indices are also efficient for high-cardinality attributes. Timing results demonstrate that the bitmap indices significantly outperform the projection index, which is often considered to be the most efficient access method for multi-dimensional queries. Finally, we review the bitmap index technology currently supported by commonly used commercial database systems and discuss open issues for future research and development.

Download Full-text

Data Warehouses and OLAP
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Efficient and Robust Node-Partitioned Data Warehouses

Indexing in Data Warehouses

Indexing in Data Warhousing

Spatial Online Analytical Processing (SOLAP)

OLAP with a Database Cluster

Data Warehouse Refreshment

Temporal Semistructured Data Models and Data Warehouses

Toward Integrating Data Warehousing with Data Mining Techniques

Conceptual Modeling Solutions for the Data Warehouse

Bitmap Indices for Data Warehouses

Export Citation Format

Data Warehouses and OLAPLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Efficient and Robust Node-Partitioned Data Warehouses

Indexing in Data Warehouses

Indexing in Data Warhousing

Spatial Online Analytical Processing (SOLAP)

OLAP with a Database Cluster

Data Warehouse Refreshment

Temporal Semistructured Data Models and Data Warehouses

Toward Integrating Data Warehousing with Data Mining Techniques

Conceptual Modeling Solutions for the Data Warehouse

Bitmap Indices for Data Warehouses

Data Warehouses and OLAP
Latest Publications