Data Warehouses and OLAP
Latest Publications


TOTAL DOCUMENTS

15
(FIVE YEARS 0)

H-INDEX

5
(FIVE YEARS 0)

Published By IGI Global

9781599043647, 9781599043661

2011 ◽  
pp. 203-229 ◽  
Author(s):  
Pedro Furtado

Running large data warehouses (DW) efficiently over low cost platforms places special requirements on the design of system architecture. The idea is to have the DW on a set of low-cost nodes in a non-dedicated local-area network (LAN). Nodes can run any relational database engine, and the system relies on a partitioning strategy and query processing middle layer. These characteristics are in contrast with typical parallel database systems, which rely on fast dedicated interconnects and hardware, as well as a specialized parallel query optimizer for a specific database engine. This chapter describes the architecture of the Node-Partitioned Data Warehouse (NPDW), designed to run on the low cost environment, focusing on the design for partitioning, efficient parallel join and query transformations. Given the low reliability of the target environment, we also show how replicas are incorporated in the design of a robust NPDW strategy with availability guarantees and how the replicas are used for always-on, always efficient behavior in the presence of periodic load and maintenance tasks.


2011 ◽  
pp. 179-202 ◽  
Author(s):  
Karen C. Davis ◽  
Ashima Gupta

Bitmap Indexes (BIs) allow fast access to individual attribute values that are needed to answer a query by storing a bit for each distinct value and tuple. A BI is defined for a single attribute and the encodings are based solely on data values; the Property Map (PMap) is a multidimensional indexing technique that precomputes attribute expressions for each tuple and stores the results as bit strings. In order to determine whether the PMap is competitive with BIs, we conduct a performance study of the PMap with the Range Encoded Bit Sliced Index (REBSI) using cost models to simulate storage and query processing costs for different kinds of query types. We identify parameters that have significant effect on index performance and determine situations in which either index is more suitable. These results could be useful for improving the performance of an analytical decision making system.


2011 ◽  
pp. 179-202
Author(s):  
Karen C. Davis ◽  
Ashima Gupta

2011 ◽  
pp. 298-319 ◽  
Author(s):  
Yvan Bedard ◽  
Sonia Rivest ◽  
Marie-Josée Proulx

It is recognized that 80% of data have a spatial component (ex. street address, place name, geographic coordinates, map coordinates). Having the possibilities to display data on maps, to compare maps of different phenomena or epochs, and to combine maps with tables and statistical charts allows one to get more insights into spatial datasets. Furthermore, performing fast spatio-temporal analysis, interactively exploring the data by drilling on maps similarly to drilling on tables and charts, and easily synchronizing such operations among these views is nowadays required by more and more users. This can be done by combining Geographical Information Systems (GIS) with On-Line Analytical Processing (OLAP), paving the way to “SOLAP” (Spatial OLAP). The present chapter focuses on the spatial characteristics of SOLAP from a geomatics engineering point of view: concepts, architectures, tools and remaining challenges.


2011 ◽  
pp. 230-252
Author(s):  
Uwe Rohm

This chapter presents a new approach to on-line decision support systems that is scalable, fast, and capable of analysing even up-to-date data. It is based on a database cluster: a cluster of commercial off-the-shelf computers as hardware infrastructure and off-the-shelf database management systems as transactional storage managers. We focus on central architectural issues and on the performance implications of such a cluster-based decision support system. In the first half, we present a scalable infrastructure and discuss physical data design alternatives for cluster-based on-line decision support systems. In the second half of the chapter, we discuss query routing algorithms and freshness-aware scheduling. This protocol enables users to seamlessly decide how fresh the data analysed should be by allowing for different degrees of freshness of the OLAP nodes. In particular it becomes then possible to trade freshness of data for query performance.


2011 ◽  
pp. 111-135 ◽  
Author(s):  
Alkis Simitsis ◽  
Panos Vassiliadis ◽  
Spiros Skiadopoulos ◽  
Timos Sellis

In the early stages of a data warehouse project, the designers/administrators have to come up with a decision concerning the design and deployment of the back-stage architecture. The possible options are (a) the usage of a commercial ETL tool, or (b) the development of an in-house ETL prototype. Both cases have advantages and disadvantages. However, in both cases the design and modeling of the ETL workflows have the same characteristics. The scope of this chapter is to indicate the main challenges, issues, and problems concerning the manufacturing of ETL workflows, in order to assist the designers/administrators to decide which solution suits better to their data warehouse project and to help them construct an efficient, robust and evolvable ETL workflow that implements the refreshment of their warehouse.


2011 ◽  
pp. 277-297 ◽  
Author(s):  
Carlo Combi ◽  
Barbara Oliboni

This chapter describes a graph-based approach to represent information stored in a data warehouse, by means of a temporal semistructured data model. We consider issues related to the representation of semistructured data warehouses, and discuss the set of constraints needed to manage in a correct way the warehouse time, i.e. the time dimension considered storing data in the data warehouse itself. We use a temporal semistructured data model because a data warehouse can contain data coming from different and heterogeneous data sources. This means that data stored in a data warehouse are semistructured in nature, i.e. in different documents the same information can be represented in different ways, and moreover, the document schemata can be available or not. Moreover, information stored into a data warehouse is often time varying, thus as for semistructured data, also in the data warehouse context, it could be useful to consider time.


2011 ◽  
pp. 253-276 ◽  
Author(s):  
Rokia Missaoui ◽  
Ganaël Jatteau ◽  
Ameur Boujenoui ◽  
Sami Naouali

In this paper, we present alternatives for coupling data warehousing and data mining techniques so that they can benefit from each other’s advances for the ultimate objective of efficiently providing a flexible answer to data mining queries addressed either to a bidimensional (relational) or a multidimensional database. In particular, we investigate two techniques: (i) the first one exploits concept lattices for generating frequent closed itemsets, clusters and association rules from multidimensional data, and (ii) the second one defines new operators similar in spirit to online analytical processing (OLAP) techniques to allow “data mining on demand” (i.e., data mining according to user’s needs and perspectives). The implementation of OLAP-like techniques relies on three operations on lattices, namely selection, projection and assembly. A detailed running example serves to illustrate the scope and benefits of the proposed techniques.


2011 ◽  
pp. 1-26 ◽  
Author(s):  
Stefano Rizzi

In the context of data warehouse design, a basic role is played by conceptual modeling, that provides a higher level of abstraction in describing the warehousing process and architecture in all its aspects, aimed at achieving independence of implementation issues. This chapter focuses on a conceptual model called the DFM, that suits the variety of modeling situations that may be encountered in real projects of small to large complexity. The aim of the chapter is to propose a comprehensive set of solutions for conceptual modeling according to the DFM and to give the designer a practical guide for applying them in the context of a design methodology. Besides the basic concepts of multidimensional modeling, the other issues discussed are descriptive and cross-dimension attributes; convergences; shared, incomplete, recursive, and dynamic hierarchies; multiple and optional arcs; additivity.


2011 ◽  
pp. 157-178 ◽  
Author(s):  
Kurt Stockinger ◽  
Kesheng Wu

In this chapter we discuss various bitmap index technologies for efficient query processing in data warehousing applications. We review the existing literature and organize the technology into three categories, namely bitmap encoding, compression and binning. We introduce an efficient bitmap compression algorithm and examine the space and time complexity of the compressed bitmap index on large data sets from real applications. According to the conventional wisdom, bitmap indices are only efficient for low-cardinality attributes. However, we show that the compressed bitmap indices are also efficient for high-cardinality attributes. Timing results demonstrate that the bitmap indices significantly outperform the projection index, which is often considered to be the most efficient access method for multi-dimensional queries. Finally, we review the bitmap index technology currently supported by commonly used commercial database systems and discuss open issues for future research and development.


Sign in / Sign up

Export Citation Format

Share Document