An Automatic Data Warehouse Conceptual Design Approach

Author(s):  
Jamel Feki

Within today’s competitive economic context, information acquisition, analysis and exploitation became strategic and unavoidable requirements for every enterprise. Moreover, in order to guarantee their persistence and growth, enterprises are forced, henceforth, to capitalize expertise in this domain. Data warehouses (DW) emerged as a potential solution answering the needs of storage and analysis of large data volumes. In fact, a DW is a database system specialized in the storage of data used for decisional ends. This type of systems was proposed to overcome the incapacities of OLTP (On-Line Transaction Processing) systems in offering analysis functionalities. It offers integrated, consolidated and temporal data to perform decisional analyses. However, the different objectives and functionalities between OLTP and DW systems created a need for a development method appropriate for DW. Indeed, data warehouses still deploy considerable efforts and interests of a large community of both software editors of decision support systems (DSS) and researchers (Kimball, 1996; Inmon, 2002). Current software tools for DW focus on meeting end-user needs. OLAP (On-Line Analytical Processing) tools are dedicated to multidimensional analyses and graphical visualization of results (e.g., Oracle Discoverer?); some products permit the description of DW and Data Mart (DM) schemes (e.g., Oracle Warehouse Builder?). One major limit of these tools is that the schemes must be built beforehand and, in most cases, manually. However, such a task can be tedious, error-prone and time-consuming, especially with heterogeneous data sources. On the other hand, the majority of research efforts focuses on particular aspects in DW development, cf., multidimensional modeling, physical design (materialized views (Moody & Kortnik, 2000), index selection (Golfarelli, Rizzi, & Saltarelli 2002), schema partitioning (Bellatreche & Boukhalfa, 2005)) and more recently applying data mining for a better data interpretation (Mikolaj, 2006; Zubcoff, Pardillo & Trujillo, 2007). While these practical issues determine the performance of a DW, other just as important, conceptual issues (e.g., requirements specification and DW schema design) still require further investigations. In fact, few propositions were put forward to assist in and/or to automate the design process of DW, cf., (Bonifati, Cattaneo, Ceri, Fuggetta & Paraboschi, 2001; Hahn, Sapia & Blaschka, 2000; Phipps & Davis 2002; Peralta, Marotta & Ruggia, 2003).

2008 ◽  
pp. 3116-3141
Author(s):  
Shi-Ming Huang ◽  
David C. Yen ◽  
Hsiang-Yuan Hsueh

The materialized view approach is widely adopted in implementations of data warehouse systems in or-der for efficiency purposes. In terms of the construction of a materialized data warehouse system, some managerial problems still exist to most developers and users in the view resource maintenance area in particular. Resource redundancy and data inconsistency among materialized views in a data warehouse system is a problem that many developers and users struggle with. In this article, a space-efficient protocol for materialized view maintenance with a global data view on data warehouses with embedded proxies is proposed. In the protocol set, multilevel proxy-based protocols with a data compensating mechanism are provided to certify the consistency and uniqueness of materialized data among data resources and materialized views. The authors also provide a set of evaluation experiences and derivations to verify the feasibility of proposed protocols and mechanisms. With such protocols as proxy services, the performance and space utilization of the materialized view approach will be improved. Furthermore, the consistency issue among materialized data warehouses and heterogeneous data sources can be properly accomplished by applying a dynamic compensating and synchronization mechanism. The trade-off between efficiency, storage consumption, and data validity for view maintenance tasks can be properly balanced.


This chapter provides an overview of the proposed model for pattern extraction and pattern prediction over data warehouses. As discussed before, the main objective of the research is to provide a single model for pattern extraction and prediction. The objectives include an automated way to select variables for the mining process, automated schema design, advanced evaluation of extracted patterns, and visualization of extracted patterns.


Author(s):  
Xinjian Lu

A data warehouse stores and manages historical data for on-line analytical processing, rather than for on-line transactional processing. Data warehouses with sizes ranging from gigabytes to terabytes are common, and they are much larger than operational databases. Data warehouse users tend to be more interested in identifying business trends rather than individual values. Queries for identifying business trends are called analytical queries. These queries invariably require data aggregation, usually according to many different groupings. Analytical queries are thus much more complex than transactional ones. The complexity of analytical queries combined with the immense size of data can easily result in unacceptably long response times. Effective approaches to improving query performance are crucial to a proper physical design of data warehouses.


Author(s):  
Jérôme Darmont ◽  
Emerson Olivier

In this context, the warehouse measures, though not necessarily numerical, remain the indicators for analysis, and analysis is still performed following different perspectives represented by dimensions. Large data volumes and their dating are other arguments in favor of this approach (Darmont et al., 2003). Data warehousing can also support various types of analysis, such as statistical reporting, on-line analysis (OLAP) and data mining. The aim of this article is to present an overview of the existing data warehouses for biomedical data and to discuss the issues and future trends in biomedical data warehousing. We illustrate this topic by presenting the design of an innovative, complex data warehouse for personal, anticipative medicine.


Author(s):  
Salman Ahmed Shaikh ◽  
Kousuke Nakabasami ◽  
Toshiyuki Amagasa ◽  
Hiroyuki Kitagawa

Data warehousing and multidimensional analysis go side by side. Data warehouses provide clean and partially normalized data for fast, consistent, and interactive multidimensional analysis. With the advancement in data generation and collection technologies, businesses and organizations are now generating big data (defined by 3Vs; i.e., volume, variety, and velocity). Since the big data is different from traditional data, it requires different set of tools and techniques for processing and analysis. This chapter discusses multidimensional analysis (also known as on-line analytical processing or OLAP) of big data by focusing particularly on data streams, characterized by huge volume and high velocity. OLAP requires to maintain a number of materialized views corresponding to user queries for interactive analysis. Precisely, this chapter discusses the issues in maintaining the materialized views for data streams, the use of special window for the maintenance of materialized views and the coupling issues of stream processing engine (SPE) with OLAP engine.


Author(s):  
Dimitri Theodoratos ◽  
Wugang Xu ◽  
Alkis Simitsis

A Data Warehouse (DW) is a repository of information retrieved from multiple, possibly heterogeneous, autonomous, distributed databases and other information sources for the purpose of complex querying, analysis and decision support. Data in the DW are selectively collected from the sources, processed in order to resolve inconsistencies, and integrated in advance (at design time) before data loading. DW data are usually organized multidimensionally to support On-Line Analytical Processing (OLAP). A DW can be abstractly seen as a set of materialized views defined over the source relations. During the initial design of a DW, the DW designer faces the problem of deciding which views to materialize in the DW. This problem has been addressed in the literature for different classes of queries and views and with different design goals.


Author(s):  
Fadila Bentayeb ◽  
Cécile Favre ◽  
Omar Boussaid

A data warehouse allows the integration of heterogeneous data sources for identified analysis purposes. The data warehouse schema is designed according to the available data sources and the users’ analysis requirements. In order to provide an answer to new individual analysis needs, the authors previously proposed, in recent work, a solution for on-line analysis personalization. They based their solution on a user-driven approach for data warehouse schema evolution which consists in creating new hierarchy levels in OLAP (on-line analytical processing) dimensions. One of the main objectives of OLAP, as the meaning of the acronym refers, is the performance during the analysis process. Since data warehouses contain a large volume of data, answering decision queries efficiently requires particular access methods. The main issue is to use redundant optimization structures such as views and indices. This implies to select an appropriate set of materialized views and indices, which minimizes total query response time, given a limited storage space. A judicious choice in this selection must be cost-driven and based on a workload which represents a set of users’ queries on the data warehouse. In this chapter, the authors address the issues related to the workload’s evolution and maintenance in data warehouse systems in response to new requirements modeling resulting from users’ personalized analysis needs. The main issue is to avoid the workload generation from scratch. Hence, they propose a workload management system which helps the administrator to maintain and adapt dynamically the workload according to changes arising on the data warehouse schema. To achieve this maintenance, the authors propose two types of workload updates: (1) maintaining existing queries consistent with respect to the new data warehouse schema and (2) creating new queries based on the new dimension hierarchy levels. Their system helps the administrator in adopting a pro-active behaviour in the management of the data warehouse performance. In order to validate their workload management system, the authors address the implementation issues of their proposed prototype. This latter has been developed within client/server architecture with a Web client interfaced with the Oracle 10g DataBase Management System.


Sign in / Sign up

Export Citation Format

Share Document