Storage Strategies in Data Warehouses

Author(s):  
Xinjian Lu

A data warehouse stores and manages historical data for on-line analytical processing, rather than for on-line transactional processing. Data warehouses with sizes ranging from gigabytes to terabytes are common, and they are much larger than operational databases. Data warehouse users tend to be more interested in identifying business trends rather than individual values. Queries for identifying business trends are called analytical queries. These queries invariably require data aggregation, usually according to many different groupings. Analytical queries are thus much more complex than transactional ones. The complexity of analytical queries combined with the immense size of data can easily result in unacceptably long response times. Effective approaches to improving query performance are crucial to a proper physical design of data warehouses.

Author(s):  
Edgard Benítez-Guerrero ◽  
Ericka-Janet Rechy-Ramírez

A Data Warehouse (DW) is a collection of historical data, built by gathering and integrating data from several sources, which supports decisionmaking processes (Inmon, 1992). On-Line Analytical Processing (OLAP) applications provide users with a multidimensional view of the DW and the tools to manipulate it (Codd, 1993). In this view, a DW is seen as a set of dimensions and cubes (Torlone, 2003). A dimension represents a business perspective under which data analysis is performed and organized in a hierarchy of levels that correspond to different ways to group its elements (e.g., the Time dimension is organized as a hierarchy involving days at the lower level and months and years at higher levels). A cube represents factual data on which the analysis is focused and associates measures (e.g., in a store chain, a measure is the quantity of products sold) with coordinates defined over a set of dimension levels (e.g., product, store, and day of sale). Interrogation is then aimed at aggregating measures at various levels. DWs are often implemented using multidimensional or relational DBMSs. Multidimensional systems directly support the multidimensional data model, while a relational implementation typically employs star schemas(or variations thereof), where a fact table containing the measures references a set of dimension tables.


Author(s):  
Kornelije Rabuzin

This chapter presents the concept of “deductive data warehouses.” Deductive data warehouses rely on deductive databases but use a data warehouse in the background instead of a database. The authors show how Datalog, as a logic programming language, can be used to perform on-line analytical processing (OLAP) analysis on data. For that purpose, a small data warehouse has been implemented. Furthermore, they propose and briefly discuss “Datalog by example” as a visual front-end tool for posing Datalog queries to deductive data warehouses.


Author(s):  
Jérôme Darmont

Performance evaluation is a key issue for designers and users of Database Management Systems (DBMSs). Performance is generally assessed with software benchmarks that help, for example test architectural choices, compare different technologies, or tune a system. In the particular context of data warehousing and On-Line Analytical Processing (OLAP), although the Transaction Processing Performance Council (TPC) aims at issuing standard decision-support benchmarks, few benchmarks do actually exist. We present in this chapter the Data Warehouse Engineering Benchmark (DWEB), which allows generating various ad-hoc synthetic data warehouses and workloads. DWEB is fully parameterized to fulfill various data warehouse design needs. However, two levels of parameterization keep it relatively easy to tune. We also expand on our previous work on DWEB by presenting its new Extract, Transform, and Load (ETL) feature, as well as its new execution protocol. A Java implementation of DWEB is freely available online, which can be interfaced with most existing relational DMBSs. To the best of our knowledge, DWEB is the only easily available, up-to-date benchmark for data warehouses.


Author(s):  
Harkiran Kaur ◽  
Kawaljeet Singh ◽  
Tejinder Kaur

Background: Numerous E – Migrants databases assist the migrants to locate their peers in various countries; hence contributing largely in communication of migrants, staying overseas. Presently, these traditional E – Migrants databases face the issues of non – scalability, difficult search mechanisms and burdensome information update routines. Furthermore, analysis of migrants’ profiles in these databases has remained unhandled till date and hence do not generate any knowledge. Objective: To design and develop an efficient and multidimensional knowledge discovery framework for E - Migrants databases. Method: In the proposed technique, results of complex calculations related to most probable On-Line Analytical Processing operations required by end users, are stored in the form of Decision Trees, at the pre- processing stage of data analysis. While browsing the Cube, these pre-computed results are called; thus offering Dynamic Cubing feature to end users at runtime. This data-tuning step reduces the query processing time and increases efficiency of required data warehouse operations. Results: Experiments conducted with Data Warehouse of around 1000 migrants’ profiles confirm the knowledge discovery power of this proposal. Using the proposed methodology, authors have designed a framework efficient enough to incorporate the amendments made in the E – Migrants Data Warehouse systems on regular intervals, which was totally missing in the traditional E – Migrants databases. Conclusion: The proposed methodology facilitate migrants to generate dynamic knowledge and visualize it in the form of dynamic cubes. Applying Business Intelligence mechanisms, blending it with tuned OLAP operations, the authors have managed to transform traditional datasets into intelligent migrants Data Warehouse.


2010 ◽  
Vol 2 (1) ◽  
pp. 99-116
Author(s):  
Katarzyna Rostek

Data Analytical Processing in Data Warehouses The article presents issues connected with processing information from data warehouses (the analytical enterprise databases) and two basic types of analytical data processing in data warehouse. The genesis, main definitions, scope of application and real examples from business implementations will be described for each type of analysis. There will be presented copyrighted method of knowledge discovering in databases, together with practical guidelines for its proper and effective use in the enterprise.


Author(s):  
Alysson Bolognesi Prado ◽  
Carmen Freitas ◽  
Thiago Ricardo Sbrici

In the growing challenge of managing people, Human Resources need effective artifacts to support decision making. On Line Analytical Processing is intended to make business information available for managers, and HR departments can now encompass this technology. This paper describes a project in which the authors built a Data Warehouse containing actual Human Resource data. This paper provides data models and shows their use through OLAP software and their presentation to end-users using a web portal. The authors also discuss the progress, and some obstacles of the project, from the IT staff’s viewpoint.


Author(s):  
Johann Eder ◽  
Karl Wiggisser

Data Warehouses typically are building blocks of decision support systems in companies and public administration. The data contained in a data warehouse is analyzed by means of OnLine Analytical Processing tools, which provide sophisticated features for aggregating and comparing data. Decision support applications depend on the reliability and accuracy of the contained data. Typically, a data warehouse does not only comprise the current snapshot data but also historical data to enable, for instance, analysis over several years. And, as we live in a changing world, one criterion for the reliability and accuracy of the results of such long period queries is their comparability. Whereas data warehouse systems are well prepared for changes in the transactional data, they are, surprisingly, not able to deal with changes in the master data. Nonetheless, such changes do frequently occur. The crucial point for supporting changes is, first of all, being aware of their existence. Second, once you know that a change took place, it is important to know which change (i.e., knowing about differences between versions and relations between the elements of different versions). For data warehouses this means that changes are identified and represented, validity of data and structures are recorded and this knowledge is used for computing correct results for OLAP queries. This chapter is intended to motivate the need for powerful maintenance mechanisms for data warehouse cubes. It presents some basic terms and definitions for the common understanding and introduces the different aspects of data warehouse maintenance. Furthermore, several approaches addressing the problem are presented and classified by their capabilities.


Author(s):  
Kheri Arionadi Shobirin ◽  
Adi Panca Saputra Iskandar ◽  
Ida Bagus Alit Swamardika

A data warehouse are central repositories of integrated data from one or more disparate sources from operational data in On-Line Transaction Processing (OLTP) system to use in decision making strategy and business intelligent using On-Line Analytical Processing (OLAP) techniques. Data warehouses support OLAP applications by storing and maintaining data in multidimensional format. Multidimensional data models as an integral part of OLAP designed to solve complex query analysis in real time.


2015 ◽  
Vol 5 (3) ◽  
pp. 1-25 ◽  
Author(s):  
Biri Arun ◽  
T.V. Vijay Kumar

Data warehouse was designed to cater to the strategic decision making needs of an organization. Most queries posed on them are on-line analytical queries, which are complex and computation intensive in nature and have high query response times when processed against a large data warehouse. This time can be substantially reduced by materializing pre-computed summarized views and storing them in a data warehouse. All possible views cannot be materialized due to storage space constraints. Also, an optimal selection of subsets of views is shown to be an NP-Complete problem. This problem of view selection has been addressed in this paper by selecting a beneficial set of views, from amongst all possible views, using the swarm intelligence technique Marriage in Honey Bees Optimization (MBO). An MBO based view selection algorithm (MBOVSA), which aims to select views that incur the minimum total cost of evaluating all the views (TVEC), is proposed. In MBOVSA, the search has been intensified by incorporating the royal jelly feeding phase into MBO. MBOVSA, when compared with the most fundamental greedy based view selection algorithm HRUA, is able to select comparatively better quality views.


Sign in / Sign up

Export Citation Format

Share Document