A Survey of Parallel and Distributed Data Warehouses

2010 ◽  
pp. 865-886
Author(s):  
Pedro Furtado

Data Warehouses are a crucial technology for current competitive organizations in the globalized world. Size, speed and distributed operation are major challenges concerning those systems. Many data warehouses have huge sizes and the requirement that queries be processed quickly and efficiently, so parallel solutions are deployed to render the necessary efficiency. Distributed operation, on the other hand, concerns global commercial and scientific organizations that need to share their data in a coherent distributed data warehouse. In this article we review the major concepts, systems and research results behind parallel and distributed data warehouses.

Author(s):  
Pedro Furtado

Data Warehouses are a crucial technology for current competitive organizations in the globalized world. Size, speed and distributed operation are major challenges concerning those systems. Many data warehouses have huge sizes and the requirement that queries be processed quickly and efficiently, so parallel solutions are deployed to render the necessary efficiency. Distributed operation, on the other hand, concerns global commercial and scientific organizations that need to share their data in a coherent distributed data warehouse. In this article we review the major concepts, systems and research results behind parallel and distributed data warehouses.


Author(s):  
Lars Frank ◽  
Christian Frank

A Star Schema Data Warehouse looks like a star with a central, so-called fact table, in the middle, surrounded by so-called dimension tables with one-to-many relationships to the central fact table. Dimensions are defined as dynamic or slowly changing if the attributes or relationships of a dimension can be updated. Aggregations of fact data to the level of the related dynamic dimensions might be misleading if the fact data are aggregated without considering the changes of the dimensions. In this chapter, we will first prove that the problems of SCD (Slowly Changing Dimensions) in a datawarehouse may be viewed as a special case of the read skew anomaly that may occur when different transactions access and update records without concurrency control. That is, we prove that aggregating fact data to the levels of a dynamic dimension should not make sense. On the other hand, we will also illustrate, by examples, that in some situations it does make sense that fact data is aggregated to the levels of a dynamic dimension. That is, it is the semantics of the data that determine whether historical dimension data should be preserved or destroyed. Even worse, we also illustrate that for some applications, we need a history preserving response, while for other applications at the same time need a history destroying response. Kimball et al., (2002), have described three classic solutions/responses to handling the aggregation problems caused by slowly changing dimensions. In this chapter, we will describe and evaluate four more responses of which one are new. This is important because all the responses have very different properties, and it is not possible to select a best solution without knowing the semantics of the data.


Author(s):  
Michel Schneider

Basically, the schema of a data warehouse lies on two kinds of elements: facts and dimensions. Facts are used to memorize measures about situations or events. Dimensions are used to analyse these measures, particularly through aggregation operations (counting, summation, average, etc.). To fix the ideas let us consider the analysis of the sales in a shop according to the product type and to the month in the year. Each sale of a product is a fact. One can characterize it by a quantity. One can calculate an aggregation function on the quantities of several facts. For example, one can make the sum of quantities sold for the product type “mineral water” during January in 2001, 2002 and 2003. Product type is a criterion of the dimension Product. Month and Year are criteria of the dimension Time. A quantity is so connected both with a type of product and with a month of one year. This type of connection concerns the organization of facts with regard to dimensions. On the other hand a month is connected to one year. This type of connection concerns the organization of criteria within a dimension. The possibilities of fact analysis depend on these two forms of connection and on the schema of the warehouse. This schema is chosen by the designer in accordance with the users needs.


Author(s):  
Rogério Luís de Carvalho Costa ◽  
Pedro Furtado

Globally accessible data warehouses are useful in many commercial and scientific organizations. For instance, research centers can be put together through a grid infrastructure in order to form a large virtual organization with a huge virtual data warehouse, which should be transparently and efficiently queried by grid participants. As it is frequent in the grid environment, in the Grid-based Data Warehouse one can both have resource constraints and establish Service Level Objectives (SLOs), providing some Quality of Service (QoS) differentiation for each group of users, participant organizations or requested operations. In this work, we discuss query scheduling and data placement in the grid-based data warehouse, proposing the use of QoS-aware strategies. There are some works on parallel and distributed data warehouses, but most do not concern the grid environment and those which do so, use best-effort oriented strategies. Our experimental results show the importance and effectiveness of proposed strategies.


Author(s):  
Michel Schneider

Basically, the schema of a data warehouse lies on two kinds of elements: facts and dimensions. Facts are used to memorize measures about situations or events. Dimensions are used to analyse these measures, particularly through aggregation operations (counting, summation, average, etc.). To fix the ideas let us consider the analysis of the sales in a shop according to the product type and to the month in the year. Each sale of a product is a fact. One can characterize it by a quantity. One can calculate an aggregation function on the quantities of several facts. For example, one can make the sum of quantities sold for the product type “mineral water” during January in 2001, 2002 and 2003. Product type is a criterion of the dimension Product. Month and Year are criteria of the dimension Time. A quantity is so connected both with a type of product and with a month of one year. This type of connection concerns the organization of facts with regard to dimensions. On the other hand a month is connected to one year. This type of connection concerns the organization of criteria within a dimension. The possibilities of fact analysis depend on these two forms of connection and on the schema of the warehouse. This schema is chosen by the designer in accordance with the users needs. Determining the schema of a data warehouse cannot be achieved without adequate modelling of dimensions and facts. In this article we present a general model for dimensions and facts and their relationships. This model will facilitate greatly the choice of the schema and its manipulation by the users.


2011 ◽  
pp. 901-920
Author(s):  
Rogério Luís de Carvalho Costa ◽  
Pedro Furtado

Globally accessible data warehouses are useful in many commercial and scientific organizations. For instance, research centers can be put together through a grid infrastructure in order to form a large virtual organization with a huge virtual data warehouse, which should be transparently and efficiently queried by grid participants. As it is frequent in the grid environment, in the Grid-based Data Warehouse one can both have resource constraints and establish Service Level Objectives (SLOs), providing some Quality of Service (QoS) differentiation for each group of users, participant organizations or requested operations. In this work, we discuss query scheduling and data placement in the grid-based data warehouse, proposing the use of QoS-aware strategies. There are some works on parallel and distributed data warehouses, but most do not concern the grid environment and those which do so, use best-effort oriented strategies. Our experimental results show the importance and effectiveness of proposed strategies.


2020 ◽  
Vol 21 (1) ◽  
pp. 49-55
Author(s):  
Nuryoto Nuryoto ◽  
Teguh Kurniawan ◽  
Indar Kustiningsih

ABSTRACTIndonesia has an abundant quantity of natural zeolites that have not yet been utilized maximally. On the other hand, fishpond farmers have a problem regarding the presence of ammonium in the fishpond water which will negatively impact to survival of fish, especially small fish. To solve this problem, this research was utilizing natural zeolite to degrade ammonium in the fishpond water. This research aimed to test mordenite natural zeolite from Bayah as an adsorbent to collaborate some variables impact to reach more maximal adsorption. The variables that were used to be observed were: mordenite natural zeolite from Bayah as an adsorbent which has been activated by 1-7 N H2SO4 and the other was without activation, ammonium concentration of 80-800 ppm, the particle size of adsorbent of 80 and 150 mesh, stirring speed of 600 and 800 rpm, and without stirring by duration adsorption time of 60 minutes. The research results showed that mordenite natural zeolite after activated was able to adsorb of 100% ammonium, while for the mordenite natural zeolite from Bayah without stirring was of 80%, by the same absorption time. These results will give significant benefits for fishpond farmers to increase their productivity because of the increase in fish survival.Keywords: adsorption, adsorbent, zeolite, amoniumABSTRAKKandungan zeolit alam di Indonesia cukup melimpah dan belum termanfaatkan secara maksimal. Pada sisi lain petani tambak dihadapkan pada masalah terdapatnya kandungan amonium di dalam air tambak, yang akan berdampak negatif bagi keberlangsungan hidup ikan, terutama ikan yang masih kecil. Penelitian ini mencoba memanfaatkan zeolit alam guna mendegradasi kandungan amonium dalam air tambak. Tujuan penelitian ini adalah melakukan pengujian terhadap zeolit alam mordenit dari Bayah sebagai adsorben, baik dilakukan dengan pengadukan maupun tanpa pengadukan, serta mengkolaborasi beberapa variabel yang berpengaruh agar hasil adsorpsi lebih maksimal. Observasi dilakukan dengan zeolit alam mordenit dari Bayah yang telah diaktivasi dengan 1-7 N H2SO4 maupun tanpa aktivasi, rentang konsentrasi larutan amonium 80-800 ppm, ukuran partikel adsorben 80 dan 150 mesh, kecepatan pengadukan 600 dan 800 rpm, dan tanpa pengadukan serta lamanya waktu penyerapan 60 menit. Hasil penelitian menunjukan hasil yang sangat baik, dan secara umum zeolit alam mordenit Bayah teraktivasi telah mampu melakukan adsorpsi amonium sebesar 100%, sedangkan untuk zeolit alam mordenit Bayah tanpa pengadukan sebesar 80% pada waktu adsorpsi yang sama.Kata kunci: adsorpsi, adsorben, zeolit, amonium


Sign in / Sign up

Export Citation Format

Share Document