The Star Schema Benchmark and Augmented Fact Table Indexing

Author(s):  
Patrick O’Neil ◽  
Elizabeth O’Neil ◽  
Xuedong Chen ◽  
Stephen Revilak
Keyword(s):  
2021 ◽  
pp. 115226
Author(s):  
Non Sanprasit ◽  
Katechan Jampachaisri ◽  
Taravichet Titijaroonroj ◽  
Kraisak Kesorn

2003 ◽  
Vol 12 (03) ◽  
pp. 325-363 ◽  
Author(s):  
Joseph Fong ◽  
Qing Li ◽  
Shi-Ming Huang

Data warehouse contains vast amount of data to support complex queries of various Decision Support Systems (DSSs). It needs to store materialized views of data, which must be available consistently and instantaneously. Using a frame metadata model, this paper presents an architecture of a universal data warehousing with different data models. The frame metadata model represents the metadata of a data warehouse, which structures an application domain into classes, and integrates schemas of heterogeneous databases by capturing their semantics. A star schema is derived from user requirements based on the integrated schema, catalogued in the metadata, which stores the schema of relational database (RDB) and object-oriented database (OODB). Data materialization between RDB and OODB is achieved by unloading source database into sequential file and reloading into target database, through which an object relational view can be defined so as to allow the users to obtain the same warehouse view in different data models simultaneously. We describe our procedures of building the relational view of star schema by multidimensional SQL query, and the object oriented view of the data warehouse by Online Analytical Processing (OLAP) through method call, derived from the integrated schema. To validate our work, an application prototype system has been developed in a product sales data warehousing domain based on this approach.


Author(s):  
ANDREIA SILVA ◽  
CLÁUDIA ANTUNES

Traditional data mining approaches look for patterns in a single table, while multi-relational data mining aims for identifying patterns that involve multiple tables. In recent years, the most common mining techniques have been extended to the multi-relational context, but there are few dedicated to deal with data stored following the multi-dimensional model, in particular the star schema. These schemas are composed of a central huge fact table linking a set of small dimension tables. Joining all the tables before mining may not be a feasible solution due to the usual massive number of records. This work proposes a method for mining frequent patterns on data following a star schema that does not materialize the join between the tables. As it extends the algorithm FP-Growth, it constructs an FP-Tree for each dimension and then combines them through the records in the fact table to form a super FP-Tree. This tree is then mined with FP-growth to find all frequent patterns. The paper presents a case study on bibliographic data, comparing efficiency and scalability of our algorithm against FP-Growth.


2020 ◽  
Vol 20 (2) ◽  
pp. 129-132
Author(s):  
Vugar Abdullayev ◽  
N.A. Ragimova N.A ◽  
V.H Abdullayev ◽  
T.K Askerov

The objects of the research are tools that support the description and analytical processing of environmental data requests. These tools are used for environmental monitoring. Analytical processing of environmental data is necessary for this monitoring by the persons concerned. Here, a star schema is used to describe the data. Analytical data processing tools are required for analysis and research of environmental data. The results of analytical processing of environmental data are used to speed up decision-making. This article also describes the structure of the analytical data processing tool. Therefore, one of the problem points is how to describe the data. For this purpose, an environmental data relay scheme is defined, and the data description is implemented in multidimensional cubes. Due to the growth of data volume, data processing is carried out using multi-dimensional visualization methods. In addition, a visual user interface has been created for analytically processing queries based on scale data. The result of this research is to find a method for describing environmental data. At the end of the research, a hypercube was obtained, with the help of which it was possible to structure environmental data and carry out analytical processing of them. To this end, environmental data have been described using a multi-dimensional visualization method. And OLAP technologies were used to carry out analytical processing of this data. OLAP technologies allow aggregate data to be used and presented as a hypercube. The results of the research can be used as a basis for an environmental information system that is used for environmental monitoring.


2017 ◽  
Vol 10 (04) ◽  
pp. 745-754
Author(s):  
Mudasir M Kirmani

Data Warehouse design requires a radical rebuilding of tremendous measures of information, frequently of questionable or conflicting quality, drawn from various heterogeneous sources. Data Warehouse configuration assimilates business learning and innovation know-how. The outline of theData Warehouse requires a profound comprehension of the business forms in detail. The principle point of this exploration paper is to contemplate and investigate the transformation model to change over the E-R outlines to Star Schema for developing Data Warehouses. The Dimensional modelling is a logical design technique used for data warehouses. This research paper addresses various potential differences between the two techniques and highlights the advantages of using dimensional modelling along with disadvantages as well. Dimensional Modelling is one of the popular techniques for databases that are designed keeping in mind the queries from end-user in a data warehouse. In this paper the focus has been on Star Schema, which basically comprises of Fact table and Dimension tables. Each fact table further comprises of foreign keys of various dimensions and measures and degenerate dimensions if any. We also discuss the possibilities of deployment and acceptance of Conversion Model (CM) to provide the details of fact table and dimension tables according to the local needs. It will also highlight to why dimensional modelling is preferred over E-R modelling when creating data warehouse.


Author(s):  
Claudivan Cruz Lopes ◽  
Valéria Cesário-Times ◽  
Stan Matwin ◽  
Cristina Dutra de Aguiar Ciferri ◽  
Ricardo Rodrigues Ciferri

A cloud data warehouse (cloud DW) is a subject-oriented, integrated, time-variant, voluminous, nonvolatile and multidimensional distributed database that is hosted in a cloud. A solution to ensure data confidentiality for a cloud DW is cryptography. In this article, the authors propose an encryption methodology for a cloud DW stored according to the star schema, considering both the data confidentiality maintenance of the DW and the capability of processing analytical queries directly over the encrypted DW. The proposed encryption methodology comprises an encryption strategy for DW called MV-HO (MultiValued and HOmomorphic) for the definition of how the different types of DW's attributes must be encrypted. The proposed MV-HO encryption strategy was compared with encryption strategies based on symmetric encryption, order preserving symmetric encryption and homomorphic encryption. Results indicated that MV-HO is the best solution found, as MV-HO is pareto-optimal with respect to other strategies investigated.


Author(s):  
Lars Frank ◽  
Christian Frank

A Star Schema Data Warehouse looks like a star with a central, so-called fact table, in the middle, surrounded by so-called dimension tables with one-to-many relationships to the central fact table. Dimensions are defined as dynamic or slowly changing if the attributes or relationships of a dimension can be updated. Aggregations of fact data to the level of the related dynamic dimensions might be misleading if the fact data are aggregated without considering the changes of the dimensions. In this chapter, we will first prove that the problems of SCD (Slowly Changing Dimensions) in a datawarehouse may be viewed as a special case of the read skew anomaly that may occur when different transactions access and update records without concurrency control. That is, we prove that aggregating fact data to the levels of a dynamic dimension should not make sense. On the other hand, we will also illustrate, by examples, that in some situations it does make sense that fact data is aggregated to the levels of a dynamic dimension. That is, it is the semantics of the data that determine whether historical dimension data should be preserved or destroyed. Even worse, we also illustrate that for some applications, we need a history preserving response, while for other applications at the same time need a history destroying response. Kimball et al., (2002), have described three classic solutions/responses to handling the aggregation problems caused by slowly changing dimensions. In this chapter, we will describe and evaluate four more responses of which one are new. This is important because all the responses have very different properties, and it is not possible to select a best solution without knowing the semantics of the data.


Sign in / Sign up

Export Citation Format

Share Document