Computation of OLAP Data Cubes

Author(s):  
Amin A. Abdulghani

The focus of online analytical processing (OLAP) is to provide a platform for analyzing data (e.g., sales data) with multiple dimensions (e.g., product, location, time) and multiple measures (e.g., total sales or total cost). OLAP operations then allow viewing of this data from a number of perspectives. For analysis, the object or data structure of primary interest in OLAP is a data cube. A detailed introduction to OLAP is presented in (Han & Kambler, 2006).

Author(s):  
Amin A. Abdulghani

The focus of Online Analytical Processing (OLAP) is to provide a platform for analyzing data (e.g., sales data) with multiple dimensions (e.g., product, location, time) and multiple measures (e.g., total sales or total cost). OLAP operations then allow viewing of this data from a number of perspectives. For analysis, the object or data structure of primary interest in OLAP is a cube.


2003 ◽  
pp. 200-221 ◽  
Author(s):  
Mirek Riedewald ◽  
Divyakant Agrawal ◽  
Amr El Abbadi

Data cubes are ubiquitous tools in data warehousing, online analytical processing, and decision support applications. Based on a selection of pre-computed and materialized aggregate values, they can dramatically speed up aggregation and summarization over large data collections. Traditionally, the emphasis has been on lowering query costs with little regard to maintenance, i.e., update cost issues. We argue that current trends require data cubes to be not only query-efficient, but also dynamic at the same time, and we also show how this can be achieved. Several array-based techniques with different tradeoffs between query and update cost are discussed in detail. We also survey selected approaches for sparse data and the popular data cube operator, CUBE. Moreover, this work includes an overview of future trends and their impact on data cubes.


Web Mining ◽  
2011 ◽  
pp. 189-207
Author(s):  
Lixin Fu

Currently, data classification is either performed on data stored in relational databases or performed on data stored in flat files. The problem with these approaches is that for large data sets, they often need multiple scans of the original data and thus are often infeasible in many applications. In this chapter we propose to deploy classification on top of OLAP (online analytical processing) and data cube systems. First, we compute the statistics in various combinations of the attributes known as data cubes. The statistics are then used to derive classification models. In this way, we only scan the original data once, which improves the performance of classification significantly. Furthermore, our new classifier will provide “free” classification by eliminating the dominating I/O overhead of scanning the massive original data. An architecture that integrates database, data cube, and data mining is given and three new cube-based classifiers are presented and evaluated.


2018 ◽  
Vol 28 (3) ◽  
pp. 346-349 ◽  
Author(s):  
Doris G Gammon ◽  
Todd Rogers ◽  
Ellen M Coats ◽  
James M Nonnemaker ◽  
Lisa Henriksen

ObjectiveAt least four varieties of little filtered cigars (LFCs) violate the US prohibition on flavoured cigarettes other than menthol. This study characterises the sales of prohibited products and other LFCs by flavour category and pack size, as well as the price of LFCs relative to cigarettes.MethodsUsing retail sales data for 2016, we computed the sales volume in dollars and equivalent units and the percentage of total sales by flavour and pack size for the USA by region and state. Paired t-tests compared the prices for LFCs and cigarettes sold in same-sized packs and cartons.ResultsLFC sales totalled 24 033 equivalent units per 100 000 persons in 2016. Flavoured LFC varieties accounted for almost half (47.5%) of the total sales. LFCs were sold in 12 different pack sizes, but 79.7% of sales were packs of 20. The price of 20-packs averaged $2.41 (SD=$1.49), which was significantly less than cigarettes (M=$5.90, SD=$0.85). Regional differences suggest a greater proportion of menthol/mint LFCs and lower prices in the South than in other regions.ConclusionClassifying all LFCs as cigarettes would require that they be offered in a minimum package of 20, eliminate flavoured varieties other than menthol and increase prices through applicable state and local cigarette taxes.


2021 ◽  
Vol 13 (23) ◽  
pp. 4807
Author(s):  
Martin Sudmanns ◽  
Hannah Augustin ◽  
Lucas van der Meer ◽  
Andrea Baraldi ◽  
Dirk Tiede

Big optical Earth observation (EO) data analytics usually start from numerical, sub-symbolic reflectance values that lack inherent semantic information (meaning) and require interpretation. However, interpretation is an ill-posed problem that is difficult for many users to solve. Our semantic EO data cube architecture aims to implement computer vision in EO data cubes as an explainable artificial intelligence approach. Automatic semantic enrichment provides semi-symbolic spectral categories for all observations as an initial interpretation of color information. Users graphically create knowledge-based semantic models in a convergence-of-evidence approach, where color information is modelled a-priori as one property of semantic concepts, such as land cover entities. This differs from other approaches that do not use a-priori knowledge and assume a direct 1:1 relationship between reflectance values and land cover. The semantic models are explainable, transferable, reusable, and users can share them in a knowledgebase. We provide insights into our web-based architecture, called Sen2Cube.at, including semantic enrichment, data models, knowledge engineering, semantic querying, and the graphical user interface. Our implemented prototype uses all Sentinel-2 MSI images covering Austria; however, the approach is transferable to other geographical regions and sensors. We demonstrate that explainable, knowledge-based big EO data analysis is possible via graphical semantic querying in EO data cubes.


Author(s):  
Rosine Cicchetti ◽  
Lotfi Lakhal ◽  
Sébastien Nedjar ◽  
Noël Novelli ◽  
Alain Casali

Datacubes are especially useful for answering efficiently queries on data warehouses. Nevertheless the amount of generated aggregated data is huge with respect to the initial data which is itself very large. Recent research work has addressed the issue of summarizing Datacubes in order to reduce their size. In this chapter, we present three different approaches. They propose structures which make it possible to reduce the size of the data cube representation. The two former, the closed cube and the quotient cube, are said semantic and discard the redundancies captured within data cubes. The size of the underlying representations is especially reduced but the counterpart is an additional response time when answering the OLAP queries. The latter approach is rather syntactic since it enforces an optimization at the logical level. It is called Partition Cube and based on the concept of partition. We also give an algorithm to compute it. We propose a Relational Partition Cube, a novel R-Olap cubing solution for managing Partition Cubes using the relational technology. An analytical evaluation shows that the storage space of Partition Cubes is smaller than Datacubes. In order to confirm analytical comparison, experiments are performed in order to compare our approach with Datacubes and with two of the best reduction methods, the Quotient Cube and the Closed Cube.


Author(s):  
Alfredo Cuzzocrea ◽  
Vincenzo Russo

The problem of ensuring the privacy and security of OLAP data cubes (Gray et al., 1997) arises in several fields ranging from advanced Data Warehousing (DW) and Business Intelligence (BI) systems to sophisticated Data Mining (DM) tools. In DW and BI systems, decision making analysts aim at avoiding that malicious users access perceptive ranges of multidimensional data in order to infer sensitive knowledge, or attack corporate data cubes via violating user rules, grants and revokes. In DM tools, domain experts aim at avoiding that malicious users infer critical-for-thetask knowledge from authoritative DM results such as frequent item sets, patterns and regularities, clusters, and discovered association rules. In more detail, the former application scenario (i.e., DW and BI systems) deals with both the privacy preservation and the security of data cubes, whereas the latter one (i.e., DM tools) deals with privacy preserving OLAP issues solely. With respect to security issues, although security aspects of information systems include a plethora of topics ranging from cryptography to access control and secure digital signature, in our work we particularly focus on access control techniques for data cubes, and remand the reader to the active literature for the other orthogonal matters. Specifically, privacy preservation of data cubes refers to the problem of ensuring the privacy of data cube cells (and, in turn, that of queries defined over collections of data cube cells), i.e. hiding sensitive information and knowledge during data management activities, according to the general guidelines drawn by Sweeney in her seminar paper (Sweeney, 2002), whereas access control issues refer to the problem of ensuring the security of data cube cells, i.e. restricting the access of unauthorized users to specific sub-domains of the target data cube, according to well-known concepts studied and assessed in the context of DBMS security. Nonetheless, it is quite straightforward foreseeing that these two even distinct aspects should be meaningfully integrated in order to ensure both the privacy and security of complex data cubes, i.e. data cubes built on top of complex data/knowledge bases. During last years, these topics have became of great interest for the Data Warehousing and Databases research communities, due to their exciting theoretical challenges as well as their relevance and practical impact in modern real-life OLAP systems and applications. On a more conceptual plane, theoretical aspects are mainly devoted to study how probability and statistics schemes as well as rule-based models can be applied in order to efficiently solve the above-introduced problems. On a more practical plane, researchers and practitioners aim at integrating convenient privacy preserving and security solutions within the core layers of commercial OLAP server platforms. Basically, to tackle deriving privacy preservation challenges in OLAP, researchers have proposed models and algorithms that can be roughly classified within two main classes: restriction-based techniques, and data perturbation techniques. First ones propose limiting the number of query kinds that can be posed against the target OLAP server. Second ones propose perturbing data cells by means of random noise at various levels, ranging from schemas to queries. On the other hand, access control solutions in OLAP are mainly inspired by the wide literature developed in the context of controlling accesses to DBMS, and try to adapt such schemes in order to control accesses to OLAP systems.


2008 ◽  
pp. 3176-3193
Author(s):  
Ying Chen ◽  
Frank Dehne ◽  
Todd Eavis ◽  
A. Rau-Chaplin

This paper presents an improved parallel method for generating ROLAP data cubes on a shared-nothing multiprocessor based on a novel optimized data partitioning technique. Since no shared disk is required, our method can be used for highly scalable processor clusters consisting of standard PCs with local disks only, connected via a data switch. Experiments show that our improved parallel method provides optimal, linear, speedup for at least 32 processors. The approach taken, which uses a ROLAP representation of the data cube, is well suited for large data warehouses and high dimensional data, and supports the generation of both fully materialized and partially materialized data cubes.


Data ◽  
2019 ◽  
Vol 4 (3) ◽  
pp. 94 ◽  
Author(s):  
Steve Kopp ◽  
Peter Becker ◽  
Abhijit Doshi ◽  
Dawn J. Wright ◽  
Kaixi Zhang ◽  
...  

Earth observation imagery have traditionally been expensive, difficult to find and access, and required specialized skills and software to transform imagery into actionable information. This has limited adoption by the broader science community. Changes in cost of imagery and changes in computing technology over the last decade have enabled a new approach for how to organize, analyze, and share Earth observation imagery, broadly referred to as a data cube. The vision and promise of image data cubes is to lower these hurdles and expand the user community by making analysis ready data readily accessible and providing modern approaches to more easily analyze and visualize the data, empowering a larger community of users to improve their knowledge of place and make better informed decisions. Image data cubes are large collections of temporal, multivariate datasets typically consisting of analysis ready multispectral Earth observation data. Several flavors and variations of data cubes have emerged. To simplify access for end users we developed a flexible approach supporting multiple data cube styles, referencing images in their existing structure and storage location, enabling fast access, visualization, and analysis from a wide variety of web and desktop applications. We provide here an overview of that approach and three case studies.


Author(s):  
Maurizio Rafanelli

The term multidimensional aggregate data (MAD; see Rafanelli, 2003) generally refers to data in which a given fact is quantified by a set of measures obtained applying one more or less complex aggregative function (count, sum, average, percent, etc.) to row data, measures that are characterized by a set of variables, called dimensions. MAD can be modeled by different representations, depending on the application field which uses them. For example, some years ago this term referred essentially to statistical data, that is, data whose use is essentially of socio-economic analysis. Recently, the metaphor of the data cube was taken up again and used for new applications, such as On-Line Analytical Processing (OLAP), which refer to aggregate and non aggregate data for business analysis.


Sign in / Sign up

Export Citation Format

Share Document