scholarly journals A Data Cube Metamodel for Geographic Analysis Involving Heterogeneous Dimensions

2021 ◽  
Vol 10 (2) ◽  
pp. 87
Author(s):  
Jean-Paul Kasprzyk ◽  
Guénaël Devillet

Due to their multiple sources and structures, big spatial data require adapted tools to be efficiently collected, summarized and analyzed. For this purpose, data are archived in data warehouses and explored by spatial online analytical processing (SOLAP) through dynamic maps, charts and tables. Data are thus converted in data cubes characterized by a multidimensional structure on which exploration is based. However, multiple sources often lead to several data cubes defined by heterogeneous dimensions. In particular, dimensions definition can change depending on analyzed scale, territory and time. In order to consider these three issues specific to geographic analysis, this research proposes an original data cube metamodel defined in unified modeling language (UML). Based on concepts like common dimension levels and metadimensions, the metamodel can instantiate constellations of heterogeneous data cubes allowing SOLAP to perform multiscale, multi-territory and time analysis. Afterwards, the metamodel is implemented in a relational data warehouse and validated by an operational tool designed for a social economy case study. This tool, called “Racines”, gathers and compares multidimensional data about social economy business in Belgium and France through interactive cross-border maps, charts and reports. Thanks to the metamodel, users remain independent from IT specialists regarding data exploration and integration.

2021 ◽  
Vol 73 (4) ◽  
pp. 1036-1047
Author(s):  
Felipe Menino Carlos ◽  
Vitor Conrado Faria Gomes ◽  
Gilberto Ribeiro de Queiroz ◽  
Felipe Carvalho de Souza ◽  
Karine Reis Ferreira ◽  
...  

The potential to perform spatiotemporal analysis of the Earth's surface, fostered by a large amount of Earth Observation (EO) open data provided by space agencies, brings new perspectives to create innovative applications. Nevertheless, these big datasets pose some challenges regarding storage and analytical processing capabilities. The organization of these datasets as multidimensional data cubes represents the state-of-the-art in analysis-ready data regarding information extraction. EO data cubes can be defined as a set of time-series images associated with spatially aligned pixels along the temporal dimension. Some key technologies have been developed to take advantage of the data cube power. The Open Data Cube (ODC) framework and the Brazil Data Cube (BDC) platform provide capabilities to access and analyze EO data cubes. This paper introduces two new tools to facilitate the creation of land use and land over (LULC) maps using EO data cubes and Machine Learning techniques, and both built on top of ODC and BDC technologies. The first tool is a module that extends the ODC framework capabilities to lower the barriers to use Machine Learning (ML) algorithms with EO data. The second tool relies on integrating the R package named Satellite Image Time Series (sits) with ODC to enable the use of the data managed by the framework. Finally, water mask classification and LULC mapping applications are presented to demonstrate the processing capabilities of the tools.


Author(s):  
Alfredo Cuzzocrea

OnLine Analytical Processing (OLAP) research issues (Gray, Chaudhuri, Bosworth, Layman, Reichart & Venkatrao, 1997) such as data cube modeling, representation, indexing and management have traditionally attracted a lot of attention from the Data Warehousing research community. In this respect, as a fundamental issue, the problem of efficiently compressing the data cube plays a leading role, and it has involved a vibrant research activity in the academic as well as industrial world during the last fifteen years. Basically, this problem consists in dealing with massive-in-size data cubes that make access and query evaluation costs prohibitive. The widely accepted solution consists in generating a compressed representation of the target data cube, with the goal of reducing its size (given an input space bound) while admitting loss of information and approximation that are considered irrelevant for OLAP analysis goals (e.g., see (Cuzzocrea, 2005a)). Compressed representations are also referred-in-literature under the term “synopsis data structures”, i.e. succinct representations of original data cubes introducing a limited loss of information. Benefits deriving from the data cube compression approach can be summarized in a relevant reduction of computational overheads needed to both represent the data cube and evaluate resource-intensive OLAP queries, which constitute a wide query class allowing us to extract useful knowledge from huge amounts of multidimensional data repositories in the vest of summarized information (e.g., aggregate information based on popular SQL aggregate operators such as SUM, COUNT, AVG etc), otherwise infeasible by means of traditional OLTP approaches. Among such queries, we recall: (i) range-queries, which extract a sub-cube bounded by a given range; (ii) top-k queries, which extract the k data cells (such that k is an input parameter) having the highest aggregate values; (iii) iceberg queries, which extract the data cells having aggregate values above a given threshold. This evidence has given raise to the proliferation of a number of Approximate Query Answering (AQA) techniques, which, based on data cube compression, aim at providing approximate answers to resource-intensive OLAP queries instead of computing exact answers, as decimal precision is usually negligible in OLAP query and report activities (e.g., see (Cuzzocrea, 2005a)). Starting from these considerations, in this article we first provide a comprehensive, rigorous survey of main data cube compression techniques, i.e. histograms, wavelets, and sampling. Then, we complete our analytical contribution with a detailed theoretical review focused on spatio-temporal complexities of these techniques.


2003 ◽  
pp. 200-221 ◽  
Author(s):  
Mirek Riedewald ◽  
Divyakant Agrawal ◽  
Amr El Abbadi

Data cubes are ubiquitous tools in data warehousing, online analytical processing, and decision support applications. Based on a selection of pre-computed and materialized aggregate values, they can dramatically speed up aggregation and summarization over large data collections. Traditionally, the emphasis has been on lowering query costs with little regard to maintenance, i.e., update cost issues. We argue that current trends require data cubes to be not only query-efficient, but also dynamic at the same time, and we also show how this can be achieved. Several array-based techniques with different tradeoffs between query and update cost are discussed in detail. We also survey selected approaches for sparse data and the popular data cube operator, CUBE. Moreover, this work includes an overview of future trends and their impact on data cubes.


2021 ◽  
Author(s):  
Mihal Miu ◽  
Xiaokun Zhang ◽  
M. Ali Akber Dewan ◽  
Junye Wang

Geospatial information plays an important role in environmental modelling, resource management, business operations, and government policy. However, very little or no commonality between formats of various geospatial data has led to difficulties in utilizing the available geospatial information. These disparate data sources must be aggregated before further extraction and analysis may be performed. The objective of this paper is to develop a framework called PlaniSphere, which aggregates various geospatial datasets, synthesizes raw data, and allows for third party customizations of the software. PlaniSphere uses NASA World Wind to access remote data and map servers using Web Map Service (WMS) as the underlying protocol that supports service-oriented architecture (SOA). The results show that PlaniSphere can aggregate and parses files that reside in local storage and conforms to the following formats: GeoTIFF, ESRI shape files, and KML. Spatial data retrieved using WMS from the Internet can create geospatial data sets (map data) from multiple sources, regardless of who the data providers are. The plug-in function of this framework can be expanded for wider uses, such as aggregating and fusing geospatial data from different data sources, by providing customizations to serve future uses, which the capacity of the commercial ESRI ArcGIS software is limited to add libraries and tools due to its closed-source architectures and proprietary data structures. Analysis and increasing availability of geo-referenced data may provide an effective way to manage spatial information by using large-scale storage, multidimensional data management, and Online Analytical Processing (OLAP) capabilities in one system.


Web Mining ◽  
2011 ◽  
pp. 189-207
Author(s):  
Lixin Fu

Currently, data classification is either performed on data stored in relational databases or performed on data stored in flat files. The problem with these approaches is that for large data sets, they often need multiple scans of the original data and thus are often infeasible in many applications. In this chapter we propose to deploy classification on top of OLAP (online analytical processing) and data cube systems. First, we compute the statistics in various combinations of the attributes known as data cubes. The statistics are then used to derive classification models. In this way, we only scan the original data once, which improves the performance of classification significantly. Furthermore, our new classifier will provide “free” classification by eliminating the dominating I/O overhead of scanning the massive original data. An architecture that integrates database, data cube, and data mining is given and three new cube-based classifiers are presented and evaluated.


Author(s):  
Alfredo Cuzzocrea

Data Warehousing (DW) systems store materialized views, data marts and data cubes, and provide nicely data exploration and analysis interfaces via OnLine Analytical Processing (OLAP) (Gray et al., 1997) and Data Mining (DM) tools and algorithms. Also, OnLine Analytical Mining (OLAM) (Han, 1997) integrates the previous knowledge discovery methodologies and offers a meaningfully convergence between OLAP and DM, thus contributing to significantly augment the power of data exploration and analysis capabilities of knowledge workers. At the storage layer, the mentioned knowledge discovery methodologies share the problem of efficiently accessing, querying and processing multidimensional data, which in turn heavily affect the performance of knowledge discovery processes at the application layer. Due to the fact that OLAP and OLAM directly process data cubes/marts, and DM is more and more encompassing methodologies that are interested to multidimensional data, the problem of efficiently representing data cubes by means of a meaningfully selected view set is become of relevant interest for the Data Warehousing and OLAP research community. This problem is directly related to the analogous problem of efficiently computing the data cube from a given relational data source (Harinarayan et al., 1996; Agarwal et al., 1996; Sarawagi et al., 1996; Zhao et al., 1997). Given a relational data source R and a target data cube schema W, the view selection problem in OLAP deals with how to select and materialize views from R in order to compute the data cube A defined by the schema W by optimizing both the query processing time, denoted by TQ, which models the amount of time required to answer a reference query-workload on the materialized view set, and the view maintenance time, denoted by TM, which models the amount of time required to maintain the materialized view set when updates occur, under a given set of constraints I that, without any loss of generality, can be represented by a space bound constraint B limiting the overall occupancy of the views to be materialized (i.e., I = ). It has been demonstrated (Gupta, 1997; Gupta & Mumick, 2005) that this problem is NP-hard, thus heuristic schemes are necessary. Heuristics are, in turn, implemented in the vest of greedy algorithms (Yang et al., 1997; Kalnis et al., 2002). In this article, we focus the attention on state-ofthe- art methods for the view selection problem in Data Warehousing and OLAP, and complete our analytical contribution with a theoretical analysis of these proposals under different selected properties that nicely model spatial and temporal complexity aspects of the investigated problem.


Semantic Web ◽  
2021 ◽  
pp. 1-35
Author(s):  
Nurefşan Gür ◽  
Torben Bach Pedersen ◽  
Katja Hose ◽  
Mikael Midtgaard

Large volumes of spatial data and multidimensional data are being published on the Semantic Web, which has led to new opportunities for advanced analysis, such as Spatial Online Analytical Processing (SOLAP). The RDF Data Cube (QB) and QB4OLAP vocabularies have been widely used for annotating and publishing statistical and multidimensional RDF data. Although such statistical data sets might have spatial information, such as coordinates, the lack of spatial semantics and spatial multidimensional concepts in QB4OLAP and QB prevents users from employing SOLAP queries over spatial data using SPARQL. The QB4SOLAP vocabulary, on the other hand, fully supports annotating spatial and multidimensional data on the Semantic Web and enables users to query endpoints with SOLAP operators in SPARQL. To bridge the gap between QB/QB4OLAP and QB4SOLAP, we propose an RDF2SOLAP enrichment model that automatically annotates spatial multidimensional concepts with QB4SOLAP and in doing so enables SOLAP on existing QB and QB4OLAP data on the Semantic Web. Furthermore, we present and evaluate a wide range of enrichment algorithms and apply them on a non-trivial real-world use case involving governmental open data with complex geometry types.


2016 ◽  
Vol 15 (02) ◽  
pp. 1650022
Author(s):  
Fatima Zahra Salmam ◽  
Mohamed Fakir ◽  
Rahhal Errattahi

Online analytical processing (OLAP) provides tools to explore data cubes in order to extract the interesting information, it refers to techniques used to query, visualise and synthesise the multidimensional data. Nevertheless OLAP is limited on visualisation, structuring and exploring manually the data cubes. On the other side, data mining allows algorithms that offer automatic knowledge extraction, such as classification, explanation and prediction algorithms. However, OLAP is not capable of explaining and predicting events from existing data; therefore, it is possible to make a more efficient online analysis by coupling data mining and OLAP to allow the user to assist in this new task of knowledge extraction. In this paper, we will carry on within works achieved in this theme and we suggest to extend the abilities of OLAP to prediction (enhancing the OLAP abilities and techniques by introducing a predictive model based on a data mining algorithms). The model is calculated on the aggregated data, and prediction is done on detailed missing data. Our approach is based on regression trees and neural networks; it consists to predict facts having a missed measures value in the data cubes. The user will have in his disposition, a new platform called PredCube, that offers the possibility to query, visualise and synthesise the multidimensional data, and also to predict missing values in the data cube using three data mining methods, and evaluate the quality of the prediction by comparing the average error and the execution time given by each one.


2021 ◽  
Author(s):  
Mihal Miu ◽  
Xiaokun Zhang ◽  
M. Ali Akber Dewan ◽  
Junye Wang

Geospatial information plays an important role in environmental modelling, resource management, business operations, and government policy. However, very little or no commonality between formats of various geospatial data has led to difficulties in utilizing the available geospatial information. These disparate data sources must be aggregated before further extraction and analysis may be performed. The objective of this paper is to develop a framework called PlaniSphere, which aggregates various geospatial datasets, synthesizes raw data, and allows for third party customizations of the software. PlaniSphere uses NASA World Wind to access remote data and map servers using Web Map Service (WMS) as the underlying protocol that supports service-oriented architecture (SOA). The results show that PlaniSphere can aggregate and parses files that reside in local storage and conforms to the following formats: GeoTIFF, ESRI shape files, and KML. Spatial data retrieved using WMS from the Internet can create geospatial data sets (map data) from multiple sources, regardless of who the data providers are. The plug-in function of this framework can be expanded for wider uses, such as aggregating and fusing geospatial data from different data sources, by providing customizations to serve future uses, which the capacity of the commercial ESRI ArcGIS software is limited to add libraries and tools due to its closed-source architectures and proprietary data structures. Analysis and increasing availability of geo-referenced data may provide an effective way to manage spatial information by using large-scale storage, multidimensional data management, and Online Analytical Processing (OLAP) capabilities in one system.


2011 ◽  
pp. 124-148
Author(s):  
Yeow Wei Choong ◽  
Anne Laurent ◽  
Dominique Laurent

In the context of multidimensional data, OLAP tools are appropriate for the navigation in the data, aiming at discovering pertinent and abstract knowledge. However, due to the size of the data set, a systematic and exhaustive exploration is not feasible. Therefore, the problem is to design automatic tools to ease the navigation in the data and their visualization. In this chapter, we present a novel approach allowing to build automatically blocks of similar values in a given data cube that are meant to summarize the content of the cube. Our method is based on a levelwise algorithm (a la Apriori) whose complexity is shown to be polynomial in the number of scans of the data cube. The experiments reported in the chapter show that our approach is scalable, in particular in the case where the measure values present in the data cube are discretized using crisp or fuzzy partitions.


Sign in / Sign up

Export Citation Format

Share Document