A Methodology for Building XML Data Warehouses

XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this chapter, the authors present two such techniques. First, they propose an XML join index that is specifically adapted to the multidimensional architecture of XML warehouses. It eliminates join operations while preserving the information contained in the original warehouse. Second, the authors present a strategy for selecting XML materialized views by clustering the query workload. To validate these proposals, the authors measure the response time of a set of decision-support XQueries over an XML data warehouse, with and without using their optimization techniques. The authors’ experimental results demonstrate their efficiency, even when queries are complex and data are voluminous.

Download Full-text

Analyses and Evaluation of Responses to Slowly Changing Dimensions in Data Warehouses

Progressive Methods in Data Warehousing and Business Intelligence ◽

10.4018/978-1-60566-232-9.ch016 ◽

2011 ◽

pp. 324-337

Author(s):

Lars Frank ◽

Christian Frank

Keyword(s):

Data Warehouse ◽

Concurrency Control ◽

The Other ◽

Data Warehouses ◽

Other Hand ◽

Star Schema ◽

Dynamic Dimension ◽

Special Case

A Star Schema Data Warehouse looks like a star with a central, so-called fact table, in the middle, surrounded by so-called dimension tables with one-to-many relationships to the central fact table. Dimensions are defined as dynamic or slowly changing if the attributes or relationships of a dimension can be updated. Aggregations of fact data to the level of the related dynamic dimensions might be misleading if the fact data are aggregated without considering the changes of the dimensions. In this chapter, we will first prove that the problems of SCD (Slowly Changing Dimensions) in a datawarehouse may be viewed as a special case of the read skew anomaly that may occur when different transactions access and update records without concurrency control. That is, we prove that aggregating fact data to the levels of a dynamic dimension should not make sense. On the other hand, we will also illustrate, by examples, that in some situations it does make sense that fact data is aggregated to the levels of a dynamic dimension. That is, it is the semantics of the data that determine whether historical dimension data should be preserved or destroyed. Even worse, we also illustrate that for some applications, we need a history preserving response, while for other applications at the same time need a history destroying response. Kimball et al., (2002), have described three classic solutions/responses to handling the aggregation problems caused by slowly changing dimensions. In this chapter, we will describe and evaluate four more responses of which one are new. This is important because all the responses have very different properties, and it is not possible to select a best solution without knowing the semantics of the data.

Download Full-text

Balancing Privacy and Strategic Planning Needs: A Case Study in De-Identification of Patron Data

Journal of Intellectual Freedom and Privacy ◽

10.5860/jifp.v2i1.6250 ◽

2017 ◽

Vol 2 (1) ◽

pp. 15

Author(s):

Becky Yoose

Keyword(s):

Data Analysis ◽

Strategic Planning ◽

Data Warehouse ◽

Limited Resources ◽

Evidence Based ◽

Data Warehouses ◽

Evidence Based Practices

The rise of evidence-based practices and assessment in libraries in recent years, combined with tying outcomes to future funding and resource allotments, has made libraries more reliant on patron data to determine how to allocate limited resources and funding. Libraries who want to use data for research and analysis but also wanting to protect patron privacy find themselves wondering how to balance these two priorities. This article explores The Seattle Public Library’s attempt to strike the balance between patron privacy and data analysis with the use of a data warehouse with de-identified patron data, as well as implications of data warehouses and de-identification as an option for other libraries.

Download Full-text

General Model for Data Warehouses

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch099 ◽

2011 ◽

pp. 523-528

Author(s):

Michel Schneider

Keyword(s):

Data Warehouse ◽

General Model ◽

Mineral Water ◽

Product Type ◽

The Other ◽

Aggregation Function ◽

Data Warehouses ◽

Other Hand ◽

One Year ◽

Aggregation Operations

Basically, the schema of a data warehouse lies on two kinds of elements: facts and dimensions. Facts are used to memorize measures about situations or events. Dimensions are used to analyse these measures, particularly through aggregation operations (counting, summation, average, etc.). To fix the ideas let us consider the analysis of the sales in a shop according to the product type and to the month in the year. Each sale of a product is a fact. One can characterize it by a quantity. One can calculate an aggregation function on the quantities of several facts. For example, one can make the sum of quantities sold for the product type “mineral water” during January in 2001, 2002 and 2003. Product type is a criterion of the dimension Product. Month and Year are criteria of the dimension Time. A quantity is so connected both with a type of product and with a month of one year. This type of connection concerns the organization of facts with regard to dimensions. On the other hand a month is connected to one year. This type of connection concerns the organization of criteria within a dimension. The possibilities of fact analysis depend on these two forms of connection and on the schema of the warehouse. This schema is chosen by the designer in accordance with the users needs.

Download Full-text

A Survey of Parallel and Distributed Data Warehouses

Business Information Systems ◽

10.4018/978-1-61520-969-9.ch053 ◽

2010 ◽

pp. 865-886

Author(s):

Pedro Furtado

Keyword(s):

Data Warehouse ◽

The Other ◽

Distributed Data ◽

Data Warehouses ◽

Research Results ◽

Other Hand ◽

Scientific Organizations ◽

Globalized World ◽

Distributed Data Warehouse

Data Warehouses are a crucial technology for current competitive organizations in the globalized world. Size, speed and distributed operation are major challenges concerning those systems. Many data warehouses have huge sizes and the requirement that queries be processed quickly and efficiently, so parallel solutions are deployed to render the necessary efficiency. Distributed operation, on the other hand, concerns global commercial and scientific organizations that need to share their data in a coherent distributed data warehouse. In this article we review the major concepts, systems and research results behind parallel and distributed data warehouses.

Download Full-text

Design and Implementation of Active Stream Data Warehouses

Research Anthology on Decision Support Systems and Decision Management in Healthcare, Business, and Engineering ◽

10.4018/978-1-7998-9023-2.ch013 ◽

2021 ◽

pp. 288-311

Author(s):

Sandro Bimonte ◽

Omar Boussaid ◽

Michel Schneider ◽

Fabien Ruelle

Keyword(s):

Data Warehouse ◽

Conceptual Modeling ◽

Data Warehouses ◽

Stream Data ◽

Modeling Tools ◽

Uml Profile ◽

Classical Information ◽

Design And Implementation ◽

Eca Rules

In the era of Big Data, more and more stream data is available. In the same way, Decision Support Systems (DSS) tools, such as data warehouses and alert systems, become more and more sophisticated, and conceptual modeling tools are consequently mandatory for successfully DSS projects. Formalisms such as UML and ER have been widely used in the context of classical information and data warehouse systems, but they have not been investigated yet for stream data warehouses to deal with alert systems. Therefore, in this article, the authors introduce the notion of Active Stream Data Warehouse (ASDW) and this article proposes a UML profile for designing Active Stream Data Warehouses. Indeed, this article extends the ICSOLAP profile to take into account continuous and window OLAP queries. Moreover, this article studies the duality of the stream and OLAP decision-making process and the authors propose a set of ECA rules to automatically trigger OLAP operators. The UML profile is implemented in a new OLAP architecture, and it is validated using an environmental case study concerning the wind monitoring.

Download Full-text

A Grading Data Warehouse Approach to Measuring and Analyzing Learning Performance

Handbook of Research on E-Assessment in Higher Education - Advances in Higher Education and Professional Development ◽

10.4018/978-1-5225-5936-8.ch005 ◽

2019 ◽

pp. 102-126

Author(s):

Michael Aram ◽

Felix Mödritscher ◽

Gustaf Neumann ◽

Monika Andergassen

Keyword(s):

Data Warehouse ◽

Learning Performance ◽

Record Management ◽

Data Warehouses ◽

Early Experiences ◽

E Learning ◽

Competency Based ◽

Learning Platforms ◽

Performance Issues

E-assessment comprises a variety of activities in and beyond the classroom. However, traditional e-learning platforms support only a part of assessment (e.g., individual and group assignments, the grading of such activities, and student record management). Typically, such platforms lack competency orientation, or face performance issues due to increasing application complexity and usage intensity. To overcome technical limitations and provide a basis for competency-based assessment, the authors present an analytics component that is inspired by data warehouses. The potential of this artifact is elaborated, and the improvements are evaluated through a case study about Learn@WU, the LMS of WU Vienna. Although the focus was competency-based aggregation of learning results, early experiences show performance increases for retrieving simple grades of 45% to 98%. Sample scenarios demonstrate how to define and calculate indicators along activity hierarchies and competency graphs to enable the measurement of learning performance along both generic indicators and competency-oriented assessment.

Download Full-text

Formal Framework of XML Document Schema Design

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2012010103 ◽

2012 ◽

Vol 2 (1) ◽

pp. 21-64

Author(s):

Zurinahni Zainol ◽

Bing Wang

Keyword(s):

Formal Specification ◽

Design Tool ◽

Xml Database ◽

Document Design ◽

Xml Documents ◽

Type Definition ◽

Formal Framework ◽

Schema Design ◽

Xml Document

Designing “good” XML documents is a very difficult task for a database designer. Although many theories for XML database design have proposed, none of commercial design tool for XML document design has been developed to assist the XML document designer. In this paper, the authors present a formal framework of XML document design by incorporating a conceptual model of XML schema called Graph-Document Type Definition (G-DTD) with a theory of database normalization. This framework is designed as a blueprint to help the XML database designers to perform the XML document schema design quickly and accurately. The G-DTD is used to describe the structure of XML documents at the schema level. A set of normal forms for G-DTD on the basis of rules proposed by Arenas and Libkin and Lv. et al is used to provide a guideline to a well-designed schema for XML documents. They develop a prototype of XML document schema design using a Z formal specification language. Finally, using a case study, this formal specification is validated to check for correctness and consistency of the specification. Thus, this gives a confidence that the authors’ prototype can be implemented successfully to generate an automatic XML document design.

Download Full-text

A General Model for Data Warehouses

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch141 ◽

2011 ◽

pp. 913-919

Author(s):

Michel Schneider

Keyword(s):

Data Warehouse ◽

General Model ◽

Mineral Water ◽

Product Type ◽

The Other ◽

Aggregation Function ◽

Data Warehouses ◽

Other Hand ◽

One Year ◽

Aggregation Operations

Basically, the schema of a data warehouse lies on two kinds of elements: facts and dimensions. Facts are used to memorize measures about situations or events. Dimensions are used to analyse these measures, particularly through aggregation operations (counting, summation, average, etc.). To fix the ideas let us consider the analysis of the sales in a shop according to the product type and to the month in the year. Each sale of a product is a fact. One can characterize it by a quantity. One can calculate an aggregation function on the quantities of several facts. For example, one can make the sum of quantities sold for the product type “mineral water” during January in 2001, 2002 and 2003. Product type is a criterion of the dimension Product. Month and Year are criteria of the dimension Time. A quantity is so connected both with a type of product and with a month of one year. This type of connection concerns the organization of facts with regard to dimensions. On the other hand a month is connected to one year. This type of connection concerns the organization of criteria within a dimension. The possibilities of fact analysis depend on these two forms of connection and on the schema of the warehouse. This schema is chosen by the designer in accordance with the users needs. Determining the schema of a data warehouse cannot be achieved without adequate modelling of dimensions and facts. In this article we present a general model for dimensions and facts and their relationships. This model will facilitate greatly the choice of the schema and its manipulation by the users.

Download Full-text

Design and Implementation of Active Stream Data Warehouses

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2019040101 ◽

2019 ◽

Vol 15 (2) ◽

pp. 1-21 ◽

Cited By ~ 1

Author(s):

Sandro Bimonte ◽

Omar Boussaid ◽

Michel Schneider ◽

Fabien Ruelle

Keyword(s):

Data Warehouse ◽

Conceptual Modeling ◽

Decision Making Process ◽

Data Warehouses ◽

Stream Data ◽

Modeling Tools ◽

Uml Profile ◽

Classical Information ◽

Eca Rules

In the era of Big Data, more and more stream data is available. In the same way, Decision Support Systems (DSS) tools, such as data warehouses and alert systems, become more and more sophisticated, and conceptual modeling tools are consequently mandatory for successfully DSS projects. Formalisms such as UML and ER have been widely used in the context of classical information and data warehouse systems, but they have not been investigated yet for stream data warehouses to deal with alert systems. Therefore, in this article, the authors introduce the notion of Active Stream Data Warehouse (ASDW) and this article proposes a UML profile for designing Active Stream Data Warehouses. Indeed, this article extends the ICSOLAP profile to take into account continuous and window OLAP queries. Moreover, this article studies the duality of the stream and OLAP decision-making process and the authors propose a set of ECA rules to automatically trigger OLAP operators. The UML profile is implemented in a new OLAP architecture, and it is validated using an environmental case study concerning the wind monitoring.

Download Full-text