Progressive Methods in Data Warehousing and Business Intelligence
Latest Publications


TOTAL DOCUMENTS

16
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

Published By IGI Global

9781605662329, 9781605662336

Author(s):  
Lars Frank ◽  
Christian Frank

A Star Schema Data Warehouse looks like a star with a central, so-called fact table, in the middle, surrounded by so-called dimension tables with one-to-many relationships to the central fact table. Dimensions are defined as dynamic or slowly changing if the attributes or relationships of a dimension can be updated. Aggregations of fact data to the level of the related dynamic dimensions might be misleading if the fact data are aggregated without considering the changes of the dimensions. In this chapter, we will first prove that the problems of SCD (Slowly Changing Dimensions) in a datawarehouse may be viewed as a special case of the read skew anomaly that may occur when different transactions access and update records without concurrency control. That is, we prove that aggregating fact data to the levels of a dynamic dimension should not make sense. On the other hand, we will also illustrate, by examples, that in some situations it does make sense that fact data is aggregated to the levels of a dynamic dimension. That is, it is the semantics of the data that determine whether historical dimension data should be preserved or destroyed. Even worse, we also illustrate that for some applications, we need a history preserving response, while for other applications at the same time need a history destroying response. Kimball et al., (2002), have described three classic solutions/responses to handling the aggregation problems caused by slowly changing dimensions. In this chapter, we will describe and evaluate four more responses of which one are new. This is important because all the responses have very different properties, and it is not possible to select a best solution without knowing the semantics of the data.


Author(s):  
Jorge Loureiro ◽  
Orlando Belo

OLAP queries are characterized by short answering times. Materialized cube views, a pre-aggregation and storage of group-by values, are one of the possible answers to that condition. However, if all possible views were computed and stored, the amount of necessary materializing time and storage space would be huge. Selecting the most beneficial set, based on the profile of the queries and observing some constraints as materializing space and maintenance time, a problem denoted as cube views selection problem, is the condition for an effective OLAP system, with a variety of solutions for centralized approaches. When a distributed OLAP architecture is considered, the problem gets bigger, as we must deal with another dimension—space. Besides the problem of the selection of multidimensional structures, there’s now a node allocation one; both are a condition for performance. This chapter focuses on distributed OLAP systems, recently introduced, proposing evolutionary algorithms for the selection and allocation of the distributed OLAP Cube, using a distributed linear cost model. This model uses an extended aggregation lattice as framework to capture the distributed semantics, and introduces processing nodes’ power and real communication costs parameters, allowing the estimation of query and maintenance costs in time units. Moreover, as we have an OLAP environment, whit several nodes, we will have parallel processing and then, the evaluation of the fitness of evolutionary solutions is based on cost estimation algorithms that simulate the execution of parallel tasks, using time units as cost metric.


Author(s):  
Dirk Draheim ◽  
Oscar Mangisengi

Nowadays tracking data from activity checkpoints of unit transactions within an organization’s business processes becomes an important data resource for business analysts and decision-makers to provide essential strategic and tactical business information. In the context of business process-oriented solutions, business-activity monitoring (BAM) architecture has been predicted as a major issue in the near future of the business-intelligence area. On the other hand, there is a huge potential for optimization of processes in today’s industrial manufacturing. Important targets of improvement are production efficiency and product quality. Optimization is a complex task. A plethora of data that stems from numerical control and monitoring systems must be accessed, correlations in the information must be recognized, and rules that lead to improvement must be identified. In this chapter we envision the vertical integration of technical processes and control data with business processes and enterprise resource data. As concrete steps, we derive an activity warehouse model based on BAM requirements. We analyze different perspectives based on the requirements, such as business process management, key performance indication, process and state based-workflow management, and macro- and micro-level data. As a concrete outcome we define a meta-model for business processes with respect to monitoring. The implementation shows that data stored in an activity warehouse is able to efficiently monitor business processes in real-time and provides a better real-time visibility of business processes.


Author(s):  
Maurizio Pighin ◽  
Lucio Ieronutti

Data Warehouses are increasingly used by commercial organizations to extract, from a huge amount of transactional data, concise information useful for supporting decision processes. However, the task of designing a data warehouse and evaluating its effectiveness is not trivial, especially in the case of large databases and in presence of redundant information. The meaning and the quality of selected attributes heavily influence the data warehouse’s effectiveness and the quality of derived decisions. Our research is focused on interactive methodologies and techniques targeted at supporting the data warehouse design and evaluation by taking into account the quality of initial data. In this chapter we propose an approach for supporting the data warehouses development and refinement, providing practical examples and demonstrating the effectiveness of our solution. Our approach is mainly based on two phases: the first one is targeted at interactively guiding the attributes selection by providing quantitative information measuring different statistical and syntactical aspects of data, while the second phase, based on a set of 3D visualizations, gives the opportunity of run-time refining taken design choices according to data examination and analysis. For experimenting proposed solutions on real data, we have developed a tool, called ELDA (EvaLuation DAta warehouse quality), that has been used for supporting the data warehouse design and evaluation.


Author(s):  
Laila Niedrite ◽  
Maris Solodovnikova Treimanis ◽  
Liga Grundmane

There are many methods in the area of data warehousing to define requirements for the development of the most appropriate conceptual model of a data warehouse. There is no universal consensus about the best method, nor are there accepted standards for the conceptual modeling of data warehouses. Only few conceptual models have formally described methods how to get these models. Therefore, problems arise when in a particular data warehousing project, an appropriate development approach, and a corresponding method for the requirements elicitation, should be chosen and applied. Sometimes it is also necessary not only to use the existing methods, but also to provide new methods that are usable in particular development situations. It is necessary to represent these new methods formally, to ensure the appropriate usage of these methods in similar situations in the future. It is also necessary to define the contingency factors, which describe the situation where the method is usable.This chapter represents the usage of method engineering approach for the development of conceptual models of data warehouses. A set of contingency factors that determine the choice between the usage of an existing method and the necessity to develop a new one is defined. Three case studies are presented. Three new methods: userdriven, data-driven, and goal-driven are developed according to the situation in the particular projects and using the method engineering approach.


Author(s):  
Hamid Haidarian Shahri

Entity resolution (also known as duplicate elimination) is an important part of the data cleaning process, especially in data integration and warehousing, where data are gathered from distributed and inconsistent sources. Learnable string similarity measures are an active area of research in the entity resolution problem. Our proposed framework builds upon our earlier work on entity resolution, in which fuzzy rules and membership functions are defined by the user. Here, we exploit neuro-fuzzy modeling for the first time to produce a unique adaptive framework for entity resolution, which automatically learns and adapts to the specific notion of similarity at a meta-level. This framework encompasses many of the previous work on trainable and domain-specific similarity measures. Employing fuzzy inference, it removes the repetitive task of hard-coding a program based on a schema, which is usually required in previous approaches. In addition, our extensible framework is very flexible for the end user. Hence, it can be utilized in the production of an intelligent tool to increase the quality and accuracy of data.


Author(s):  
Jorge Loureiro ◽  
Orlando Belo

Globalization and market deregulation has increased business competition, which imposed OLAP data and technologies as one of the great enterprise’s assets. Its growing use and size stressed underlying servers and forced new solutions. The distribution of multidimensional data through a number of servers allows the increasing of storage and processing power without an exponential increase of financial costs. However, this solution adds another dimension to the problem: space. Even in centralized OLAP, cube selection efficiency is complex, but now, we must also know where to materialize subcubes. We have to select and also allocate the most beneficial subcubes, attending an expected (changing) user profile and constraints. We now have to deal with materializing space, processing power distribution, and communication costs. This chapter proposes new distributed cube selection algorithms based on discrete particle swarm optimizers; algorithms that solve the distributed OLAP selection problem considering a query profile under space constraints, using discrete particle swarm optimization in its normal(Di-PSO), cooperative (Di-CPSO), multi-phase (Di-MPSO), and applying hybrid genetic operators.


Author(s):  
Stefano Rizzi

In the context of data warehouse design, a basic role is played by conceptual modeling, that provides a higher level of abstraction in describing the warehousing process and architecture in all its aspects, aimed at achieving independence of implementation issues. This chapter focuses on a conceptual model called the DFM that suits the variety of modeling situations that may be encountered in real projects of small to large complexity. The aim of the chapter is to propose a comprehensive set of solutions for conceptual modeling according to the DFM and to give the designer a practical guide for applying them in the context of a design methodology. Besides the basic concepts of multidimensional modeling, the other issues discussed are descriptive and cross-dimension attributes; convergences; shared, incomplete, recursive, and dynamic hierarchies; multiple and optional arcs; and additivity.


Author(s):  
Shi-Ming Huang ◽  
John Tait ◽  
Chun-Hao Su ◽  
Chih-Fong Tsai

Data warehousing is a popular technology, which aims at improving decision-making ability. As the result of an increasingly competitive environment, many companies are adopting a “bottom-up” approach to construct a data warehouse, since it is more likely to be on time and within budget. However, multiple independent data marts/cubes can easily cause problematic data inconsistency for anomalous update transactions, which leads to biased decision-making. This research focuses on solving the data inconsistency problem and proposing a temporal-based data consistency mechanism (TDCM) to maintain data consistency. From a relative time perspective, we use an active rule (standard ECA rule) to monitor the user query event and use a metadata approach to record related information. This both builds relationships between the different data cubes, and allows a user to define a VIT (valid interval temporal) threshold to identify the validity of interval that is a threshold to maintain data consistency. Moreover, we propose a consistency update method to update inconsistent data cubes, which can ensure all pieces of information are temporally consistent.


Author(s):  
Hanene Ben-Abdallah ◽  
Jamel Feki ◽  
Mounira Ben Abdallah

Despite their strategic importance, the wide-spread usage of decision support systems remains limited by both the complexity of their design and the lack of commercial design tools. This chapter addresses the design complexity of these systems. It proposes an approach for data mart design that is practical and that endorses the decision maker involvement in the design process. This approach adapts a development technique well established in the design of various complex systems for the design of data marts (DM): Pattern-based design. In the case of DM, a multidimensional pattern (MP) is a generic specification of analytical requirements within one domain. It is constructed and documented with standard, real-world entities (RWE) that describe information artifacts used or produced by the operational information systems (IS) of several enterprises. This documentation assists a decision maker in understanding the generic analytical solution; in addition, it guides the DM developer during the implementation phase. After over viewing our notion of MP and their construction method, this chapter details a reuse method composed of two adaptation levels: one logical and one physical. The logical level, which is independent of any data source model, allows a decision maker to adapt a given MP to their analytical requirements and to the RWE of their particular enterprise; this produces a DM schema. The physical specific level projects the RWE of the DM over the data source model. That is, the projection identifies the data source elements necessary to define the ETL procedures. We illustrate our approaches of construction and reuse of MP with examples in the medical domain.


Sign in / Sign up

Export Citation Format

Share Document