Incremental Load in a Data Warehousing Environment

Incremental load is an important factor for successful data warehousing. Lack of standardized incremental refresh methodologies can lead to poor analytical results, which can be unacceptable to an organization’s analytical community. Successful data warehouse implementation depends on consistent metadata as well as incremental data load techniques. If consistent load timestamps are maintained and efficient transformation algorithms are used, it is possible to refresh databases with complete accuracy and with little or no manual checking. This paper proposes an Extract-Transform-Load (ETL) metadata model that archives load observation timestamps and other useful load parameters. The author also recommends algorithms and techniques for incremental refreshes that enable table loading while ensuring data consistency, integrity, and improving load performance. In addition to significantly improving quality in incremental load techniques, these methods will save a substantial amount of data warehouse systems resources.

Download Full-text

Incremental Load in a Data Warehousing Environment

Insights into Advancements in Intelligent Information Technologies ◽

10.4018/978-1-4666-0158-1.ch009 ◽

2012 ◽

pp. 161-177

Author(s):

Nayem Rahman

Keyword(s):

Data Warehouse ◽

Data Warehousing ◽

Data Consistency ◽

Metadata Model ◽

Efficient Transformation ◽

Analytical Results ◽

Transformation Algorithms ◽

Complete Accuracy

Download Full-text

Universal Data Warehousing Based on a Meta-Data Modeling Approach

International Journal of Cooperative Information Systems ◽

10.1142/s0218843003000772 ◽

2003 ◽

Vol 12 (03) ◽

pp. 325-363 ◽

Cited By ~ 2

Author(s):

Joseph Fong ◽

Qing Li ◽

Shi-Ming Huang

Keyword(s):

Data Warehouse ◽

Data Warehousing ◽

Object Oriented ◽

Data Models ◽

Heterogeneous Databases ◽

Materialized Views ◽

Prototype System ◽

Relational View ◽

Metadata Model ◽

Star Schema

Data warehouse contains vast amount of data to support complex queries of various Decision Support Systems (DSSs). It needs to store materialized views of data, which must be available consistently and instantaneously. Using a frame metadata model, this paper presents an architecture of a universal data warehousing with different data models. The frame metadata model represents the metadata of a data warehouse, which structures an application domain into classes, and integrates schemas of heterogeneous databases by capturing their semantics. A star schema is derived from user requirements based on the integrated schema, catalogued in the metadata, which stores the schema of relational database (RDB) and object-oriented database (OODB). Data materialization between RDB and OODB is achieved by unloading source database into sequential file and reloading into target database, through which an object relational view can be defined so as to allow the users to obtain the same warehouse view in different data models simultaneously. We describe our procedures of building the relational view of star schema by multidimensional SQL query, and the object oriented view of the data warehouse by Online Analytical Processing (OLAP) through method call, derived from the integrated schema. To validate our work, an application prototype system has been developed in a product sales data warehousing domain based on this approach.

Download Full-text

O conceito de datawarehousing aplicado à gestão de informações em bibliotecas The data warehousing concept applied to information management in libraries

RDBCI Revista Digital de Biblioteconomia e Ciência da Informação ◽

10.20396/rdbci.v8i2.1936 ◽

2011 ◽

Vol 8 (2) ◽

pp. 114

Author(s):

Maurício Ferreira Santana

Keyword(s):

Decision Making ◽

Literature Review ◽

Data Warehouse ◽

Information Management ◽

Data Warehousing ◽

Historical Information ◽

Amount Of Information ◽

Analytical Results ◽

Decision Making Processes

Propõe que a arquitetura de data warehouse seja um referencial para implantação em bibliotecas. Esta proposta tem origem na preocupação com o grande volume de informações existente nesses setores, em nível operacional, gerencial e estratégico, e com uma forma efetiva de geração de informações históricas de acervo, clientes (usuários) e custos para o processo decisório. Através de revisão bibliográfica sobre a arquitetura de data warehouse, apresenta-se a arquitetura proposta por Ralph Kimball, em esquema dimensional, tomando-se como exemplo o processo “aquisição”. Espera-se que bibliotecas possam se valer desta arquitetura para obter resultados analíticos similares aos de empresas que já lançam mão desta tecnologia.AbstractPropose the data warehouse architecture as a reference to be applied in libraries. This proposal has began with the concern about the large amount of information existing in these sectors, at the operational, managerial and strategic levels, and as an effective way to generate historical information of collection, customers (users) and costs, for decision making processes. Through literature review about data warehouse architecture, it is presented the architecture proposed by Ralph Kimball, in a dimensional scheme, taking as example the process of "acquisition". It is expected that libraries can use this architecture to obtain analytical results similar to those of companies that already make use of such technology.

Download Full-text

Designing a Framework to Standardize Data Warehouse Development Process for Effective Data Warehousing Practices

International Journal of Database Management Systems ◽

10.5121/ijdms.2016.8402 ◽

2016 ◽

Vol 8 (4) ◽

pp. 15-32

Author(s):

Deepak Asrani ◽

Renu Jain

Keyword(s):

Data Warehouse ◽

Development Process ◽

Data Warehousing

Download Full-text

Weather Data Warehouse: An Agent-Based Data Warehousing System

Proceedings of the 38th Annual Hawaii International Conference on System Sciences ◽

10.1109/hicss.2005.681 ◽

2005 ◽

Cited By ~ 2

Author(s):

G. Kalra ◽

D. Steiner

Keyword(s):

Data Warehouse ◽

Data Warehousing ◽

Weather Data ◽

Agent Based

Download Full-text

Optimizing ETL by a Two-Level Data Staging Method

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2016070103 ◽

2016 ◽

Vol 12 (3) ◽

pp. 32-50

Author(s):

Xiufeng Liu ◽

Nadeem Iftikhar ◽

Huan Huo ◽

Per Sieverts Nielsen

Keyword(s):

Data Warehouse ◽

Data Warehousing ◽

Data Extraction ◽

Data Staging ◽

Staging Area ◽

Level Data ◽

Different Types ◽

Operational Systems ◽

Early Late ◽

Central Data

In data warehousing, the data from source systems are populated into a central data warehouse (DW) through extraction, transformation and loading (ETL). The standard ETL approach usually uses sequential jobs to process the data with dependencies, such as dimension and fact data. It is a non-trivial task to process the so-called early-/late-arriving data, which arrive out of order. This paper proposes a two-level data staging area method to optimize ETL. The proposed method is an all-in-one solution that supports processing different types of data from operational systems, including early-/late-arriving data, and fast-/slowly-changing data. The introduced additional staging area decouples loading process from data extraction and transformation, which improves ETL flexibility and minimizes intervention to the data warehouse. This paper evaluates the proposed method empirically, which shows that it is more efficient and less intrusive than the standard ETL method.

Download Full-text

AN ARCHITECTURE FOR DATA WAREHOUSING SUPPORTING DATA INDEPENDENCE AND INTEROPERABILITY

International Journal of Cooperative Information Systems ◽

10.1142/s0218843001000394 ◽

2001 ◽

Vol 10 (03) ◽

pp. 377-397 ◽

Cited By ~ 8

Author(s):

LUCA CABIBBO ◽

RICCARDO TORLONE

Keyword(s):

Data Warehouse ◽

Data Model ◽

Data Warehousing ◽

Heterogeneous Data ◽

Multidimensional Data ◽

Multidimensional Databases ◽

Level Of Aggregation ◽

High Level ◽

Data Independence ◽

Logical Architecture

We report on the design of a novel architecture for data warehousing based on the introduction of an explicit "logical" layer to the traditional data warehousing framework. This layer serves to guarantee a complete independence of OLAP applications from the physical storage structure of the data warehouse and thus allows users and applications to manipulate multidimensional data ignoring implementation details. For example, it makes possible the modification of the data warehouse organization (e.g. MOLAP or ROLAP implementation, star scheme or snowflake scheme structure) without influencing the high level description of multidimensional data and programs that use the data. Also, it supports the integration of multidimensional data stored in heterogeneous OLAP servers. We propose [Formula: see text], a simple data model for multidimensional databases, as the reference for the logical layer. [Formula: see text] provides an abstract formalism to describe the basic concepts that can be found in any OLAP system (fact, dimension, level of aggregation, and measure). We show that [Formula: see text] databases can be implemented in both relational and multidimensional storage systems. We also show that [Formula: see text] can be profitably used in OLAP applications as front-end. We finally describe the design of a practical system that supports the above logical architecture; this system is used to show in practice how the architecture we propose can hide implementation details and provides a support for interoperability between different and possibly heterogeneous data warehouse applications.

Download Full-text

Data warehousing organization: Infrastructural experimentation with educational governance

Organization ◽

10.1177/1350508418808233 ◽

2018 ◽

Vol 26 (4) ◽

pp. 537-552 ◽

Cited By ~ 3

Author(s):

Helene Ratner ◽

Christopher Gad

Keyword(s):

Data Warehouse ◽

Science And Technology Studies ◽

Data Warehousing ◽

Ethnographic Study ◽

Educational Governance ◽

Unfinished Business ◽

Organizational Effects ◽

Organizational Relations ◽

Partial Connection ◽

Governance Infrastructure

Organization is increasingly entwined with databased governance infrastructures. Developing the idea of ‘infrastructure as partial connection’ with inspiration from Marilyn Strathern and Science and Technology Studies, this article proposes that database infrastructures are intrinsic to processes of organizing intra- and inter-organizational relations. Seeing infrastructure as partial connection brings our attention to the ontological experimentation with knowing organizations through work of establishing and cutting relations. We illustrate this claim through a multi-sited ethnographic study of ‘The Data Warehouse’. ‘The Data Warehouse’ is an important infrastructural component in the current reorganization of Danish educational governance which makes schools’ performance public and comparable. We suggest that ‘The Data Warehouse’ materializes different, but overlapping, infrastructural experiments with governing education at different organizational sites enacting a governmental hierarchy. Each site can be seen as belonging to the same governance infrastructure but also as constituting ‘centres’ in its own right. ‘The Data Warehouse’ participates in the always-unfinished business of organizational world making and is made to (partially) relate to different organizational concerns and practices. This argument has implications for how we analyze the organizational effects of pervasive databased governance infrastructures and invites exploring their multiple organizing effects.

Download Full-text

Issues and Handy Solutions Addressed at Every Stage in Real Time Data Warehousing, I.E. ETL (Extraction, Transformation & Loading)

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e1100.0785s319 ◽

2019 ◽

Vol 8 (5S3) ◽

pp. 344-348

Keyword(s):

Real Time ◽

Data Warehouse ◽

Data Warehousing ◽

Time Data ◽

Processing Load ◽

Real Time Data ◽

Data Source

In the standard ETL (Extract Processing Load), the data warehouse refreshment must be performed outside of peak hours. i It implies i that the i functioning and i analysis has stopped in their iall actions. iIt causes the iamount of icleanness of i data from the idata Warehouse which iisn't suggesting ithe latest i operational transections. This i issue is i known as i data i latency. The data warehousing is iemployed to ibe a iremedy for ithis iissue. It updates the idata warehouse iat a inear real-time iFashion, instantly after data found from the data source. Therefore, data i latency could i be reduced. Hence the near real time data warehousing was having issues which was not identified in traditional ETL. This paper claims to communicate the issues and accessible options at every point iin the i near real-time i data warehousing, i.e. i The i issues and Available alternatives iare based ion ia literature ireview by additional iStudy that ifocus ion near real-time data iwarehousing issue

Download Full-text

Data Warehousing Requirements Collection and Definition

Organizational Applications of Business Intelligence Management ◽

10.4018/978-1-4666-0279-3.ch018 ◽

2012 ◽

pp. 261-271

Author(s):

Nenad Jukic ◽

Miguel Velasco

Keyword(s):

Data Warehouse ◽

Large Scale ◽

Data Warehousing ◽

System Development ◽

Large Scale Data ◽

Typical Data ◽

Potential Risks ◽

Real Scenario ◽

The Impact ◽

Scale Data

Defining data warehouse requirements is widely recognized as one of the most important steps in the larger data warehouse system development process. This paper examines the potential risks and pitfalls within the data warehouse requirement collection and definition process. A real scenario of a large-scale data warehouse implementation is given, and details of this project, which ultimately failed due to inadequate requirement collection and definition process, are described. The presented case underscores and illustrates the impact of the requirement collection and definition process on the data warehouse implementation, while the case is analyzed within the context of the existing approaches, methodologies, and best practices for prevention and avoidance of typical data warehouse requirement errors and oversights.

Download Full-text