Optimizing ETL by a Two-Level Data Staging Method

Xiufeng Liu; Nadeem Iftikhar; Huan Huo; Per Sieverts Nielsen

doi:10.4018/ijdwm.2016070103

Optimizing ETL by a Two-Level Data Staging Method

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2016070103 ◽

2016 ◽

Vol 12 (3) ◽

pp. 32-50

Author(s):

Xiufeng Liu ◽

Nadeem Iftikhar ◽

Huan Huo ◽

Per Sieverts Nielsen

Keyword(s):

Data Warehouse ◽

Data Warehousing ◽

Data Extraction ◽

Data Staging ◽

Staging Area ◽

Level Data ◽

Different Types ◽

Operational Systems ◽

Early Late ◽

Central Data

In data warehousing, the data from source systems are populated into a central data warehouse (DW) through extraction, transformation and loading (ETL). The standard ETL approach usually uses sequential jobs to process the data with dependencies, such as dimension and fact data. It is a non-trivial task to process the so-called early-/late-arriving data, which arrive out of order. This paper proposes a two-level data staging area method to optimize ETL. The proposed method is an all-in-one solution that supports processing different types of data from operational systems, including early-/late-arriving data, and fast-/slowly-changing data. The introduced additional staging area decouples loading process from data extraction and transformation, which improves ETL flexibility and minimizes intervention to the data warehouse. This paper evaluates the proposed method empirically, which shows that it is more efficient and less intrusive than the standard ETL method.

Download Full-text

A Two-Tiered Segmentation Approach for Transaction Data Warehousing

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5516-2.ch001 ◽

2019 ◽

pp. 1-27

Author(s):

Xiufeng Liu ◽

Huan Huo ◽

Nadeem Iftikhar ◽

Per Sieverts Nielsen

Keyword(s):

Data Warehouse ◽

High Velocity ◽

Data Warehousing ◽

Transaction Data ◽

Data Process ◽

Data Store ◽

Segmentation Approach ◽

Operational Systems ◽

Early Late ◽

Central Data

Data warehousing populates data from different source systems into a central data warehouse (DW) through extraction, transformation, and loading (ETL). Massive transaction data are routinely recorded in a variety of applications such as retail commerce, bank systems, and website management. Transaction data record the timestamp and relevant reference data needed for a particular transaction record. It is a non-trivial task for a standard ETL to process transaction data with dependencies and high velocity. This chapter presents a two-tiered segmentation approach for transaction data warehousing. The approach uses a so-called two-staging ETL method to process detailed records from operational systems, followed by a dimensional data process to populate the data store with a star or snowflake schema. The proposed approach is an all-in-one solution capable of processing fast/slowly changing data and early/late-arriving data. This chapter evaluates the proposed method, and the results have validated the effectiveness of the proposed approach for processing transaction data.

Download Full-text

Data Warehouse Software

Encyclopedia of Information Technology Curriculum Integration ◽

10.4018/978-1-59904-881-9.ch029 ◽

2011 ◽

pp. 179-184

Author(s):

Huanyu Ouyang ◽

John Wang

Keyword(s):

Data Warehouse ◽

Data Storage ◽

Data Warehousing ◽

Information Analysis ◽

Distribution Solution ◽

The World ◽

Information Center ◽

Operational Systems ◽

Data Source ◽

Flow Of Information

A data warehouse (DW) is a complete intelligent data storage and information delivery or distribution solution enabling users to customize the flow of information through their organization (Inmon & Hackathorn, 2002). It provides all authorized members of users’ organization with flexible, secure, and rapid access to critical information and intelligent reporting. DW can extract information from sources anywhere in the world and then delivers intelligence anywhere in the world. It connects to any platform, database, data source, and it will also scale to businesses and applications of any size. As early as the 1970’s, data warehousing software (DWS) was recognized when the earliest systems were first developed. The database designs of operational systems were not effective enough for the information analysis and reporting (The Data Warehousing Information Center, 2006).

Download Full-text

Extract Transform Load (ETL) Process in Distributed Database Academic Data Warehouse

APTIKOM Journal on Computer Science and Information Technologies ◽

10.11591/aptikom.j.csit.36 ◽

2019 ◽

Vol 4 (2) ◽

pp. 61-68

Author(s):

Ardhian Agung Yulianto

Keyword(s):

Decision Making ◽

Data Warehouse ◽

Distributed Database ◽

Data Sources ◽

Data Delivery ◽

Multidimensional Model ◽

Data Staging ◽

Staging Area ◽

Data Source ◽

Sql Query

While data warehouse is designed to support the decision-making function, the most time-consuming part is Extract Transform Load (ETL) process. Case in Academic Data Warehouse, when data source came from faculty’s distributed database, although having a typical database but become not easier to integrate. This paper presents the ETL detail process following Data Flow Thread in data staging area for identifying, profiling, the content analyzing including all tables in data sources, and then cleaning, confirming dimension and data delivery to the data warehouse. Those processes are running gradually from each distributed database data sources until it merged. Dimension table and fact table are generated in a multidimensional model. ETL tool is Pentaho Data Integration 6.1. ETL testing is done by comparing data source and data target and DW testing conducted by comparing the data analysis between SQL query and Saiku Analytics plugin in Pentaho Business Analytic Server.

Download Full-text

A Multidimensional Data Warehouse Development Methodology

Managing Data Mining Technologies in Organizations ◽

10.4018/978-1-59140-057-8.ch010 ◽

2011 ◽

pp. 188-201 ◽

Cited By ~ 1

Author(s):

Jose Maria Cavero ◽

Carmen Costilla ◽

Esperanza Marcos ◽

Mario G. Piattini ◽

Adolfo Sanchez

Keyword(s):

Software Development ◽

Data Warehouse ◽

Data Warehousing ◽

Conceptual Modeling ◽

Physical Design ◽

Multidimensional Data ◽

Development Methodology ◽

Development Cycle ◽

Operational Systems ◽

Analytical Processing

Data warehousing and online analytical processing (OLAP) technologies have become growing interest areas in recent years. Specific issues such as conceptual modeling, schemes translation from operational systems, physical design, etc. have been widely treated. A few methodologies covering the entire development cycle have also been proposed, but there is still not a general, accepted, complete methodology for data warehouse design. In this work we present a multidimensional data warehouse development methodology integrated within a traditional software development methodology.

Download Full-text

Knowledge representation in graph-models of complex technical systems

Informatization and communication ◽

10.34219/2078-8320-2020-11-3-12-16 ◽

2020 ◽

pp. 12-16

Author(s):

Evgenia R. Muntyan

Keyword(s):

Analytical Study ◽

Level Structure ◽

Graph Models ◽

Level Data ◽

Knowledge Formation ◽

Different Types ◽

Level Information ◽

Complex Technical Systems ◽

Technical Systems ◽

Structure Of Knowledge

The article analyzes a number of methods of knowledge formation using various graph models, including oriented, undirected graphs with the same type of edges and graphs with multiple and different types of edges. This article shows the possibilities of using graphs to represent a three-level structure of knowledge in the field of complex technical systems modeling. In such a model, at the first level, data is formed in the form of unrelated graph vertices, at the second level – information presented by a related undirected graph, and at the third level – knowledge in the form of a set of graph paths. The proposed interpretation of the structure of knowledge allows to create new opportunities for analytical study of knowledge and information, their properties and relationships.

Download Full-text

Designing a Framework to Standardize Data Warehouse Development Process for Effective Data Warehousing Practices

International Journal of Database Management Systems ◽

10.5121/ijdms.2016.8402 ◽

2016 ◽

Vol 8 (4) ◽

pp. 15-32

Author(s):

Deepak Asrani ◽

Renu Jain

Keyword(s):

Data Warehouse ◽

Development Process ◽

Data Warehousing

Download Full-text

Hyperchaotic Image Encryption Based on Multiple Bit Permutation and Diffusion

Entropy ◽

10.3390/e23050510 ◽

2021 ◽

Vol 23 (5) ◽

pp. 510

Author(s):

Taiyong Li ◽

Duzhong Zhang

Keyword(s):

Big Data ◽

Lyapunov Exponents ◽

Image Encryption ◽

Hyperchaotic System ◽

Encryption Scheme ◽

Image Content ◽

Image Security ◽

Level Data ◽

Different Types ◽

And Diffusion

Image security is a hot topic in the era of Internet and big data. Hyperchaotic image encryption, which can effectively prevent unauthorized users from accessing image content, has become more and more popular in the community of image security. In general, such approaches conduct encryption on pixel-level, bit-level, DNA-level data or their combinations, lacking diversity of processed data levels and limiting security. This paper proposes a novel hyperchaotic image encryption scheme via multiple bit permutation and diffusion, namely MBPD, to cope with this issue. Specifically, a four-dimensional hyperchaotic system with three positive Lyapunov exponents is firstly proposed. Second, a hyperchaotic sequence is generated from the proposed hyperchaotic system for consequent encryption operations. Third, multiple bit permutation and diffusion (permutation and/or diffusion can be conducted with 1–8 or more bits) determined by the hyperchaotic sequence is designed. Finally, the proposed MBPD is applied to image encryption. We conduct extensive experiments on a couple of public test images to validate the proposed MBPD. The results verify that the MBPD can effectively resist different types of attacks and has better performance than the compared popular encryption methods.

Download Full-text

Universal Data Warehousing Based on a Meta-Data Modeling Approach

International Journal of Cooperative Information Systems ◽

10.1142/s0218843003000772 ◽

2003 ◽

Vol 12 (03) ◽

pp. 325-363 ◽

Cited By ~ 2

Author(s):

Joseph Fong ◽

Qing Li ◽

Shi-Ming Huang

Keyword(s):

Data Warehouse ◽

Data Warehousing ◽

Object Oriented ◽

Data Models ◽

Heterogeneous Databases ◽

Materialized Views ◽

Prototype System ◽

Relational View ◽

Metadata Model ◽

Star Schema

Data warehouse contains vast amount of data to support complex queries of various Decision Support Systems (DSSs). It needs to store materialized views of data, which must be available consistently and instantaneously. Using a frame metadata model, this paper presents an architecture of a universal data warehousing with different data models. The frame metadata model represents the metadata of a data warehouse, which structures an application domain into classes, and integrates schemas of heterogeneous databases by capturing their semantics. A star schema is derived from user requirements based on the integrated schema, catalogued in the metadata, which stores the schema of relational database (RDB) and object-oriented database (OODB). Data materialization between RDB and OODB is achieved by unloading source database into sequential file and reloading into target database, through which an object relational view can be defined so as to allow the users to obtain the same warehouse view in different data models simultaneously. We describe our procedures of building the relational view of star schema by multidimensional SQL query, and the object oriented view of the data warehouse by Online Analytical Processing (OLAP) through method call, derived from the integrated schema. To validate our work, an application prototype system has been developed in a product sales data warehousing domain based on this approach.

Download Full-text

Weather Data Warehouse: An Agent-Based Data Warehousing System

Proceedings of the 38th Annual Hawaii International Conference on System Sciences ◽

10.1109/hicss.2005.681 ◽

2005 ◽

Cited By ~ 2

Author(s):

G. Kalra ◽

D. Steiner

Keyword(s):

Data Warehouse ◽

Data Warehousing ◽

Weather Data ◽

Agent Based

Download Full-text

AN ARCHITECTURE FOR DATA WAREHOUSING SUPPORTING DATA INDEPENDENCE AND INTEROPERABILITY

International Journal of Cooperative Information Systems ◽

10.1142/s0218843001000394 ◽

2001 ◽

Vol 10 (03) ◽

pp. 377-397 ◽

Cited By ~ 8

Author(s):

LUCA CABIBBO ◽

RICCARDO TORLONE

Keyword(s):

Data Warehouse ◽

Data Model ◽

Data Warehousing ◽

Heterogeneous Data ◽

Multidimensional Data ◽

Multidimensional Databases ◽

Level Of Aggregation ◽

High Level ◽

Data Independence ◽

Logical Architecture

We report on the design of a novel architecture for data warehousing based on the introduction of an explicit "logical" layer to the traditional data warehousing framework. This layer serves to guarantee a complete independence of OLAP applications from the physical storage structure of the data warehouse and thus allows users and applications to manipulate multidimensional data ignoring implementation details. For example, it makes possible the modification of the data warehouse organization (e.g. MOLAP or ROLAP implementation, star scheme or snowflake scheme structure) without influencing the high level description of multidimensional data and programs that use the data. Also, it supports the integration of multidimensional data stored in heterogeneous OLAP servers. We propose [Formula: see text], a simple data model for multidimensional databases, as the reference for the logical layer. [Formula: see text] provides an abstract formalism to describe the basic concepts that can be found in any OLAP system (fact, dimension, level of aggregation, and measure). We show that [Formula: see text] databases can be implemented in both relational and multidimensional storage systems. We also show that [Formula: see text] can be profitably used in OLAP applications as front-end. We finally describe the design of a practical system that supports the above logical architecture; this system is used to show in practice how the architecture we propose can hide implementation details and provides a support for interoperability between different and possibly heterogeneous data warehouse applications.

Download Full-text