Extract Transform Load (ETL) Process in Distributed Database Academic Data Warehouse

While a data warehouse is designed to support the decision-making function, the most time-consuming partis the Extract Transform Load (ETL) process. Case in Academic Data Warehouse, when data source came from thefaculty’s distributed database, although having a typical database but become not easier to integrate. This paperpresents how to an ETL process in distributed database academic data warehouse. Following Data Flow Threadprocess in the data staging area, a deep analysis performed for identifying all tables in each data sources, includingcontent profiling. Then the cleaning, confirming, and data delivery steps pour the different data source into the datawarehouse (DW). Since DW development using bottom-up Kimball’s multidimensional approach, we found the threetypes of extraction activities from data source table: merge, merge-union, and union. Result for cleaning andconforming step set by creating conform dimension on data source analysis, refinement, and hierarchy structure. Thefinal of the ETL step is loading it into integrating dimension and fact tables by a generation of a surrogate key. Thoseprocesses are running gradually from each distributed database data sources until it incorporated. This technicalactivity in distributed database ETL process generally can be adopted widely in other industries which designer musthave advance knowledge to structure and content of data source.

Download Full-text

Design of Marine Data Warehouse ETL System

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.668-669.1374 ◽

2014 ◽

Vol 668-669 ◽

pp. 1374-1377 ◽

Cited By ~ 1

Author(s):

Wei Jun Wen

Keyword(s):

Decision Making ◽

Data Quality ◽

Data Warehouse ◽

Environmental Data ◽

Data Sources ◽

Quality Data ◽

Critical Step ◽

Marine Data ◽

Data Extracting ◽

Marine Environmental

ETL refers to the process of data extracting, transformation and loading and is deemed as a critical step in ensuring the quality, data specification and standardization of marine environmental data. Marine data, due to their complication, field diversity and huge volume, still remain decentralized, polyphyletic and isomerous with different semantics and hence far from being able to provide effective data sources for decision making. ETL enables the construction of marine environmental data warehouse in the form of cleaning, transformation, integration, loading and periodic updating of basic marine data warehouse. The paper presents a research on rules for cleaning, transformation and integration of marine data, based on which original ETL system of marine environmental data warehouse is so designed and developed. The system further guarantees data quality and correctness in analysis and decision-making based on marine environmental data in the future.

Download Full-text

Optimizing ETL by a Two-Level Data Staging Method

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2016070103 ◽

2016 ◽

Vol 12 (3) ◽

pp. 32-50

Author(s):

Xiufeng Liu ◽

Nadeem Iftikhar ◽

Huan Huo ◽

Per Sieverts Nielsen

Keyword(s):

Data Warehouse ◽

Data Warehousing ◽

Data Extraction ◽

Data Staging ◽

Staging Area ◽

Level Data ◽

Different Types ◽

Operational Systems ◽

Early Late ◽

Central Data

In data warehousing, the data from source systems are populated into a central data warehouse (DW) through extraction, transformation and loading (ETL). The standard ETL approach usually uses sequential jobs to process the data with dependencies, such as dimension and fact data. It is a non-trivial task to process the so-called early-/late-arriving data, which arrive out of order. This paper proposes a two-level data staging area method to optimize ETL. The proposed method is an all-in-one solution that supports processing different types of data from operational systems, including early-/late-arriving data, and fast-/slowly-changing data. The introduced additional staging area decouples loading process from data extraction and transformation, which improves ETL flexibility and minimizes intervention to the data warehouse. This paper evaluates the proposed method empirically, which shows that it is more efficient and less intrusive than the standard ETL method.

Download Full-text

A New Approach to Generate Hospital Data Warehouse Schema

Applying Business Intelligence to Clinical and Healthcare Organizations - Advances in Bioinformatics and Biomedical Engineering ◽

10.4018/978-1-4666-9882-6.ch005 ◽

2016 ◽

pp. 84-115 ◽

Cited By ~ 2

Author(s):

Nouha Arfaoui ◽

Jalel Akaichi

Keyword(s):

Decision Making ◽

Data Warehouse ◽

Information Exchange ◽

Data Sources ◽

Healthcare Industry ◽

Schema Integration ◽

Hospital Data ◽

New Approach ◽

Similar Elements ◽

Existing Data

The healthcare industry generates huge amount of data underused for decision making needs because of the absence of specific design mastered by healthcare actors and the lack of collaboration and information exchange between the institutions. In this work, a new approach is proposed to design the schema of a Hospital Data Warehouse (HDW). It starts by generating the schemas of the Hospital Data Mart (HDM) one for each department taking into consideration the requirements of the healthcare staffs and the existing data sources. Then, it merges them to build the schema of HDW. The bottom-up approach is suitable because the healthcare departments are separately. To merge the schemas, a new schema integration methodology is used. It starts by extracting the similar elements of the schemas and the conflicts and presents them as mapping rules. Then, it transforms the rules into queries and applies them to merge the schemas.

Download Full-text

Analysis and Design of Data Warehouse on Academic STMIK STIKOM Bali

International Journal of Engineering and Emerging Technology ◽

10.24843/ijeet.2017.v02.i01.p08 ◽

2017 ◽

Vol 2 (1) ◽

pp. 35

Author(s):

Komang Budiarta ◽

Putu Agung Ananta Wijaya ◽

Cokorde Gede Indra Partha

Keyword(s):

Decision Making ◽

Data Warehouse ◽

Evaluation Process ◽

Process Data ◽

Study Program ◽

Analysis And Design ◽

Self Evaluation ◽

Data Source ◽

System Information

College accreditation by BAN-PT is one of the parameters in determining the quality of universities in Indonesia. As consideration to achieve the standard from BAN-PT, so they have an evaluation process itself in study program or college to be meet the standard universities when set by the BAN-PT. In carrying out the process of self evaluation, required data source that is used as the basis in assessing on a criteria. In most of the study program, all data spread on the system information and physical document that different, that is require more time and effort to integrate up to interpret. Data warehouse fight important in collecting data that spread and become an information. The process data warehouse with ETL used to integrate, extract, clean, transforming and reload into the data warehouse. With the existence of the data warehouse on Academic STIMIK STIKOM Bali can make it easier for executives to get the information to support the standard accreditation standart three and can be used as a reference in decision making.

Download Full-text

Domain/Mapping Model: A Novel Data Warehouse Data Mode

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2017.2.2876 ◽

2017 ◽

Vol 12 (2) ◽

pp. 166 ◽

Cited By ~ 2

Author(s):

Ivan Bojicic ◽

Zoran Marjanovic ◽

Nina Turajlic ◽

Marko Petrovic ◽

Milica Vuckovic ◽

...

Keyword(s):

Data Warehouse ◽

Data Model ◽

Heterogeneous Data ◽

Physical Structure ◽

Data Sources ◽

Mapping Model ◽

Heterogeneous Data Sources ◽

Domain Mapping ◽

Source Models ◽

Data Source

In order for a data warehouse to be able to adequately fulfill its integrative and historical purpose, its data model must enable the appropriate and consistent representation of the different states of a system. In effect, a DW data model, representing the physical structure of the DW, must be general enough, to be able to consume data from heterogeneous data sources and reconcile the semantic differences of the data source models, and, at the same time, be resilient to the constant changes in the structure of the data sources. One of the main problems related to DW development is the absence of a standardized DW data model. In this paper a comparative analysis of the four most prominent DW data models (namely the relational/normalized model, data vault model, anchor model and dimensional model) will be given. On the basis of the results of [1]a, the new DW data model (the Domain/Mapping model- DMM) which would more adequately fulfill the posed requirements is presented.

Download Full-text

On Handling the Evolution of External Data Sources in a Data Warehouse Architecture

Integrations of Data Warehousing, Data Mining and Database Technologies ◽

10.4018/978-1-60960-537-7.ch006 ◽

2011 ◽

pp. 106-147 ◽

Cited By ~ 4

Author(s):

Robert Wrembel

Keyword(s):

Data Warehouse ◽

Real World ◽

Practical Importance ◽

Data Sources ◽

External Data ◽

External Data Source ◽

On Line ◽

Advanced Analysis ◽

Data Source ◽

Analytical Processing

A data warehouse architecture (DWA) has been developed for the purpose of integrating data from multiple heterogeneous, distributed, and autonomous external data sources (EDSs) as well as for providing means for advanced analysis of integrated data. The major components of this architecture include: an external data source (EDS) layer, and extraction-transformation-loading (ETL) layer, a data warehouse (DW) layer, and an on-line analytical processing (OLAP) layer. Methods of designing a DWA, research developments, and most of the commercially available DW technologies tacitly assumed that a DWA is static. In practice, however, a DWA requires changes among others as the result of the evolution of EDSs, changes of the real world represented in a DW, and new user requirements. Changes in the structures of EDSs impact the ETL, DW, and OLAP layers. Since such changes are frequent, developing a technology for handling them automatically or semi-automatically in a DWA is of high practical importance. This chapter discusses challenges in designing, building, and managing a DWA that supports the evolution of structures of EDSs, evolution of an ETL layer, and evolution of a DW. The challenges and their solutions presented here are based on an experience of building a prototype Evolving-ETL and a prototype Multiversion Data Warehouse (MVDW). In details, this chapter presents the following issues: the concept of the MVDW, an approach to querying the MVDW, an approach to handling the evolution of an ETL layer, a technique for sharing data between multiple DW versions, and two index structures for the MVDW.

Download Full-text

Application of Dempster–Shafer Data Fusion Technique in Support of Decision Making with Big Data

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/2645-04 ◽

2017 ◽

Vol 2645 (1) ◽

pp. 32-37 ◽

Cited By ~ 2

Author(s):

Ping Yi ◽

Songling Zhang

Keyword(s):

Decision Making ◽

Big Data ◽

Data Fusion ◽

Incomplete Information ◽

Data Sources ◽

Fusion Technique ◽

Crowd Management ◽

Multiple Data ◽

Data Source ◽

Single Data

This paper introduces applications of the Dempster–Shafer (D-S) data fusion technique in transportation system decision making. D-S inference is a statistics-based data classification technique, and it can be used when data sources contribute discontinuous and incomplete information and no single data source can produce an overwhelmingly high probability of certainty for identifying the most probable event. The technique captures and combines the information contributed by the data sources by using Dempster’s rule to find the conjunction of the events and to determine the highest associated probability. The D-S theory is explained and its implementation described through numerical examples of a ride-hauling service and of crowd management at a subway station. Results from the applications have shown that the technique is very effective in dealing with incomplete information and multiple data sources in the era of big data.

Download Full-text

A New Approach to Generate Hospital Data Warehouse Schema

Hospital Management and Emergency Medicine ◽

10.4018/978-1-7998-2451-0.ch001 ◽

2020 ◽

pp. 1-26

Author(s):

Nouha Arfaoui ◽

Jalel Akaichi

Keyword(s):

Decision Making ◽

Data Warehouse ◽

Information Exchange ◽

Data Sources ◽

Healthcare Industry ◽

Schema Integration ◽

Hospital Data ◽

New Approach ◽

Similar Elements ◽

Existing Data

The healthcare industry generates huge amount of data underused for decision making needs because of the absence of specific design mastered by healthcare actors and the lack of collaboration and information exchange between the institutions. In this work, a new approach is proposed to design the schema of a Hospital Data Warehouse (HDW). It starts by generating the schemas of the Hospital Data Mart (HDM) one for each department taking into consideration the requirements of the healthcare staffs and the existing data sources. Then, it merges them to build the schema of HDW. The bottom-up approach is suitable because the healthcare departments are separately. To merge the schemas, a new schema integration methodology is used. It starts by extracting the similar elements of the schemas and the conflicts and presents them as mapping rules. Then, it transforms the rules into queries and applies them to merge the schemas.

Download Full-text

Design of a Multidimensional Model Using Object Oriented Features in UML

IARS' International Research Journal ◽

10.51611/iars.irj.v1i1.2011.6 ◽

2011 ◽

Vol 1 (1) ◽

Author(s):

Payal Pahwa ◽

Shweta Taneja ◽

Shalini Jain

Keyword(s):

Data Warehouse ◽

System Development ◽

Unified Modeling Language ◽

Conceptual Modeling ◽

Object Oriented ◽

Data Sources ◽

Multidimensional Model ◽

Integration Framework ◽

Analysis And Design ◽

Class Diagrams

A data warehouse is a single repository of data which includes data generated from various operational systems. Conceptual modeling is an important concept in the successful design of a data warehouse. The Unified Modeling Language (UML) has become a standard for object modeling during analysis and design steps of software system development. The paper proposes an object oriented approach to model the process of data warehouse design. The hierarchies of each data element can be explicitly defined, thus highlighting the data granularity. We propose a UML multidimensional model using various data sources based on UML schemas. We present a conceptual-level integration framework on diverse UML data sources on which OLAP operations can be performed. Our integration framework takes into account the benefits of UML (its concepts, relationships and extended features) which is more close to the real world and can model even the complex problems easily and accurately. Two steps are involved in our integration framework. The first one is to convert UML schemas into UML class diagrams. The second is to build a multidimensional model from the UML class diagrams. The white-paper focuses on the transformations used in the second step. We describe how to represent a multidimensional model using a UML star or snowflake diagram with the help of a case study. To the best of our knowledge, we are the first people to represent a UML snowflake diagram that integrates heterogeneous UML data sources.

Download Full-text