Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines

Author(s):  
Ramya Prabhakar ◽  
Sudharshan S. Vazhkudai ◽  
Youngjae Kim ◽  
Ali R. Butt ◽  
Min Li ◽  
...  
2016 ◽  
Vol 12 (3) ◽  
pp. 32-50
Author(s):  
Xiufeng Liu ◽  
Nadeem Iftikhar ◽  
Huan Huo ◽  
Per Sieverts Nielsen

In data warehousing, the data from source systems are populated into a central data warehouse (DW) through extraction, transformation and loading (ETL). The standard ETL approach usually uses sequential jobs to process the data with dependencies, such as dimension and fact data. It is a non-trivial task to process the so-called early-/late-arriving data, which arrive out of order. This paper proposes a two-level data staging area method to optimize ETL. The proposed method is an all-in-one solution that supports processing different types of data from operational systems, including early-/late-arriving data, and fast-/slowly-changing data. The introduced additional staging area decouples loading process from data extraction and transformation, which improves ETL flexibility and minimizes intervention to the data warehouse. This paper evaluates the proposed method empirically, which shows that it is more efficient and less intrusive than the standard ETL method.


Author(s):  
Ardhian Agung Yulianto

While data warehouse is designed to support the decision-making function, the most time-consuming part is Extract Transform Load (ETL) process. Case in Academic Data Warehouse, when data source came from faculty’s distributed database, although having a typical database but become not easier to integrate. This paper presents the ETL detail process following Data Flow Thread in data staging area for identifying, profiling, the content analyzing including all tables in data sources, and then cleaning, confirming dimension and data delivery to the data warehouse. Those processes are running gradually from each distributed database data sources until it merged. Dimension table and fact table are generated in a multidimensional model. ETL tool is Pentaho Data Integration 6.1. ETL testing is done by comparing data source and data target and DW testing conducted by comparing the data analysis between SQL query and Saiku Analytics plugin in Pentaho Business Analytic Server.


2020 ◽  
Vol 19 (2) ◽  
pp. 211
Author(s):  
Yusuke Sawa ◽  
Chieko Tamura ◽  
Toshio Ikeuchi ◽  
Tetsuo Shimada ◽  
Kaoru Fujii ◽  
...  
Keyword(s):  

2009 ◽  
Vol 46 (1) ◽  
pp. 71-82 ◽  
Author(s):  
Min Zhou ◽  
Onkar Sahni ◽  
H. Jin Kim ◽  
C. Alberto Figueroa ◽  
Charles A. Taylor ◽  
...  

Author(s):  
Md. Mostofa Ali Patwary ◽  
Nadathur Rajagopalan Satish ◽  
Narayanan Sundaram ◽  
Jialin Liu ◽  
Peter Sadowski ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document