etl workflow
Recently Published Documents


TOTAL DOCUMENTS

14
(FIVE YEARS 2)

H-INDEX

2
(FIVE YEARS 0)

2021 ◽  
Vol 17 (4) ◽  
pp. 29-47
Author(s):  
Bruno Oliveira ◽  
Óscar Oliveira ◽  
Orlando Belo

Considering extract-transform-load (ETL) as a complex and evolutionary process, development teams must conscientiously and rigorously create log strategies for retrieving the most value of the information that can be gathered from the events that occur through the ETL workflow. Efficient logging strategies must be structured so that metrics, logs, and alerts can, beyond their troubleshooting capabilities, provide insights about the system. This paper presents a configurable and flexible ETL component for creating logging mechanisms in ETL workflows. A pattern-oriented approach is followed as a way to abstract ETL activities and enable its mapping to physical primitives that can be interpreted by ETL commercial tools.



Author(s):  
N Saranya ◽  
R Brindha ◽  
N Aishwariya ◽  
R Kokila ◽  
P Matheswaran ◽  
...  
Keyword(s):  


Author(s):  
Syed Muhammad Fawad Ali ◽  
Johannes Mey ◽  
Maik Thiele

Abstract Today’s ETL tools provide capabilities to develop custom code as user-defined functions (UDFs) to extend the expressiveness of the standard ETL operators. However, while this allows us to easily add new functionalities, it also comes with the risk that the custom code is not intended to be optimized, e.g., by parallelism, and for this reason, it performs poorly for data-intensive ETL workflows. In this paper we present a novel framework, which allows the ETL developer to choose a design pattern in order to write parallelizable code and generates a configuration for the UDFs to be executed in a distributed environment. This enables ETL developers with minimum expertise in distributed and parallel computing to develop UDFs without taking care of parallelization configurations and complexities. We perform experiments on large-scale datasets based on TPC-DS and BigBench. The results show that our approach significantly reduces the effort of ETL developers and at the same time generates efficient parallel configurations to support complex and data-intensive ETL tasks.



2017 ◽  
Vol 20 (1) ◽  
pp. 21-43 ◽  
Author(s):  
Artur Wojciechowski


Author(s):  
Saifur Rehman Malik ◽  
Azra Shamim ◽  
Zanib Bibi ◽  
Sajid Ullah Khan ◽  
Shabir Ahmad Gorsi


2012 ◽  
Vol 16 (3) ◽  
pp. 453-471 ◽  
Author(s):  
Naiqiao Du ◽  
Xiaojun Ye ◽  
Jianmin Wang
Keyword(s):  


2011 ◽  
pp. 111-135 ◽  
Author(s):  
Alkis Simitsis ◽  
Panos Vassiliadis ◽  
Spiros Skiadopoulos ◽  
Timos Sellis

In the early stages of a data warehouse project, the designers/administrators have to come up with a decision concerning the design and deployment of the back-stage architecture. The possible options are (a) the usage of a commercial ETL tool, or (b) the development of an in-house ETL prototype. Both cases have advantages and disadvantages. However, in both cases the design and modeling of the ETL workflows have the same characteristics. The scope of this chapter is to indicate the main challenges, issues, and problems concerning the manufacturing of ETL workflows, in order to assist the designers/administrators to decide which solution suits better to their data warehouse project and to help them construct an efficient, robust and evolvable ETL workflow that implements the refreshment of their warehouse.



Sign in / Sign up

Export Citation Format

Share Document