Optimization of ETL Process in Data Warehouse Through a Combination of Parallelization and Shared Cache Memory

Extraction, Transformation and Loading (ETL) is introduced as one of the notable subjects in optimization, management, improvement and acceleration of processes and operations in data bases and data warehouses. The creation of ETL processes is potentially one of the greatest tasks of data warehouses and so its production is a time-consuming and complicated procedure. Without optimization of these processes, the implementation of projects in data warehouses area is costly, complicated and time-consuming. The present paper used the combination of parallelization methods and shared cache memory in systems distributed on the basis of data warehouse. According to the conducted assessment, the proposed method exhibited 7.1% speed improvement to kattle optimization instrument and 7.9% to talend instrument in terms of implementation time of the ETL process. Therefore, parallelization could notably improve the ETL process. It eventually caused the management and integration processes of big data to be implemented in a simple way and with acceptable speed.

Download Full-text

Big Data Warehouse

International Journal of Decision Support System Technology ◽

10.4018/ijdsst.2020010101 ◽

2020 ◽

Vol 12 (1) ◽

pp. 1-24

Author(s):

Khaled Dehdouh ◽

Omar Boussaid ◽

Fadila Bentayeb

Keyword(s):

Big Data ◽

Data Warehouse ◽

Database System ◽

Massive Data ◽

Data Warehouses ◽

Online Analysis ◽

Storage Model ◽

Nosql Database ◽

Big Data Warehouse ◽

Oriented Approach

In the Big Data warehouse context, a column-oriented NoSQL database system is considered as the storage model which is highly adapted to data warehouses and online analysis. Indeed, the use of NoSQL models allows data scalability easily and the columnar store is suitable for storing and managing massive data, especially for decisional queries. However, the column-oriented NoSQL DBMS do not offer online analysis operators (OLAP). To build OLAP cubes corresponding to the analysis contexts, the most common way is to integrate other software such as HIVE or Kylin which has a CUBE operator to build data cubes. By using that, the cube is built according to the row-oriented approach and does not allow to fully obtain the benefits of a column-oriented approach. In this article, the focus is to define a cube operator called MC-CUBE (MapReduce Columnar CUBE), which allows building columnar NoSQL cubes according to the columnar approach by taking into account the non-relational and distributed aspects when data warehouses are stored.

Download Full-text

Bioinformatics in translational drug discovery

Bioscience Reports ◽

10.1042/bsr20160180 ◽

2017 ◽

Vol 37 (4) ◽

Cited By ~ 11

Author(s):

Sarah K. Wooller ◽

Graeme Benstead-Hume ◽

Xiangrong Chen ◽

Yusuf Ali ◽

Frances M.G. Pearl

Keyword(s):

Big Data ◽

Drug Discovery ◽

Pharmaceutical Industry ◽

Drug Targets ◽

Large Data ◽

Data Warehouses ◽

The Creation ◽

Novel Drug ◽

Novel Drug Targets ◽

Discovery Pipeline

Bioinformatics approaches are becoming ever more essential in translational drug discovery both in academia and within the pharmaceutical industry. Computational exploitation of the increasing volumes of data generated during all phases of drug discovery is enabling key challenges of the process to be addressed. Here, we highlight some of the areas in which bioinformatics resources and methods are being developed to support the drug discovery pipeline. These include the creation of large data warehouses, bioinformatics algorithms to analyse ‘big data’ that identify novel drug targets and/or biomarkers, programs to assess the tractability of targets, and prediction of repositioning opportunities that use licensed drugs to treat additional indications.

Download Full-text

A Proposal of Methodology for Designing Big Data Warehouses

10.20944/preprints201806.0219.v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Francesco Di Tria ◽

Ezio Lefons ◽

Filippo Tangorra

Keyword(s):

Big Data ◽

Data Warehouse ◽

Design Methodology ◽

Data Sources ◽

Massive Data ◽

Data Warehouses ◽

Short Intervals ◽

New Class ◽

New Business ◽

Business Requirements

Big Data warehouses are a new class of databases that largely use unstructured and volatile data for analytical purpose. Examples of this kind of data sources are those coming from the Web, such as social networks and blogs, or from sensor networks, where huge amounts of data may be available only for short intervals of time. In order to manage massive data sources, a strategy must be adopted to define multidimensional schemas in presence of fast-changing situations or even undefined business requirements. In the paper, we propose a design methodology that adopts agile and automatic approaches, in order to reduce the time necessary to integrate new data sources and to include new business requirements on the fly. The data are immediately available for analyses, since the underlying architecture is based on a virtual data warehouse that does not require the importing phase. Examples of application of the methodology are presented along the paper in order to show the validity of this approach compared to a traditional one.

Download Full-text

REAL-D: A Schema for Data Warehouses

Journal of Information Systems ◽

10.2308/jis.1999.13.1.49 ◽

1999 ◽

Vol 13 (1) ◽

pp. 49-62 ◽

Cited By ~ 10

Author(s):

Daniel E. O'Leary

Keyword(s):

Data Warehouse ◽

Data Warehouses ◽

Time Period ◽

The Creation ◽

Existing Data

This article integrates McCarthy's REA (Resources-Events-Agents) model and the closely related REAL (Resources Events Agents Locations) model with general capabilities and requirements of data warehouses. REA/REAL contribute a theory for capturing information about events and a focus on control relationships. Data warehouses bring time-period information and a focus on information facilitating the creation of value. Using aspects from both camps, a hybrid schema is developed called REAL-D, REAL for Data warehouses. Existing data warehouse approaches lack theory, while REA/REAL are theory based. Unique demands on data warehouses, however, make additional requirements on REA/REAL, including (1) addition of time period as another dimension in order to allow rollups from hour to day to week to month to year, (2) addition of location to facilitate rollups from office to city to district, (3) change from a pure location dimension to a nonhomogeneous dimension that allows rollup from person to office, and (4) change of relationships of agents from one of control to a marketing-oriented one.

Download Full-text

Towards Comparative Analysis of Resumption Techniques in ETL

Indonesian Journal of Information Systems ◽

10.24002/ijis.v3i2.3776 ◽

2021 ◽

Vol 3 (2) ◽

pp. 82

Author(s):

Mohammed Muddasir ◽

Raghuveer K ◽

Dayanand R

Keyword(s):

Comparative Analysis ◽

Real Time ◽

Data Warehouse ◽

Data Warehouses ◽

Data Bases ◽

Loading Process ◽

Application Analysis ◽

Block Based ◽

Operational Data ◽

Real Time Application

Data warehouses are loaded with data from sources such as operational data bases. Failure of loading process or failure of any of the process such as extraction or transformation is expensive because of the non-availability of data for analysis. With the advent of e-commerce and many real time application analysis of data in real time becomes a norm and hence any misses while the data is being loaded into data warehouse needs to be handled in an efficient and optimized way. The techniques to handle failure of process to populate the data are very much important as the actual loading process. Alternative arrangement needs to be made for in case of failure so that processes of populating the data warehouse are done in time. This paper explores the various ways through which a failed process of populating the data warehouse could be resumed. Various resumption techniques are compared and a novel block based technique is proposed to improve one of the existing resumption techniques.

Download Full-text

Formalizing the Mapping of UML Conceptual Schemas to Column-Oriented Databases

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2018070103 ◽

2018 ◽

Vol 14 (3) ◽

pp. 44-68 ◽

Cited By ~ 1

Author(s):

Fatma Abdelhedi ◽

Amal Ait Brahim ◽

Gilles Zurfluh

Keyword(s):

Big Data ◽

Data Warehouse ◽

Relational Databases ◽

Traditional Approach ◽

Physical Models ◽

Decision Making Process ◽

Nosql Databases ◽

Care Field ◽

Sufficient Degree

Nowadays, most organizations need to improve their decision-making process using Big Data. To achieve this, they have to store Big Data, perform an analysis, and transform the results into useful and valuable information. To perform this, it's necessary to deal with new challenges in designing and creating data warehouse. Traditionally, creating a data warehouse followed well-governed process based on relational databases. The influence of Big Data challenged this traditional approach primarily due to the changing nature of data. As a result, using NoSQL databases has become a necessity to handle Big Data challenges. In this article, the authors show how to create a data warehouse on NoSQL systems. They propose the Object2NoSQL process that generates column-oriented physical models starting from a UML conceptual model. To ensure efficient automatic transformation, they propose a logical model that exhibits a sufficient degree of independence so as to enable its mapping to one or more column-oriented platforms. The authors provide experiments of their approach using a case study in the health care field.

Download Full-text

A Data Modelling Method for Big Data Warehouses

Information Systems - Lecture Notes in Business Information Processing ◽

10.1007/978-3-030-44322-1_7 ◽

2020 ◽

pp. 85-98

Author(s):

Marta Nogueira ◽

João Galvão ◽

Maribel Y. Santos

Keyword(s):

Big Data ◽

Data Modelling ◽

Data Warehouses ◽

Modelling Method

Download Full-text

SkipSJoin: A New Physical Design for Distributed Big Data Warehouses in Hadoop

Conceptual Modeling - Lecture Notes in Computer Science ◽

10.1007/978-3-030-33223-5_21 ◽

2019 ◽

pp. 255-263

Author(s):

Yassine Ramdane ◽

Nadia Kabachi ◽

Omar Boussaid ◽

Fadila Bentayeb

Keyword(s):

Big Data ◽

Physical Design ◽

Data Warehouses

Download Full-text

Uncovering Data Warehouse Issues and Challenges in Big Data Management

Communications in Computer and Information Science - Big Data, Machine Learning, and Applications ◽

10.1007/978-3-030-62625-9_5 ◽

2020 ◽

pp. 48-59

Author(s):

Rohit Kr Batwada ◽

Namita Mittal ◽

Emmanuel S. Pilli

Keyword(s):

Big Data ◽

Data Management ◽

Data Warehouse

Download Full-text

Data Analytical Processing in Data Warehouses

Foundations of Management ◽

10.2478/v10238-012-0023-x ◽

2010 ◽

Vol 2 (1) ◽

pp. 99-116

Author(s):

Katarzyna Rostek

Keyword(s):

Data Processing ◽

Data Warehouse ◽

Analytical Data ◽

Data Warehouses ◽

Practical Guidelines ◽

Processing Information ◽

Scope Of Application ◽

Analytical Processing ◽

Effective Use

Data Analytical Processing in Data Warehouses The article presents issues connected with processing information from data warehouses (the analytical enterprise databases) and two basic types of analytical data processing in data warehouse. The genesis, main definitions, scope of application and real examples from business implementations will be described for each type of analysis. There will be presented copyrighted method of knowledge discovering in databases, together with practical guidelines for its proper and effective use in the enterprise.

Download Full-text