scholarly journals Optimization of ETL Process in Data Warehouse Through a Combination of Parallelization and Shared Cache Memory

2016 ◽  
Vol 6 (6) ◽  
pp. 1241-1244 ◽  
Author(s):  
M. Faridi Masouleh ◽  
M. A. Afshar Kazemi ◽  
M. Alborzi ◽  
A. Toloie Eshlaghy

Extraction, Transformation and Loading (ETL) is introduced as one of the notable subjects in optimization, management, improvement and acceleration of processes and operations in data bases and data warehouses. The creation of ETL processes is potentially one of the greatest tasks of data warehouses and so its production is a time-consuming and complicated procedure. Without optimization of these processes, the implementation of projects in data warehouses area is costly, complicated and time-consuming. The present paper used the combination of parallelization methods and shared cache memory in systems distributed on the basis of data warehouse. According to the conducted assessment, the proposed method exhibited 7.1% speed improvement to kattle optimization instrument and 7.9% to talend instrument in terms of implementation time of the ETL process. Therefore, parallelization could notably improve the ETL process. It eventually caused the management and integration processes of big data to be implemented in a simple way and with acceptable speed.

2020 ◽  
Vol 12 (1) ◽  
pp. 1-24
Author(s):  
Khaled Dehdouh ◽  
Omar Boussaid ◽  
Fadila Bentayeb

In the Big Data warehouse context, a column-oriented NoSQL database system is considered as the storage model which is highly adapted to data warehouses and online analysis. Indeed, the use of NoSQL models allows data scalability easily and the columnar store is suitable for storing and managing massive data, especially for decisional queries. However, the column-oriented NoSQL DBMS do not offer online analysis operators (OLAP). To build OLAP cubes corresponding to the analysis contexts, the most common way is to integrate other software such as HIVE or Kylin which has a CUBE operator to build data cubes. By using that, the cube is built according to the row-oriented approach and does not allow to fully obtain the benefits of a column-oriented approach. In this article, the focus is to define a cube operator called MC-CUBE (MapReduce Columnar CUBE), which allows building columnar NoSQL cubes according to the columnar approach by taking into account the non-relational and distributed aspects when data warehouses are stored.


2017 ◽  
Vol 37 (4) ◽  
Author(s):  
Sarah K. Wooller ◽  
Graeme Benstead-Hume ◽  
Xiangrong Chen ◽  
Yusuf Ali ◽  
Frances M.G. Pearl

Bioinformatics approaches are becoming ever more essential in translational drug discovery both in academia and within the pharmaceutical industry. Computational exploitation of the increasing volumes of data generated during all phases of drug discovery is enabling key challenges of the process to be addressed. Here, we highlight some of the areas in which bioinformatics resources and methods are being developed to support the drug discovery pipeline. These include the creation of large data warehouses, bioinformatics algorithms to analyse ‘big data’ that identify novel drug targets and/or biomarkers, programs to assess the tractability of targets, and prediction of repositioning opportunities that use licensed drugs to treat additional indications.


Author(s):  
Francesco Di Tria ◽  
Ezio Lefons ◽  
Filippo Tangorra

Big Data warehouses are a new class of databases that largely use unstructured and volatile data for analytical purpose. Examples of this kind of data sources are those coming from the Web, such as social networks and blogs, or from sensor networks, where huge amounts of data may be available only for short intervals of time. In order to manage massive data sources, a strategy must be adopted to define multidimensional schemas in presence of fast-changing situations or even undefined business requirements. In the paper, we propose a design methodology that adopts agile and automatic approaches, in order to reduce the time necessary to integrate new data sources and to include new business requirements on the fly. The data are immediately available for analyses, since the underlying architecture is based on a virtual data warehouse that does not require the importing phase. Examples of application of the methodology are presented along the paper in order to show the validity of this approach compared to a traditional one.


1999 ◽  
Vol 13 (1) ◽  
pp. 49-62 ◽  
Author(s):  
Daniel E. O'Leary

This article integrates McCarthy's REA (Resources-Events-Agents) model and the closely related REAL (Resources Events Agents Locations) model with general capabilities and requirements of data warehouses. REA/REAL contribute a theory for capturing information about events and a focus on control relationships. Data warehouses bring time-period information and a focus on information facilitating the creation of value. Using aspects from both camps, a hybrid schema is developed called REAL-D, REAL for Data warehouses. Existing data warehouse approaches lack theory, while REA/REAL are theory based. Unique demands on data warehouses, however, make additional requirements on REA/REAL, including (1) addition of time period as another dimension in order to allow rollups from hour to day to week to month to year, (2) addition of location to facilitate rollups from office to city to district, (3) change from a pure location dimension to a nonhomogeneous dimension that allows rollup from person to office, and (4) change of relationships of agents from one of control to a marketing-oriented one.


2021 ◽  
Vol 3 (2) ◽  
pp. 82
Author(s):  
Mohammed Muddasir ◽  
Raghuveer K ◽  
Dayanand R

Data warehouses are loaded with data from sources such as operational data bases. Failure of loading process or failure of any of the process such as extraction or transformation is expensive because of the non-availability of data for analysis. With the advent of e-commerce and many real time application analysis of data in real time becomes a norm and hence any misses while the data is being loaded into data warehouse needs to be handled in an efficient and optimized way. The techniques to handle failure of process to populate the data are very much important as the actual loading process. Alternative arrangement needs to be made for in case of failure so that processes of populating the data warehouse are done in time. This paper explores the various ways through which a failed process of populating the data warehouse could be resumed. Various resumption techniques are compared and a novel block based technique is proposed to improve one of the existing resumption techniques.


2018 ◽  
Vol 14 (3) ◽  
pp. 44-68 ◽  
Author(s):  
Fatma Abdelhedi ◽  
Amal Ait Brahim ◽  
Gilles Zurfluh

Nowadays, most organizations need to improve their decision-making process using Big Data. To achieve this, they have to store Big Data, perform an analysis, and transform the results into useful and valuable information. To perform this, it's necessary to deal with new challenges in designing and creating data warehouse. Traditionally, creating a data warehouse followed well-governed process based on relational databases. The influence of Big Data challenged this traditional approach primarily due to the changing nature of data. As a result, using NoSQL databases has become a necessity to handle Big Data challenges. In this article, the authors show how to create a data warehouse on NoSQL systems. They propose the Object2NoSQL process that generates column-oriented physical models starting from a UML conceptual model. To ensure efficient automatic transformation, they propose a logical model that exhibits a sufficient degree of independence so as to enable its mapping to one or more column-oriented platforms. The authors provide experiments of their approach using a case study in the health care field.


Author(s):  
Yassine Ramdane ◽  
Nadia Kabachi ◽  
Omar Boussaid ◽  
Fadila Bentayeb

2010 ◽  
Vol 2 (1) ◽  
pp. 99-116
Author(s):  
Katarzyna Rostek

Data Analytical Processing in Data Warehouses The article presents issues connected with processing information from data warehouses (the analytical enterprise databases) and two basic types of analytical data processing in data warehouse. The genesis, main definitions, scope of application and real examples from business implementations will be described for each type of analysis. There will be presented copyrighted method of knowledge discovering in databases, together with practical guidelines for its proper and effective use in the enterprise.


Sign in / Sign up

Export Citation Format

Share Document