SkipSJoin: A New Physical Design for Distributed Big Data Warehouses in Hadoop

Cloud computing provides a powerful, scalable and flexible infrastructure into which one can integrate, previously known, techniques and methods of Data Mining. The result of such integration should be strong and capacitive platform that will be able to deal with the increasing production of data, or that will create the conditions for the efficient mining of massive amounts of data from various data warehouses with the aim of creating (useful) information or the production of new knowledge. This paper discusses such technology - the technology of big data mining, known as Cloud Data Mining (CDM).

Download Full-text

Building OLAP Cubes From Columnar NoSQL Data Warehouses

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5516-2.ch006 ◽

2019 ◽

pp. 129-157

Author(s):

Khaled Dehdouh

Keyword(s):

Big Data ◽

Database System ◽

Massive Data ◽

Data Warehouses ◽

Online Analysis ◽

Storage Model ◽

Data Cubes ◽

Nosql Database ◽

Oriented Approach

In the big data warehouses context, a column-oriented NoSQL database system is considered as the storage model which is highly adapted to data warehouses and online analysis. Indeed, the use of NoSQL models allows data scalability easily and the columnar store is suitable for storing and managing massive data, especially for decisional queries. However, the column-oriented NoSQL DBMS do not offer online analysis operators (OLAP). To build OLAP cubes corresponding to the analysis contexts, the most common way is to integrate other software such as HIVE or Kylin which has a CUBE operator to build data cubes. By using that, the cube is built according to the row-oriented approach and does not allow to fully obtain the benefits of a column-oriented approach. In this chapter, the main contribution is to define a cube operator called MC-CUBE (MapReduce Columnar CUBE), which allows building columnar NoSQL cubes according to the columnar approach by taking into account the non-relational and distributed aspects when data warehouses are stored.

Download Full-text

Big Data Warehouses for Smart Industries

Encyclopedia of Big Data Technologies ◽

10.1007/978-3-319-77525-8_204 ◽

2019 ◽

pp. 341-351

Author(s):

Carlos Costa ◽

Carina Andrade ◽

Maribel Yasmina Santos

Keyword(s):

Big Data ◽

Data Warehouses

Download Full-text

Concept of operations for knowledge discovery from Big Data across enterprise data warehouses

10.1117/12.2016321 ◽

2013 ◽

Cited By ~ 3

Author(s):

Sreenivas R. Sukumar ◽

Mohammed M. Olama ◽

Allen W. McNair ◽

James J. Nutaro

Keyword(s):

Big Data ◽

Knowledge Discovery ◽

Data Warehouses ◽

Concept Of Operations

Download Full-text

Optimization of ETL Process in Data Warehouse Through a Combination of Parallelization and Shared Cache Memory

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.849 ◽

2016 ◽

Vol 6 (6) ◽

pp. 1241-1244 ◽

Cited By ~ 2

Author(s):

M. Faridi Masouleh ◽

M. A. Afshar Kazemi ◽

M. Alborzi ◽

A. Toloie Eshlaghy

Keyword(s):

Big Data ◽

Data Warehouse ◽

Cache Memory ◽

Data Warehouses ◽

Data Bases ◽

Shared Cache ◽

Optimization Management ◽

Implementation Time ◽

The Creation ◽

Management Improvement

Extraction, Transformation and Loading (ETL) is introduced as one of the notable subjects in optimization, management, improvement and acceleration of processes and operations in data bases and data warehouses. The creation of ETL processes is potentially one of the greatest tasks of data warehouses and so its production is a time-consuming and complicated procedure. Without optimization of these processes, the implementation of projects in data warehouses area is costly, complicated and time-consuming. The present paper used the combination of parallelization methods and shared cache memory in systems distributed on the basis of data warehouse. According to the conducted assessment, the proposed method exhibited 7.1% speed improvement to kattle optimization instrument and 7.9% to talend instrument in terms of implementation time of the ETL process. Therefore, parallelization could notably improve the ETL process. It eventually caused the management and integration processes of big data to be implemented in a simple way and with acceptable speed.

Download Full-text

Efficient Big Data Modelling and Organization for Hadoop Hive-Based Data Warehouses

Information Systems - Lecture Notes in Business Information Processing ◽

10.1007/978-3-319-65930-5_1 ◽

2017 ◽

pp. 3-16 ◽

Cited By ~ 12

Author(s):

Eduarda Costa ◽

Carlos Costa ◽

Maribel Yasmina Santos

Keyword(s):

Big Data ◽

Data Modelling ◽

Data Warehouses

Download Full-text

SDWP: A New Data Placement Strategy for Distributed Big Data Warehouses in Hadoop

Big Data Analytics and Knowledge Discovery - Lecture Notes in Computer Science ◽

10.1007/978-3-030-27520-4_14 ◽

2019 ◽

pp. 189-205

Author(s):

Yassine Ramdane ◽

Nadia Kabachi ◽

Omar Boussaid ◽

Fadila Bentayeb

Keyword(s):

Big Data ◽

Data Placement ◽

Data Warehouses

Download Full-text

A Framework for Evaluating Design Methodologies for Big Data Warehouses

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2018010102 ◽

2018 ◽

Vol 14 (1) ◽

pp. 15-39 ◽

Cited By ~ 3

Author(s):

Francesco Di Tria ◽

Ezio Lefons ◽

Filippo Tangorra

Keyword(s):

Big Data ◽

Unstructured Data ◽

Data Warehouses ◽

Qualitative Information ◽

Design Specifications ◽

Set Up ◽

Definition Of ◽

Data Context

This article describes how the evaluation of modern data warehouses considers new solutions adopted for facing the radical changes caused by the necessity of reducing the storage volume, while increasing the velocity in multidimensional design and data elaboration, even in presence of unstructured data that are useful for providing qualitative information. The aim is to set up a framework for the evaluation of the physical and methodological characteristics of a data warehouse, realized by considering the factors that affect the data warehouse's lifecycle when taking into account the Big Data issues (Volume, Velocity, Variety, Value, and Veracity). The contribution is the definition of a set of criteria for classifying Big Data Warehouses on the basis of their methodological characteristics. Based on these criteria, the authors defined a set of metrics for measuring the quality of Big Data Warehouses in reference to the design specifications. They show through a case study how the proposed metrics are able to check the eligibility of methodologies falling in different classes in the Big Data context.

Download Full-text

Big Data Warehouse

International Journal of Decision Support System Technology ◽

10.4018/ijdsst.2020010101 ◽

2020 ◽

Vol 12 (1) ◽

pp. 1-24

Author(s):

Khaled Dehdouh ◽

Omar Boussaid ◽

Fadila Bentayeb

Keyword(s):

Big Data ◽

Data Warehouse ◽

Database System ◽

Massive Data ◽

Data Warehouses ◽

Online Analysis ◽

Storage Model ◽

Nosql Database ◽

Big Data Warehouse ◽

Oriented Approach

In the Big Data warehouse context, a column-oriented NoSQL database system is considered as the storage model which is highly adapted to data warehouses and online analysis. Indeed, the use of NoSQL models allows data scalability easily and the columnar store is suitable for storing and managing massive data, especially for decisional queries. However, the column-oriented NoSQL DBMS do not offer online analysis operators (OLAP). To build OLAP cubes corresponding to the analysis contexts, the most common way is to integrate other software such as HIVE or Kylin which has a CUBE operator to build data cubes. By using that, the cube is built according to the row-oriented approach and does not allow to fully obtain the benefits of a column-oriented approach. In this article, the focus is to define a cube operator called MC-CUBE (MapReduce Columnar CUBE), which allows building columnar NoSQL cubes according to the columnar approach by taking into account the non-relational and distributed aspects when data warehouses are stored.

Download Full-text