Big Data Warehouse

Khaled Dehdouh; Omar Boussaid; Fadila Bentayeb

doi:10.4018/ijdsst.2020010101

Big Data Warehouse

International Journal of Decision Support System Technology ◽

10.4018/ijdsst.2020010101 ◽

2020 ◽

Vol 12 (1) ◽

pp. 1-24

Author(s):

Khaled Dehdouh ◽

Omar Boussaid ◽

Fadila Bentayeb

Keyword(s):

Big Data ◽

Data Warehouse ◽

Database System ◽

Massive Data ◽

Data Warehouses ◽

Online Analysis ◽

Storage Model ◽

Nosql Database ◽

Big Data Warehouse ◽

Oriented Approach

In the Big Data warehouse context, a column-oriented NoSQL database system is considered as the storage model which is highly adapted to data warehouses and online analysis. Indeed, the use of NoSQL models allows data scalability easily and the columnar store is suitable for storing and managing massive data, especially for decisional queries. However, the column-oriented NoSQL DBMS do not offer online analysis operators (OLAP). To build OLAP cubes corresponding to the analysis contexts, the most common way is to integrate other software such as HIVE or Kylin which has a CUBE operator to build data cubes. By using that, the cube is built according to the row-oriented approach and does not allow to fully obtain the benefits of a column-oriented approach. In this article, the focus is to define a cube operator called MC-CUBE (MapReduce Columnar CUBE), which allows building columnar NoSQL cubes according to the columnar approach by taking into account the non-relational and distributed aspects when data warehouses are stored.

Download Full-text

Building OLAP Cubes From Columnar NoSQL Data Warehouses

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5516-2.ch006 ◽

2019 ◽

pp. 129-157

Author(s):

Khaled Dehdouh

Keyword(s):

Big Data ◽

Database System ◽

Massive Data ◽

Data Warehouses ◽

Online Analysis ◽

Storage Model ◽

Data Cubes ◽

Nosql Database ◽

Oriented Approach

In the big data warehouses context, a column-oriented NoSQL database system is considered as the storage model which is highly adapted to data warehouses and online analysis. Indeed, the use of NoSQL models allows data scalability easily and the columnar store is suitable for storing and managing massive data, especially for decisional queries. However, the column-oriented NoSQL DBMS do not offer online analysis operators (OLAP). To build OLAP cubes corresponding to the analysis contexts, the most common way is to integrate other software such as HIVE or Kylin which has a CUBE operator to build data cubes. By using that, the cube is built according to the row-oriented approach and does not allow to fully obtain the benefits of a column-oriented approach. In this chapter, the main contribution is to define a cube operator called MC-CUBE (MapReduce Columnar CUBE), which allows building columnar NoSQL cubes according to the columnar approach by taking into account the non-relational and distributed aspects when data warehouses are stored.

Download Full-text

A Proposal of Methodology for Designing Big Data Warehouses

10.20944/preprints201806.0219.v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Francesco Di Tria ◽

Ezio Lefons ◽

Filippo Tangorra

Keyword(s):

Big Data ◽

Data Warehouse ◽

Design Methodology ◽

Data Sources ◽

Massive Data ◽

Data Warehouses ◽

Short Intervals ◽

New Class ◽

New Business ◽

Business Requirements

Big Data warehouses are a new class of databases that largely use unstructured and volatile data for analytical purpose. Examples of this kind of data sources are those coming from the Web, such as social networks and blogs, or from sensor networks, where huge amounts of data may be available only for short intervals of time. In order to manage massive data sources, a strategy must be adopted to define multidimensional schemas in presence of fast-changing situations or even undefined business requirements. In the paper, we propose a design methodology that adopts agile and automatic approaches, in order to reduce the time necessary to integrate new data sources and to include new business requirements on the fly. The data are immediately available for analyses, since the underlying architecture is based on a virtual data warehouse that does not require the importing phase. Examples of application of the methodology are presented along the paper in order to show the validity of this approach compared to a traditional one.

Download Full-text

Design and Implementation of Distributed Spatial Indexing and Query Processing on a Big Data Warehouse System

Journal of Korean Society for Geospatial Information System ◽

10.7319/kogsis.2021.29.1.025 ◽

2021 ◽

Vol 29 (1) ◽

pp. 25-32

Author(s):

Hyun Gu Cho ◽

Mi Kyung Eum ◽

Jong Yun Lee ◽

Seok Chan Bae ◽

Kwang Woo Nam

Keyword(s):

Big Data ◽

Query Processing ◽

Data Warehouse ◽

Spatial Indexing ◽

Design And Implementation ◽

Big Data Warehouse

Download Full-text

A Semi-Automatic Design Methodology for (Big) Data Warehouse Transforming Facts into Dimensions

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2019.2925621 ◽

2021 ◽

Vol 33 (1) ◽

pp. 28-42

Author(s):

Lucile Sautot ◽

Sandro Bimonte ◽

Ludovic Journaux

Keyword(s):

Big Data ◽

Data Warehouse ◽

Design Methodology ◽

Automatic Design ◽

Big Data Warehouse

Download Full-text

NoSQL Big Data Warehouse: Review and Comparison

Advances in Intelligent Systems and Computing - Intelligent Systems Design and Applications ◽

10.1007/978-3-030-71187-0_36 ◽

2021 ◽

pp. 392-401

Author(s):

Senda Bouaziz ◽

Ahlem Nabli ◽

Faiez Gargouri

Keyword(s):

Big Data ◽

Data Warehouse ◽

Big Data Warehouse

Download Full-text

On construction of a big data warehouse accessing platform for campus power usages

Journal of Parallel and Distributed Computing ◽

10.1016/j.jpdc.2019.05.011 ◽

2019 ◽

Vol 133 ◽

pp. 40-50

Author(s):

Chih-Hung Chang ◽

Fuu-Cheng Jiang ◽

Chao-Tung Yang ◽

Sheng-Cang Chou

Keyword(s):

Big Data ◽

Data Warehouse ◽

Big Data Warehouse

Download Full-text

Privacy-Aware Big Data Warehouse Architecture

2016 IEEE International Congress on Big Data (BigData Congress) ◽

10.1109/bigdatacongress.2016.53 ◽

2016 ◽

Cited By ~ 1

Author(s):

Karthik Navuluri ◽

Ravi Mukkamala ◽

Aftab Ahmad

Keyword(s):

Big Data ◽

Data Warehouse ◽

Big Data Warehouse

Download Full-text

Optimization of ETL Process in Data Warehouse Through a Combination of Parallelization and Shared Cache Memory

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.849 ◽

2016 ◽

Vol 6 (6) ◽

pp. 1241-1244 ◽

Cited By ~ 2

Author(s):

M. Faridi Masouleh ◽

M. A. Afshar Kazemi ◽

M. Alborzi ◽

A. Toloie Eshlaghy

Keyword(s):

Big Data ◽

Data Warehouse ◽

Cache Memory ◽

Data Warehouses ◽

Data Bases ◽

Shared Cache ◽

Optimization Management ◽

Implementation Time ◽

The Creation ◽

Management Improvement

Extraction, Transformation and Loading (ETL) is introduced as one of the notable subjects in optimization, management, improvement and acceleration of processes and operations in data bases and data warehouses. The creation of ETL processes is potentially one of the greatest tasks of data warehouses and so its production is a time-consuming and complicated procedure. Without optimization of these processes, the implementation of projects in data warehouses area is costly, complicated and time-consuming. The present paper used the combination of parallelization methods and shared cache memory in systems distributed on the basis of data warehouse. According to the conducted assessment, the proposed method exhibited 7.1% speed improvement to kattle optimization instrument and 7.9% to talend instrument in terms of implementation time of the ETL process. Therefore, parallelization could notably improve the ETL process. It eventually caused the management and integration processes of big data to be implemented in a simple way and with acceptable speed.

Download Full-text

Big Data

Pattern and Data Analysis in Healthcare Settings - Advances in Medical Technologies and Clinical Practice ◽

10.4018/978-1-5225-0536-5.ch009 ◽

2017 ◽

pp. 158-179

Author(s):

Vinod Kumar ◽

Ramjeevan Singh Thakur

Keyword(s):

Big Data ◽

Database System ◽

Smart Phones ◽

Data Handling ◽

Data Generation ◽

Business Organizations ◽

Nosql Databases ◽

Data Formats ◽

Nosql Database ◽

Tools And Techniques

With every passing day, data generation is increasing exponentially, its volume, variety, velocity are making it quite challenging to analyze, interpret, visualize for gaining the greater insights from the available data. Billions of networked sensors are being embedded in devices such as smart phones, automobiles, social media sites, laptop, PC's and industrial machines etc. that operates, generate and communicate data. Thus, the data obtained from various resources exists in structured, semi-structured and unstructured form. The traditional database system is not suitable to handle these data formats. Therefore, new tools and techniques are developed to work with these data. NoSQL is one of them. Currently, many NoSQL database are available in the market, each one of them specially designed to solve specific type of data handling problems, most of the NoSQL databases are developed with special attention to problem of business organizations and enterprises. The chapter focuses various aspects of NoSQL as tool for handling the big data.

Download Full-text

Towards Convergence in Information Systems Design

Novel Approaches to Information Systems Design - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-7998-2975-1.ch011 ◽

2020 ◽

pp. 247-263

Author(s):

Deepika Prakash

Keyword(s):

Machine Learning ◽

Big Data ◽

Information Systems ◽

Data Warehouse ◽

Business Intelligence ◽

Systems Design ◽

Information Model ◽

Information Systems Design ◽

Nosql Databases ◽

Nosql Database

Three technologies—business intelligence, big data, and machine learning—developed independently and address different types of problems. Data warehouses have been used as systems for business intelligence, and NoSQL databases are used for big data. In this chapter, the authors explore the convergence of business intelligence and big data. Traditionally, a data warehouse is implemented on a ROLAP or MOLAP platform. Whereas MOLAP suffers from having propriety architecture, ROLAP suffers from the inherent disadvantages of RDBMS. In order to mitigate the drawbacks of ROLAP, the authors propose implementing a data warehouse on a NoSQL database. They choose Cassandra as their database. For this they start by identifying a generic information model that captures the requirements of the system to-be. They propose mapping rules that map the components of the information model to the Cassandra data model. They finally show a small implementation using an example.

Download Full-text