Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management

2019 ◽

pp. 129-157

Author(s):

Khaled Dehdouh

Keyword(s):

Big Data ◽

Database System ◽

Massive Data ◽

Data Warehouses ◽

Online Analysis ◽

Storage Model ◽

Data Cubes ◽

Nosql Database ◽

Oriented Approach

In the big data warehouses context, a column-oriented NoSQL database system is considered as the storage model which is highly adapted to data warehouses and online analysis. Indeed, the use of NoSQL models allows data scalability easily and the columnar store is suitable for storing and managing massive data, especially for decisional queries. However, the column-oriented NoSQL DBMS do not offer online analysis operators (OLAP). To build OLAP cubes corresponding to the analysis contexts, the most common way is to integrate other software such as HIVE or Kylin which has a CUBE operator to build data cubes. By using that, the cube is built according to the row-oriented approach and does not allow to fully obtain the benefits of a column-oriented approach. In this chapter, the main contribution is to define a cube operator called MC-CUBE (MapReduce Columnar CUBE), which allows building columnar NoSQL cubes according to the columnar approach by taking into account the non-relational and distributed aspects when data warehouses are stored.

Download Full-text

A Trajectory Ontology Design Pattern for Semantic Trajectory Data Warehouses

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5516-2.ch004 ◽

2019 ◽

pp. 83-104 ◽

Cited By ~ 1

Author(s):

Marwa Manaa ◽

Thouraya Sakouhi ◽

Jalel Akaichi

Keyword(s):

Case Studies ◽

Design Pattern ◽

Design Patterns ◽

Domain Knowledge ◽

Data Warehouses ◽

Trajectory Data ◽

Semantic Approach ◽

Mobility Data ◽

Ontology Design ◽

Ontology Design Pattern

Mobility data became an important paradigm for computing performed in various areas. Mobility data is considered as a core revealing the trace of mobile objects displacements. While each area presents a different optic of trajectory, they aim to support mobility data with domain knowledge. Semantic annotations may offer a common model for trajectories. Ontology design patterns seem to be promising solutions to define such trajectory related pattern. They appear more suitable for the annotation of multiperspective data than the only use of ontologies. The trajectory ontology design pattern will be used as a semantic layer for trajectory data warehouses for the sake of analyzing instantaneous behaviors conducted by mobile entities. In this chapter, the authors propose a semantic approach for the semantic modeling of trajectory and trajectory data warehouses based on a trajectory ontology design pattern. They validate the proposal through real case studies dealing with behavior analysis and animal tracking case studies.

Download Full-text

Personalized Spatio-Temporal OLAP Queries Suggestion Based on User Behavior and a New Similarity Measure

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5516-2.ch005 ◽

2019 ◽

pp. 105-128

Author(s):

Olfa Layouni ◽

Jalel Akaichi

Keyword(s):

Probabilistic Model ◽

User Behavior ◽

Search Behavior ◽

Relevant Information ◽

Current User ◽

Data Cube ◽

Temporal Data ◽

Query Suggestion ◽

Spatio Temporal ◽

User Search Behavior

Spatio-temporal data warehouses store enormous amount of data. They are usually exploited by spatio-temporal OLAP systems to extract relevant information. For extracting interesting information, the current user launches spatio-temporal OLAP (ST-OLAP) queries to navigate within a geographic data cube (Geo-cube). Very often choosing which part of the Geo-cube to navigate further, and thus designing the forthcoming ST-OLAP query, is a difficult task. So, to help the current user refine his queries after launching in the geo-cube his current query, we need a ST-OLAP queries suggestion by exploiting a Geo-cube. However, models that focus on adapting to a specific user can help to improve the probability of the user being satisfied. In this chapter, first, the authors focus on assessing the similarity between spatio-temporal OLAP queries in term of their GeoMDX queries. Then, they propose a personalized query suggestion model based on users' search behavior, where they inject relevance between queries in the current session and current user' search behavior into a basic probabilistic model.

Download Full-text

Commercial and Open Source Business Intelligence Platforms for Big Data Warehousing

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5516-2.ch007 ◽

2019 ◽

pp. 158-181 ◽

Cited By ~ 1

Author(s):

Jorge Bernardino ◽

Joaquim Lapa ◽

Ana Almeida

Keyword(s):

Decision Making ◽

Big Data ◽

Decision Support ◽

Comparative Analysis ◽

Open Source ◽

Data Warehouse ◽

Business Intelligence ◽

Selection Process ◽

Decision Making Processes ◽

Big Data Warehouse

A big data warehouse enables the analysis of large amounts of information that typically comes from the organization's transactional systems (OLTP). However, today's data warehouse systems do not have the capacity to handle the massive amount of data that is currently produced. Business intelligence (BI) is a collection of decision support technologies that enable executives, managers, and analysts to make better and faster decisions. Organizations must make good use of business intelligence platforms to quickly acquire desirable information from the huge volume of data to reduce the time and increase the efficiency of decision-making processes. In this chapter, the authors present a comparative analysis of commercial and open source BI tools capabilities, in order to aid organizations in the selection process of the most suitable BI platform. They also evaluated and compared six major open source BI platforms: Actuate, Jaspersoft, Jedox/Palo, Pentaho, SpagoBI, and Vanilla; and six major commercial BI platforms: IBM Cognos, Microsoft BI, MicroStrategy, Oracle BI, SAP BI, and SAS BI & Analytics.

Download Full-text

Real-Time Big Data Warehousing

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5516-2.ch002 ◽

2019 ◽

pp. 28-57

Author(s):

Francisca Vale Lima ◽

Carlos Costa ◽

Maribel Yasmina Santos

Keyword(s):

Decision Making ◽

Big Data ◽

Real Time ◽

Large Volume ◽

Data Warehousing ◽

Methodological Approach ◽

Big Data Analytics ◽

Data Flows ◽

Main Components ◽

Batch Data

The large volume of data that is constantly being generated leads to the need of extracting useful patterns, trends, or insights from this data, raising the interest in business intelligence and big data analytics. The volume, velocity, and variety of data highlight the need for concepts like real-time big data warehouses (RTBDWs). The lack of guidelines or methodological approaches for implementing these systems requires further research in this recent topic. This chapter presents the proposal of a RTBDW architecture that includes the main components and data flows needed to collect, process, store, and analyze the available data, integrating streaming with batch data and enabling real-time decision making. Using Twitter data, several technologies were evaluated to understand their performance. The obtained results were satisfactory and allowed the identification of a methodological approach that can be followed for the implementation of this type of system.

Download Full-text

Introduction of Item Constraints to Discover Characteristic Sequential Patterns

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5516-2.ch011 ◽

2019 ◽

pp. 279-292

Author(s):

Shigeaki Sakurai

Keyword(s):

Background Knowledge ◽

Structured Data ◽

Sequential Patterns ◽

Sequential Data ◽

Special Case ◽

Attribute Value

This chapter introduces a method that discovers characteristic sequential patterns from sequential data based on background knowledge. The sequential data is composed of rows of items. This chapter focuses on the sequential data based on the tabular structured data. That is, each item is composed of an attribute and an attribute value. Also, this chapter focuses on item constraints in order to describe the background knowledge. The constraints describe the combination of items included in sequential patterns. They can represent the interests of analysts. Therefore, they can easily discover sequential patterns coinciding to the interests of the analysts as characteristic sequential patterns. In addition, this chapter focuses on the special case of the item constraints. It is constrained at the last item of the sequential patterns. The discovered patterns are used to the analysis of cause, and reason and can predict the last item in the case that the sub-sequence is given. This chapter introduces the property of the item constraints for the last item.

Download Full-text

Multidimensional Analysis of Big Data

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5516-2.ch009 ◽

2019 ◽

pp. 198-224

Author(s):

Salman Ahmed Shaikh ◽

Kousuke Nakabasami ◽

Toshiyuki Amagasa ◽

Hiroyuki Kitagawa

Keyword(s):

Big Data ◽

Data Streams ◽

Multidimensional Analysis ◽

Materialized Views ◽

Data Generation ◽

Interactive Analysis ◽

On Line ◽

Analytical Processing ◽

Tools And Techniques ◽

Processing Engine

Data warehousing and multidimensional analysis go side by side. Data warehouses provide clean and partially normalized data for fast, consistent, and interactive multidimensional analysis. With the advancement in data generation and collection technologies, businesses and organizations are now generating big data (defined by 3Vs; i.e., volume, variety, and velocity). Since the big data is different from traditional data, it requires different set of tools and techniques for processing and analysis. This chapter discusses multidimensional analysis (also known as on-line analytical processing or OLAP) of big data by focusing particularly on data streams, characterized by huge volume and high velocity. OLAP requires to maintain a number of materialized views corresponding to user queries for interactive analysis. Precisely, this chapter discusses the issues in maintaining the materialized views for data streams, the use of special window for the maintenance of materialized views and the coupling issues of stream processing engine (SPE) with OLAP engine.

Download Full-text

A Two-Tiered Segmentation Approach for Transaction Data Warehousing

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5516-2.ch001 ◽

2019 ◽

pp. 1-27

Author(s):

Xiufeng Liu ◽

Huan Huo ◽

Nadeem Iftikhar ◽

Per Sieverts Nielsen

Keyword(s):

Data Warehouse ◽

High Velocity ◽

Data Warehousing ◽

Transaction Data ◽

Data Process ◽

Data Store ◽

Segmentation Approach ◽

Operational Systems ◽

Early Late ◽

Central Data

Data warehousing populates data from different source systems into a central data warehouse (DW) through extraction, transformation, and loading (ETL). Massive transaction data are routinely recorded in a variety of applications such as retail commerce, bank systems, and website management. Transaction data record the timestamp and relevant reference data needed for a particular transaction record. It is a non-trivial task for a standard ETL to process transaction data with dependencies and high velocity. This chapter presents a two-tiered segmentation approach for transaction data warehousing. The approach uses a so-called two-staging ETL method to process detailed records from operational systems, followed by a dimensional data process to populate the data store with a star or snowflake schema. The proposed approach is an all-in-one solution capable of processing fast/slowly changing data and early/late-arriving data. This chapter evaluates the proposed method, and the results have validated the effectiveness of the proposed approach for processing transaction data.

Download Full-text

Development of ETL Processes Using the Domain-Specific Modeling Approach

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5516-2.ch010 ◽

2019 ◽

pp. 225-278

Author(s):

Marko Petrović ◽

Nina Turajlić ◽

Milica Vučković ◽

Sladjan Babarogić ◽

Nenad Aničić

Keyword(s):

Process Development ◽

Automatic Generation ◽

Specific Aspect ◽

Application Framework ◽

Development Environment ◽

Model Driven ◽

Domain Specific ◽

Executable Code ◽

Domain Specific Modeling ◽

Specific Modeling

ETL process development is the most complex and expensive phase of data warehouse development so research is focused on its conceptualization and automation. A new solution (model-driven ETL approach – M-ETL-A), based on domain-specific modeling, is proposed for the formal specification of ETL processes and their implementation. Several domain-specific languages (DSLs) are introduced, each defining concepts relevant for a specific aspect of an ETL process (primarily, languages for specifying the data flow and the control flow). A specific platform (ETL-PL) technologically supports the modeling (using the DSLs) and automated transformation of models into the executable code of a specific application framework. ETL-PL development environment comprises tools for ETL process modeling (tools for defining the abstract and concrete DSL syntax and for creating models in accordance with the DSLs). ETL-PL execution environment consists of services responsible for the automatic generation of executable code from models and execution of the generated code.

Download Full-text

Deductive Data Warehouses

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5516-2.ch003 ◽

2019 ◽

pp. 58-82

Author(s):

Kornelije Rabuzin

Keyword(s):

Logic Programming ◽

Data Warehouse ◽

Programming Language ◽

Small Data ◽

Data Warehouses ◽

Logic Programming Language ◽

Front End ◽

On Line ◽

Analytical Processing ◽

Olap Analysis

This chapter presents the concept of “deductive data warehouses.” Deductive data warehouses rely on deductive databases but use a data warehouse in the background instead of a database. The authors show how Datalog, as a logic programming language, can be used to perform on-line analytical processing (OLAP) analysis on data. For that purpose, a small data warehouse has been implemented. Furthermore, they propose and briefly discuss “Datalog by example” as a visual front-end tool for posing Datalog queries to deductive data warehouses.

Download Full-text

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Building OLAP Cubes From Columnar NoSQL Data Warehouses

A Trajectory Ontology Design Pattern for Semantic Trajectory Data Warehouses

Personalized Spatio-Temporal OLAP Queries Suggestion Based on User Behavior and a New Similarity Measure

Commercial and Open Source Business Intelligence Platforms for Big Data Warehousing

Real-Time Big Data Warehousing

Introduction of Item Constraints to Discover Characteristic Sequential Patterns

Multidimensional Analysis of Big Data

A Two-Tiered Segmentation Approach for Transaction Data Warehousing

Development of ETL Processes Using the Domain-Specific Modeling Approach

Deductive Data Warehouses

Export Citation Format

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database ManagementLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Building OLAP Cubes From Columnar NoSQL Data Warehouses

A Trajectory Ontology Design Pattern for Semantic Trajectory Data Warehouses

Personalized Spatio-Temporal OLAP Queries Suggestion Based on User Behavior and a New Similarity Measure

Commercial and Open Source Business Intelligence Platforms for Big Data Warehousing

Real-Time Big Data Warehousing

Introduction of Item Constraints to Discover Characteristic Sequential Patterns

Multidimensional Analysis of Big Data

A Two-Tiered Segmentation Approach for Transaction Data Warehousing

Development of ETL Processes Using the Domain-Specific Modeling Approach

Deductive Data Warehouses

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management
Latest Publications