Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management
Latest Publications


TOTAL DOCUMENTS

11
(FIVE YEARS 11)

H-INDEX

1
(FIVE YEARS 1)

Published By IGI Global

9781522555162, 9781522555179

Author(s):  
Khaled Dehdouh

In the big data warehouses context, a column-oriented NoSQL database system is considered as the storage model which is highly adapted to data warehouses and online analysis. Indeed, the use of NoSQL models allows data scalability easily and the columnar store is suitable for storing and managing massive data, especially for decisional queries. However, the column-oriented NoSQL DBMS do not offer online analysis operators (OLAP). To build OLAP cubes corresponding to the analysis contexts, the most common way is to integrate other software such as HIVE or Kylin which has a CUBE operator to build data cubes. By using that, the cube is built according to the row-oriented approach and does not allow to fully obtain the benefits of a column-oriented approach. In this chapter, the main contribution is to define a cube operator called MC-CUBE (MapReduce Columnar CUBE), which allows building columnar NoSQL cubes according to the columnar approach by taking into account the non-relational and distributed aspects when data warehouses are stored.


Author(s):  
Marwa Manaa ◽  
Thouraya Sakouhi ◽  
Jalel Akaichi

Mobility data became an important paradigm for computing performed in various areas. Mobility data is considered as a core revealing the trace of mobile objects displacements. While each area presents a different optic of trajectory, they aim to support mobility data with domain knowledge. Semantic annotations may offer a common model for trajectories. Ontology design patterns seem to be promising solutions to define such trajectory related pattern. They appear more suitable for the annotation of multiperspective data than the only use of ontologies. The trajectory ontology design pattern will be used as a semantic layer for trajectory data warehouses for the sake of analyzing instantaneous behaviors conducted by mobile entities. In this chapter, the authors propose a semantic approach for the semantic modeling of trajectory and trajectory data warehouses based on a trajectory ontology design pattern. They validate the proposal through real case studies dealing with behavior analysis and animal tracking case studies.


Author(s):  
Olfa Layouni ◽  
Jalel Akaichi

Spatio-temporal data warehouses store enormous amount of data. They are usually exploited by spatio-temporal OLAP systems to extract relevant information. For extracting interesting information, the current user launches spatio-temporal OLAP (ST-OLAP) queries to navigate within a geographic data cube (Geo-cube). Very often choosing which part of the Geo-cube to navigate further, and thus designing the forthcoming ST-OLAP query, is a difficult task. So, to help the current user refine his queries after launching in the geo-cube his current query, we need a ST-OLAP queries suggestion by exploiting a Geo-cube. However, models that focus on adapting to a specific user can help to improve the probability of the user being satisfied. In this chapter, first, the authors focus on assessing the similarity between spatio-temporal OLAP queries in term of their GeoMDX queries. Then, they propose a personalized query suggestion model based on users' search behavior, where they inject relevance between queries in the current session and current user' search behavior into a basic probabilistic model.


Author(s):  
Jorge Bernardino ◽  
Joaquim Lapa ◽  
Ana Almeida

A big data warehouse enables the analysis of large amounts of information that typically comes from the organization's transactional systems (OLTP). However, today's data warehouse systems do not have the capacity to handle the massive amount of data that is currently produced. Business intelligence (BI) is a collection of decision support technologies that enable executives, managers, and analysts to make better and faster decisions. Organizations must make good use of business intelligence platforms to quickly acquire desirable information from the huge volume of data to reduce the time and increase the efficiency of decision-making processes. In this chapter, the authors present a comparative analysis of commercial and open source BI tools capabilities, in order to aid organizations in the selection process of the most suitable BI platform. They also evaluated and compared six major open source BI platforms: Actuate, Jaspersoft, Jedox/Palo, Pentaho, SpagoBI, and Vanilla; and six major commercial BI platforms: IBM Cognos, Microsoft BI, MicroStrategy, Oracle BI, SAP BI, and SAS BI & Analytics.


Author(s):  
Francisca Vale Lima ◽  
Carlos Costa ◽  
Maribel Yasmina Santos

The large volume of data that is constantly being generated leads to the need of extracting useful patterns, trends, or insights from this data, raising the interest in business intelligence and big data analytics. The volume, velocity, and variety of data highlight the need for concepts like real-time big data warehouses (RTBDWs). The lack of guidelines or methodological approaches for implementing these systems requires further research in this recent topic. This chapter presents the proposal of a RTBDW architecture that includes the main components and data flows needed to collect, process, store, and analyze the available data, integrating streaming with batch data and enabling real-time decision making. Using Twitter data, several technologies were evaluated to understand their performance. The obtained results were satisfactory and allowed the identification of a methodological approach that can be followed for the implementation of this type of system.


Author(s):  
Shigeaki Sakurai

This chapter introduces a method that discovers characteristic sequential patterns from sequential data based on background knowledge. The sequential data is composed of rows of items. This chapter focuses on the sequential data based on the tabular structured data. That is, each item is composed of an attribute and an attribute value. Also, this chapter focuses on item constraints in order to describe the background knowledge. The constraints describe the combination of items included in sequential patterns. They can represent the interests of analysts. Therefore, they can easily discover sequential patterns coinciding to the interests of the analysts as characteristic sequential patterns. In addition, this chapter focuses on the special case of the item constraints. It is constrained at the last item of the sequential patterns. The discovered patterns are used to the analysis of cause, and reason and can predict the last item in the case that the sub-sequence is given. This chapter introduces the property of the item constraints for the last item.


Author(s):  
Salman Ahmed Shaikh ◽  
Kousuke Nakabasami ◽  
Toshiyuki Amagasa ◽  
Hiroyuki Kitagawa

Data warehousing and multidimensional analysis go side by side. Data warehouses provide clean and partially normalized data for fast, consistent, and interactive multidimensional analysis. With the advancement in data generation and collection technologies, businesses and organizations are now generating big data (defined by 3Vs; i.e., volume, variety, and velocity). Since the big data is different from traditional data, it requires different set of tools and techniques for processing and analysis. This chapter discusses multidimensional analysis (also known as on-line analytical processing or OLAP) of big data by focusing particularly on data streams, characterized by huge volume and high velocity. OLAP requires to maintain a number of materialized views corresponding to user queries for interactive analysis. Precisely, this chapter discusses the issues in maintaining the materialized views for data streams, the use of special window for the maintenance of materialized views and the coupling issues of stream processing engine (SPE) with OLAP engine.


Author(s):  
Xiufeng Liu ◽  
Huan Huo ◽  
Nadeem Iftikhar ◽  
Per Sieverts Nielsen

Data warehousing populates data from different source systems into a central data warehouse (DW) through extraction, transformation, and loading (ETL). Massive transaction data are routinely recorded in a variety of applications such as retail commerce, bank systems, and website management. Transaction data record the timestamp and relevant reference data needed for a particular transaction record. It is a non-trivial task for a standard ETL to process transaction data with dependencies and high velocity. This chapter presents a two-tiered segmentation approach for transaction data warehousing. The approach uses a so-called two-staging ETL method to process detailed records from operational systems, followed by a dimensional data process to populate the data store with a star or snowflake schema. The proposed approach is an all-in-one solution capable of processing fast/slowly changing data and early/late-arriving data. This chapter evaluates the proposed method, and the results have validated the effectiveness of the proposed approach for processing transaction data.


Author(s):  
Marko Petrović ◽  
Nina Turajlić ◽  
Milica Vučković ◽  
Sladjan Babarogić ◽  
Nenad Aničić

ETL process development is the most complex and expensive phase of data warehouse development so research is focused on its conceptualization and automation. A new solution (model-driven ETL approach – M-ETL-A), based on domain-specific modeling, is proposed for the formal specification of ETL processes and their implementation. Several domain-specific languages (DSLs) are introduced, each defining concepts relevant for a specific aspect of an ETL process (primarily, languages for specifying the data flow and the control flow). A specific platform (ETL-PL) technologically supports the modeling (using the DSLs) and automated transformation of models into the executable code of a specific application framework. ETL-PL development environment comprises tools for ETL process modeling (tools for defining the abstract and concrete DSL syntax and for creating models in accordance with the DSLs). ETL-PL execution environment consists of services responsible for the automatic generation of executable code from models and execution of the generated code.


Author(s):  
Kornelije Rabuzin

This chapter presents the concept of “deductive data warehouses.” Deductive data warehouses rely on deductive databases but use a data warehouse in the background instead of a database. The authors show how Datalog, as a logic programming language, can be used to perform on-line analytical processing (OLAP) analysis on data. For that purpose, a small data warehouse has been implemented. Furthermore, they propose and briefly discuss “Datalog by example” as a visual front-end tool for posing Datalog queries to deductive data warehouses.


Sign in / Sign up

Export Citation Format

Share Document