A Genetic Algorithm for Selecting Horizontal Fragments

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch142 ◽

2011 ◽

pp. 920-925

Author(s):

Ladjel Bellatreche

Keyword(s):

Database Systems ◽

Data Partitioning ◽

Materialized Views ◽

Access Path ◽

Multiple Dimensions ◽

Physical Database Design ◽

Join Queries ◽

Speed Up ◽

Disjoint Sets ◽

Horizontal Partitioning

Decision support applications require complex queries, e.g., multi way joins defining on huge warehouses usually modelled using star schemas, i.e., a fact table and a set of data dimensions (Papadomanolakis & Ailamaki, 2004). Star schemas have an important property in terms of join operations between dimensions tables and the fact table (i.e., the fact table contains foreign keys for each dimension). None join operations between dimension tables. Joins in data warehouses (called star join queries) are particularly expensive because the fact table (the largest table in the warehouse by far) participates in every join and multiple dimensions are likely to participate in each join. To speed up star join queries, many optimization structures were proposed: redundant structures (materialized views and advanced index schemes) and non redundant structures (data partitioning and parallel processing). Recently, data partitioning is known as an important aspect of physical database design (Sanjay, Narasayya & Yang, 2004; Papadomanolakis & Ailamaki, 2004). Two types of data partitioning are available (Özsu & Valduriez, 1999): vertical and horizontal partitioning. Vertical partitioning allows tables to be decomposed into disjoint sets of columns. Horizontal partitioning allows tables, materialized views and indexes to be partitioned into disjoint sets of rows that are physically stored and usually accessed separately. Contrary to redundant structures, data partitioning does not replicate data, thereby reducing storage requirement and minimizing maintenance overhead. In this paper, we concentrate only on horizontal data partitioning (HP). HP may affect positively (1) query performance, by performing partition elimination: if a query includes a partition key as a predicate in the WHERE clause, the query optimizer will automatically route the query to only relevant partitions and (2) database manageability: for instance, by allocating partitions in different machines or by splitting any access paths: tables, materialized views, indexes, etc. Most of database systems allow three methods to perform the HP using PARTITION statement: RANGE, HASH and LIST (Sanjay, Narasayya & Yang, 2004). In the range partitioning, an access path (table, view, and index) is split according to a range of values of a given set of columns. The hash mode decomposes the data according to a hash function (provided by the system) applied to the values of the partitioning columns. The list partitioning splits a table according to the listed values of a column. These methods can be combined to generate composite partitioning. Oracle currently supports range-hash and range-list composite partitioning using PARTITION - SUBPARTITION statement. The following SQL statement shows an example of fragmenting a table Student using range partitioning.

Download Full-text

Bitmap Join Indexes vs. Data Partitioning

Database Technologies ◽

10.4018/978-1-60566-058-5.ch140 ◽

2009 ◽

pp. 2292-2300

Author(s):

Ladjel Bellatreche

Keyword(s):

Parallel Processing ◽

Data Partitioning ◽

Optimization Techniques ◽

Sloan Digital Sky Survey ◽

Materialized Views ◽

Binary Operations ◽

Vertical Partitioning ◽

Speed Up ◽

Redundant Structure ◽

Horizontal Partitioning

Scientific databases and data warehouses store large amounts of data ith several tables and attributes. For instance, the Sloan Digital Sky Survey (SDSS) astronomical database contains a large number of tables with hundreds of attributes, which can be queried in various combinations (Papadomanolakis & Ailamaki, 2004). These queries involve many tables using binary operations, such as joins. To speed up these queries, many optimization structures were proposed that can be divided into two main categories: redundant structures like materialized views, advanced indexing schemes (bitmap, bitmap join indexes, etc.) (Sanjay, Chaudhuri & Narasayya, 2000) and vertical partitioning (Sanjay, Narasayya & Yang 2004) and non redundant structures like horizontal partitioning (Sanjay, Narasayya & Yang 2004; Bellatreche, Boukhalfa & Mohania, 2007) and parallel processing (Datta, Moon, & Thomas, 2000; Stöhr, Märtens & Rahm, 2000). These optimization techniques are used either in a sequential manner ou combined. These combinations are done intra-structures: materialized views and indexes for redundant and partitioning and data parallel processing for no redundant. Materialized views and indexes compete for the same resource representing storage, and incur maintenance overhead in the presence of updates (Sanjay, Chaudhuri & Narasayya, 2000). None work addresses the problem of selecting combined optimization structures. In this paper, we propose two approaches; one for combining a non redundant structures horizontal partitioning and a redundant structure bitmap indexes in order to reduce the query processing and reduce the maintenance overhead, and another to exploit algorithms for vertical partitioning to generate bitmap join indexes. To facilitate the understanding of our approaches, for review these techniques in details.

Download Full-text

Bitmap Join Indexes vs. Data Partitioning

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch028 ◽

2011 ◽

pp. 171-177

Author(s):

Ladjel Bellatreche

Keyword(s):

Parallel Processing ◽

Data Partitioning ◽

Optimization Techniques ◽

Sloan Digital Sky Survey ◽

Materialized Views ◽

Binary Operations ◽

Vertical Partitioning ◽

Speed Up ◽

Redundant Structure ◽

Horizontal Partitioning

Download Full-text

View Selection and Materialization

Data Warehousing Design and Advanced Engineering Applications ◽

10.4018/978-1-60566-756-0.ch007 ◽

2010 ◽

pp. 114-130

Author(s):

Zohra Bellahsene

Keyword(s):

Data Warehousing ◽

Database Systems ◽

Selection Method ◽

Materialized Views ◽

Storage Space ◽

View Selection ◽

Dynamic View ◽

Processing Cost ◽

Speed Up ◽

Warehousing Systems

There are many motivations for investigating the view selection problem. At first, materialized views are increasingly being supported by commercial database systems and are used to speed up query response time. Therefore, the problem of choosing an appropriate set of views to materialize in the database is crucial in order to improve query processing cost. Another application of the view selection issue is selecting views to materialize in data warehousing systems to answer decision support queries. The problem addressed in this paper is similar to that of deciding which views to materialize in data warehousing. However, most existing view selection methods are static. Moreover, none of these methods have considered the problem of de-materializing the already materialized views. Yet it is a very important issue since the size of storage space is usually restricted. This chapter deals with the problem of dynamic view selection and with the pending issue of removing materialized views in order to replace less beneficial views with more beneficial ones. We propose a view selection method for deciding which views to materialize according to statistic metadata. More precisely, we have designed and implemented our view selection method, including a polynomial algorithm, to decide which views to materialize.

Download Full-text

Referential Horizontal Partitioning Selection Problem in Data Warehouses

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2009080701 ◽

2009 ◽

Vol 5 (4) ◽

pp. 1-23 ◽

Cited By ~ 21

Author(s):

Ladjel Bellatreche ◽

Kamel Boukhalfa ◽

Pascal Richard ◽

Komla Yamavo Woameno

Keyword(s):

Data Warehouse ◽

Cost Model ◽

Experimental Studies ◽

Optimal Solution ◽

Database Systems ◽

Selection Problem ◽

Hill Climbing ◽

Materialized Views ◽

Data Set ◽

Horizontal Partitioning

Horizontal Partitioning has been largely adopted by the database community, where it took a significant part in the physical design process. Actually, it is supported by most commercial database systems (DBMS), where a native Data Definition Language for decomposing tables/materialized views using various modes is proposed. In traditional databases, horizontal partitioning has been largely studied, where several fragmentation algorithms were proposed to partition tables in isolation. In the relational data warehouse environment, horizontal partitioning consists in decomposing the whole warehouse schema into sub schemas, where each schema contains fragments of dimension and fact tables. Dimension tables are fragmented using the primary partitioning mode, whereas the fact table is divided using referential mode. In this article, the authors first focus on the evolution of horizontal partitioning in commercial DBMS motivated by decision support applications. Secondly, they give a formalization of the referential fragmentation schema selection problem in the data warehouse and they study its hardness to select an optimal solution. Due to its high complexity, they develop two algorithms: hill climbing and simulated annealing with several variants to select a near optimal partitioning schema. Finally, extensive experimental studies are conducted using the data set of APB1 benchmark to compare the quality the proposed algorithms using a mathematical cost model. Based on these experiments, some recommendations are given to advise database administrator for well using horizontal partitioning.

Download Full-text

Horizontal Data Partitioning

Handbook of Research on Innovations in Database Technologies and Applications ◽

10.4018/978-1-60566-242-8.ch023 ◽

2009 ◽

pp. 199-207

Author(s):

Ladjel Bellatreche

Keyword(s):

Distributed Databases ◽

Data Partitioning ◽

Distributed Environment ◽

Parallel Database ◽

Reading And Writing ◽

Query Performance ◽

Parallel Database System ◽

Speed Up ◽

In The Beginning ◽

Horizontal Partitioning

Horizontal data partitioning is the process of splitting access objects into set of disjoint rows. It was first introduced in the end of 70’s and beginning of the 80’s (Ceri et al., 1982) for logically designing databases in order to improve the query performance by eliminating unnecessary accesses to non-relevant data. It knew a large success (in the beginning of the 80’s) in designing homogeneous distributed databases (Ceri et al., 1982; Ceri et al., 1984; Özsu et al., 1999) and parallel databases (DeWitt et al., 1992; Valduriez, 1993). In distributed environment, horizontal partitioning decomposes global tables into horizontal fragments, where each partition may be spread over multiple nodes. End users at the node can perform local queries/transactions on the partition transparently (the fragmentation of data across multiple sites/processors is not visible to the users.). This increases performance for sites that have regular transactions involving certain views of data, whilst maintaining availability and security. In parallel database context (Rao et al., 2002), horizontal partitioning has been used in order to speed up query performance in a sharednothing parallel database system (DeWitt et al., 1992). This will be done by both intra-query and intra-query parallelisms (Valduriez, 1993). It also facilitates the exploitation of the inputs/outputs bandwidth of the disks by reading and writing data in parallel. In this paper, we use fragmentation and partitioning words interchangeably.

Download Full-text

On resource scheduling of multi-join queries in parallel database systems

Information Processing Letters ◽

10.1016/0020-0190(93)90144-x ◽

1993 ◽

Vol 48 (4) ◽

pp. 189-195 ◽

Cited By ~ 7

Author(s):

Kian-Lee Tan ◽

Hongjun Lu

Keyword(s):

Resource Scheduling ◽

Database Systems ◽

Parallel Database ◽

Join Queries ◽

Parallel Database Systems

Download Full-text

A data partitioning approach to speed up the fuzzy ARTMAP algorithm using the Hilbert space-filling curve

2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541) ◽

10.1109/ijcnn.2004.1380997 ◽

2005 ◽

Author(s):

J. Castro ◽

M. Georgiopoulos ◽

R. Demara

Keyword(s):

Hilbert Space ◽

Data Partitioning ◽

Fuzzy Artmap ◽

Space Filling ◽

Space Filling Curve ◽

Speed Up ◽

Filling Curve

Download Full-text

Data partitioning for multicomputer database systems: A cell-based approach

Information Systems ◽

10.1016/0306-4379(93)90032-v ◽

1993 ◽

Vol 18 (5) ◽

pp. 329-342 ◽

Cited By ~ 4

Author(s):

Kien A Hua ◽

Lee Chiang ◽

Honesty C Young

Keyword(s):

Database Systems ◽

Data Partitioning ◽

A Cell

Download Full-text

Beyond equi-joins

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476306 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2599-2612

Author(s):

Nikolaos Tziavelis ◽

Wolfgang Gatterbauer ◽

Mirek Riedewald

Keyword(s):

Experimental Study ◽

State Of The Art ◽

Database Systems ◽

Ranking Function ◽

Space Complexity ◽

Time And Space ◽

Running Time ◽

Join Queries ◽

Time And Space Complexity ◽

Memory Efficient

We study theta-joins in general and join predicates with conjunctions and disjunctions of inequalities in particular, focusing on ranked enumeration where the answers are returned incrementally in an order dictated by a given ranking function. Our approach achieves strong time and space complexity properties: with n denoting the number of tuples in the database, we guarantee for acyclic full join queries with inequality conditions that for every value of k , the k top-ranked answers are returned in O ( n polylog n + k log k ) time. This is within a polylogarithmic factor of O ( n + k log k ), i.e., the best known complexity for equi-joins, and even of O ( n + k ), i.e., the time it takes to look at the input and return k answers in any order. Our guarantees extend to join queries with selections and many types of projections (namely those called "free-connex" queries and those that use bag semantics). Remarkably, they hold even when the number of join results is n ℓ for a join of ℓ relations. The key ingredient is a novel O ( n polylog n )-size factorized representation of the query output , which is constructed on-the-fly for a given query and database. In addition to providing the first nontrivial theoretical guarantees beyond equi-joins, we show in an experimental study that our ranked-enumeration approach is also memory-efficient and fast in practice, beating the running time of state-of-the-art database systems by orders of magnitude.

Download Full-text

Similarity Search for Voxelized CAD Objects

Database Modeling for Industrial Data Management ◽

10.4018/978-1-59140-684-6.ch004 ◽

2011 ◽

pp. 115-147

Author(s):

Hans-Peter Kriegel ◽

Peer Kröger ◽

Martin Pfeifle ◽

Stefan Brecheisen ◽

Marco Pötke ◽

...

Keyword(s):

Similarity Search ◽

Computer Aided Design ◽

Clustering Algorithm ◽

Large Data ◽

Database Systems ◽

Data Partitioning ◽

Modern Application ◽

Data Collections ◽

The Cost ◽

Aided Design

Similarity search in database systems is becoming an increasingly important task in modern application domains such as multimedia, molecular biology, medical imaging, and many others. Especially for CAD (Computer-Aided Design), suitable similarity models and a clear representation of the results can help to reduce the cost of developing and producing new parts by maximizing the reuse of existing parts. In this chapter, we present different similarity models for voxelized CAD data based on space partitioning and data partitioning. Based on these similarity models, we introduce anindustrial prototype, called BOSS, which helps the user to get an overview over a set of CAD objects. BOSS allows the user to easily browse large data collections by graphically displaying the results of a hierarchical clustering algorithm. This representation is well suited for the evaluation of similarity models and to aid an industrial user searching for similar parts.

Download Full-text