A Query Beehive Algorithm for Data Warehouse Buffer Management and Query Scheduling

2014 ◽  
Vol 10 (3) ◽  
pp. 34-58 ◽  
Author(s):  
Amira Kerkad ◽  
Ladjel Bellatreche ◽  
Pascal Richard ◽  
Carlos Ordonez ◽  
Dominique Geniet

Analytical queries, like those used in data warehouses and OLAP, are generally interdependent. This is due to the fact that the database is usually modeled with a denormalized star schema or its variants, where most queries pass through a large central fact table. Such interaction has been largely exploited in query optimization techniques such as materialized views. Nevertheless, such approaches usually ignore buffer management and assume queries have a fixed order and are known in advance. We believe such assumptions are too strong and thus they need to be revisited and simplified. In this paper, we study the combination of two problems: buffer management and query scheduling, in both static and dynamic scenarios. We present an NP-hardness study of the joint problem, highlighting its complexity. We then introduce a new and highly efficient algorithm inspired by a beehive. We conduct an extensive experimental evaluation on a real DBMS showing the superiority of our algorithm compared to previous ones as well as its excellent scalability.

2013 ◽  
Vol 9 (2) ◽  
pp. 1-20 ◽  
Author(s):  
Goetz Graefe ◽  
Anisoara Nica ◽  
Knut Stolze ◽  
Thomas Neumann ◽  
Todd Eavis ◽  
...  

A central promise of cloud services is elastic, on-demand provisioning. The provisioning of data on temporarily available nodes is what makes elastic database services a hard problem. The essential task that enables elastic data services is bringing a node and its data up-to-date. Strategies for high availability do not satisfy the need in this context because they bring nodes online and up-to-date by repeating history, e.g., by log shipping. Nodes must become up-to-date and useful for query processing incrementally by key range. What is wanted is a technique such that in a newly added node, during each short period of time, an additional small key range becomes up-to-date, until eventually the entire dataset becomes up-to-date and useful for query processing, with overall update performance comparable to a traditional high-availability strategy that carries the entire dataset forward without regard to key ranges. Even without the entire dataset being available, the node is productive and participates in query processing tasks. The authors’ proposed solution relies on techniques from partitioned B-trees, adaptive merging, deferred maintenance of secondary indexes and of materialized views, and query optimization using materialized views. The paper introduces a family of maintenance strategies for temporarily available copies, the space of possible query execution plans and their cost functions, as well as appropriate query optimization techniques.


2015 ◽  
Vol 11 (2) ◽  
pp. 62-84 ◽  
Author(s):  
Ahcene Boukorca ◽  
Ladjel Bellatreche ◽  
Sid-Ahmed Benali Senouci ◽  
Zoé Faget

Materialized views are queries whose results are stored and maintained in order to facilitate access to data in their underlying base tables of extremely large databases. Selecting the best materialized views for a given query workload is a hard problem. Studies on view selection have considered sharing common sub expressions and other multi-query optimization techniques. Multi-Query Optimization is a well-studied domain in traditional and advanced databases. It aims at optimizing a workload of queries by finding and reusing common sub-expression between queries. Finding the best shared expression is known as a NP-hard problem. The shared expressions usually identified by graph structure have been used to be candidate for materialized views. This shows the strong interdependency between the problems of materialized view selection (PVS) and multi query optimization (PMQO), since the PVS uses the graph structure of the PMQO. Exploring the existing works on PVS considering the interaction between PVS and PMQO figures two main categories of studies: (i) those considering the PMQO as a black box where the output is the graph and (ii) those preparing the graph to guide the materialized view selection process. In this category, the graph generation is based on individual query plans, an approach that does not scale, especially with the explosion of Big Data applications requiring large number of complex queries with high interaction. To ensure a scalable solution, this work proposes a new technique to generate a global processing plan without using individual plans by borrowing techniques used in the electronic design automation (EDA) domain. This paper first presents a rich state of art regarding the PVS and a classification of the most important existing work. Secondly, an analogy between the MQO problem and the EDA domain, in which large circuits are manipulated, is established. Thirdly, it proposes to model the problem with hypergraphs which are massively used to design and test integrated circuits. Fourthly, it proposes a deterministic algorithm to select materialized views using the global processing plan. Finally, experiments are conducted to show the scalability of our approach.


1988 ◽  
Vol 11 (3) ◽  
pp. 241-265
Author(s):  
W. Marek ◽  
C. Rauszer

In this paper, we address the problem of query optimization in distributed databases. We show that horizontal partitions of databases, generated by products of equivalence relations, induce optimization techniques for the basic database operations (i.e., the selection, projection, and join operators). In the case of selection, our method allows for restriction of the number of blocks to be searched in the selection process and subsequent simplification of the selection formula at each block. For the natural join operation, we propose an algorithm that reduces the computation of fragments. Proofs of the correctness of our algorithms are also included.


2003 ◽  
Vol 12 (03) ◽  
pp. 325-363 ◽  
Author(s):  
Joseph Fong ◽  
Qing Li ◽  
Shi-Ming Huang

Data warehouse contains vast amount of data to support complex queries of various Decision Support Systems (DSSs). It needs to store materialized views of data, which must be available consistently and instantaneously. Using a frame metadata model, this paper presents an architecture of a universal data warehousing with different data models. The frame metadata model represents the metadata of a data warehouse, which structures an application domain into classes, and integrates schemas of heterogeneous databases by capturing their semantics. A star schema is derived from user requirements based on the integrated schema, catalogued in the metadata, which stores the schema of relational database (RDB) and object-oriented database (OODB). Data materialization between RDB and OODB is achieved by unloading source database into sequential file and reloading into target database, through which an object relational view can be defined so as to allow the users to obtain the same warehouse view in different data models simultaneously. We describe our procedures of building the relational view of star schema by multidimensional SQL query, and the object oriented view of the data warehouse by Online Analytical Processing (OLAP) through method call, derived from the integrated schema. To validate our work, an application prototype system has been developed in a product sales data warehousing domain based on this approach.


2011 ◽  
Vol 219-220 ◽  
pp. 927-931
Author(s):  
Jun Qiang Liu ◽  
Xiao Ling Guan

In recent years the processing of composite event queries over data streams has attracted a lot of research attention. Traditional database techniques were not designed for stream processing system. Furthermore, example continuous queries are often formulated in declarative query language without specifying the semantics. To overcome these deficiencies, this article presents the design, implementation, and evaluation of a system that executes data streams with semantic information. Then, a set of optimization techniques are proposed for handling query. So, our approach not only makes it possible to express queries with a sound semantics, but also provides a solid foundation for query optimization. Experiment results show that our approach is effective and efficient for data streams and domain knowledge.


Author(s):  
Arijit Sengupta ◽  
V. Ramesh

This chapter presents DSQL, a conservative extension of SQL, as an ad-hoc query language for XML. The development of DSQL follows the theoretical foundations of first order logic, and uses common query semantics already accepted for SQL. DSQL represents a core subset of XQuery that lends well to query optimization techniques; while at the same time allows easy integration into current databases and applications that use SQL. The intent of DSQL is not to replace XQuery, the current W3C recommended XML query language, but to serve as an ad-hoc querying frontend to XQuery. Further, the authors present proofs for important query language properties such as complexity and closure. An empirical study comparing DSQL and XQuery for the purpose of ad-hoc querying demonstrates that users perform better with DSQL for both flat and tree structures, in terms of both accuracy and efficiency.


Author(s):  
Sheng-Uei Guan

This chapter presents an ontology-based query formation and information retrieval system under the mobile commerce (m-commerce) agent framework. A query formation approach that combines the usage of ontology and keywords is implemented. This approach takes advantage of the tree structure in ontology to form queries visually and efficiently. It also uses additional aids such as keywords to complete the query formation process more efficiently. The proposed information retrieval scheme focuses on using genetic algorithms (GAs) to improve computational effectiveness. Other query optimization techniques used include query restructuring by logical terms and numerical constraints replacement.


2009 ◽  
pp. 2292-2300
Author(s):  
Ladjel Bellatreche

Scientific databases and data warehouses store large amounts of data ith several tables and attributes. For instance, the Sloan Digital Sky Survey (SDSS) astronomical database contains a large number of tables with hundreds of attributes, which can be queried in various combinations (Papadomanolakis & Ailamaki, 2004). These queries involve many tables using binary operations, such as joins. To speed up these queries, many optimization structures were proposed that can be divided into two main categories: redundant structures like materialized views, advanced indexing schemes (bitmap, bitmap join indexes, etc.) (Sanjay, Chaudhuri & Narasayya, 2000) and vertical partitioning (Sanjay, Narasayya & Yang 2004) and non redundant structures like horizontal partitioning (Sanjay, Narasayya & Yang 2004; Bellatreche, Boukhalfa & Mohania, 2007) and parallel processing (Datta, Moon, & Thomas, 2000; Stöhr, Märtens & Rahm, 2000). These optimization techniques are used either in a sequential manner ou combined. These combinations are done intra-structures: materialized views and indexes for redundant and partitioning and data parallel processing for no redundant. Materialized views and indexes compete for the same resource representing storage, and incur maintenance overhead in the presence of updates (Sanjay, Chaudhuri & Narasayya, 2000). None work addresses the problem of selecting combined optimization structures. In this paper, we propose two approaches; one for combining a non redundant structures horizontal partitioning and a redundant structure bitmap indexes in order to reduce the query processing and reduce the maintenance overhead, and another to exploit algorithms for vertical partitioning to generate bitmap join indexes. To facilitate the understanding of our approaches, for review these techniques in details.


Sign in / Sign up

Export Citation Format

Share Document