A cost model for NDP-aware query optimization for KV-stores

Author(s):  
Christian Knödler ◽  
Tobias Vinçon ◽  
Arthur Bernhardt ◽  
Ilia Petrov ◽  
Leonardo Solis-Vasquez ◽  
...  
Author(s):  
Andreas M. Weiner ◽  
Theo Härder

Since the very beginning of query processing in database systems, cost-based query optimization has been the essential strategy for effectively answering complex queries on large documents. XML documents can be efficiently stored and processed using native XML database management systems. Even though such systems can choose from a huge repertoire of join operators (e. g., Structural Joins and Holistic Twig Joins) and various index access operators to efficiently evaluate queries on XML documents, the development of full-fledged XML query optimizers is still in its infancy. Especially the evaluation of complex XQuery expressions using these operators is not well understood and needs further research. The extensible, rule-based, and cost-based XML query optimization framework proposed in this chapter, serves as a testbed for exploring how and whether well-known concepts from relational query optimization (e. g., join reordering) can be reused and which new techniques can make a significant contribution to speed-up query execution. Using the best practices and an appropriate cost model that will be developed using this framework, it can be turned into a robust cost-based XML query optimizer in the future.


Author(s):  
Qiang Zhu ◽  
Per-Åke Larson

A crucial challenge for global query optimization in a multidatabase system (MDBS) is that some local optimization information, such as local cost parameters, may not be accurately known at the global level because of local autonomy. Traditional query optimization techniques using a crisp cost model may not be suitable for an MDBS because precise information is required. In this paper we present a new approach that performs global query optimization using a fuzzy cost model that allows fuzzy information. We suggest methods for establishing a fuzzy cost model and introduce a fuzzy optimization criterion that can be used with a fuzzy cost model. We discuss the relationship between the fuzzy optimization approach and the traditional (crisp) optimization approach and show that the former has a better chance to find a good execution strategy for a query in an MDBS environment, but its complexity may grow exponentially compared with the complexity of the later. To reduce the complexity, we suggest to use so-called k-approximate fuzzy values to approximate all fuzzy values during fuzzy query optimization. It is proven that the improved fuzzy approach has the same order of complexity as the crisp approach.


2008 ◽  
Vol 8 (3) ◽  
pp. 393-409 ◽  
Author(s):  
EDNA RUCKHAUS ◽  
EDUARDO RUIZ ◽  
MARÍA-ESTHER VIDAL

AbstractWe address the problem of answering Web ontology queries efficiently. An ontology is formalized as adeductive ontology base(DOB), a deductive database that comprises the ontology's inference axioms and facts. A cost-based query optimization technique for DOB is presented. A hybrid cost model is proposed to estimate the cost and cardinality of basic and inferred facts. Cardinality and cost of inferred facts are estimated using an adaptive sampling technique, while techniques of traditional relational cost models are used for estimating the cost of basic facts and conjunctive ontology queries. Finally, we implement a dynamic-programming optimization algorithm to identify query evaluation plans that minimize the number of intermediate inferred facts. We modeled a subset of the Web ontology language Lite as a DOB and performed an experimental study to analyze the predictive capacity of our cost model and the benefits of the query optimization technique. Our study has been conducted over synthetic and real-world Web ontology language ontologies and shows that the techniques are accurate and improve query performance.


2013 ◽  
Vol 380-384 ◽  
pp. 2850-2853
Author(s):  
Yan Qin Li ◽  
Cai Tian Zhang

In order to improve the performance of the query optimization for the distribute database, an improved query optimization algorithm was proposed based on the genetic algorithm. The query execution cost model based on the genetic algorithm was proposed in this paper. The distributed database was emerged in the 70's of the last century and developed with the progress of the computer technology and network technology, the distributed database was the database system which is distributed storage dispersedly in physics and with centralized processing in mathematic logic. Because the storage points were not uniform, the structure of the distributed database is much more complicated than the centralized database. Both the genetic algorithm and the dynamic exhaustive planning algorithm were taken in the query simulation for the performance comparison. The result shows that the genetic query optimization method in this paper has better performance in the distributed database query application. The case study and the simulation result show that the algorithm can get a satisfactory optimization result in a few iterations and the query optimization algorithm based on the genetic method has nice performance of the query optimization property, and the consumption and costs of the query is reduced to the minimum. The method which this paper proposed has good application performance and is valuable to put into practice.


2021 ◽  
Vol 14 (11) ◽  
pp. 2019-2032
Author(s):  
Parimarjan Negi ◽  
Ryan Marcus ◽  
Andreas Kipf ◽  
Hongzi Mao ◽  
Nesime Tatbul ◽  
...  

Recently there has been significant interest in using machine learning to improve the accuracy of cardinality estimation. This work has focused on improving average estimation error, but not all estimates matter equally for downstream tasks like query optimization. Since learned models inevitably make mistakes, the goal should be to improve the estimates that make the biggest difference to an optimizer. We introduce a new loss function, Flow-Loss, for learning cardinality estimation models. Flow-Loss approximates the optimizer's cost model and search algorithm with analytical functions, which it uses to optimize explicitly for better query plans. At the heart of Flow-Loss is a reduction of query optimization to a flow routing problem on a certain "plan graph", in which different paths correspond to different query plans. To evaluate our approach, we introduce the Cardinality Estimation Benchmark (CEB) which contains the ground truth cardinalities for sub-plans of over 16 K queries from 21 templates with up to 15 joins. We show that across different architectures and databases, a model trained with Flow-Loss improves the plan costs and query runtimes despite having worse estimation accuracy than a model trained with Q-Error. When the test set queries closely match the training queries, models trained with both loss functions perform well. However, the Q-Error-trained model degrades significantly when evaluated on slightly different queries (e.g., similar but unseen query templates), while the Flow-Loss-trained model generalizes better to such situations, achieving 4 -- 8× better 99th percentile runtimes on unseen templates with the same model architecture and training data.


Sign in / Sign up

Export Citation Format

Share Document