query optimizer
Recently Published Documents


TOTAL DOCUMENTS

61
(FIVE YEARS 11)

H-INDEX

10
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Tin Vu ◽  
Alberto Belussi ◽  
Sara Migliorini ◽  
Ahmed Eldawy
Keyword(s):  

2021 ◽  
Vol 14 (11) ◽  
pp. 2383-2396
Author(s):  
Philipp Fent ◽  
Thomas Neumann

Groupjoins, the combined execution of a join and a subsequent group by, are common in analytical queries, and occur in about 1/8 of the queries in TPC-H and TPC-DS. While they were originally invented to improve performance, efficient parallel execution of groupjoins can be limited by contention, which limits their usefulness in a many-core system. Having an efficient implementation of groupjoins is highly desirable, as groupjoins are not only used to fuse group by and join but are also introduced by the unnesting component of the query optimizer to avoid nested-loops evaluation of aggregates. Furthermore, the query optimizer needs be able to reason over the result of aggregation in order to schedule it correctly. Traditional selectivity and cardinality estimations quickly reach their limits when faced with computed columns from nested aggregates, which leads to poor cost estimations and thus, suboptimal query plans. In this paper, we present techniques to efficiently estimate, plan, and execute groupjoins and nested aggregates. We propose two novel techniques, aggregate estimates to predict the result distribution of aggregates, and parallel groupjoin execution for a scalable execution of groupjoins. The resulting system has significantly better estimates and a contention-free evaluation of groupjoins, which can speed up some TPC-H queries up to a factor of 2.


2021 ◽  
Vol 7 ◽  
pp. e580
Author(s):  
Elham Azhir ◽  
Nima Jafari Navimipour ◽  
Mehdi Hosseinzadeh ◽  
Arash Sharifi ◽  
Aso Darwesh

Query optimization is the process of identifying the best Query Execution Plan (QEP). The query optimizer produces a close to optimal QEP for the given queries based on the minimum resource usage. The problem is that for a given query, there are plenty of different equivalent execution plans, each with a corresponding execution cost. To produce an effective query plan thus requires examining a large number of alternative plans. Access plan recommendation is an alternative technique to database query optimization, which reuses the previously-generated QEPs to execute new queries. In this technique, the query optimizer uses clustering methods to identify groups of similar queries. However, clustering such large datasets is challenging for traditional clustering algorithms due to huge processing time. Numerous cloud-based platforms have been introduced that offer low-cost solutions for the processing of distributed queries such as Hadoop, Hive, Pig, etc. This paper has applied and tested a model for clustering variant sizes of large query datasets parallelly using MapReduce. The results demonstrate the effectiveness of the parallel implementation of query workloads clustering to achieve good scalability.


2021 ◽  
Vol 13 (1) ◽  
pp. 21-40
Author(s):  
Chenxiao Wang ◽  
Zach Arani ◽  
Le Gruenwald ◽  
Laurent d'Orazio ◽  
Eleazar Leal

In cloud environments, hardware configurations, data usage, and workload allocations are continuously changing. These changes make it difficult for the query optimizer of a cloud database management system (DBMS) to select an optimal query execution plan (QEP). In order to optimize a query with a more accurate cost estimation, performing query re-optimizations during the query execution has been proposed in the literature. However, some of there-optimizations may not provide any performance gain in terms of query response time or monetary costs, which are the two optimization objectives for cloud databases, and may also have negative impacts on the performance due to their overheads. This raises the question of how to determine when are-optimization is beneficial. In this paper, we present a technique called ReOptML that uses machine learning to enable effective re-optimizations. This technique executes a query in stages, employs a machine learning model to predict whether a query re-optimization is beneficial after a stage is executed, and invokes the query optimizer to perform the re-optimization automatically. The experiments comparing ReOptML with existing query re-optimization algorithms show that ReOptML improves query response time from 13% to 35% for skew data and from 13% to 21% for uniform data, and improves monetary cost paid to cloud service providers from 17% to 35% on skewdata.


2021 ◽  
Vol 6 (1) ◽  
pp. 86-101
Author(s):  
Hai Lan ◽  
Zhifeng Bao ◽  
Yuwei Peng

AbstractQuery optimizer is at the heart of the database systems. Cost-based optimizer studied in this paper is adopted in almost all current database systems. A cost-based optimizer introduces a plan enumeration algorithm to find a (sub)plan, and then uses a cost model to obtain the cost of that plan, and selects the plan with the lowest cost. In the cost model, cardinality, the number of tuples through an operator, plays a crucial role. Due to the inaccuracy in cardinality estimation, errors in cost model, and the huge plan space, the optimizer cannot find the optimal execution plan for a complex query in a reasonable time. In this paper, we first deeply study the causes behind the limitations above. Next, we review the techniques used to improve the quality of the three key components in the cost-based optimizer, cardinality estimation, cost model, and plan enumeration. We also provide our insights on the future directions for each of the above aspects.


2020 ◽  
pp. 417-485
Author(s):  
Jesper Wisborg Krogh
Keyword(s):  

Map Reduce, Flink, and Spark, also become more popular in the processing of big data lately. Flink will be an open platform Big Data processing system for Apache-powered batch storage and streaming of data. Flink's query optimizer is constructed for historical information processing (batch) based on parallel storage systems approaches. Flink query query optimizer interprets the questions into jobs of different tasks that are regularly sent. Therefore, taking advantage of task similarities should prevent redundant computation. In this article, the multi-demand optimization model for Flink, Flink was planned and designed on Flink Software Stack's top priority. It's thought-about as an associate in Apache Flink's nursing add-on to maximize multi-demand information sharing. The Flink system takes advantage of option operators ' information sharing resources to reduce overlap and duplication of multi-query in-network information movement. Research findings show that the leveraging of shared option operations in vast information on multiple requests would offer promising time to perform queries. Therefore, in the stream phase, Without doubt the Flink approach can be used to boost application performance over time periods.


Sign in / Sign up

Export Citation Format

Share Document