query optimizer Latest Research Papers

Groupjoins, the combined execution of a join and a subsequent group by, are common in analytical queries, and occur in about 1/8 of the queries in TPC-H and TPC-DS. While they were originally invented to improve performance, efficient parallel execution of groupjoins can be limited by contention, which limits their usefulness in a many-core system. Having an efficient implementation of groupjoins is highly desirable, as groupjoins are not only used to fuse group by and join but are also introduced by the unnesting component of the query optimizer to avoid nested-loops evaluation of aggregates. Furthermore, the query optimizer needs be able to reason over the result of aggregation in order to schedule it correctly. Traditional selectivity and cardinality estimations quickly reach their limits when faced with computed columns from nested aggregates, which leads to poor cost estimations and thus, suboptimal query plans. In this paper, we present techniques to efficiently estimate, plan, and execute groupjoins and nested aggregates. We propose two novel techniques, aggregate estimates to predict the result distribution of aggregates, and parallel groupjoin execution for a scalable execution of groupjoins. The resulting system has significantly better estimates and a contention-free evaluation of groupjoins, which can speed up some TPC-H queries up to a factor of 2.

Download Full-text

A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method

PeerJ Computer Science ◽

10.7717/peerj-cs.580 ◽

2021 ◽

Vol 7 ◽

pp. e580

Author(s):

Elham Azhir ◽

Nima Jafari Navimipour ◽

Mehdi Hosseinzadeh ◽

Arash Sharifi ◽

Aso Darwesh

Keyword(s):

Query Optimization ◽

Parallel Implementation ◽

Low Cost ◽

Clustering Algorithms ◽

Clustering Methods ◽

Execution Plan ◽

Query Optimizer ◽

Parallel Query ◽

Distributed Queries ◽

The Given

Query optimization is the process of identifying the best Query Execution Plan (QEP). The query optimizer produces a close to optimal QEP for the given queries based on the minimum resource usage. The problem is that for a given query, there are plenty of different equivalent execution plans, each with a corresponding execution cost. To produce an effective query plan thus requires examining a large number of alternative plans. Access plan recommendation is an alternative technique to database query optimization, which reuses the previously-generated QEPs to execute new queries. In this technique, the query optimizer uses clustering methods to identify groups of similar queries. However, clustering such large datasets is challenging for traditional clustering algorithms due to huge processing time. Numerous cloud-based platforms have been introduced that offer low-cost solutions for the processing of distributed queries such as Hadoop, Hive, Pig, etc. This paper has applied and tested a model for clustering variant sizes of large query datasets parallelly using MapReduce. The results demonstrate the effectiveness of the parallel implementation of query workloads clustering to achieve good scalability.

Download Full-text

COPRAO: A Capability Aware Query Optimizer for Reconfigurable Near Data Processors

2021 IEEE 37th International Conference on Data Engineering Workshops (ICDEW) ◽

10.1109/icdew53142.2021.00017 ◽

2021 ◽

Author(s):

B. G. Lekshmi ◽

Klaus Meyer-Wegener

Keyword(s):

Query Optimizer

Download Full-text

Re-optimization for Multi-objective Cloud Database Query Processing using Machine Learning

International Journal of Database Management Systems ◽

10.5121/ijdms.2021.13102 ◽

2021 ◽

Vol 13 (1) ◽

pp. 21-40

Author(s):

Chenxiao Wang ◽

Zach Arani ◽

Le Gruenwald ◽

Laurent d'Orazio ◽

Eleazar Leal

Keyword(s):

Machine Learning ◽

Response Time ◽

Cost Estimation ◽

Service Providers ◽

Database Management System ◽

Query Execution ◽

Execution Plan ◽

Query Optimizer ◽

Query Response Time ◽

Cloud Database

In cloud environments, hardware configurations, data usage, and workload allocations are continuously changing. These changes make it difficult for the query optimizer of a cloud database management system (DBMS) to select an optimal query execution plan (QEP). In order to optimize a query with a more accurate cost estimation, performing query re-optimizations during the query execution has been proposed in the literature. However, some of there-optimizations may not provide any performance gain in terms of query response time or monetary costs, which are the two optimization objectives for cloud databases, and may also have negative impacts on the performance due to their overheads. This raises the question of how to determine when are-optimization is beneficial. In this paper, we present a technique called ReOptML that uses machine learning to enable effective re-optimizations. This technique executes a query in stages, employs a machine learning model to predict whether a query re-optimization is beneficial after a stage is executed, and invokes the query optimizer to perform the re-optimization automatically. The experiments comparing ReOptML with existing query re-optimization algorithms show that ReOptML improves query response time from 13% to 35% for skew data and from 13% to 21% for uniform data, and improves monetary cost paid to cloud service providers from 17% to 35% on skewdata.

Download Full-text

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Data Science and Engineering ◽

10.1007/s41019-020-00149-7 ◽

2021 ◽

Vol 6 (1) ◽

pp. 86-101

Author(s):

Hai Lan ◽

Zhifeng Bao ◽

Yuwei Peng

Keyword(s):

Cost Model ◽

Database Systems ◽

Enumeration Algorithm ◽

Estimation Errors ◽

Cardinality Estimation ◽

Execution Plan ◽

Query Optimizer ◽

Almost All ◽

The Cost

AbstractQuery optimizer is at the heart of the database systems. Cost-based optimizer studied in this paper is adopted in almost all current database systems. A cost-based optimizer introduces a plan enumeration algorithm to find a (sub)plan, and then uses a cost model to obtain the cost of that plan, and selects the plan with the lowest cost. In the cost model, cardinality, the number of tuples through an operator, plays a crucial role. Due to the inaccuracy in cardinality estimation, errors in cost model, and the huge plan space, the optimizer cannot find the optimal execution plan for a complex query in a reasonable time. In this paper, we first deeply study the causes behind the limitations above. Next, we review the techniques used to improve the quality of the three key components in the cost-based optimizer, cardinality estimation, cost model, and plan enumeration. We also provide our insights on the future directions for each of the above aspects.

Download Full-text

Using MDE for Teaching Database Query Optimizer

Proceedings of the 16th International Conference on Evaluation of Novel Approaches to Software Engineering ◽

10.5220/0010535105290536 ◽

2021 ◽

Author(s):

Abdelkader Ouared ◽

Abdelhafid Chadli

Keyword(s):

Database Query ◽

Query Optimizer

Download Full-text

The Query Optimizer

MySQL 8 Query Performance Tuning ◽

10.1007/978-1-4842-5584-1_17 ◽

2020 ◽

pp. 417-485

Author(s):

Jesper Wisborg Krogh

Keyword(s):

Query Optimizer

Download Full-text

Data Optimization using Apache Flink

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3081.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 137-142

Keyword(s):

Big Data ◽

Information Sharing ◽

Processing System ◽

Application Performance ◽

Network Information ◽

Query Optimizer ◽

Open Platform ◽

Parallel Storage ◽

Demand Information ◽

Research Findings

Map Reduce, Flink, and Spark, also become more popular in the processing of big data lately. Flink will be an open platform Big Data processing system for Apache-powered batch storage and streaming of data. Flink's query optimizer is constructed for historical information processing (batch) based on parallel storage systems approaches. Flink query query optimizer interprets the questions into jobs of different tasks that are regularly sent. Therefore, taking advantage of task similarities should prevent redundant computation. In this article, the multi-demand optimization model for Flink, Flink was planned and designed on Flink Software Stack's top priority. It's thought-about as an associate in Apache Flink's nursing add-on to maximize multi-demand information sharing. The Flink system takes advantage of option operators ' information sharing resources to reduce overlap and duplication of multi-query in-network information movement. Research findings show that the leveraging of shared option operations in vast information on multiple requests would offer promising time to perform queries. Therefore, in the stream phase, Without doubt the Flink approach can be used to boost application performance over time periods.

Download Full-text

Towards a Multi-engine Query Optimizer for Complex SQL Queries on Big Data

2019 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata47090.2019.9006445 ◽

2019 ◽

Author(s):

Evdokia Kassela ◽

Ioannis Konstantinou ◽

Nectarios Koziris

Keyword(s):

Big Data ◽

Query Optimizer

Download Full-text

query optimizer
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Learned Query Optimizer for Spatial Join

A practical approach to groupjoin and nested aggregates

A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method

COPRAO: A Capability Aware Query Optimizer for Reconfigurable Near Data Processors

Re-optimization for Multi-objective Cloud Database Query Processing using Machine Learning

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Using MDE for Teaching Database Query Optimizer

The Query Optimizer

Data Optimization using Apache Flink

Towards a Multi-engine Query Optimizer for Complex SQL Queries on Big Data

Export Citation Format

query optimizerRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Learned Query Optimizer for Spatial Join

A practical approach to groupjoin and nested aggregates

A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method

COPRAO: A Capability Aware Query Optimizer for Reconfigurable Near Data Processors

Re-optimization for Multi-objective Cloud Database Query Processing using Machine Learning

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Using MDE for Teaching Database Query Optimizer

The Query Optimizer

Data Optimization using Apache Flink

Towards a Multi-engine Query Optimizer for Complex SQL Queries on Big Data

query optimizer
Recently Published Documents