query optimizers Latest Research Papers

I would like to dedicate this little exposition to Prof. Phan Dinh Dieu, one of the giants and pioneers of Mathematics in Computer Science in Vietnam. In the past 15 years or so, new and exciting connections between fundamental problems in database theory and information theory have emerged. There are several angles one can take to describe this connection. This paper takes one such angle, influenced by the author's own bias and research results. In particular, we will describe how the cardinality estimation problem -- a corner-stone problem for query optimizers -- is deeply connected to information theoretic inequalities. Furthermore, we explain how inequalities can also be used to derive a couple of classic geometric inequalities such as the Loomis-Whitney inequality. A purpose of the article is to introduce the reader to these new connections, where theory and practice meet in a wonderful way. Another objective is to point the reader to a research area with many new open questions.

Download Full-text

Have query optimizers hit the wall?

The VLDB Journal ◽

10.1007/s00778-021-00689-y ◽

2021 ◽

Author(s):

Richard T. Snodgrass ◽

Sabah Currim ◽

Young-Kyoon Suh

Keyword(s):

Query Optimizers

Download Full-text

Learned cardinality estimation

Proceedings of the VLDB Endowment ◽

10.14778/3485450.3485459 ◽

2021 ◽

Vol 15 (1) ◽

pp. 85-97

Author(s):

Ji Sun ◽

Jintao Zhang ◽

Zhaoyan Sun ◽

Guoliang Li ◽

Nan Tang

Keyword(s):

Deep Learning ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

Data Distribution ◽

Learning Models ◽

Cardinality Estimation ◽

Comprehensive Comparison ◽

Query Optimizers ◽

Relational Table

Cardinality estimation is core to the query optimizers of DBMSs. Non-learned methods, especially based on histograms and samplings, have been widely used in commercial and open-source DBMSs. Nevertheless, histograms and samplings can only be used to summarize one or few columns, which fall short of capturing the joint data distribution over an arbitrary combination of columns, because of the oversimplification of histograms and samplings over the original relational table(s). Consequently, these traditional methods typically make bad predictions for hard cases such as queries over multiple columns, with multiple predicates, and joins between multiple tables. Recently, learned cardinality estimators have been widely studied. Because these learned estimators can better capture the data distribution and query characteristics, empowered by the recent advance of (deep learning) models, they outperform non-learned methods on many cases. The goals of this paper are to provide a design space exploration of learned cardinality estimators and to have a comprehensive comparison of the SOTA learned approaches so as to provide a guidance for practitioners to decide what method to use under various practical scenarios.

Download Full-text

Data-induced predicates for sideways information passing in query optimizers

The VLDB Journal ◽

10.1007/s00778-021-00693-2 ◽

2021 ◽

Author(s):

Srikanth Kandula ◽

Laurel Orr ◽

Surajit Chaudhuri

Keyword(s):

Query Optimizers

Download Full-text

Steering Query Optimizers: A Practical Take on Big Data Workloads

Proceedings of the 2021 International Conference on Management of Data ◽

10.1145/3448016.3457568 ◽

2021 ◽

Author(s):

Parimarjan Negi ◽

Matteo Interlandi ◽

Ryan Marcus ◽

Mohammad Alizadeh ◽

Tim Kraska ◽

...

Keyword(s):

Big Data ◽

Query Optimizers

Download Full-text

Big Steps Towards Query Eco-Processing - Thinking Smart

Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées ◽

10.46298/arima.6767 ◽

2021 ◽

Vol Volume 34 - 2020 - Special... ◽

Author(s):

Simon Pierre Dembele ◽

Ladjel Bellatreche ◽

Carlos Ordonez ◽

Nabil Gmati ◽

Mathieu Roche ◽

...

Keyword(s):

Energy Efficiency ◽

Energy Consumption ◽

Cost Models ◽

It Systems ◽

Environmentally Responsible ◽

International Audience ◽

Query Optimizers ◽

Carbon Dioxide Co2 ◽

Time And Energy ◽

Sensitive Parameters

Soumission à Episciences International audience Computers and electronic machines in businesses consume a significant amount of electricity, releasing carbon dioxide (CO2), which contributes to greenhouse gas emissions. Energy efficiency is a pressing concern in IT systems, ranging from mobile devices to large servers in data centers, in order to be more environmentally responsible. In order to meet the growing demands in the awareness of excessive energy consumption, many initiatives have been launched on energy efficiency for big data processing covering electronic components, software and applications. Query optimizers are one of the most power consuming components of a DBMS. They can be modified to take into account the energetical cost of query plans by using energy-based cost models with the aim of reducing the power consumption of computer systems. In this paper, we study, describe and evaluate the design of three energy cost models whose values of energy sensitive parameters are determined using the Nonlinear Regression and the Random Forests techniques. To this end, we study in depth the operating principle of the selected DBMS and present an analysis comparing the performance time and energy consumption of typical queries in the TPC benchmark. We perform extensive experiments on a physical testbed based on PostreSQL, MontetDB and Hyrise systems using workloads generatedusing our chosen benchmark to validate our proposal. Les ordinateurs et les machines électroniques des entreprises consomment une quantité importante d’électricité, libérant ainsi du dioxyde de carbone (CO2), qui contribue aux émissions de gaz à effet de serre. L’efficacité énergétique est une préoccupation urgente dans les systèmesinformatiques, partant des équipements mobiles aux grands serveurs dans les centres de données, afin d’être plus respectueux envers l’environnement. Afin de répondre aux exigences croissantes en matière de sensibilisation à l’utilisation excessive de l’énergie, de nombreuses initiatives ont été lancées sur l’efficacité énergétique pour le traitement des données massives couvrant les composantsélectroniques, les logiciels et les applications. Les optimiseurs de requêtes sont l’un des composants les plus énergivores d’un SGBD. Ils peuvent être modifiés pour prendre en compte le coût énergétique des plans des requêtes à l’aide des modèles de coût énergétiques intégrés dans l’optimiseur dans le but de réduire la consommation électrique des systèmes informatiques. Dans cet article, nousétudions, décrivons et évaluons la conception de trois modèles de coût énergétique dont les valeurs des paramètres sensibles à l’énergie sont définis en utilisant la technique de la Régression non linéaire et la technique des forêts aléatoires. Pour ce fait, nous menons une étude approfondie du principe de fonctionnement des SGBD choisis et présentons une analyse des performances en termes de temps et énergie sur des requêtes typiques du benchmarks TPC-H. Nous effectuons des expériences approfondies basées sur les systèmes PostgreSQL, MonetDB et Hyrise en utilisant un jeu de données généré à partir du benchmarks TPC-H afin de valider nos propositions.

Download Full-text

An empirical assessment of autonomicity for autonomic query optimizers using fuzzy-AHP technique

Applied Soft Computing ◽

10.1016/j.asoc.2020.106137 ◽

2020 ◽

Vol 90 ◽

pp. 106137 ◽

Cited By ~ 1

Author(s):

Pooja Dehraj ◽

Arun Sharma

Keyword(s):

Fuzzy Ahp ◽

Empirical Assessment ◽

Query Optimizers

Download Full-text

Effective Cost Models for Predicting Web Query Execution Cost

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1127.1292s19 ◽

2019 ◽

Vol 9 (1S) ◽

pp. 213-218

Keyword(s):

Cost Model ◽

Data Sources ◽

Cost Modeling ◽

Cost Models ◽

Local Data ◽

Meta Data ◽

Effective Cost ◽

Query Optimizers ◽

Real World Datasets ◽

The Cost

Classical query optimizers rely on sophisticated cost models to estimate the cost of executing a query and its operators. By using this cost model, an efficient global plan is created by the optimizer which will be used to execute a given query. This cost modeling facility is difficult to be implemented in Web query engines because many local data sources might not be comfortable in sharing meta data information due to confidentiality issues. In this work, an efficient and effective cost modeling techniques for Web query engines are proposed. These techniques does not force the local data sources to reveal their meta data but employs a learning mechanism to estimate the cost of executing a given local query. Two cost modeling algorithms namely: Poisson cost model and Exponential cost model algorithms are presented. Empirical results over real world datasets reveal the efficiency and effectiveness of the new cost models.

Download Full-text