cardinality estimation Latest Research Papers

Aggregate-based Training Phase for ML-based Cardinality Estimation

Datenbank-Spektrum ◽

10.1007/s13222-021-00400-z ◽

2022 ◽

Author(s):

Lucas Woltmann ◽

Claudio Hartmann ◽

Dirk Habich ◽

Wolfgang Lehner

Keyword(s):

Training Phase ◽

Aggregated Data ◽

Cardinality Estimation ◽

Speed Up ◽

Query Structure ◽

Database Query Processing ◽

Model Training ◽

Core Idea ◽

Traditional Approaches ◽

Base Data

AbstractCardinality estimation is a fundamental task in database query processing and optimization. As shown in recent papers, machine learning (ML)-based approaches may deliver more accurate cardinality estimations than traditional approaches. However, a lot of training queries have to be executed during the model training phase to learn a data-dependent ML model making it very time-consuming. Many of those training or example queries use the same base data, have the same query structure, and only differ in their selective predicates. To speed up the model training phase, our core idea is to determine a predicate-independent pre-aggregation of the base data and to execute the example queries over this pre-aggregated data. Based on this idea, we present a specific aggregate-based training phase for ML-based cardinality estimation approaches in this paper. As we are going to show with different workloads in our evaluation, we are able to achieve an average speedup of 90 with our aggregate-based training phase and thus outperform indexes.

RECENT RESULTS ON CARDINALITY ESTIMATION AND INFORMATION THEORETIC INEQUALITIES

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/37/3/16129 ◽

2021 ◽

Vol 37 (3) ◽

pp. 223-238

Author(s):

Hung Q. Ngo

Keyword(s):

Research Area ◽

Theory And Practice ◽

Geometric Inequalities ◽

Information Theoretic ◽

Cardinality Estimation ◽

The Past ◽

Open Questions ◽

Database Theory ◽

Query Optimizers ◽

Information Theoretic Inequalities

I would like to dedicate this little exposition to Prof. Phan Dinh Dieu, one of the giants and pioneers of Mathematics in Computer Science in Vietnam. In the past 15 years or so, new and exciting connections between fundamental problems in database theory and information theory have emerged. There are several angles one can take to describe this connection. This paper takes one such angle, influenced by the author's own bias and research results. In particular, we will describe how the cardinality estimation problem -- a corner-stone problem for query optimizers -- is deeply connected to information theoretic inequalities. Furthermore, we explain how inequalities can also be used to derive a couple of classic geometric inequalities such as the Loomis-Whitney inequality. A purpose of the article is to introduce the reader to these new connections, where theory and practice meet in a wonderful way. Another objective is to point the reader to a research area with many new open questions.

Learned cardinality estimation

Proceedings of the VLDB Endowment ◽

10.14778/3485450.3485459 ◽

2021 ◽

Vol 15 (1) ◽

pp. 85-97

Author(s):

Ji Sun ◽

Jintao Zhang ◽

Zhaoyan Sun ◽

Guoliang Li ◽

Nan Tang

Keyword(s):

Deep Learning ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

Data Distribution ◽

Learning Models ◽

Cardinality Estimation ◽

Comprehensive Comparison ◽

Query Optimizers ◽

Relational Table

Cardinality estimation is core to the query optimizers of DBMSs. Non-learned methods, especially based on histograms and samplings, have been widely used in commercial and open-source DBMSs. Nevertheless, histograms and samplings can only be used to summarize one or few columns, which fall short of capturing the joint data distribution over an arbitrary combination of columns, because of the oversimplification of histograms and samplings over the original relational table(s). Consequently, these traditional methods typically make bad predictions for hard cases such as queries over multiple columns, with multiple predicates, and joins between multiple tables. Recently, learned cardinality estimators have been widely studied. Because these learned estimators can better capture the data distribution and query characteristics, empowered by the recent advance of (deep learning) models, they outperform non-learned methods on many cases. The goals of this paper are to provide a design space exploration of learned cardinality estimators and to have a comprehensive comparison of the SOTA learned approaches so as to provide a guidance for practitioners to decide what method to use under various practical scenarios.

Fauce

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476254 ◽

2021 ◽

Vol 14 (11) ◽

pp. 1950-1963

Author(s):

Jie Liu ◽

Wenqian Dong ◽

Qingqing Zhou ◽

Dong Li

Keyword(s):

Deep Learning ◽

State Of The Art ◽

The State ◽

Light Weight ◽

Critical Problem ◽

Estimation Errors ◽

Cardinality Estimation ◽

Complex Queries ◽

Uncertainty Information ◽

Deep Learning Model

Cardinality estimation is a fundamental and critical problem in databases. Recently, many estimators based on deep learning have been proposed to solve this problem and they have achieved promising results. However, these estimators struggle to provide accurate results for complex queries, due to not capturing real inter-column and inter-table correlations. Furthermore, none of these estimators contain the uncertainty information about their estimations. In this paper, we present a join cardinality estimator called Fauce. Fauce learns the correlations across all columns and all tables in the database. It also contains the uncertainty information of each estimation. Among all studied learned estimators, our results are promising: (1) Fauce is a light-weight estimator, it has 10× faster inference speed than the state of the art estimator; (2) Fauce is robust to the complex queries, it provides 1.3×--6.7× smaller estimation errors for complex queries compared with the state of the art estimator; (3) To the best of our knowledge, Fauce is the first estimator that incorporates uncertainty information for cardinality estimation into a deep learning model.

Flow-loss

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476259 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2019-2032

Author(s):

Parimarjan Negi ◽

Ryan Marcus ◽

Andreas Kipf ◽

Hongzi Mao ◽

Nesime Tatbul ◽

...

Keyword(s):

Query Optimization ◽

Search Algorithm ◽

Cost Model ◽

Estimation Error ◽

Ground Truth ◽

Training Data ◽

Estimation Accuracy ◽

Routing Problem ◽

Cardinality Estimation ◽

Flow Loss

Recently there has been significant interest in using machine learning to improve the accuracy of cardinality estimation. This work has focused on improving average estimation error, but not all estimates matter equally for downstream tasks like query optimization. Since learned models inevitably make mistakes, the goal should be to improve the estimates that make the biggest difference to an optimizer. We introduce a new loss function, Flow-Loss, for learning cardinality estimation models. Flow-Loss approximates the optimizer's cost model and search algorithm with analytical functions, which it uses to optimize explicitly for better query plans. At the heart of Flow-Loss is a reduction of query optimization to a flow routing problem on a certain "plan graph", in which different paths correspond to different query plans. To evaluate our approach, we introduce the Cardinality Estimation Benchmark (CEB) which contains the ground truth cardinalities for sub-plans of over 16 K queries from 21 templates with up to 15 joins. We show that across different architectures and databases, a model trained with Flow-Loss improves the plan costs and query runtimes despite having worse estimation accuracy than a model trained with Q-Error. When the test set queries closely match the training queries, models trained with both loss functions perform well. However, the Q-Error-trained model degrades significantly when evaluated on slightly different queries (e.g., similar but unseen query templates), while the Flow-Loss-trained model generalizes better to such situations, achieving 4 -- 8× better 99th percentile runtimes on unseen templates with the same model architecture and training data.

Information theoretic limits of cardinality estimation: Fisher meets Shannon

Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing ◽

10.1145/3406325.3451032 ◽

2021 ◽

Author(s):

Seth Pettie ◽

Dingyu Wang

Keyword(s):

Information Theoretic ◽

Cardinality Estimation

Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries

Proceedings of the 2021 International Conference on Management of Data ◽

10.1145/3448016.3452821 ◽

2021 ◽

Cited By ~ 1

Author(s):

Yuan Qiu ◽

Yilei Wang ◽

Ke Yi ◽

Feifei Li ◽

Bin Wu ◽

...

Keyword(s):

Cardinality Estimation

Learned Cardinality Estimation for Similarity Queries

Proceedings of the 2021 International Conference on Management of Data ◽

10.1145/3448016.3452790 ◽

2021 ◽

Author(s):

Ji Sun ◽

Guoliang Li ◽

Nan Tang

Keyword(s):

Similarity Queries ◽

Cardinality Estimation

Combining Sampling and Synopses with Worst-Case Optimal Runtime and Quality Guarantees for Graph Pattern Cardinality Estimation

Proceedings of the 2021 International Conference on Management of Data ◽

10.1145/3448016.3457246 ◽

2021 ◽

Author(s):

Kyoungmin Kim ◽

Hyeonji Kim ◽

George Fletcher ◽

Wook-Shin Han

Keyword(s):

Worst Case ◽

Cardinality Estimation ◽

Graph Pattern

A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation

Proceedings of the 2021 International Conference on Management of Data ◽

10.1145/3448016.3452830 ◽

2021 ◽

Author(s):

Peizhi Wu ◽

Gao Cong

Keyword(s):

Cardinality Estimation ◽

Deep Model

cardinality estimation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Aggregate-based Training Phase for ML-based Cardinality Estimation

RECENT RESULTS ON CARDINALITY ESTIMATION AND INFORMATION THEORETIC INEQUALITIES

Learned cardinality estimation

Fauce

Flow-loss

Information theoretic limits of cardinality estimation: Fisher meets Shannon

Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries

Learned Cardinality Estimation for Similarity Queries

Combining Sampling and Synopses with Worst-Case Optimal Runtime and Quality Guarantees for Graph Pattern Cardinality Estimation

A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation

Export Citation Format

cardinality estimationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Aggregate-based Training Phase for ML-based Cardinality Estimation

RECENT RESULTS ON CARDINALITY ESTIMATION AND INFORMATION THEORETIC INEQUALITIES

Learned cardinality estimation

Fauce

Flow-loss

Information theoretic limits of cardinality estimation: Fisher meets Shannon

Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries

Learned Cardinality Estimation for Similarity Queries

Combining Sampling and Synopses with Worst-Case Optimal Runtime and Quality Guarantees for Graph Pattern Cardinality Estimation

A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation

cardinality estimation
Recently Published Documents