selectivity estimation
Recently Published Documents


TOTAL DOCUMENTS

158
(FIVE YEARS 15)

H-INDEX

22
(FIVE YEARS 2)

2020 ◽  
Vol 14 (4) ◽  
pp. 471-484
Author(s):  
Suraj Shetiya ◽  
Saravanan Thirumuruganathan ◽  
Nick Koudas ◽  
Gautam Das

Accurate selectivity estimation for string predicates is a long-standing research challenge in databases. Supporting pattern matching on strings (such as prefix, substring, and suffix) makes this problem much more challenging, thereby necessitating a dedicated study. Traditional approaches often build pruned summary data structures such as tries followed by selectivity estimation using statistical correlations. However, this produces insufficiently accurate cardinality estimates resulting in the selection of sub-optimal plans by the query optimizer. Recently proposed deep learning based approaches leverage techniques from natural language processing such as embeddings to encode the strings and use it to train a model. While this is an improvement over traditional approaches, there is a large scope for improvement. We propose Astrid, a framework for string selectivity estimation that synthesizes ideas from traditional and deep learning based approaches. We make two complementary contributions. First, we propose an embedding algorithm that is query-type (prefix, substring, and suffix) and selectivity aware. Consider three strings 'ab', 'abc' and 'abd' whose prefix frequencies are 1000, 800 and 100 respectively. Our approach would ensure that the embedding for 'ab' is closer to 'abc' than 'abd'. Second, we describe how neural language models could be used for selectivity estimation. While they work well for prefix queries, their performance for substring queries is sub-optimal. We modify the objective function of the neural language model so that it could be used for estimating selectivities of pattern matching queries. We also propose a novel and efficient algorithm for optimizing the new objective function. We conduct extensive experiments over benchmark datasets and show that our proposed approaches achieve state-of-the-art results.


Author(s):  
Shohedul Hasan ◽  
Saravanan Thirumuruganathan ◽  
Jees Augustine ◽  
Nick Koudas ◽  
Gautam Das

2020 ◽  
Vol 50 (8) ◽  
Author(s):  
Severino Adriano de Oliveira Lima ◽  
Humber Agrelli Andrade ◽  
Alfredo Olivera Gálvez

ABSTRACT: A type of dredge was introduced as fishing gear along the extractive bank of Mangue Seco - PE from which the largest annual catch of Anomalocardia flexuosa in the world is extracted. This study was carried out with the objective of estimating the selectivity of the new fishing gear and quantitatively evaluating the length classes most compromised by the catches, especially considering 20 mm as the reference value. Specimens larger than this size are most likely to be mature. For the selectivity estimation, the methodology using codends (16 or 20 mm) and small meshed cover (2 mm) was used. To estimate the selectivity parameters, a logistic regression and the Bayesian approach were used. The transition between the state in which the specimen is invulnerable to the fishing gear and vulnerable occurs between 10 and 18 mm, using a 16 mm mesh, and using a 20 mm mesh, this transition is between 14 and 20 mm. Dredgers with 16 mm and 20 mm mesh compromise a large proportion of specimens smaller than 20 mm. If the intention is to protect this part of the population, measures such as total restriction of the 16 mm mesh and use of the 20 mm mesh should be necessary only in the months of less catching incidences, or increasing the mesh to 25 mm.


Sign in / Sign up

Export Citation Format

Share Document