Execution Time Prediction for Cypher Queries in the Neo4j Database Using a Learning Approach

With database management systems becoming complex, predicting the execution time of graph queries before they are executed is one of the challenges for query scheduling, workload management, resource allocation, and progress monitoring. Through the comparison of query performance prediction methods, existing research works have solved such problems in traditional SQL queries, but they cannot be directly applied in Cypher queries on the Neo4j database. Additionally, most query performance prediction methods focus on measuring the relationship between correlation coefficients and retrieval performance. Inspired by machine-learning methods and graph query optimization technologies, we used the RBF neural network as a prediction model to train and predict the execution time of Cypher queries. Meanwhile, the corresponding query pattern features, graph data features, and query plan features were fused together and then used to train our prediction models. Furthermore, we also deployed a monitor node and designed a Cypher query benchmark for the database clusters to obtain the query plan information and native data store. The experimental results of four benchmarks showed that the average mean relative error of the RBF model reached 16.5% in the Northwind dataset, 12% in the FIFA2021 dataset, and 16.25% in the CORD-19 dataset. This experiment proves the effectiveness of our proposed approach on three real-world datasets.

Download Full-text

The Combination and Evaluation of Query Performance Prediction Methods

Lecture Notes in Computer Science - Advances in Information Retrieval ◽

10.1007/978-3-642-00958-7_28 ◽

2009 ◽

pp. 301-312 ◽

Cited By ~ 26

Author(s):

Claudia Hauff ◽

Leif Azzopardi ◽

Djoerd Hiemstra

Keyword(s):

Performance Prediction ◽

Prediction Methods ◽

Query Performance

Download Full-text

Evaluation of Query Performance Prediction Methods by Range

String Processing and Information Retrieval - Lecture Notes in Computer Science ◽

10.1007/978-3-642-16321-0_23 ◽

2010 ◽

pp. 225-236

Author(s):

Joaquín Pérez-Iglesias ◽

Lourdes Araujo

Keyword(s):

Performance Prediction ◽

Prediction Methods ◽

Query Performance

Download Full-text

Machine Learning–enabled Scalable Performance Prediction of Scientific Codes

ACM Transactions on Modeling and Computer Simulation ◽

10.1145/3450264 ◽

2021 ◽

Vol 31 (2) ◽

pp. 1-28

Author(s):

Gopinath Chennupati ◽

Nandakishore Santhi ◽

Phill Romero ◽

Stephan Eidenbenz

Keyword(s):

Machine Learning ◽

Performance Prediction ◽

Prediction Models ◽

Radiation Transport ◽

Discrete Event ◽

Basic Block ◽

Distribution Models ◽

Scientific Application ◽

High Level ◽

Access Patterns

Hardware architectures become increasingly complex as the compute capabilities grow to exascale. We present the Analytical Memory Model with Pipelines (AMMP) of the Performance Prediction Toolkit (PPT). PPT-AMMP takes high-level source code and hardware architecture parameters as input and predicts runtime of that code on the target hardware platform, which is defined in the input parameters. PPT-AMMP transforms the code to an (architecture-independent) intermediate representation, then (i) analyzes the basic block structure of the code, (ii) processes architecture-independent virtual memory access patterns that it uses to build memory reuse distance distribution models for each basic block, and (iii) runs detailed basic-block level simulations to determine hardware pipeline usage. PPT-AMMP uses machine learning and regression techniques to build the prediction models based on small instances of the input code, then integrates into a higher-order discrete-event simulation model of PPT running on Simian PDES engine. We validate PPT-AMMP on four standard computational physics benchmarks and present a use case of hardware parameter sensitivity analysis to identify bottleneck hardware resources on different code inputs. We further extend PPT-AMMP to predict the performance of a scientific application code, namely, the radiation transport mini-app SNAP. To this end, we analyze multi-variate regression models that accurately predict the reuse profiles and the basic block counts. We validate predicted SNAP runtimes against actual measured times.

Download Full-text

When is query performance prediction effective?

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09 ◽

10.1145/1571941.1572150 ◽

2009 ◽

Cited By ~ 4

Author(s):

Claudia Hauff ◽

Leif Azzopardi

Keyword(s):

Performance Prediction ◽

Query Performance

Download Full-text

Forward and backward feature selection for query performance prediction

Proceedings of the 35th Annual ACM Symposium on Applied Computing ◽

10.1145/3341105.3373904 ◽

2020 ◽

Cited By ~ 1

Author(s):

Sébastien Déjean ◽

Radu Tudor Ionescu ◽

Josiane Mothe ◽

Md Zia Ullah

Keyword(s):

Feature Selection ◽

Performance Prediction ◽

Query Performance ◽

Selection For

Download Full-text

Comparative Analysis of Performance Prediction Models for Flexible Pavements

Journal of Transportation Engineering Part B Pavements ◽

10.1061/jpeodx.0000090 ◽

2019 ◽

Vol 145 (1) ◽

pp. 04018062 ◽

Cited By ~ 2

Author(s):

Sung Ho Park ◽

Jae Hoon Kim

Keyword(s):

Comparative Analysis ◽

Performance Prediction ◽

Prediction Models ◽

Flexible Pavements ◽

Analysis Of Performance

Download Full-text

Multi-metric Graph Query Performance Prediction

Database Systems for Advanced Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-319-91452-7_19 ◽

2018 ◽

pp. 289-306

Author(s):

Keyvan Sasani ◽

Mohammad Hossein Namaki ◽

Yinghui Wu ◽

Assefaw H. Gebremedhin

Keyword(s):

Performance Prediction ◽

Query Performance ◽

Metric Graph ◽

Graph Query

Download Full-text

Query performance prediction for microblog search

Information Processing & Management ◽

10.1016/j.ipm.2017.08.002 ◽

2017 ◽

Vol 53 (6) ◽

pp. 1320-1341 ◽

Cited By ~ 5

Author(s):

Maram Hasanain ◽

Tamer Elsayed

Keyword(s):

Performance Prediction ◽

Query Performance ◽

Microblog Search

Download Full-text

Development of machine-learning performance prediction models for asphalt mixtures

Advances in Materials and Pavement Performance Prediction II ◽

10.1201/9781003027362-9 ◽

2020 ◽

pp. 36-39

Author(s):

E. Omer ◽

S. Saadeh

Keyword(s):

Machine Learning ◽

Performance Prediction ◽

Prediction Models ◽

Asphalt Mixtures ◽

Learning Performance

Download Full-text

Neural Embedding-Based Metrics for Pre-retrieval Query Performance Prediction

10.32920/ryerson.14654253.v1 ◽

2021 ◽

Author(s):

Arabzadehghahyazi Negar

Keyword(s):

Performance Prediction ◽

State Of The Art ◽

Learning To Rank ◽

The State ◽

Test Collection ◽

Query Performance ◽

Performance Predictors ◽

Level Statistics ◽

Ablation Study ◽

Individual Specificity

file:///C:/Users/MWF/Downloads/Arabzadehghahyazi, Negar.Pre-retrieval Query Performance Prediction (QPP) methods are oblivious to the performance of the retrieval model as they predict query difficulty prior to observing the set of documents retrieved for the query. Among pre-retrieval query performance predictors, specificity-based metrics investigate how corpus, query and corpus-query level statistics can be used to predict the performance of the query. In this thesis, we explore how neural embeddings can be utilized to define corpus-independent and semantics-aware specificity metrics. Our metrics are based on the intuition that a term that is closely surrounded by other terms in the embedding space is more likely to be specific while a term surrounded by less closely related terms is more likely to be generic. On this basis, we leverage geometric properties between embedded terms to define four groups of metrics: (1) neighborhood-based, (2) graph-based, (3) cluster-based and (4) vector-based metrics. Moreover, we employ learning-to-rank techniques to analyze the importance of individual specificity metrics. To evaluate the proposed metrics, we have curated and publicly share a test collection of term specificity measurements defined based on Wikipedia category hierarchy and DMOZ taxonomy. We report on our extensive experiments on the effectiveness of our metrics through metric comparison, ablation study and comparison against the state-of-the-art baselines. We have shown that our proposed set of pre-retrieval QPP metrics based on the properties of pre-trained neural embeddings are more effective for performance prediction compared to the state-of-the-art methods. We report our findings based on Robust04, ClueWeb09 and Gov2 corpora and their associated TREC topics.

Download Full-text