Predicting SQL Query Execution Time with a Cost Model for Spark Platform

Author(s):  
Aleksey Burdakov ◽  
Viktoria Proletarskaya ◽  
Andrey Ploutenko ◽  
Oleg Ermakov ◽  
Uriy Grigorev
2018 ◽  
Vol 10 (1) ◽  
Author(s):  
Aaron Kite-Powell ◽  
Michael Coletta ◽  
Jamie Smimble

Objective: The objective of this work is to describe the use and performance of the NSSP ESSENCE system by analyzing the structured query language (SQL) logs generated by users of the National Syndromic Surveillance Program’s (NSSP) Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE).Introduction: As system users develop queries within ESSENCE, they step through the user-interface to select data sources and parameters needed for their query. Then they select from the available output options (e.g., time series, table builder, data details). These activities execute a SQL query on the database, the majority of which are saved in a log so that system developers can troubleshoot problems. Secondarily, these data can be used as a form of web analytics to describe user query choices, query volume, query execution time, and develop an understanding of ESSENCE query patterns.Methods: ESSENCE SQL query logs were extracted from April 1, 2016 to August 23th, 2017. Overall query volume was assessed by summarizing volume of queries over time (e.g., by hour, day, and week), and by Site. To better understand system performance the mean, median, and maximum query execution times were summarized over time and by Site. SQL query text was parsed so that we could isolate, 1) Syndromes queried, 2) Sub-syndromes queried, 3) Keyword categories queried, and 4) Free text query terms used. Syndromes, sub-syndromes, and keyword categories were tabulated in total and by Site. Frequencies of free text query terms were analyzed using n-grams, wordclouds, and term co-occurrence relationships. Term co-occurrence network graphs were used to visualize the structure and relationships among terms.Results: There were a total of 354,101 SQL queries generated by users of ESSENCE between April 1, 2016 and August 23rd, 2017. Over this entire time period there was a weekly mean of 4,785 SQL queries performed by users. When looking at 2017 data through August 23rd this figure increases to a mean of 7,618 SQL queries per week for 2017, and since May 2017 the mean number of SQL queries has increased to 10,485 per week. The maximum number of user generated SQL queries in a week was 29,173. The mean, median, and maximum query execution times for all data was 0.61 minutes, 0 minutes, and 365 minutes, respectively. When looking at only queries with a free text component the mean query execution time increases slightly to 0.94 minutes, though the median is still 0 minutes. The peak usage period based on number of SQL queries performed is between 12:00pm and 3:00pm EST.Conclusions: The use of NSSP ESSENCE has grown since implementation. This is the first time the ESSENCE system has been used at a National level with this volume of data, and number of users. Our focus to date has been on successfully on-boarding new Sites so that they can benefit from use of the available tools, providing trainings to new users, and optimizing ESSENCE performance. Routine analysis of the ESSENCE SQL logs can assist us in understanding how the system is being used, how well it is performing, and in evaluating our system optimization efforts.


2014 ◽  
Vol 7 (14) ◽  
pp. 1857-1868 ◽  
Author(s):  
Wentao Wu ◽  
Xi Wu ◽  
Hakan Hacigümüş ◽  
Jeffrey F. Naughton

Author(s):  
NAPHAT KEAWPIBA ◽  
LADDA PREECHAVEERAKUL ◽  
SIRIRUT VANICHAYOBON

A bitmap-based index is an effective and efficient indexing method for answering selective queries in a read- only environment. It offers improved query execution time by applying low-cost Boolean operators on the index directly, before accessing raw data. A drawback of the bitmap index is that index size increases with the cardinality of indexed attributes, which additionally has an impact on a query execution time. This impact is related to an increase of query execution time due to the scanning of bitmap vectors to answer the queries. In this paper, we propose a new encoding bitmap index, called the HyBiX bitmap index. The HyBiX bitmap index was experimentally compared to existing encoding bitmap indexes in terms of space requirement, query execution time, and space and time trade-off for equality and range queries. As experimental results, the HyBiX bitmap index can reduce space requirements with high cardinality attributes with satisfactory execution times for both equality and range queries. The performance of the HyBiX bitmap index provides the second-best results for equality queries and the first-best for range queries in terms of space and time trade-off.


Author(s):  
Eman A. Khashan ◽  
Ali I. El Desouky ◽  
Sally M. Elghamrawy

The increasing of data on the web poses major confrontations. The amount of stored data and query data sources have become needful features for huge data systems. There are a large number of platforms used to handle the NoSQL database model such as: Spark, H2O and Hadoop HDFS / MapReduce, which are suitable for controlling and managing the amount of big data. Developers of different applications impose data stores on difficult tasks by interacting with mixed data models through different APIs and queries. In this paper, a complex SQL Query and NoSQL (CQNS) framework that acts as an interpreter sends complex queries received from any data store to its corresponding executable engine called CQNS. The proposed framework supports application queries and database transformation at the same time, which in turn speeds up the process. Moreover, CQNS handles many NoSQL databases like MongoDB and Cassandra. This paper provides a spark framework that can handle SQL and NoSQL databases. This work also examines the importance of MongoDB block sharding and composition. Cassandra database deals with two types of sections vertex and edge Portioning. The four scenarios criteria datasets are used to evaluate the proposed CQNS to query the various NOSQL databases in terms of optimization performance and timing of query execution. The results show that among the comparative system, CQNS achieves optimum latency and productivity in less time.


2018 ◽  
Vol 5 (1) ◽  
pp. 27 ◽  
Author(s):  
Kukuh Triyuliarno Hidayat ◽  
Riza Arifudin ◽  
Alamsyah Alamsyah

The relational database is defined as the database by connecting between tables. Each table has a collection of information. The information is processed in the database by using queries, such as data retrieval, data storage, and data conversion. If the information in the table or data has a large size, then the query process to process the database becomes slow. In this paper, Genetic Algorithm is used to process queries in order to optimize and reduce query execution time. The results obtained are query execution with genetic algorithm optimization to show the best execution time. The genetic algorithm processes the query by changing the structure of the relation and rearranging it. The fitness value generated from the genetic algorithm becomes the best solution. The fitness used is the highest fitness of each experiment results. In this experiment, the database used is  MySQL sample database which is named as employees. The database has a total of over 3,000,000 rows in 6 tables. Queries are designed by using 5 relations in the form of a left deep tree. The execution time of the query is 8.14247 seconds and the execution time after the optimization of the genetic algorithm is 6.08535 seconds with the fitness value of 0.90509. The time generated after optimization of the genetic algorithm is reduced by 25.3%. It shows that genetic algorithm can reduce query execution time by optimizing query in the part of relation. Therefore, query optimization with genetic algorithm can be an alternative solution and can be used to maximize query performance.


Sign in / Sign up

Export Citation Format

Share Document