query execution Latest Research Papers

Various distributed processing schemes were studied to efficiently utilize a large scale of RDF graph in semantic web services. This paper proposes a new distributed SPARQL query processing scheme considering communication costs in Spark environments to reduce I/O costs during SPARQL query processing. We divide a SPARQL query into several subqueries using a WHERE clause to process a query of an RDF graph stored in a distributed environment. The proposed scheme reduces data communication costs by grouping the divided subqueries in related nodes through the index and processing them, and the grouped subqueries calculate the cost of all possible query execution paths to select an efficient query execution path. The efficient query execution path is selected through the algorithm considering the data parsing cost of all possible query execution paths, amount of data communication, and queue time per node. It is shown through various performance evaluations that the proposed scheme outperforms the existing schemes.

Download Full-text

A Survey of IoT Stream Query Execution Latency Optimization within Edge and Cloud

Wireless Communications and Mobile Computing ◽

10.1155/2021/4811018 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Fatima Abdullah ◽

Limei Peng ◽

Byungchul Tak

Keyword(s):

Query Processing ◽

Streaming Data ◽

Query Execution ◽

Cloud Environment ◽

Future Directions ◽

Stream Query Processing ◽

Execution Model ◽

Iot Devices ◽

Edge Based ◽

Execution Models

IoT (Internet of Things) streaming data has increased dramatically over the recent years and continues to grow rapidly due to the exponential growth of connected IoT devices. For many IoT applications, fast stream query processing is crucial for correct operations. To achieve better query performance and quality, researchers and practitioners have developed various types of query execution models—purely cloud-based, geo-distributed, edge-based, and edge-cloud-based models. Each execution model presents unique challenges and limitations of query processing optimizations. In this work, we provide a comprehensive review and analysis of query execution models within the context of the query execution latency optimization. We also present a detailed overview of various query execution styles regarding different query execution models and highlight their contributions. Finally, the paper concludes by proposing promising future directions towards advancing the query executions in the edge and cloud environment.

Download Full-text

An authorization model for query execution in the cloud

The VLDB Journal ◽

10.1007/s00778-021-00709-x ◽

2021 ◽

Author(s):

Sabrina De Capitani di Vimercati ◽

Sara Foresti ◽

Sushil Jajodia ◽

Giovanni Livraga ◽

Stefano Paraboschi ◽

...

Keyword(s):

Query Execution ◽

Authorization Model

Download Full-text

Optimizing storage of RDF archives using bidirectional delta chains

Semantic Web ◽

10.3233/sw-210449 ◽

2021 ◽

pp. 1-30

Author(s):

Ruben Taelman ◽

Thibault Mahieu ◽

Martin Vanbrabant ◽

Ruben Verborgh

Keyword(s):

Chain Structure ◽

Additional Benefit ◽

Query Execution ◽

Hybrid Storage ◽

Ingestion Time ◽

Open Datasets ◽

The Cost ◽

Future Work ◽

Over Time ◽

The Web

Linked Open Datasets on the Web that are published as RDF can evolve over time. There is a need to be able to store such evolving RDF datasets, and query across their versions. Different storage strategies are available for managing such versioned datasets, each being efficient for specific types of versioned queries. In recent work, a hybrid storage strategy has been introduced that combines these different strategies to lead to more efficient query execution for all versioned query types at the cost of increased ingestion time. While this trade-off is beneficial in the context of Web querying, it suffers from exponential ingestion times in terms of the number of versions, which becomes problematic for RDF datasets with many versions. As such, there is a need for an improved storage strategy that scales better in terms of ingestion time for many versions. We have designed, implemented, and evaluated a change to the hybrid storage strategy where we make use of a bidirectional delta chain instead of the default unidirectional delta chain. In this article, we introduce a concrete architecture for this change, together with accompanying ingestion and querying algorithms. Experimental results from our implementation show that the ingestion time is significantly reduced. As an additional benefit, this change also leads to lower total storage size and even improved query execution performance in some cases. This work shows that modifying the structure of delta chains within the hybrid storage strategy can be highly beneficial for RDF archives. In future work, other modifications to this delta chain structure deserve to be investigated, to further improve the scalability of ingestion and querying of datasets with many versions.

Download Full-text

MuSe: a multi-level storage scheme for big RDF data using MapReduce

Journal Of Big Data ◽

10.1186/s40537-021-00519-6 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Tanvi Chawla ◽

Girdhari Singh ◽

Emmanuel S. Pilli

Keyword(s):

Linked Data ◽

State Of The Art ◽

Query Execution ◽

Hadoop Mapreduce ◽

Storage Scheme ◽

The Common ◽

Multi Level ◽

Rdf Data ◽

Description Framework ◽

Resource Description

AbstractResource Description Framework (RDF) model owing to its flexible structure is increasingly being used to represent Linked data. The rise in amount of Linked data and Knowledge graphs has resulted in an increase in the volume of RDF data. RDF is used to model metadata especially for social media domains where the data is linked. With the plethora of RDF data sources available on the Web, scalable RDF data management becomes a tedious task. In this paper, we present MuSe—an efficient distributed RDF storage scheme for storing and querying RDF data with Hadoop MapReduce. In MuSe, the Big RDF data is stored at two levels for answering the common triple patterns in SPARQL queries. MuSe considers the type of frequently occuring triple patterns and optimizes RDF storage to answer such triple patterns in minimum time. It accesses only the tables that are sufficient for answering a triple pattern instead of scanning the whole RDF dataset. The extensive experiments on two synthetic RDF datasets i.e. LUBM and WatDiv, show that MuSe outperforms the compared state-of-the art frameworks in terms of query execution time and scalability.

Download Full-text

Improving Lookup and Query Execution Performance in Distributed Big Data Systems using Cuckoo Filter

10.21203/rs.3.rs-877903/v1 ◽

2021 ◽

Author(s):

Sharafat Ibn Mollah Mosharraf ◽

Muhammad Abdullah Adnan

Keyword(s):

Big Data ◽

Data Warehouse ◽

Performance Improvement ◽

Query Execution ◽

Data Systems ◽

Round Trip ◽

Improve Performance ◽

Cuckoo Filter ◽

Big Data Systems ◽

Big Data Warehouse

Abstract Performance is a critical concern when reading and writing data from billions of records stored in Big Data warehouse. We introduce two scopes for query performance improvement. One is to improve performance of lookup queries after data deletion in Big Data systems that use Eventual Consistency. We propose a scheme to improve lookup performance after data deletion by using Cuckoo Filter. Another scope for improvement is to avoid unnecessary network round-trip for querying in remote nodes in a distributed Big Data cluster when it is known that the nodes do not have requested partition of data. We propose a scheme using probabilistic filters that are looked up before querying remote nodes, so that queries resulting in no data can be skipped from passing through the network. We evaluate our schemes with Cassandra using real dataset and show that each scheme can improve performance of lookup queries for up to 100%.

Download Full-text

DeepEverest

Proceedings of the VLDB Endowment ◽

10.14778/3485450.3485460 ◽

2021 ◽

Vol 15 (1) ◽

pp. 98-111

Author(s):

Dong He ◽

Maureen Daum ◽

Walter Cai ◽

Magdalena Balazinska

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Query Execution ◽

Indexing Technique ◽

Efficient Execution

We design, implement, and evaluate DeepEverest, a system for the efficient execution of interpretation by example queries over the activation values of a deep neural network. DeepEverest consists of an efficient indexing technique and a query execution algorithm with various optimizations. We prove that the proposed query execution algorithm is instance optimal. Experiments with our prototype show that DeepEverest, using less than 20% of the storage of full materialization, significantly accelerates individual queries by up to 63X and consistently outperforms other methods on multi-query workloads that simulate DNN interpretation processes.

Download Full-text

Robust Cardinality: a novel approach for cardinality prediction in SQL queries

Journal of the Brazilian Computer Society ◽

10.1186/s13173-021-00115-9 ◽

2021 ◽

Vol 27 (1) ◽

Author(s):

Francisco D. B. S. Praciano ◽

Paulo R. P. Amora ◽

Italo C. Abreu ◽

Francisco L. F. Pereira ◽

Javam C. Machado

Keyword(s):

Estimation Error ◽

Experimental Tests ◽

Query Execution ◽

New Approach ◽

Modern Technique ◽

Lower Estimation ◽

Novel Approach ◽

Execution Engine ◽

Query Estimation ◽

Query Operations

Abstract Background Database Management Systems (DBMSs) use declarative language to execute queries to stored data. The DBMS defines how data will be processed and ultimately retrieved. Therefore, it must choose the best option from the different possibilities based on an estimation process. The optimization process uses estimated cardinalities to make optimization decisions, such as choosing predicate order. Methods In this paper, we propose Robust Cardinality, an approach to calculate cardinality estimates of query operations to guide the execution engine of the DBMSs to choose the best possible form or at least avoid the worst one. By using machine learning, instead of the current histogram heuristics, it is possible to improve these estimates; hence, leading to more efficient query execution. Results We perform experimental tests using PostgreSQL, comparing both estimators and a modern technique proposed in the literature. With Robust Cardinality, a lower estimation error of a batch of queries was obtained and PostgreSQL executed these queries more efficiently than when using the default estimator. We observed a 3% reduction in execution time after reducing 4 times the query estimation error. Conclusions From the results, it is possible to conclude that this new approach results in improvements in query processing in DBMSs, especially in the generation of cardinality estimates.

Download Full-text

Optimizing RDF Stream Processing for Uncertainty Management

10.3233/ssw210039 ◽

2021 ◽

Author(s):

Robin Keskisärkkä ◽

Eva Blomqvist ◽

Olaf Hartig

Keyword(s):

Stream Processing ◽

Uncertainty Management ◽

Optimization Techniques ◽

Query Execution ◽

Multiple Factors ◽

Web Standards ◽

Uncertainty Function ◽

The Cost ◽

The Impact ◽

Uncertainty Functions

RDF Stream Processing (RSP) has been proposed as a way of bridging the gap between the Complex Event Processing (CEP) paradigm and the Semantic Web standards. Uncertainty has been recognized as a critical aspect in CEP, but it has received little attention within the context of RSP. In this paper, we investigate the impact of different RSP optimization strategies for uncertainty management. The paper describes (1) an extension of the RSP-QL⋆ data model to capture bind expressions, filter expressions, and uncertainty functions; (2) optimization techniques related to lazy variables and caching of uncertainty functions, and a heuristic for reordering uncertainty filters in query plans; and (3) an evaluation of these strategies in a prototype implementation. The results show that using a lazy variable mechanism for uncertainty functions can improve query execution performance by orders of magnitude while introducing negligible overhead. The results also show that caching uncertainty function results can improve performance under most conditions, but that maintaining this cache can potentially add overhead to the overall query execution process. Finally, the effect of the proposed heuristic on query execution performance was shown to depend on multiple factors, including the selectivity of uncertainty filters, the size of intermediate results, and the cost associated with the evaluation of the uncertainty functions.

Download Full-text

query execution
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Comparison of Query Execution Speeds for Large Amounts of Data Using Various DBMS Engines Executing on Selected RAM and CPU Configurations

An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments

A Survey of IoT Stream Query Execution Latency Optimization within Edge and Cloud

An authorization model for query execution in the cloud

Optimizing storage of RDF archives using bidirectional delta chains

MuSe: a multi-level storage scheme for big RDF data using MapReduce

Improving Lookup and Query Execution Performance in Distributed Big Data Systems using Cuckoo Filter

DeepEverest

Robust Cardinality: a novel approach for cardinality prediction in SQL queries

Optimizing RDF Stream Processing for Uncertainty Management

Export Citation Format

query executionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Comparison of Query Execution Speeds for Large Amounts of Data Using Various DBMS Engines Executing on Selected RAM and CPU Configurations

An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments

A Survey of IoT Stream Query Execution Latency Optimization within Edge and Cloud

An authorization model for query execution in the cloud

Optimizing storage of RDF archives using bidirectional delta chains

MuSe: a multi-level storage scheme for big RDF data using MapReduce

Improving Lookup and Query Execution Performance in Distributed Big Data Systems using Cuckoo Filter

DeepEverest

Robust Cardinality: a novel approach for cardinality prediction in SQL queries

Optimizing RDF Stream Processing for Uncertainty Management

query execution
Recently Published Documents