A Distributed SPARQL Query Processing Scheme Considering Data Locality and Query Execution Path

Various distributed processing schemes were studied to efficiently utilize a large scale of RDF graph in semantic web services. This paper proposes a new distributed SPARQL query processing scheme considering communication costs in Spark environments to reduce I/O costs during SPARQL query processing. We divide a SPARQL query into several subqueries using a WHERE clause to process a query of an RDF graph stored in a distributed environment. The proposed scheme reduces data communication costs by grouping the divided subqueries in related nodes through the index and processing them, and the grouped subqueries calculate the cost of all possible query execution paths to select an efficient query execution path. The efficient query execution path is selected through the algorithm considering the data parsing cost of all possible query execution paths, amount of data communication, and queue time per node. It is shown through various performance evaluations that the proposed scheme outperforms the existing schemes.

Download Full-text

Enhancement of Query Execution Time in SPARQL Query Processing

2020 International Conference on Advanced Information Technologies (ICAIT) ◽

10.1109/icait51105.2020.9261805 ◽

2020 ◽

Author(s):

Khin Myat Kyu ◽

Aung Nway Oo

Keyword(s):

Query Processing ◽

Execution Time ◽

Sparql Query ◽

Query Execution

Download Full-text

An empirical evaluation of cost-based federated SPARQL query processing engines

Semantic Web ◽

10.3233/sw-200420 ◽

2021 ◽

pp. 1-26

Author(s):

Umair Qudus ◽

Muhammad Saleem ◽

Axel-Cyrille Ngonga Ngomo ◽

Young-Koo Lee

Keyword(s):

Query Processing ◽

Detailed Analysis ◽

Performance Metrics ◽

Empirical Evaluation ◽

Sparql Query ◽

Evaluation Metrics ◽

Future Cost ◽

Query Plan ◽

Fine Grained ◽

Runtime Performance

Finding a good query plan is key to the optimization of query runtime. This holds in particular for cost-based federation engines, which make use of cardinality estimations to achieve this goal. A number of studies compare SPARQL federation engines across different performance metrics, including query runtime, result set completeness and correctness, number of sources selected and number of requests sent. Albeit informative, these metrics are generic and unable to quantify and evaluate the accuracy of the cardinality estimators of cost-based federation engines. To thoroughly evaluate cost-based federation engines, the effect of estimated cardinality errors on the overall query runtime performance must be measured. In this paper, we address this challenge by presenting novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query engines. We evaluate five cost-based federated SPARQL query engines using existing as well as novel evaluation metrics by using LargeRDFBench queries. Our results provide a detailed analysis of the experimental outcomes that reveal novel insights, useful for the development of future cost-based federated SPARQL query processing engines.

Download Full-text

The Data Cyclotron query processing scheme

Proceedings of the 13th International Conference on Extending Database Technology - EDBT '10 ◽

10.1145/1739041.1739054 ◽

2010 ◽

Cited By ~ 7

Author(s):

R. Goncalves ◽

M. Kersten

Keyword(s):

Query Processing ◽

Processing Scheme

Download Full-text

A Weight-Bind-Based Safe Top-k Query Processing Scheme in Two-Tiered Sensor Networks

Security, Privacy, and Anonymity in Computation, Communication, and Storage - Lecture Notes in Computer Science ◽

10.1007/978-3-319-72395-2_59 ◽

2017 ◽

pp. 653-666

Author(s):

Xiaoyan Kui ◽

Shigeng Zhang ◽

Wei Li ◽

Ping Zhong ◽

Xingpo Ma ◽

...

Keyword(s):

Sensor Networks ◽

Query Processing ◽

Processing Scheme

Download Full-text

XML Stream Query Processing

Open and Novel Issues in XML Database Applications ◽

10.4018/978-1-60566-308-1.ch005 ◽

2010 ◽

pp. 89-107

Author(s):

Mingzhu Wei ◽

Ming Li ◽

Elke A. Rundensteiner ◽

Murali Mani ◽

Hong Su

Keyword(s):

Query Processing ◽

Input Data ◽

Stream Processing ◽

Query Execution ◽

Data Semantics ◽

Stream Query Processing ◽

Xml Stream ◽

And Algebra ◽

Xml Streams ◽

Compare And Contrast

Stream applications bring the challenge of efficiently processing queries on sequentially accessible XML data streams. In this chapter, the authors study the current techniques and open challenges of XML stream processing. Firstly, they examine the input data semantics in XML streams and introduce the state-of-the-art of XML stream processing. Secondly, they compare and contrast the automatonbased and algebra-based techniques used in XML stream query execution. Thirdly, they study different optimization strategies that have been investigated for XML stream processing – in particular, they discuss cost-based optimization as well as schema-based optimization strategies. Lastly but not least, the authors list several key open challenges in XML stream processing.

Download Full-text

Efficient SPARQL Query Processing Based on Adjacent-Predicate Structure Index

2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC) ◽

10.1109/ihmsc.2018.00070 ◽

2018 ◽

Author(s):

Haoyuan Guan ◽

Bin Zhu ◽

Guanyu Li ◽

Yongjia Cai

Keyword(s):

Query Processing ◽

Sparql Query ◽

Structure Index

Download Full-text

Query Optimization of Distributed RDF Data Based on MapReduce

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.441.970 ◽

2013 ◽

Vol 441 ◽

pp. 970-973

Author(s):

Yan Qin Zhang ◽

Jing Bin Wang

Keyword(s):

Query Optimization ◽

Query Language ◽

Sparql Query ◽

Query Execution ◽

Data Set ◽

Greedy Strategy ◽

Lehigh University ◽

Join Algorithm ◽

Simple Protocol ◽

Rdf Data

As the development of the semantic web, RDF data set has grown rapidly, thus causing the query problem of massive RDF. Using distributed technique to complete the SPARQL (Simple Protocol and RDF Query Language) Query is a new way of solving the large amounts of RDF query problem. At present, most of the RDF query strategies based on Hadoop have to use multiple MapReduce jobs to complete the task, resulting in waste of time. In order to overcome this drawback, MRQJ (using MapReduce to query and join) algorithm is proposed in the paper, which firstly uses a greedy strategy to generate join plan, then only one MapReduce job should be created to get the query results in SPARQL query execution. Finally, a contrast experiment on the LUBM (Lehigh University Benchmark) test data set is conducted, the results of which show that MRQJ method has a great advantage in the case that the query is more complicated.

Download Full-text