A Distributed SPARQL Query Processing Scheme Considering Data Locality and Query Execution Path

2017 ◽  
Vol 23 (5) ◽  
pp. 275-283
Author(s):  
Byounghoon Kim ◽  
Daeyun Kim ◽  
Geonsik Ko ◽  
Yeonwoo Noh ◽  
Jongtae Lim ◽  
...  
2021 ◽  
Vol 12 (1) ◽  
pp. 122
Author(s):  
Jongtae Lim ◽  
Byounghoon Kim ◽  
Hyeonbyeong Lee ◽  
Dojin Choi ◽  
Kyoungsoo Bok ◽  
...  

Various distributed processing schemes were studied to efficiently utilize a large scale of RDF graph in semantic web services. This paper proposes a new distributed SPARQL query processing scheme considering communication costs in Spark environments to reduce I/O costs during SPARQL query processing. We divide a SPARQL query into several subqueries using a WHERE clause to process a query of an RDF graph stored in a distributed environment. The proposed scheme reduces data communication costs by grouping the divided subqueries in related nodes through the index and processing them, and the grouped subqueries calculate the cost of all possible query execution paths to select an efficient query execution path. The efficient query execution path is selected through the algorithm considering the data parsing cost of all possible query execution paths, amount of data communication, and queue time per node. It is shown through various performance evaluations that the proposed scheme outperforms the existing schemes.


Semantic Web ◽  
2021 ◽  
pp. 1-26
Author(s):  
Umair Qudus ◽  
Muhammad Saleem ◽  
Axel-Cyrille Ngonga Ngomo ◽  
Young-Koo Lee

Finding a good query plan is key to the optimization of query runtime. This holds in particular for cost-based federation engines, which make use of cardinality estimations to achieve this goal. A number of studies compare SPARQL federation engines across different performance metrics, including query runtime, result set completeness and correctness, number of sources selected and number of requests sent. Albeit informative, these metrics are generic and unable to quantify and evaluate the accuracy of the cardinality estimators of cost-based federation engines. To thoroughly evaluate cost-based federation engines, the effect of estimated cardinality errors on the overall query runtime performance must be measured. In this paper, we address this challenge by presenting novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query engines. We evaluate five cost-based federated SPARQL query engines using existing as well as novel evaluation metrics by using LargeRDFBench queries. Our results provide a detailed analysis of the experimental outcomes that reveal novel insights, useful for the development of future cost-based federated SPARQL query processing engines.


Author(s):  
Mingzhu Wei ◽  
Ming Li ◽  
Elke A. Rundensteiner ◽  
Murali Mani ◽  
Hong Su

Stream applications bring the challenge of efficiently processing queries on sequentially accessible XML data streams. In this chapter, the authors study the current techniques and open challenges of XML stream processing. Firstly, they examine the input data semantics in XML streams and introduce the state-of-the-art of XML stream processing. Secondly, they compare and contrast the automatonbased and algebra-based techniques used in XML stream query execution. Thirdly, they study different optimization strategies that have been investigated for XML stream processing – in particular, they discuss cost-based optimization as well as schema-based optimization strategies. Lastly but not least, the authors list several key open challenges in XML stream processing.


2013 ◽  
Vol 441 ◽  
pp. 970-973
Author(s):  
Yan Qin Zhang ◽  
Jing Bin Wang

As the development of the semantic web, RDF data set has grown rapidly, thus causing the query problem of massive RDF. Using distributed technique to complete the SPARQL (Simple Protocol and RDF Query Language) Query is a new way of solving the large amounts of RDF query problem. At present, most of the RDF query strategies based on Hadoop have to use multiple MapReduce jobs to complete the task, resulting in waste of time. In order to overcome this drawback, MRQJ (using MapReduce to query and join) algorithm is proposed in the paper, which firstly uses a greedy strategy to generate join plan, then only one MapReduce job should be created to get the query results in SPARQL query execution. Finally, a contrast experiment on the LUBM (Lehigh University Benchmark) test data set is conducted, the results of which show that MRQJ method has a great advantage in the case that the query is more complicated.


2018 ◽  
pp. 21-55 ◽  
Author(s):  
Bernd Amann ◽  
Olivier Curé ◽  
Hubert Naacke

2015 ◽  
Vol 9 (6) ◽  
pp. 919-933 ◽  
Author(s):  
Xiaoyan Wang ◽  
Tao Yang ◽  
Jinchuan Chen ◽  
Long He ◽  
Xiaoyong Du

Sign in / Sign up

Export Citation Format

Share Document