DESERT: A Continuous SPARQL Query Engine for On-Demand Query Answering

2018 ◽  
Vol 12 (03) ◽  
pp. 373-397 ◽  
Author(s):  
Farah Karim ◽  
Ioanna Lytra ◽  
Christian Mader ◽  
Sören Auer ◽  
Maria-Esther Vidal

The Internet of Things (IoT) has been rapidly adopted in many domains ranging from household appliances e.g. ventilation, lighting, and heating, to industrial manufacturing and transport networks. Despite the, enormous benefits of optimization, monitoring, and maintenance rendered by IoT devices, an ample amount of data is generated continuously. Semantically describing IoT generated data using ontologies enables a precise interpretation of this data. However, ontology-based descriptions tremendously increase the size of IoT data and in presence of repeated sensor measurements, a large amount of the data are duplicates that do not contribute to new insights during query processing or IoT data analytics. In order to ensure that only required ontology-based descriptions are generated, we devise a knowledge-driven approach named DESERT that is able to on-[Formula: see text]emand factoriz[Formula: see text] and [Formula: see text]emantically [Formula: see text]nrich st[Formula: see text]eam da[Formula: see text]a. DESERT resorts to a knowledge graph to describe IoT stream data; it utilizes only the data that is required to answer an input continuous SPARQL query and applies a novel method of data factorization to reduce duplicated measurements in the knowledge graph. The performance of DESERT is empirically studied on a collection of continuous SPARQL queries from SRBench, a benchmark of IoT stream data and continuous SPARQL queries. Furthermore, data streams with various combinations of uniform and varying data stream speeds and streaming window size dimensions are considered in the study. Experimental results suggest that DESERT is capable of speeding up continuous query processing while creates knowledge graphs that include no replications.

Author(s):  
Hyunjoong Kang ◽  
Sanghyun Hong ◽  
Kookjin Lee ◽  
Noseong Park ◽  
Soonhyun Kwon

Author(s):  
K. V. Metre

In recent years, many data-intensive and location based applications have emerged that need to process stream data in applications such as network monitoring, telecommunications data management, and sensor networks. Unlike regular queries, a continuous query exists for certain period of time and need to be continuously processed during this time. The algorithms used for data processing for the traditional database systems are not suited to tackle complex and various continuous queries over dynamic streaming data. The indexing for finite queries is preferred to indexing on infinite data to avoid expensive operations of index maintenance. Previous related work focused on moving queries on static objects or static queries on moving object. But now-a-days queries as well as objects are dynamic. So, hybrid indexing for queries significantly reduces the space costs and scales well with the increasing data. To deal with the speed of unbounded data, it is necessary to use data parallelism in query processing. The data parallelism in query processing offers better performance, availability and scalability.


Semantic Web ◽  
2021 ◽  
pp. 1-26
Author(s):  
Umair Qudus ◽  
Muhammad Saleem ◽  
Axel-Cyrille Ngonga Ngomo ◽  
Young-Koo Lee

Finding a good query plan is key to the optimization of query runtime. This holds in particular for cost-based federation engines, which make use of cardinality estimations to achieve this goal. A number of studies compare SPARQL federation engines across different performance metrics, including query runtime, result set completeness and correctness, number of sources selected and number of requests sent. Albeit informative, these metrics are generic and unable to quantify and evaluate the accuracy of the cardinality estimators of cost-based federation engines. To thoroughly evaluate cost-based federation engines, the effect of estimated cardinality errors on the overall query runtime performance must be measured. In this paper, we address this challenge by presenting novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query engines. We evaluate five cost-based federated SPARQL query engines using existing as well as novel evaluation metrics by using LargeRDFBench queries. Our results provide a detailed analysis of the experimental outcomes that reveal novel insights, useful for the development of future cost-based federated SPARQL query processing engines.


2021 ◽  
Author(s):  
Hamed Hasibi ◽  
Saeed Sedighian Kashi

Fog computing brings cloud capabilities closer to the Internet of Things (IoT) devices. IoT devices generate a tremendous amount of stream data towards the cloud via hierarchical fog nodes. To process data streams, many Stream Processing Engines (SPEs) have been developed. Without the fog layer, the stream query processing executes on the cloud, which forwards much traffic toward the cloud. When a hierarchical fog layer is available, a complex query can be divided into simple queries to run on fog nodes by using distributed stream processing. In this paper, we propose an approach to assign stream queries to fog nodes using container technology. We name this approach Stream Queries Placement in Fog (SQPF). Our goal is to minimize end-to-end delay to achieve a better quality of service. At first, in the emulation step, we make docker container instances from SPEs and evaluate their processing delay and throughput under different resource configurations and queries with varying input rates. Then in the placement step, we assign queries among fog nodes by using a genetic algorithm. The practical approach used in SQPF achieves a near-the-best assignment based on the lowest application deadline in real scenarios, and evaluation results are evidence of this goal.


Sign in / Sign up

Export Citation Format

Share Document