C-SPARQL: A CONTINUOUS QUERY LANGUAGE FOR RDF DATA STREAMS

This article defines C-SPARQL, an extension of SPARQL whose distinguishing feature is the support of continuous queries, i.e. queries registered over RDF data streams and then continuously executed. Queries consider windows, i.e. the most recent triples of such streams, observed while data is continuously flowing. Supporting streams in RDF format guarantees interoperability and opens up important applications, in which reasoners can deal with evolving knowledge over time. C-SPARQL is presented by means of a full specification of the syntax, a formal semantics, and a comprehensive set of examples, relative to urban computing applications, that systematically cover the SPARQL extensions. The expression of meaningful queries over streaming data is strictly connected to the availability of aggregation primitives, thus C-SPARQL also includes extensions in this respect.

Download Full-text

Query Rewriting for Incremental Continuous Query Evaluation in HIFUN

Algorithms ◽

10.3390/a14050149 ◽

2021 ◽

Vol 14 (5) ◽

pp. 149

Author(s):

Petros Zervoudakis ◽

Haridimos Kondylakis ◽

Nicolas Spyratos ◽

Dimitris Plexousakis

Keyword(s):

Query Optimization ◽

Query Language ◽

Computational Cost ◽

Continuous Queries ◽

Continuous Query ◽

Query Rewriting ◽

Query Evaluation ◽

Clear Separation ◽

Complete Dataset ◽

High Level

HIFUN is a high-level query language for expressing analytic queries of big datasets, offering a clear separation between the conceptual layer, where analytic queries are defined independently of the nature and location of data, and the physical layer, where queries are evaluated. In this paper, we present a methodology based on the HIFUN language, and the corresponding algorithms for the incremental evaluation of continuous queries. In essence, our approach is able to process the most recent data batch by exploiting already computed information, without requiring the evaluation of the query over the complete dataset. We present the generic algorithm which we translated to both SQL and MapReduce using SPARK; it implements various query rewriting methods. We demonstrate the effectiveness of our approach in temrs of query answering efficiency. Finally, we show that by exploiting the formal query rewriting methods of HIFUN, we can further reduce the computational cost, adding another layer of query optimization to our implementation.

Download Full-text

TADILOF: Time Aware Density-Based Incremental Local Outlier Detection in Data Streams

Sensors ◽

10.3390/s20205829 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5829 ◽

Cited By ~ 1

Author(s):

Jen-Wei Huang ◽

Meng-Xun Zhong ◽

Bijay Prasad Jaysawal

Keyword(s):

Outlier Detection ◽

Data Streams ◽

Data Stream ◽

State Of The Art ◽

Streaming Data ◽

Current State ◽

Data Points ◽

Local Outlier ◽

Time Aware ◽

Over Time

Outlier detection in data streams is crucial to successful data mining. However, this task is made increasingly difficult by the enormous growth in the quantity of data generated by the expansion of Internet of Things (IoT). Recent advances in outlier detection based on the density-based local outlier factor (LOF) algorithms do not consider variations in data that change over time. For example, there may appear a new cluster of data points over time in the data stream. Therefore, we present a novel algorithm for streaming data, referred to as time-aware density-based incremental local outlier detection (TADILOF) to overcome this issue. In addition, we have developed a means for estimating the LOF score, termed "approximate LOF," based on historical information following the removal of outdated data. The results of experiments demonstrate that TADILOF outperforms current state-of-the-art methods in terms of AUC while achieving similar performance in terms of execution time. Moreover, we present an application of the proposed scheme to the development of an air-quality monitoring system.

Download Full-text

Technology of Continuous Query Optimization over Data Streams

2008 International Symposium on Information Science and Engineering ◽

10.1109/isise.2008.36 ◽

2008 ◽

Author(s):

Feng Weibing ◽

Li Zhanhuai

Keyword(s):

Query Optimization ◽

Data Streams ◽

Continuous Query

Download Full-text

A Fitting Approach to Construct and Measurement Alignment

Organizational Research Methods ◽

10.1177/1094428117728372 ◽

2017 ◽

Vol 21 (3) ◽

pp. 592-632 ◽

Cited By ~ 19

Author(s):

Margaret M. Luciano ◽

John E. Mathieu ◽

Semin Park ◽

Scott I. Tannenbaum

Keyword(s):

Big Data ◽

Data Streams ◽

Iterative Process ◽

Emerging Technologies ◽

Measurement Techniques ◽

Great Promise ◽

Big Data Technologies ◽

Nearly Continuous ◽

Dynamic Phenomena ◽

Over Time

Many phenomena of interest to management and psychology scholars are dynamic and change over time. One of the primary impediments to the examination of dynamic phenomena has been challenges associated with collecting data at a sufficient frequency and duration to accurately model such changes. Emerging technologies that produce nearly continuous streams of big data offer great promise to address those challenges; however, they introduce new methodological challenges and construct validity concerns. We seek to integrate the emerging big data technologies into the existing repertoire of measurement techniques and advance an iterative process to enhance their measurement fit. First, we provide an overview of dynamic constructs and temporal frameworks, highlighting their measurement implications. Second, we discuss different data streams and feature emerging technologies that leverage big data as a means to index dynamic constructs. Third, we integrate the previous sections and advance an iterative approach to achieving measurement fit, highlighting factors that make some measurement choices more suitable and viable than others. In so doing, we hope to accelerate the advancement of dynamic theories and methods.

Download Full-text

Composite Event Processing for Data Streams and Domain Knowledge

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.219-220.927 ◽

2011 ◽

Vol 219-220 ◽

pp. 927-931

Author(s):

Jun Qiang Liu ◽

Xiao Ling Guan

Keyword(s):

Query Optimization ◽

Data Streams ◽

Domain Knowledge ◽

Semantic Information ◽

Query Language ◽

Processing System ◽

Optimization Techniques ◽

Research Attention ◽

Composite Event ◽

Solid Foundation

In recent years the processing of composite event queries over data streams has attracted a lot of research attention. Traditional database techniques were not designed for stream processing system. Furthermore, example continuous queries are often formulated in declarative query language without specifying the semantics. To overcome these deficiencies, this article presents the design, implementation, and evaluation of a system that executes data streams with semantic information. Then, a set of optimization techniques are proposed for handling query. So, our approach not only makes it possible to express queries with a sound semantics, but also provides a solid foundation for query optimization. Experiment results show that our approach is effective and efficient for data streams and domain knowledge.

Download Full-text

A Query Language for Workflow Logs

ACM Transactions on Management Information Systems ◽

10.1145/3482968 ◽

2022 ◽

Vol 13 (2) ◽

pp. 1-28

Author(s):

Yan Tang ◽

Weilong Cui ◽

Jianwen Su

Keyword(s):

Business Process ◽

Evaluation Method ◽

Ad Hoc ◽

Query Language ◽

Cost Model ◽

Formal Semantics ◽

Control Flow ◽

Query Evaluation ◽

Evaluation Algorithm ◽

Laws And Policies

A business process (workflow) is an assembly of tasks to accomplish a business goal. Real-world workflow models often demanded to change due to new laws and policies, changes in the environment, and so on. To understand the inner workings of a business process to facilitate changes, workflow logs have the potential to enable inspecting, monitoring, diagnosing, analyzing, and improving the design of a complex workflow. Querying workflow logs, however, is still mostly an ad hoc practice by workflow managers. In this article, we focus on the problem of querying workflow log concerning both control flow and dataflow properties. We develop a query language based on “incident patterns” to allow the user to directly query workflow logs instead of having to transform such queries into database operations. We provide the formal semantics and a query evaluation algorithm of our language. By deriving an accurate cost model, we develop an optimization mechanism to accelerate query evaluation. Our experiment results demonstrate the effectiveness of the optimization and achieves up to 50× speedup over an adaption of existing evaluation method.

Download Full-text

IDSM ChemWebRDF: SPARQLing small-molecule datasets

Journal of Cheminformatics ◽

10.1186/s13321-021-00515-1 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Jakub Galgonek ◽

Jiří Vondrášek

Keyword(s):

Data Storage ◽

Small Molecule ◽

Web Application ◽

Query Language ◽

Data Interoperability ◽

Sparql Endpoint ◽

Data Source ◽

Rdf Data ◽

Relational Form ◽

Federated Queries

AbstractThe Resource Description Framework (RDF), together with well-defined ontologies, significantly increases data interoperability and usability. The SPARQL query language was introduced to retrieve requested RDF data and to explore links between them. Among other useful features, SPARQL supports federated queries that combine multiple independent data source endpoints. This allows users to obtain insights that are not possible using only a single data source. Owing to all of these useful features, many biological and chemical databases present their data in RDF, and support SPARQL querying. In our project, we primary focused on PubChem, ChEMBL and ChEBI small-molecule datasets. These datasets are already being exported to RDF by their creators. However, none of them has an official and currently supported SPARQL endpoint. This omission makes it difficult to construct complex or federated queries that could access all of the datasets, thus underutilising the main advantage of the availability of RDF data. Our goal is to address this gap by integrating the datasets into one database called the Integrated Database of Small Molecules (IDSM) that will be accessible through a SPARQL endpoint. Beyond that, we will also focus on increasing mutual interoperability of the datasets. To realise the endpoint, we decided to implement an in-house developed SPARQL engine based on the PostgreSQL relational database for data storage. In our approach, data are stored in the traditional relational form, and the SPARQL engine translates incoming SPARQL queries into equivalent SQL queries. An important feature of the engine is that it optimises the resulting SQL queries. Together with optimisations performed by PostgreSQL, this allows efficient evaluations of SPARQL queries. The endpoint provides not only querying in the dataset, but also the compound substructure and similarity search supported by our Sachem project. Although the endpoint is accessible from an internet browser, it is mainly intended to be used for programmatic access by other services, for example as a part of federated queries. For regular users, we offer a rich web application called ChemWebRDF using the endpoint. The application is publicly available at https://idsm.elixir-czech.cz/chemweb/.

Download Full-text