scholarly journals Multi-Dimensional Event Data in Graph Databases

Author(s):  
Stefan Esser ◽  
Dirk Fahland

AbstractProcess event data is usually stored either in a sequential process event log or in a relational database. While the sequential, single-dimensional nature of event logs aids querying for (sub)sequences of events based on temporal relations such as “directly/eventually-follows,” it does not support querying multi-dimensional event data of multiple related entities. Relational databases allow storing multi-dimensional event data, but existing query languages do not support querying for sequences or paths of events in terms of temporal relations. In this paper, we propose a general data model for multi-dimensional event data based on labeled property graphs that allows storing structural and temporal relations in a single, integrated graph-based data structure in a systematic way. We provide semantics for all concepts of our data model, and generic queries for modeling event data over multiple entities that interact synchronously and asynchronously. The queries allow for efficiently converting large real-life event data sets into our data model, and we provide 5 converted data sets for further research. We show that typical and advanced queries for retrieving and aggregating such multi-dimensional event data can be formulated and executed efficiently in the existing query language Cypher, giving rise to several new research questions. Specifically, aggregation queries on our data model enable process mining over multiple inter-related entities using off-the-shelf technology.

Author(s):  
Etienne Toussaint ◽  
Paolo Guagliardo ◽  
Leonid Libkin

Answering queries over incomplete data is based on finding answers that are certainly true, independently of how missing values are interpreted. This informal description has given rise to several different mathematical definitions of certainty. To unify them, a framework based on "explanations", or extra information about incomplete data, was recently proposed. It partly succeeded in justifying query answering methods for relational databases under set semantics, but had two major limitations. First, it was firmly tied to the set data model, and a fixed way of comparing incomplete databases with respect to their information content. These assumptions fail for real-life database queries in languages such as SQL that use bag semantics instead. Second, it was restricted to queries that only manipulate data, while in practice most analytical SQL queries invent new values, typically via arithmetic operations and aggregation. To leverage our understanding of the notion of certainty for queries in SQL-like languages, we consider incomplete databases whose information content may be enriched by additional knowledge. The knowledge order among them is derived from their semantics, rather than being fixed a priori. The resulting framework allows us to capture and justify existing notions of certainty, and extend these concepts to other data models and query languages. As natural applications, we provide for the first time a well-founded definition of certain answers for the relational bag data model and for value-inventing queries on incomplete databases, addressing the key shortcomings of previous approaches.


2011 ◽  
Vol 10 (02) ◽  
pp. 193-208 ◽  
Author(s):  
Georgios John Fakas ◽  
Ben Cawley ◽  
Zhi Cai

This paper presents a novel approach for extracting personal data and automatically generating Personal Data Reports (PDRs) from relational databases. Such PDRs can be used among other purposes for compliance with Subject Access Requests of Data Protection Acts. Two methodologies with different usability characteristics are introduced: (1) the GDSBased Method and (2) the By Schema Browsing Method. The proposed methdologies combine the use of graphs and query languages for the construction of PDRs. The novelty of these methodologies is that they do not require any prior knowledge of either the database schema or of any query language by the users. An optimisation algorithm is proposed that employs Hash Tables and reuses already found data. We conducted several queries on two standard benchmark databases (i.e. TPC-H and Microsoft Northwind) and we present the performance results.


Author(s):  
Artem Chebotko ◽  
Shiyong Lu

Relational technology has shown to be very useful for scalable Semantic Web data management. Numerous researchers have proposed to use RDBMSs to store and query voluminous RDF data using SQL and RDF query languages. This chapter studies how RDF queries with the so called well-designed graph patterns and nested optional patterns can be efficiently evaluated in an RDBMS. The authors propose to extend relational algebra with a novel relational operator, nested optional join (NOJ), that is more efficient than left outer join in processing nested optional patterns of well-designed graph patterns. They design three efficient algorithms to implement the new operator in relational databases: (1) nested-loops NOJ algorithm, NL-NOJ, (2) sort-merge NOJ algorithm, SM-NOJ, and (3) simple hash NOJ algorithm, SH-NOJ. Using a real life RDF dataset, the authors demonstrate the efficiency of their algorithms by comparing them with the corresponding left outer join implementations and explore the effect of join selectivity on the performance of these algorithms.


Author(s):  
Kornelije Rabuzin

In the past few years, many NoSQL databases have emerged, including graph databases. NoSQL databases have certain advantages and they can be used in certain domains as an alternative to relational databases. In order to use graph databases, one needs to be familiar with specific languages like Cypher Query Language (CQL) or Gremlin. However, some statements in CQL can be considered too complex for end users as it is shown later on. Because of that, the main idea of this chapter is to explore two other languages for graph databases. One of them is new and it is used to pose queries visually. Since CQL does not support recursion, views, etc., the other language is used to show how to use recursion and views on a graph database.


Author(s):  
Sapiahon Khaidarova ◽  

The article outlines the methods for creating SQL queries in relational databases. The use of the structured query language SQL in relational databases is substantiated. It provides information about the SQL standard and the three-tier database organization system. The author describes the choice of a data model based on the conceptual level using to that end an example of the Kokand Pedagogical Institute as the relational database model. A relational conceptual diagram of the information model of a pedagogical institute is compiled. Such a conceptual diagram is depicted using a cluster. Objects of the subject area are depicted in the form of tables, which differ from each other in geometric shapes or colors. The relationships between tables in Microsoft Access are presented. The basic rules for creating and filling tables in SQL using the instructions CREATE TABLE and INSERT INTO are considered. The syntax of the SELECT statement is given. All offers of the SELECT statement and their order are listed. Examples are given for compiling simple queries and subqueries in SQL using the SELECT statement for the database of the Kokand Pedagogical Institute. Information about the order of execution of internal and external requests is given. The article considers the ORDER BY offer of a SELECT statement for sorting query results.


Author(s):  
Kornelije Rabuzin

In the past few years many NoSQL databases have emerged, including graph databases. NoSQL databases have certain advantages and they can be used in certain domains as an alternative to relational databases. In order to use graph databases, one needs to be familiar with specific languages like Cypher Query Language (CQL) or Gremlin. However, some statements in CQL can be considered too complex for end users as it is shown later on. Because of that the main idea of this paper is to explore two other languages for graph databases. One of them is new and it is used to pose queries visually. Since CQL does not support recursion, views, etc., the other language is used to show how to use recursion and views on a graph database.


2004 ◽  
Vol 3 (4) ◽  
pp. 241-252 ◽  
Author(s):  
Michael Rice ◽  
William Gladstone ◽  
Michael Weir

We discuss how relational databases constitute an ideal framework for representing and analyzing large-scale genomic data sets in biology. As a case study, we describe a Drosophila splice-site database that we recently developed at Wesleyan University for use in research and teaching. The database stores data about splice sites computed by a custom algorithm using Drosophila cDNA transcripts and genomic DNA and supports a set of procedures for analyzing splice-site sequence space. A generic Web interface permits the execution of the procedures with a variety of parameter settings and also supports custom structured query language queries. Moreover, new analytical procedures can be added by updating special metatables in the database without altering the Web interface. The database provides a powerful setting for students to develop informatic thinking skills.


Author(s):  
Renu Chaudhary ◽  
Gagangeet Singh

NoSQL databases (commonly interpreted by developers as „not only SQL databases‟ and not „no SQL‟) is an emerging alternative to the most widely used relational databases. As the name suggests, it does not completely replace SQL but compliments it in such a way that they can co-exist. In this paper we will be discussing the NoSQL data model, types of NoSQL data stores, characteristics and features of each data store, query languages used in NoSQL, advantages and disadvantages of NoSQL over RDBMS and the future prospects of NoSQL. Motivation/Background:NoSQL systems exhibit the ability to store and index arbitrarily big data sets while enabling a large amount of concurrent user requests. Method:Many people think NoSQL is a derogatory term created to poke at SQL. In reality, the term means Not Only SQL. The idea is that both technologies can coexist and each has its place. Results:Large-scale data processing (parallel processing over distributed systems); Embedded IR (basic machine-to-machine information look-up & retrieval); Exploratory analytics on semi-structured data (expert level); Large volume data storage (unstructured, semi-structured, small-packet structured). Conclusions:This study report motivation to provide an independent understanding of the strengths and weaknesses of various NoSQL database approaches to supporting applications that process huge volumes of data; as well as to provide a global overview of this non-relational  NoSQL databases.


Author(s):  
DONG-HYUK IM ◽  
SANG-WON LEE ◽  
HYOUNG-JOO KIM

RDF is widely used as an ontology language for representing the metadata in Semantic Web, knowledge management system and E-commerce. Since ontologies model the knowledge in a particular domain, they may change over time. Furthermore, ontologies are usually developed and controlled in a distributed and collaborative way. Thus, it is very important to be able to manage multiple versions for RDF data. Earlier studies on RDF versions have focused on providing the accesses to different versions (i.e. snapshots) and computing the differences between those two versions. However, the existing approaches suffer from the space overhead for large scale data, since all snapshots should be redundantly kept in a repository. Moreover, it is very time consuming to compute the delta between two specific versions, which is very common in RDF applications. In this paper, we propose a framework for RDF version management in relational databases. It stores the original version and the deltas between two consecutive versions, thereby reducing the space requirement considerably. The other benefit of our approach is appropriate for change queries. On the flip side, in order to answer a query on a specific logical version, version should be constructed on the fly by applying the deltas between the original version and the logical version. This can slow down query performance. In order to overcome this, we propose a compression technique for deltas, called Aggregated Delta, to create a logical version directly rather than executing the sequence of deltas. An experimental study with real life RDF data sets shows our framework maintains multiple versions efficiently.


2021 ◽  
Vol 1 (2) ◽  
pp. 17-20
Author(s):  
Renas Rajab Asaad ◽  
Revink Masoud Abdulhakim

Recent days, the concept of data mining and the need for it, its objectives and its uses in various fields, explain its procedures and tools, the type of data that is mined, and the structural structure of that data while simplifying the concept of databases, relational databases and the query language. Explain the benefits and uses of mining or mining data stored in specialized databases in various vital areas of society. Also, it is the process of analyzing data from different perspectives and discovering imbalances, patterns and correlations in data sets that are insightful and useful for predicting results that help you make a good decision. Let's bring back our mining example, when you plan to prospect for gold or any valuable minerals you first have to determine where you think the gold is to start digging. In the process of data mining we have the same concept. To mine data, you must first collect data from various sources, prepare it, and store it in one place, as nothing from data mining is related to the process of searching for the data itself. Currently, the company is storing data in what is called a Datawarehouse which we will talk about in a later stage in detail.


Sign in / Sign up

Export Citation Format

Share Document