Nested Optional Join for Efficient Evaluation of SPARQL Nested Optional Graph Patterns

Author(s):  
Artem Chebotko ◽  
Shiyong Lu

Relational technology has shown to be very useful for scalable Semantic Web data management. Numerous researchers have proposed to use RDBMSs to store and query voluminous RDF data using SQL and RDF query languages. This chapter studies how RDF queries with the so called well-designed graph patterns and nested optional patterns can be efficiently evaluated in an RDBMS. The authors propose to extend relational algebra with a novel relational operator, nested optional join (NOJ), that is more efficient than left outer join in processing nested optional patterns of well-designed graph patterns. They design three efficient algorithms to implement the new operator in relational databases: (1) nested-loops NOJ algorithm, NL-NOJ, (2) sort-merge NOJ algorithm, SM-NOJ, and (3) simple hash NOJ algorithm, SH-NOJ. Using a real life RDF dataset, the authors demonstrate the efficiency of their algorithms by comparing them with the corresponding left outer join implementations and explore the effect of join selectivity on the performance of these algorithms.

Author(s):  
Z.M. Ma ◽  
Yanhui Lv ◽  
Li Yan

Ontology is an important part of the W3C standards for the Semantic Web used to specify standard conceptual vocabularies to exchange data among systems, provide reusable knowledge bases, and facilitate interoperability across multiple heterogeneous systems and databases. However, current ontology is not sufficient for handling vague information that is commonly found in many application domains. A feasible solution is to import the fuzzy ability to extend the classical ontology. In this article, we propose a fuzzy ontology generation framework from the fuzzy relational databases, in which the fuzzy ontology consists of fuzzy ontology structure and instances. We simultaneously consider the schema and instances of the fuzzy relational databases, and respectively transform them to fuzzy ontology structure and fuzzy RDF data model. This can ensure the integrality of the original structure as well as the completeness and consistency of the original instances in the fuzzy relational databases.


Author(s):  
Trupti Padiya ◽  
Minal Bhise ◽  
Sanjay Chaudhary

Semantic Web database is an RDF database due to increased use of Semantic Web in real life applications; one can find heavy growth in RDF database. As there is a tremendous increase in RDF data, performance and scalability issues are of main concern. This chapter discusses improving and scaling up query performance for increasingly growing Semantic Web. It discusses current Semantic Web data storage techniques, which have been found to scale poorly and have poor query performance. It discusses the partitioning techniques vertical and horizontal partitioning to improve query performance. To further improve the query performance, along with these partitioning techniques, various compression techniques can also be used. Relational data offers faster execution of queries as compared to RDF data. To demonstrate these ideas, semantic data is converted to relational data and then query performance improvement techniques are applied. The scaling up of Semantic Web data is also discussed.


2018 ◽  
Vol 8 (1) ◽  
pp. 18-37 ◽  
Author(s):  
Median Hilal ◽  
Christoph G. Schuetz ◽  
Michael Schrefl

Abstract The foundations for traditional data analysis are Online Analytical Processing (OLAP) systems that operate on multidimensional (MD) data. The Resource Description Framework (RDF) serves as the foundation for the publication of a growing amount of semantic web data still largely untapped by companies for data analysis. Most RDF data sources, however, do not correspond to the MD modeling paradigm and, as a consequence, elude traditional OLAP. The complexity of RDF data in terms of structure, semantics, and query languages renders RDF data analysis challenging for a typical analyst not familiar with the underlying data model or the SPARQL query language. Hence, conducting RDF data analysis is not a straightforward task. We propose an approach for the definition of superimposed MD schemas over arbitrary RDF datasets and show how to represent the superimposed MD schemas using well-known semantic web technologies. On top of that, we introduce OLAP patterns for RDF data analysis, which are recurring, domain-independent elements of data analysis. Analysts may compose queries by instantiating a pattern using only the MD concepts and business terms. Upon pattern instantiation, the corresponding SPARQL query over the source data can be automatically generated, sparing analysts from technical details and fostering self-service capabilities.


2018 ◽  
Vol 7 (1) ◽  
pp. 35-46
Author(s):  
Krzysztof Myszkorowski

Recursive relationships are used for modelling problems coming from the real life, such as, for example, a relationship describing formal dependencies between employees of an enterprise, where creation of work groups and teams requires analysis of many elements. In conventional database systems, the precision of data is assumed. If our knowledge of the fragment of reality to be modelled is imperfect one should apply tools for describing uncertain or imprecise information. One of them is the fuzzy set theory. The paper deals with recursive relationships in fuzzy databases. The analysis is performed with the use of the theory of interval-valued fuzzy sets. A definition of a fuzzy interval recursive relationship has been presented. The paper defines different connections of entities which participate in such relationships. Op-erations of the extended relational algebra are also discussed.


Author(s):  
DONG-HYUK IM ◽  
SANG-WON LEE ◽  
HYOUNG-JOO KIM

RDF is widely used as an ontology language for representing the metadata in Semantic Web, knowledge management system and E-commerce. Since ontologies model the knowledge in a particular domain, they may change over time. Furthermore, ontologies are usually developed and controlled in a distributed and collaborative way. Thus, it is very important to be able to manage multiple versions for RDF data. Earlier studies on RDF versions have focused on providing the accesses to different versions (i.e. snapshots) and computing the differences between those two versions. However, the existing approaches suffer from the space overhead for large scale data, since all snapshots should be redundantly kept in a repository. Moreover, it is very time consuming to compute the delta between two specific versions, which is very common in RDF applications. In this paper, we propose a framework for RDF version management in relational databases. It stores the original version and the deltas between two consecutive versions, thereby reducing the space requirement considerably. The other benefit of our approach is appropriate for change queries. On the flip side, in order to answer a query on a specific logical version, version should be constructed on the fly by applying the deltas between the original version and the logical version. This can slow down query performance. In order to overcome this, we propose a compression technique for deltas, called Aggregated Delta, to create a logical version directly rather than executing the sequence of deltas. An experimental study with real life RDF data sets shows our framework maintains multiple versions efficiently.


1998 ◽  
Vol 09 (04) ◽  
pp. 399-429
Author(s):  
KIM S. LARSEN

A relation of degree k can be sorted lexicographically in k! different ways, i.e., according to any one of the possible permutations of the schema of the relation. Such permutations are referred to as sort orders. When evaluating unary and binary relational algebra operators using sort-merge algorithms, sort orders fulfilling the constraints enforced by the operators are chosen for the operand relations. The relations are then sorted according to their assigned sort orders, and the result is obtained by merging. Should the operands already be sorted according to one of the permissible sort orders, then only a merging is required. The sort order of the result will depend on the sort orders of the operands. When evaluating whole relational algebra expressions, the result of one operation will be used as an operand to the next. It is desirable to choose sort orders in such a way that the result of one operation will automatically fulfill the requirements of the next. In general, one would like to find a minimal number of operators in the expression for which this cannot be obtained, bearing in mind the overall goal of minimizing the total work. We show that this problem is NP-hard, and that the corresponding decision problem is NP-complete. However, most simplifications of the original problem give rise to efficient algorithms. In fact, most frequently occurring queries can be analyzed in linear time in the size of the query. This is due to the fact that only a very limited number of subsets of all permutations of schemas can be encountered in the algorithms, which means that compact representations for these subsets can be found.


Author(s):  
D. Ulutaş Karakol ◽  
G. Kara ◽  
C. Yılmaz ◽  
Ç. Cömert

<p><strong>Abstract.</strong> Large amounts of spatial data are hold in relational databases. Spatial data in the relational databases must be converted to RDF for semantic web applications. Spatial data is an important key factor for creating spatial RDF data. Linked Data is the most preferred way by users to publish and share data in the relational databases on the Web. In order to define the semantics of the data, links are provided to vocabularies (ontologies or other external web resources) that are common conceptualizations for a domain. Linking data of resource vocabulary with globally published concepts of domain resources combines different data sources and datasets, makes data more understandable, discoverable and usable, improves data interoperability and integration, provides automatic reasoning and prevents data duplication. The need to convert relational data to RDF is coming in sight due to semantic expressiveness of Semantic Web Technologies. One of the important key factors of Semantic Web is ontologies. Ontology means “explicit specification of a conceptualization”. The semantics of spatial data relies on ontologies. Linking of spatial data from relational databases to the web data sources is not an easy task for sharing machine-readable interlinked data on the Web. Tim Berners-Lee, the inventor of the World Wide Web and the advocate of Semantic Web and Linked Data, layed down the Linked Data design principles. Based on these rules, firstly, spatial data in the relational databases must be converted to RDF with the use of supporting tools. Secondly, spatial RDF data must be linked to upper level-domain ontologies and related web data sources. Thirdly, external data sources (ontologies and web data sources) must be determined and spatial RDF data must be linked related data sources. Finally, spatial linked data must be published on the web. The main contribution of this study is to determine requirements for finding RDF links and put forward the deficiencies for creating or publishing linked spatial data. To achieve this objective, this study researches existing approaches, conversion tools and web data sources for relational data conversion to the spatial RDF. In this paper, we have investigated current state of spatial RDF data, standards, open source platforms (particularly D2RQ, Geometry2RDF, TripleGeo, GeoTriples, Ontop, etc.) and the Web Data Sources. Moreover, the process of spatial data conversion to the RDF and how to link it to the web data sources is described. The implementation of linking spatial RDF data to the web data sources is demonstrated with an example use case. Road data has been linked to the one of the related popular web data sources, DBPedia. SILK, a tool for discovering relationships between data items within different Linked Data sources, is used as a link discovery framework. Also, we evaluated other link discovery tools e.g. LIMES, Silk and results are compared to carry out matching/linking task. As a result, linked road data is shared and represented as an information resource on the web and enriched with definitions of related different resources. By this way, road datasets are also linked by the related classes, individuals, spatial relations and properties they cover such as, construction date, road length, coordinates, etc.</p>


Author(s):  
Stefan Esser ◽  
Dirk Fahland

AbstractProcess event data is usually stored either in a sequential process event log or in a relational database. While the sequential, single-dimensional nature of event logs aids querying for (sub)sequences of events based on temporal relations such as “directly/eventually-follows,” it does not support querying multi-dimensional event data of multiple related entities. Relational databases allow storing multi-dimensional event data, but existing query languages do not support querying for sequences or paths of events in terms of temporal relations. In this paper, we propose a general data model for multi-dimensional event data based on labeled property graphs that allows storing structural and temporal relations in a single, integrated graph-based data structure in a systematic way. We provide semantics for all concepts of our data model, and generic queries for modeling event data over multiple entities that interact synchronously and asynchronously. The queries allow for efficiently converting large real-life event data sets into our data model, and we provide 5 converted data sets for further research. We show that typical and advanced queries for retrieving and aggregating such multi-dimensional event data can be formulated and executed efficiently in the existing query language Cypher, giving rise to several new research questions. Specifically, aggregation queries on our data model enable process mining over multiple inter-related entities using off-the-shelf technology.


Author(s):  
Etienne Toussaint ◽  
Paolo Guagliardo ◽  
Leonid Libkin

Answering queries over incomplete data is based on finding answers that are certainly true, independently of how missing values are interpreted. This informal description has given rise to several different mathematical definitions of certainty. To unify them, a framework based on "explanations", or extra information about incomplete data, was recently proposed. It partly succeeded in justifying query answering methods for relational databases under set semantics, but had two major limitations. First, it was firmly tied to the set data model, and a fixed way of comparing incomplete databases with respect to their information content. These assumptions fail for real-life database queries in languages such as SQL that use bag semantics instead. Second, it was restricted to queries that only manipulate data, while in practice most analytical SQL queries invent new values, typically via arithmetic operations and aggregation. To leverage our understanding of the notion of certainty for queries in SQL-like languages, we consider incomplete databases whose information content may be enriched by additional knowledge. The knowledge order among them is derived from their semantics, rather than being fixed a priori. The resulting framework allows us to capture and justify existing notions of certainty, and extend these concepts to other data models and query languages. As natural applications, we provide for the first time a well-founded definition of certain answers for the relational bag data model and for value-inventing queries on incomplete databases, addressing the key shortcomings of previous approaches.


Sign in / Sign up

Export Citation Format

Share Document