Riso-Tree: An Efficient and Scalable Index for Spatial Entities in Graph Database Management Systems

2021 ◽  
Vol 7 (3) ◽  
pp. 1-39
Author(s):  
Yuhan Sun ◽  
Mohamed Sarwat

With the ubiquity of spatial data, vertexes or edges in graphs can possess spatial location attributes side by side with other non-spatial attributes. For instance, as of June 2018, the Wikidata knowledge graph contains 48,547,142 data items (i.e., vertexes) and 13% of them have spatial location attributes. The article proposes Riso-Tree, a generic efficient and scalable indexing framework for spatial entities in graph database management systems. Riso-Tree enables the fast execution of graph queries that involve different types of spatial predicates (GraSp queries). The proposed framework augments the classic R-Tree structure with pre-materialized sub-graph entries. The pruning power of R-Tree is enhanced with the sub-graph information. Riso-Tree partitions the graph into sub-graphs based on their connectivity to the spatial sub-regions. The proposed index allows for the fast execution of GraSp queries by efficiently pruning the traversed vertexes/edges based upon the materialized sub-graph information. The experiments show that the proposed Riso-Tree achieves up to two orders of magnitude faster execution time than its counterparts when executing GraSp queries on real graphs (e.g., Wikidata). The strategy of limiting the size of each sub-graph entry ( PN max ) is proposed to reduce the storage overhead of Riso-Tree. The strategy can save up to around 70% storage without harming the query performance according to the experiments. Another strategy is proposed to ensure the performance of the index maintenance (Irrelevant Vertexes Skipping). The experiments show that the strategy can improve performance, especially for slow updates. It proves that Riso-Tree is useful for applications that need to support frequent updates.

2019 ◽  
Vol 4 (4) ◽  
pp. 309-322 ◽  
Author(s):  
Yijian Cheng ◽  
Pengjie Ding ◽  
Tongtong Wang ◽  
Wei Lu ◽  
Xiaoyong Du

Abstract Over decades, relational database management systems (RDBMSs) have been the first choice to manage data. Recently, due to the variety properties of big data, graph database management systems (GDBMSs) have emerged as an important complement to RDBMSs. As pointed out in the existing literature, both RDBMSs and GDBMSs are capable of managing graph data and relational data; however, the boundaries of them still remain unclear. For this reason, in this paper, we first extend a unified benchmark for RDBMSs and GDBMSs over the same datasets using the same query workload under the same metrics. We then conduct extensive experiments to evaluate them and make the following findings: (1) RDBMSs outperform GDMBSs by a substantial margin under the workloads which mainly consist of group by, sort, and aggregation operations, and their combinations; (2) GDMBSs show their superiority under the workloads that mainly consist of multi-table join, pattern match, path identification, and their combinations.


This article is devoted to graph database management systems. The main characteristics and capabilities of those systems have been contemplated. The problems that may occur during the social network development have been selected to be solved using a graph data model. The most popular database management systems nowadays, namely, Neo4J, OrientDB and ArangoDB have been chosen for the study. Such characteristics of the selected databases as whether the software is proprietary or freely distributed, whether databases have up-to-date documentation or not, whether they are supported by developers, whether there is a community where you can get answers to your questions, and how much time is needed to master the database have been elaborated. The typical social network queries, when you need to receive results with a large depth of search quickly, have been developed using the query languages Cypher, OrientDB SQL and AQL used in Neo4J, OrientDB and ArangoDB respectively. The comparison of query execution speed has been performed for the selected databases. For this purpose, a graph that has 5000 nodes and 24900 connections has been built by implementing the Barabashi-Albert model for generating random-scale networks. The test tasks for finding friends of three users with the depth of 5 have been generated. The average time for each request has been estimated for several executions. The conclusions have been drawn and the recommendations regarding the selection of the best graph database for social network implementation have been made.


2021 ◽  
Vol 14 (11) ◽  
pp. 2491-2504
Author(s):  
Pranjal Gupta ◽  
Amine Mhedhbi ◽  
Semih Salihoglu

We revisit column-oriented storage and query processing techniques in the context of contemporary graph database management systems (GDBMSs). Similar to column-oriented RDBMSs, GDBMSs support read-heavy analytical workloads that however have fundamentally different data access patterns than traditional analytical workloads. We first derive a set of desiderata for optimizing storage and query processors of GDBMS based on their access patterns. We then present the design of columnar storage, compression, and query processing techniques based on these desiderata. In addition to showing direct integration of existing techniques from columnar RDBMSs, we also propose novel ones that are optimized for GDBMSs. These include a novel list-based query processor, which avoids expensive data copies of traditional block-based processors under many-to-many joins, a new data structure we call single-indexed edge property pages and an accompanying edge ID scheme, and a new application of Jacobson's bit vector index for compressing NULL values and empty lists. We integrated our techniques into the GraphflowDB in-memory GDBMS. Through extensive experiments, we demonstrate the scalability and query performance benefits of our techniques.


Author(s):  
Kornelije Rabuzin ◽  
◽  
Sonja Ristić ◽  
Robert Kudelić ◽  
◽  
...  

In recent years, graph databases have become far more important. They have been proven to be an excellent choice for storing and managing large amounts of interconnected data. Since graph databases (GDB) rely on a graph data model based on graph theory, this study examines whether currently available graph database management systems support the principles of graph theory, and, if so, to what extent. We also show how these systems differ in terms of implementation and languages, and we also discuss which graph database management systems are used today and why.


2021 ◽  
Vol 310 ◽  
pp. 06001
Author(s):  
Alexey A. Kolesnikov ◽  
Pavel M. Kikin

An increasing number of database management systems are expanding their functionality to work with various types of spatial data. This is true for both relational and NoSQL data models. The article describes the main features of those data models for which the functions of storing and processing spatial data are implemented. A comparative analysis of the performance of typical spatial queries for database management systems based on various data models, including multi-model ones, is carried out. The dataset on which the comparison is performed is presented in the form of three blocks of OpenStreetMap vector data for the territory of the Novosibirsk region. Based on the results of the study, recommendations are made on the use of certain data models, depending on the available data and the tasks to be solved.


2021 ◽  
Vol 1902 (1) ◽  
pp. 012059
Author(s):  
A S Dubrovin ◽  
O V Ogorodnikova ◽  
E G Tsarkova ◽  
E A Andreeva ◽  
T N Kulikova

Sign in / Sign up

Export Citation Format

Share Document