GrAPL 2021 Keynote 1: Sparse Adjacency Matrices at the Core of Graph Databases: GraphBLAS the Engine Behind RedisGraph Property Graph Database

Abstract Background High-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells. Methods Here we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types. Results These concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5). Conclusion With the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.

Download Full-text

Applying graph database technology for analyzing perturbed co-expression networks in cancer

Database ◽

10.1093/database/baaa110 ◽

2020 ◽

Vol 2020 ◽

Author(s):

Claire M Simpson ◽

Florian Gnad

Keyword(s):

Relational Databases ◽

Molecular Mechanisms ◽

Biological Data ◽

Database Management System ◽

Graph Database ◽

Graph Databases ◽

Graph Representations ◽

Rnaseq Data ◽

Database Technology ◽

Speed Accuracy

Abstract Graph representations provide an elegant solution to capture and analyze complex molecular mechanisms in the cell. Co-expression networks are undirected graph representations of transcriptional co-behavior indicating (co-)regulations, functional modules or even physical interactions between the corresponding gene products. The growing avalanche of available RNA sequencing (RNAseq) data fuels the construction of such networks, which are usually stored in relational databases like most other biological data. Inferring linkage by recursive multiple-join statements, however, is computationally expensive and complex to design in relational databases. In contrast, graph databases store and represent complex interconnected data as nodes, edges and properties, making it fast and intuitive to query and analyze relationships. While graph-based database technologies are on their way from a fringe domain to going mainstream, there are only a few studies reporting their application to biological data. We used the graph database management system Neo4j to store and analyze co-expression networks derived from RNAseq data from The Cancer Genome Atlas. Comparing co-expression in tumors versus healthy tissues in six cancer types revealed significant perturbation tracing back to erroneous or rewired gene regulation. Applying centrality, community detection and pathfinding graph algorithms uncovered the destruction or creation of central nodes, modules and relationships in co-expression networks of tumors. Given the speed, accuracy and straightforwardness of managing these densely connected networks, we conclude that graph databases are ready for entering the arena of biological data.

Download Full-text

A Rule-based Conversion of an EER Schema to Neo4j Schema Constraints

10.5753/sbbd.2021.17876 ◽

2021 ◽

Author(s):

Telmo Henrique Valverde da Silva ◽

Ronaldo dos Santos Mello

Keyword(s):

Data Model ◽

Design Methodology ◽

A Priori ◽

Database Management System ◽

Conversion Process ◽

Graph Database ◽

Graph Databases ◽

Rule Based ◽

Promising Solution ◽

Entity Relationship

Several application domains hold highly connected data, like supply chain and social network. In this context, NoSQL graph databases raise as a promising solution since relationships are first class citizens in their data model. Nevertheless, a traditional database design methodology initially defines a conceptual schema of the domain data, and the Enhanced Entity-Relationship (EER) model is a common tool. This paper presents a rule-based conversion process from an EER schema to Neo4j schema constraints, as Neo4j is the most representative NoSQL graph database management system with an expressive data model. Different from related work, our conversion process deals with all EER model concepts and generates rules for ensuring schema constraints through a set of Cypher instructions ready to run into a Neo4j database instance, as Neo4J is a schemaless system, and it is not possible to create a schema a priori. We also present an experimental evaluation that demonstrates the viability of our process in terms of performance.

Download Full-text

Exploiting NoSQL Graph Databases and in Memory Architectures for Extracting Graph Structural Data Summaries

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488517500040 ◽

2017 ◽

Vol 25 (01) ◽

pp. 81-109 ◽

Cited By ~ 4

Author(s):

Arnaud Castelltort ◽

Anne Laurent

Keyword(s):

Relational Databases ◽

Structural Data ◽

Scientific Data ◽

Graph Database ◽

Graph Databases ◽

Fuzzy Operators ◽

Data Volume ◽

Novel Method ◽

Memory Architectures ◽

Data Summaries

NoSQL graph databases have been introduced in recent years for dealing with large collections of graph-based data. Scientific data and social networks are among the best examples of the dramatic increase of the use of such structures. NoSQL repositories allow the management of large amounts of data in order to store and query them. Such data are not structured with a predefined schema as relational databases could be. They are rather composed by nodes and relationships of a certain type. For instance, a node can represent a Person and a relationship Friendship. Retrieving the structure of the graph database is thus of great help to users, for example when they must know how to query the data or to identify relevant data sources for recommender systems. For this reason, this paper introduces methods to retrieve structural summaries. Such structural summaries are extracted at different levels of information from the NoSQL graph database. The expression of the mining queries is facilitated by the use of two frame-works: Fuzzy4S allowing to define fuzzy operators and operations with Scala; Cypherf allowing the use of fuzzy operators and operations in the declarative queries over NoSQL graph databases. We show that extracting such summaries can be impossible with the NoSQL query engines because of the data volume and the complexity of the task of automatic knowledge extraction. A novel method based on in memory architectures is thus introduced. This paper provides the definitions of the summaries with the methods to automatically extract them from NoSQL graph databases only and with the help of in-memory architectures. The benefit of our proposition is demonstrated by experimental results.

Download Full-text

GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis

PeerJ ◽

10.7717/peerj.3509 ◽

2017 ◽

Vol 5 ◽

pp. e3509 ◽

Cited By ~ 8

Author(s):

Raquel L. Costa ◽

Luiz Gadelha ◽

Marcelo Ribeiro-Alves ◽

Fábio Porto

Keyword(s):

Protein Interactions ◽

Gene Annotation ◽

Scientific Workflow ◽

Biological Data ◽

Scientific Workflows ◽

Gene Set Enrichment Analysis ◽

Graph Database ◽

Graph Databases ◽

Biological Databases ◽

Transcriptome Data

There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet.

Download Full-text

A study on time models in graph databases for security log analysis

International Journal of Web Information Systems ◽

10.1108/ijwis-03-2021-0023 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Daniel Hofer ◽

Markus Jäger ◽

Aya Khaled Youssef Sayed Mohamed ◽

Josef Küng

Keyword(s):

Computer Security ◽

Suitable Model ◽

Graph Database ◽

Graph Databases ◽

Content Type ◽

Log Files ◽

Log File ◽

Set Up ◽

The One ◽

The Time Domain

Purpose For aiding computer security experts in their study, log files are a crucial piece of information. Especially the time domain is very important for us because in most cases, timestamps are the only linking points between events caused by attackers, faulty systems or simple errors and their corresponding entries in log files. With the idea of storing and analyzing this log information in graph databases, we need a suitable model to store and connect timestamps and their events. This paper aims to find and evaluate different approaches how to store timestamps in graph databases and their individual benefits and drawbacks. Design/methodology/approach We analyse three different approaches, how timestamp information can be represented and stored in graph databases. For checking the models, we set up four typical questions that are important for log file analysis and tested them for each of the models. During the evaluation, we used the performance and other properties as metrics, how suitable each of the models is for representing the log files’ timestamp information. In the last part, we try to improve one promising looking model. Findings We come to the conclusion, that the simplest model with the least graph database-specific concepts in use is also the one yielding the simplest and fastest queries. Research limitations/implications Limitations to this research are that only one graph database was studied and also improvements to the query engine might change future results. Originality/value In the study, we addressed the issue of storing timestamps in graph databases in a meaningful, practical and efficient way. The results can be used as a pattern for similar scenarios and applications.

Download Full-text

XACML Implementation Based on Graph Databases

10.29007/rf56 ◽

2019 ◽

Author(s):

Ying Jin ◽

Krishna Kaja

Keyword(s):

Access Control ◽

Security Policy ◽

Markup Language ◽

Graph Database ◽

Graph Databases ◽

Query Result ◽

Policy Specification ◽

Conflict Resolution Strategies ◽

Control Decision ◽

High Level

Extensible Access Control Markup Language (XACML) is an OASIS standard for security policy specification. It consists of a policy language to define security authorizations and an access control decision language for requests and responses. The high-level policy specification is independent of underlying implementation. Different from existing approaches, this research uses a graph database for XACML implementation. Once a policy is specified, it will be parsed and the parsing results will be processed by eliminating duplicates and resolving conflicts. The final results are saved as graphs in the persistent storage. When a XACML request is submitted, the request is processed as a query to the graph database. Based on this query result, a XACML response will be produced to permit or deny the user’s request. This paper describes the architecture, implementation details, and conflict resolution strategies of our system to implement XACML.

Download Full-text

Efficient Fault-Tolerant Transactions for Distributed Graph Database

Singular Engenharia, Tecnologia e Gestão ◽

10.33911/singular-etg.v1i2.59 ◽

2019 ◽

Vol 1 (2) ◽

pp. 14-20

Author(s):

Ray Neiheiser ◽

Roland Schmitz ◽

Luciana Rech ◽

Manfredo Manfredini

Keyword(s):

Social Networks ◽

Fault Tolerance ◽

Linked Data ◽

Fault Tolerant ◽

Graph Database ◽

Graph Databases ◽

Massive Growth ◽

Atomic Broadcast ◽

A Performance

Through the ongoing trend in graph technologies due to the massive growth of linked data produced by social networks graph databases gained popularity. Replication, a common approach to increase availability in databases, is also used by diverse graph database solutions. Few approaches implementing fault-tolerance in graph databases have been proposed yet.This paper considers deferred update replication using atomic broadcast in order to implement fault-tolerance in distributed graph databases. The main contribution of this paper is a deferred update algorithm adapted to graph databases offering a more scalable and faster solution, showing a performance advantage of over 30\% compared to existing approaches.

Download Full-text

Query Languages for Graph Databases

Advances in Computer and Electrical Engineering - Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics ◽

10.4018/978-1-5225-7598-6.ch047 ◽

2019 ◽

pp. 645-659

Author(s):

Kornelije Rabuzin

Keyword(s):

Relational Databases ◽

Query Language ◽

Main Idea ◽

Query Languages ◽

End Users ◽

The Other ◽

Graph Database ◽

Graph Databases ◽

Nosql Databases ◽

The Past

In the past few years, many NoSQL databases have emerged, including graph databases. NoSQL databases have certain advantages and they can be used in certain domains as an alternative to relational databases. In order to use graph databases, one needs to be familiar with specific languages like Cypher Query Language (CQL) or Gremlin. However, some statements in CQL can be considered too complex for end users as it is shown later on. Because of that, the main idea of this chapter is to explore two other languages for graph databases. One of them is new and it is used to pose queries visually. Since CQL does not support recursion, views, etc., the other language is used to show how to use recursion and views on a graph database.

Download Full-text

Designing Graph Databases With GRAPHED

Journal of Database Management ◽

10.4018/jdm.2019010103 ◽

2019 ◽

Vol 30 (1) ◽

pp. 41-60 ◽

Cited By ~ 1

Author(s):

Gustavo Cordeiro Galvão Van Erven ◽

Rommel Novaes Carvalho ◽

Waldeyr Mendes Cordeiro da Silva ◽

Sergio Lifschitz ◽

Harley Vera-Olivera ◽

...

Keyword(s):

Social Networks ◽

Data Model ◽

Biological Network ◽

Database Systems ◽

Graph Database ◽

Graph Databases ◽

Database Modeling ◽

Schema Design ◽

Conceptual Data ◽

The Relationship

In recent years, graph database systems have become very popular and been deployed mainly in situations where the relationship between data is significant, such as in social networks. Although they do not require a particular schema design, a data model contributes to their consistency. Designing diagrams is an approach to satisfying this demand for a conceptual data model. While researchers and companies have been developing concepts and notations for graph database modeling, their notations focus on their specific implementations. In this article, the authors propose a diagram to address this lack of a generic and comprehensive notation for graph databases modeling, named GRAPHED (Graph Description Diagram for Graph Databases). The authors verified the effectiveness and compatibility of GRAPHED in two case studies: fraud identification, and a biological network model.

Download Full-text