Graphium Chrysalis: Exploiting Graph Database Engines to Analyze RDF Graphs

The Graph Signature

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2015040102 ◽

2015 ◽

Vol 11 (2) ◽

pp. 36-65 ◽

Cited By ~ 1

Author(s):

Mustafa Jarrar ◽

Anton Deik

Keyword(s):

Large Data ◽

Research Community ◽

Graph Database ◽

Semantic Technologies ◽

Original Graph ◽

Rdf Graphs ◽

Trace Equivalence

Querying large data graphs has brought the attention of the research community. Many solutions were proposed, such as Oracle Semantic Technologies, Virtuoso, RDF3X, and C-Store, among others. Although such approaches have shown good performance in queries with medium complexity, they perform poorly when the complexity of the queries increases. In this paper, the authors propose the Graph Signature Index, a novel and scalable approach to index and query large data graphs. The idea is that they summarize a graph and instead of executing the query on the original graph, they execute it on the summaries. The authors' experiments with Yago (16M triples) have shown that e.g., a query with 4 levels costs 62 sec using Oracle but it only costs about 0.6 sec with their index. Their index can be implemented on top of any Graph database, but they chose to implement it as an extension to Oracle on top of the SEM_MATCH table function. The paper also introduces disk-based versions of the Trace Equivalence and Bisimilarity algorithms to summarize data graphs, and discusses their complexity and usability for RDF graphs.

Download Full-text

Transforming Product Catalogue Relational into Graph Database: a Performance Comparison

2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO) ◽

10.23919/mipro48935.2020.9245152 ◽

2020 ◽

Author(s):

Josip Lorincz ◽

Vlatka Huljic ◽

Dinko Begusic

Keyword(s):

Performance Comparison ◽

Graph Database ◽

A Performance

Download Full-text

RAKING: An Efficient K-Maximal Frequent Pattern Mining Algorithm on Uncertain Graph Database

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2010.01387 ◽

2010 ◽

Vol 33 (8) ◽

pp. 1387-1395 ◽

Cited By ~ 4

Author(s):

Meng HAN ◽

Wei ZHANG ◽

Jian-Zhong LI

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Graph Database ◽

Uncertain Graph ◽

Mining Algorithm ◽

Maximal Frequent Pattern

Download Full-text

Application of Java Relationship Graphs (JRG) to plagiarism detection in Java Projects: A Neo4j Graph Database Approach

2021 The 4th International Conference on Software Engineering and Information Management ◽

10.1145/3451471.3451479 ◽

2021 ◽

Author(s):

Ritu Arora ◽

Arun Motilal Maurya ◽

Yashvardhan Sharma

Keyword(s):

Graph Database ◽

Plagiarism Detection

Download Full-text

Design and implementation of express logistics system based on graph database and Baidu map

2020 2nd International Conference on Information Technology and Computer Application (ITCA) ◽

10.1109/itca52113.2020.00089 ◽

2020 ◽

Author(s):

Zhang Xiaoliang ◽

Zeng Qingtao ◽

Tang Mingjie ◽

Huang Hui

Keyword(s):

Graph Database ◽

Logistics System ◽

Design And Implementation

Download Full-text

Advantages of using graph databases to explore chromatin conformation capture experiments

BMC Bioinformatics ◽

10.1186/s12859-020-03937-0 ◽

2021 ◽

Vol 22 (S2) ◽

Author(s):

Daniele D’Agostino ◽

Pietro Liò ◽

Marco Aldinucci ◽

Ivan Merelli

Keyword(s):

Web Application ◽

High Throughput Sequencing ◽

Cell Types ◽

Graph Database ◽

Graph Databases ◽

Sources Of Information ◽

Chromosome Conformation ◽

Wide Scale ◽

User Friendly ◽

Different Cell Types

Abstract Background High-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells. Methods Here we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types. Results These concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5). Conclusion With the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.

Download Full-text

Parallel Betweenness Computation in Graph Database for Contingency Selection

2020 IEEE Power & Energy Society General Meeting (PESGM) ◽

10.1109/pesgm41954.2020.9281492 ◽

2020 ◽

Author(s):

Yongli Zhu ◽

Renchang Dai ◽

Guangyi Liu

Keyword(s):

Graph Database

Download Full-text

Applying graph database technology for analyzing perturbed co-expression networks in cancer

Database ◽

10.1093/database/baaa110 ◽

2020 ◽

Vol 2020 ◽

Author(s):

Claire M Simpson ◽

Florian Gnad

Keyword(s):

Relational Databases ◽

Molecular Mechanisms ◽

Biological Data ◽

Database Management System ◽

Graph Database ◽

Graph Databases ◽

Graph Representations ◽

Rnaseq Data ◽

Database Technology ◽

Speed Accuracy

Abstract Graph representations provide an elegant solution to capture and analyze complex molecular mechanisms in the cell. Co-expression networks are undirected graph representations of transcriptional co-behavior indicating (co-)regulations, functional modules or even physical interactions between the corresponding gene products. The growing avalanche of available RNA sequencing (RNAseq) data fuels the construction of such networks, which are usually stored in relational databases like most other biological data. Inferring linkage by recursive multiple-join statements, however, is computationally expensive and complex to design in relational databases. In contrast, graph databases store and represent complex interconnected data as nodes, edges and properties, making it fast and intuitive to query and analyze relationships. While graph-based database technologies are on their way from a fringe domain to going mainstream, there are only a few studies reporting their application to biological data. We used the graph database management system Neo4j to store and analyze co-expression networks derived from RNAseq data from The Cancer Genome Atlas. Comparing co-expression in tumors versus healthy tissues in six cancer types revealed significant perturbation tracing back to erroneous or rewired gene regulation. Applying centrality, community detection and pathfinding graph algorithms uncovered the destruction or creation of central nodes, modules and relationships in co-expression networks of tumors. Given the speed, accuracy and straightforwardness of managing these densely connected networks, we conclude that graph databases are ready for entering the arena of biological data.

Download Full-text