scholarly journals Applying graph database technology for analyzing perturbed co-expression networks in cancer

Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Claire M Simpson ◽  
Florian Gnad

Abstract Graph representations provide an elegant solution to capture and analyze complex molecular mechanisms in the cell. Co-expression networks are undirected graph representations of transcriptional co-behavior indicating (co-)regulations, functional modules or even physical interactions between the corresponding gene products. The growing avalanche of available RNA sequencing (RNAseq) data fuels the construction of such networks, which are usually stored in relational databases like most other biological data. Inferring linkage by recursive multiple-join statements, however, is computationally expensive and complex to design in relational databases. In contrast, graph databases store and represent complex interconnected data as nodes, edges and properties, making it fast and intuitive to query and analyze relationships. While graph-based database technologies are on their way from a fringe domain to going mainstream, there are only a few studies reporting their application to biological data. We used the graph database management system Neo4j to store and analyze co-expression networks derived from RNAseq data from The Cancer Genome Atlas. Comparing co-expression in tumors versus healthy tissues in six cancer types revealed significant perturbation tracing back to erroneous or rewired gene regulation. Applying centrality, community detection and pathfinding graph algorithms uncovered the destruction or creation of central nodes, modules and relationships in co-expression networks of tumors. Given the speed, accuracy and straightforwardness of managing these densely connected networks, we conclude that graph databases are ready for entering the arena of biological data.

2021 ◽  
Author(s):  
Telmo Henrique Valverde da Silva ◽  
Ronaldo dos Santos Mello

Several application domains hold highly connected data, like supply chain and social network. In this context, NoSQL graph databases raise as a promising solution since relationships are first class citizens in their data model. Nevertheless, a traditional database design methodology initially defines a conceptual schema of the domain data, and the Enhanced Entity-Relationship (EER) model is a common tool. This paper presents a rule-based conversion process from an EER schema to Neo4j schema constraints, as Neo4j is the most representative NoSQL graph database management system with an expressive data model. Different from related work, our conversion process deals with all EER model concepts and generates rules for ensuring schema constraints through a set of Cypher instructions ready to run into a Neo4j database instance, as Neo4J is a schemaless system, and it is not possible to create a schema a priori. We also present an experimental evaluation that demonstrates the viability of our process in terms of performance.


Author(s):  
Arnaud Castelltort ◽  
Anne Laurent

NoSQL graph databases have been introduced in recent years for dealing with large collections of graph-based data. Scientific data and social networks are among the best examples of the dramatic increase of the use of such structures. NoSQL repositories allow the management of large amounts of data in order to store and query them. Such data are not structured with a predefined schema as relational databases could be. They are rather composed by nodes and relationships of a certain type. For instance, a node can represent a Person and a relationship Friendship. Retrieving the structure of the graph database is thus of great help to users, for example when they must know how to query the data or to identify relevant data sources for recommender systems. For this reason, this paper introduces methods to retrieve structural summaries. Such structural summaries are extracted at different levels of information from the NoSQL graph database. The expression of the mining queries is facilitated by the use of two frame-works: Fuzzy4S allowing to define fuzzy operators and operations with Scala; Cypherf allowing the use of fuzzy operators and operations in the declarative queries over NoSQL graph databases. We show that extracting such summaries can be impossible with the NoSQL query engines because of the data volume and the complexity of the task of automatic knowledge extraction. A novel method based on in memory architectures is thus introduced. This paper provides the definitions of the summaries with the methods to automatically extract them from NoSQL graph databases only and with the help of in-memory architectures. The benefit of our proposition is demonstrated by experimental results.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3509 ◽  
Author(s):  
Raquel L. Costa ◽  
Luiz Gadelha ◽  
Marcelo Ribeiro-Alves ◽  
Fábio Porto

There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet.


Author(s):  
Kornelije Rabuzin

In the past few years, many NoSQL databases have emerged, including graph databases. NoSQL databases have certain advantages and they can be used in certain domains as an alternative to relational databases. In order to use graph databases, one needs to be familiar with specific languages like Cypher Query Language (CQL) or Gremlin. However, some statements in CQL can be considered too complex for end users as it is shown later on. Because of that, the main idea of this chapter is to explore two other languages for graph databases. One of them is new and it is used to pose queries visually. Since CQL does not support recursion, views, etc., the other language is used to show how to use recursion and views on a graph database.


Relational databases are holding the maximum amount of data underpinning the web. They show excellent record of convenience and efficiency in repository, optimized query execution, scalability, security and accuracy. Recently graph databases are seen as an good replacement for relational database. When compared to the relational data model, graph data model is more vivid, strong and data expressed in it models relationships among data properly. An important requirement is to increase the vast quantities of data stored in RDB into web. In this situation, migration from relational to graph format is very advantageous. Both databases have advantages and limitations depending on the form of queries. Thus, this paper converts relational to graph database by utilizing the schema in order to develop a dual database system through migration, which merges the capability of both relational db and graph db. The experimental results are provided to demonstrate the practicability of the method and query response time over the target database. The proposed concept is proved by implementing it on MySQL and Neo4j


2019 ◽  
Vol 8 (2) ◽  
pp. 1722-1726

Paper Relational database model (also called SQL databases) are one of the prevalent databases that are used with structured data. Currently news demands are arising owing to the magnitude with which the internet and social networks are getting used which brought importance to graph-structured data. Graph database (a nosql database) deal more naturally with highly connected data and are thus becoming popular and efficient choice. Due to limitations faced by relational databases in handling relationships (highly connected data), enterprise information systems find graph database as a promising alternative. According to the form of queries and property of data both relational and graph databases have vitality and flaws. Since most of the data is available in relational schema in this context, the conversion of an application from a relational to a graph format is very beneficial. Thus, this paper develops a dual database system through migration, which unifies the strengths of both relational databases and graph databases. Experimental results have shown that, this hybrid system has efficient performance.


Author(s):  
Kornelije Rabuzin

In the past few years many NoSQL databases have emerged, including graph databases. NoSQL databases have certain advantages and they can be used in certain domains as an alternative to relational databases. In order to use graph databases, one needs to be familiar with specific languages like Cypher Query Language (CQL) or Gremlin. However, some statements in CQL can be considered too complex for end users as it is shown later on. Because of that the main idea of this paper is to explore two other languages for graph databases. One of them is new and it is used to pose queries visually. Since CQL does not support recursion, views, etc., the other language is used to show how to use recursion and views on a graph database.


2015 ◽  
Vol 09 (04) ◽  
pp. 523-545 ◽  
Author(s):  
Shao-Ting Wang ◽  
Jennifer Jin ◽  
Pete Rivett ◽  
Atsushi Kitazawa

Graph databases can be defined as databases that use graph structures with nodes, edges and properties to store data. Semantic queries and graph-oriented operations are used to access them. With a rapidly growing amount of information on the Internet in recent years, relational databases suffer performance degradation as a large number of nodes are added due to the number of entries in join tables. Therefore, based on the network nature of Internet activities, graph databases are designed for fast access to complex data found in social networks, recommendation engines and networked system. The main objective of this survey is to present the work that has been done in the area of graph database, including query languages, processing, and related application.


2016 ◽  
Author(s):  
Raquel L. Costa ◽  
Luiz M. R. Gadelha ◽  
Marcelo Ribeiro-Alves ◽  
Fabio Porto

AbstractBackgroundThere are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced may additionally be integrated with other biological databases, such as Protein-Protein Interactions and annotations. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managingthe resulting data and its respective metadata are challenging tasks. Running in-silico experiments to structure and compose the information as needed for analysis is a daunting task. Different programsmay need to be applied and different files are produced during the experiment cycle. In this context,the availability of a platform supporting experiment execution is paramount.ResultsWe present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. GeNNet includes pre-loaded biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and geneset enrichment analysis. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships.The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene regulatory networks.ConclusionsGeNNet is the first platform to integrate the analytical process of transcriptome data with graph database. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers as well can add new functionality to each component of GeNNet. The resulting data allows for testing previous hypotheses about an experiment as well as exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms.


2017 ◽  
Vol 1 (1) ◽  
pp. 04 ◽  
Author(s):  
Jaroslav Pokorny

Comparing graph databases with traditional,e.g., relational databases, some important database features are often missing there. Particularly, a graph database schema including integrity constraints is mostly not explicitly defined, also a conceptual modelling is not used. It is hard to check a consistency of the graph database, because almost no integrity constraints are defined or only their very simple representatives can be specified. In the paper, we discuss these issues and present current possibilities and challenges in graph database modelling. We focus also on integrity constraints modelling and propose functional dependencies between entity types, which reminds modelling functional dependencies known from relational databases. We show a number of examples of often cited GDBMSs and their approach to database schemas and ICs specification. Also a conceptual level of a graph database design is considered. We propose a sufficient conceptual model based on a binary variant of the ER model and show its relationship to a graph database model, i.e. a mapping conceptual schemas to database schemas. An alternative based on the conceptual functions called attributes is presented.  This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Sign in / Sign up

Export Citation Format

Share Document