On graph query optimization in large networks

Abstract Many graph query languages rely on composition to navigate graphs and select nodes of interest, even though evaluating compositions of relations can be costly. Often, this need for composition can be reduced by rewriting toward queries using semi-joins instead, resulting in a significant reduction of the query evaluation cost. We study techniques to recognize and apply such rewritings. Concretely, we study the relationship between the expressive power of the relation algebras, which heavily rely on composition, and the semi-join algebras, which replace composition in favor of semi-joins. Our main result is that each fragment of the relation algebras where intersection and/or difference is only used on edges (and not on complex compositions) is expressively equivalent to a fragment of the semi-join algebras. This expressive equivalence holds for node queries evaluating to sets of nodes. For practical relevance, we exhibit constructive rules for rewriting relation algebra queries to semi-join algebra queries and prove that they lead to only a well-bounded increase in the number of steps needed to evaluate the rewritten queries. In addition, on sibling-ordered trees, we establish new relationships among the expressive power of Regular XPath, Conditional XPath, FO-logic and the semi-join algebra augmented with restricted fixpoint operators.

Download Full-text

Towards plug-and-play visual graph query interfaces

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476256 ◽

2021 ◽

Vol 14 (11) ◽

pp. 1979-1991

Author(s):

Zifeng Yuan ◽

Huey Eng Chua ◽

Sourav S Bhowmick ◽

Zekun Ye ◽

Wook-Shin Han ◽

...

Keyword(s):

Real World ◽

Domain Knowledge ◽

Experimental Studies ◽

Plug And Play ◽

Large Networks ◽

Graph Query ◽

Query Interfaces ◽

Wide Range ◽

Real World Datasets ◽

Visual Graph

Canned patterns ( i.e. , small subgraph patterns) in visual graph query interfaces (a.k.a GUI) facilitate efficient query formulation by enabling pattern-at-a-time construction mode. However, existing GUIS for querying large networks either do not expose any canned patterns or if they do then they are typically selected manually based on domain knowledge. Unfortunately, manual generation of canned patterns is not only labor intensive but may also lack diversity for supporting efficient visual formulation of a wide range of subgraph queries. In this paper, we present a novel, generic, and extensible framework called TATTOO that takes a data-driven approach to automatically select canned patterns for a GUI from large networks. Specifically, it first decomposes the underlying network into truss-infested and truss-oblivious regions. Then candidate canned patterns capturing different real-world query topologies are generated from these regions. Canned patterns based on a user-specified plug are then selected for the GUI from these candidates by maximizing coverage and diversity , and by minimizing the cognitive load of the pattern set. Experimental studies with real-world datasets demonstrate the benefits of TATTOO. Importantly, this work takes a concrete step towards realizing plug-and-play visual graph query interfaces for large networks.

Download Full-text

Query Optimization: Fund Data Generation Applying NonClustered Indexing and MapReduced Data Cube Numerosity Reduction Method

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/1991.12020 ◽

2020 ◽

Vol 9 (1.1 S I) ◽

pp. 102-109

Author(s):

Mercy Burawis

Keyword(s):

Query Optimization ◽

Reduction Method ◽

Data Cube ◽

Data Generation

Download Full-text

Multi-dimensional query optimization algorithm for bitmap index with binning

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.02013 ◽

2010 ◽

Vol 30 (8) ◽

pp. 2013-2016 ◽

Cited By ~ 1

Author(s):

Li-ming WANG ◽

Xiao CHENG ◽

Yu-mei CHAI

Keyword(s):

Query Optimization ◽

Optimization Algorithm ◽

Bitmap Index

Download Full-text

A Query Optimization Method Based on User Profiles Reasoning

JOURNAL OF ELECTRONICS INFORMATION TECHNOLOGY ◽

10.3724/sp.j.1146.2006.00758 ◽

2011 ◽

Vol 30 (1) ◽

pp. 33-37

Author(s):

Xiang Mei ◽

Xiang-wu Meng ◽

Jun-Liang Chen ◽

Meng Xu

Keyword(s):

Query Optimization ◽

Optimization Method ◽

User Profiles

Download Full-text

An Optimal Framework for Spatial Query Optimization Using Hadoop in Big Data Analytics

Recent Patents on Computer Science ◽

10.2174/2213275912666190419215231 ◽

2019 ◽

Vol 12 ◽

Author(s):

Pankaj Dadheech ◽

Dinesh Goyal ◽

Sumit Srivastava ◽

Ankit Kumar

Keyword(s):

Big Data ◽

Query Optimization ◽

Spatial Data ◽

Spatial Information ◽

Big Data Analytics ◽

Spatial Query ◽

Data Process ◽

Boolean Queries ◽

Spatial Query Optimization ◽

Hadoop System

Spatial queries frequently used in Hadoop for significant data process. However, vast and massive size of spatial information makes it difficult to process the spatial inquiries proficiently, so they utilized the Hadoop system for process Big Data. We have used Boolean Queries & Geometry Boolean Spatial Data for Query Optimization using Hadoop System. In this paper, we show a lightweight and adaptable spatial data index for big data which will process in Hadoop frameworks. Results demonstrate the proficiency and adequacy of our spatial ordering system for various spatial inquiries.

Download Full-text

Query Rewriting for Incremental Continuous Query Evaluation in HIFUN

Algorithms ◽

10.3390/a14050149 ◽

2021 ◽

Vol 14 (5) ◽

pp. 149

Author(s):

Petros Zervoudakis ◽

Haridimos Kondylakis ◽

Nicolas Spyratos ◽

Dimitris Plexousakis

Keyword(s):

Query Optimization ◽

Query Language ◽

Computational Cost ◽

Continuous Queries ◽

Continuous Query ◽

Query Rewriting ◽

Query Evaluation ◽

Clear Separation ◽

Complete Dataset ◽

High Level

HIFUN is a high-level query language for expressing analytic queries of big datasets, offering a clear separation between the conceptual layer, where analytic queries are defined independently of the nature and location of data, and the physical layer, where queries are evaluated. In this paper, we present a methodology based on the HIFUN language, and the corresponding algorithms for the incremental evaluation of continuous queries. In essence, our approach is able to process the most recent data batch by exploiting already computed information, without requiring the evaluation of the query over the complete dataset. We present the generic algorithm which we translated to both SQL and MapReduce using SPARK; it implements various query rewriting methods. We demonstrate the effectiveness of our approach in temrs of query answering efficiency. Finally, we show that by exploiting the formal query rewriting methods of HIFUN, we can further reduce the computational cost, adding another layer of query optimization to our implementation.

Download Full-text

Bayesian graphical models for modern biological applications

Statistical Methods & Applications ◽

10.1007/s10260-021-00572-8 ◽

2021 ◽

Author(s):

Yang Ni ◽

Veerabhadran Baladandayuthapani ◽

Marina Vannucci ◽

Francesco C. Stingo

Keyword(s):

Graphical Models ◽

Cancer Genomics ◽

Graph Structure ◽

Biological Processes ◽

Limited Sample ◽

Large Networks ◽

Bayesian Approaches ◽

Complex Sampling ◽

Complex Dependence ◽

Bayesian Graphical Models

AbstractGraphical models are powerful tools that are regularly used to investigate complex dependence structures in high-throughput biomedical datasets. They allow for holistic, systems-level view of the various biological processes, for intuitive and rigorous understanding and interpretations. In the context of large networks, Bayesian approaches are particularly suitable because it encourages sparsity of the graphs, incorporate prior information, and most importantly account for uncertainty in the graph structure. These features are particularly important in applications with limited sample size, including genomics and imaging studies. In this paper, we review several recently developed techniques for the analysis of large networks under non-standard settings, including but not limited to, multiple graphs for data observed from multiple related subgroups, graphical regression approaches used for the analysis of networks that change with covariates, and other complex sampling and structural settings. We also illustrate the practical utility of some of these methods using examples in cancer genomics and neuroimaging.

Download Full-text