aggregate queries Latest Research Papers

We investigate the application of the Shapley value to quantifying the contribution of a tuple to a query answer. The Shapley value is a widely known numerical measure in cooperative game theory and in many applications of game theory for assessing the contribution of a player to a coalition game. It has been established already in the 1950s, and is theoretically justified by being the very single wealth-distribution measure that satisfies some natural axioms. While this value has been investigated in several areas, it received little attention in data management. We study this measure in the context of conjunctive and aggregate queries by defining corresponding coalition games. We provide algorithmic and complexity-theoretic results on the computation of Shapley-based contributions to query answers; and for the hard cases we present approximation algorithms.

Download Full-text

On detecting cherry-picked generalizations

Proceedings of the VLDB Endowment ◽

10.14778/3485450.3485457 ◽

2021 ◽

Vol 15 (1) ◽

pp. 59-71

Author(s):

Yin Lin ◽

Brit Youngmann ◽

Yuval Moskovitch ◽

H. V. Jagadish ◽

Tova Milo

Keyword(s):

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Real World Data ◽

Scoring Method ◽

Misleading Information ◽

Aggregate Queries ◽

Algorithmic Framework ◽

Level Of Aggregation ◽

Evaluation Metric

Generalizing from detailed data to statements in a broader context is often critical for users to make sense of large data sets. Correspondingly, poorly constructed generalizations might convey misleading information even if the statements are technically supported by the data. For example, a cherry-picked level of aggregation could obscure substantial sub-groups that oppose the generalization. We present a framework for detecting and explaining cherry-picked generalizations by refining aggregate queries. We present a scoring method to indicate the appropriateness of the generalizations. We design efficient algorithms for score computation. For providing a better understanding of the resulting score, we also formulate practical explanation tasks to disclose significant counterexamples and provide better alternatives to the statement. We conduct experiments using real-world data sets and examples to show the effectiveness of our proposed evaluation metric and the efficiency of our algorithmic framework.

Download Full-text

Database Principles and Challenges in Text Analysis

ACM SIGMOD Record ◽

10.1145/3484622.3484624 ◽

2021 ◽

Vol 50 (2) ◽

pp. 6-17

Author(s):

Johannes Doleschal ◽

Benny Kimelfeld ◽

Wim Martens

Keyword(s):

Text Analysis ◽

Relational Databases ◽

Query Evaluation ◽

Text Documents ◽

Step Process ◽

Distributed Query ◽

Recent Advances ◽

Aggregate Queries ◽

Technical Challenges ◽

Query Planning

A common conceptual view of text analysis is that of a two-step process, where we first extract relations from text documents and then apply a relational query over the result. Hence, text analysis shares technical challenges with, and can draw ideas from, relational databases. A framework that formally instantiates this connection is that of the document spanners. In this article, we review recent advances in various research efforts that adapt fundamental database concepts to text analysis through the lens of document spanners. Among others, we discuss aspects of query evaluation, aggregate queries, provenance, and distributed query planning.

Download Full-text

Towards Interactive Analytics over RDF Graphs

Algorithms ◽

10.3390/a14020034 ◽

2021 ◽

Vol 14 (2) ◽

pp. 34 ◽

Cited By ~ 1

Author(s):

Maria-Evangelia Papadaki ◽

Nicolas Spyratos ◽

Yannis Tzitzikas

Keyword(s):

Query Language ◽

Proposed Model ◽

Aggregate Queries ◽

Rdf Data ◽

Description Framework ◽

Rdf Graphs ◽

High Level ◽

Resource Description ◽

Functional Algebra ◽

Continuous Accumulation

The continuous accumulation of multi-dimensional data and the development of Semantic Web and Linked Data published in the Resource Description Framework (RDF) bring new requirements for data analytics tools. Such tools should take into account the special features of RDF graphs, exploit the semantics of RDF and support flexible aggregate queries. In this paper, we present an approach for applying analytics to RDF data based on a high-level functional query language, called HIFUN. According to that language, each analytical query is considered to be a well-formed expression of a functional algebra and its definition is independent of the nature and structure of the data. In this paper, we investigate how HIFUN can be used for easing the formulation of analytic queries over RDF data. We detail the applicability of HIFUN over RDF, as well as the transformations of data that may be required, we introduce the translation rules of HIFUN queries to SPARQL and we describe a first implementation of the proposed model.

Download Full-text

FedAggs: Optimizing Aggregate Queries Evaluation in Federated RDF Systems

10.1007/978-3-030-91560-5_41 ◽

2021 ◽

pp. 527-535

Author(s):

Ningchao Ge ◽

Peng Peng ◽

Zheng Qin ◽

Mingdao Li

Keyword(s):

Aggregate Queries

Download Full-text

Integrity Checking for Aggregate Queries

IEEE Access ◽

10.1109/access.2021.3079799 ◽

2021 ◽

pp. 1-1

Author(s):

Somayeh Dolatnezhad Samarin ◽

Morteza Amini

Keyword(s):

Integrity Checking ◽

Aggregate Queries

Download Full-text

Approximate Aggregate Queries Under Additive Inequalities

Symposium on Algorithmic Principles of Computer Systems (APOCS) ◽

10.1137/1.9781611976489.7 ◽

2021 ◽

pp. 85-99

Author(s):

Mahmoud Abo-Khamis ◽

Sungjin Im ◽

Benjamin Moseley ◽

Kirk Pruhs ◽

Alireza Samadian

Keyword(s):

Aggregate Queries

Download Full-text

Functional Aggregate Queries with Additive Inequalities

ACM Transactions on Database Systems ◽

10.1145/3426865 ◽

2020 ◽

Vol 45 (4) ◽

pp. 1-41

Author(s):

Mahmoud Abo Khamis ◽

Ryan R. Curtin ◽

Benjamin Moseley ◽

Hung Q. Ngo ◽

Xuanlong Nguyen ◽

...

Keyword(s):

Aggregate Queries

Download Full-text

Chanjo: Clincal grade sequence coverage analysis

F1000Research ◽

10.12688/f1000research.23605.1 ◽

2020 ◽

Vol 9 ◽

pp. 615

Author(s):

Robin Andeer ◽

Måns Magnusson ◽

Anna Wedell ◽

Henrik Stranneheim

Keyword(s):

Region Of Interest ◽

Analysis Tool ◽

Sequence Coverage ◽

Report Generation ◽

Coverage Analysis ◽

Aggregate Queries ◽

Set Up ◽

Easy Integration ◽

Genomic Regions ◽

Multiple Samples

Coverage analysis is essential when analysing massive parallel sequencing (MPS) data. The analysis indicates existence of false negatives or positives in a region of interest or poorly covered genomic regions. There are several tools that have excellent performance when doing coverage analysis on a few samples with predefined regions. However, there is no current tool for collecting samples over a longer period of time for aggregated coverage analysis of multiple samples or sequencing methods. Furthermore, current coverage analysis tools do not generate customized coverage reports or enable exploratory coverage analysis without extensive bioinformatic skill and access to the original alignment files. We present Chanjo, a user friendly coverage analysis tool for persistent storage of coverage data, that, accompanied with Chanjo Report, produces coverage reports that summarize coverage data for predefined regions in an elegant manner. Chanjo Report can produce both structured coverage reports and dynamic reports tailored to a subset of genomic regions, coverage cut-offs or samples. Chanjo stores data in an SQL database where thousands of samples can be added over time, which allows for aggregate queries to discover problematic regions. Chanjo is well tested, supports whole exome and genome sequencing, and follows common UNIX standards, allowing for easy integration into existing pipelines. Chanjo is easy to install and operate, and provides a solution for persistent coverage analysis and clinical-grade reporting. It makes it easy to set up a local database and automate the addition of multiple samples and report generation. To our knowledge there is no other tool with matching capabilities. Chanjo handles the common file formats in genetics, such as BED and BAM, and makes it easy to produce PDF coverage reports that are highly valuable for individuals with limited bioinformatic expertise. We believe Chanjo to be a vital tool for clinicians and researchers performing MPS analysis.

Download Full-text

aggregate queries
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Processing of Top-k Aggregate Queries on Distributed Data

The Shapley Value of Tuples in Query Answering

On detecting cherry-picked generalizations

Database Principles and Challenges in Text Analysis

Towards Interactive Analytics over RDF Graphs

FedAggs: Optimizing Aggregate Queries Evaluation in Federated RDF Systems

Integrity Checking for Aggregate Queries

Approximate Aggregate Queries Under Additive Inequalities

Functional Aggregate Queries with Additive Inequalities

Chanjo: Clincal grade sequence coverage analysis

Export Citation Format

aggregate queriesRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Processing of Top-k Aggregate Queries on Distributed Data

The Shapley Value of Tuples in Query Answering

On detecting cherry-picked generalizations

Database Principles and Challenges in Text Analysis

Towards Interactive Analytics over RDF Graphs

FedAggs: Optimizing Aggregate Queries Evaluation in Federated RDF Systems

Integrity Checking for Aggregate Queries

Approximate Aggregate Queries Under Additive Inequalities

Functional Aggregate Queries with Additive Inequalities

Chanjo: Clincal grade sequence coverage analysis

aggregate queries
Recently Published Documents