aggregate queries
Recently Published Documents


TOTAL DOCUMENTS

182
(FIVE YEARS 27)

H-INDEX

17
(FIVE YEARS 2)

2021 ◽  
Vol Volume 17, Issue 3 ◽  
Author(s):  
Ester Livshits ◽  
Leopoldo Bertossi ◽  
Benny Kimelfeld ◽  
Moshe Sebag

We investigate the application of the Shapley value to quantifying the contribution of a tuple to a query answer. The Shapley value is a widely known numerical measure in cooperative game theory and in many applications of game theory for assessing the contribution of a player to a coalition game. It has been established already in the 1950s, and is theoretically justified by being the very single wealth-distribution measure that satisfies some natural axioms. While this value has been investigated in several areas, it received little attention in data management. We study this measure in the context of conjunctive and aggregate queries by defining corresponding coalition games. We provide algorithmic and complexity-theoretic results on the computation of Shapley-based contributions to query answers; and for the hard cases we present approximation algorithms.


2021 ◽  
Vol 15 (1) ◽  
pp. 59-71
Author(s):  
Yin Lin ◽  
Brit Youngmann ◽  
Yuval Moskovitch ◽  
H. V. Jagadish ◽  
Tova Milo

Generalizing from detailed data to statements in a broader context is often critical for users to make sense of large data sets. Correspondingly, poorly constructed generalizations might convey misleading information even if the statements are technically supported by the data. For example, a cherry-picked level of aggregation could obscure substantial sub-groups that oppose the generalization. We present a framework for detecting and explaining cherry-picked generalizations by refining aggregate queries. We present a scoring method to indicate the appropriateness of the generalizations. We design efficient algorithms for score computation. For providing a better understanding of the resulting score, we also formulate practical explanation tasks to disclose significant counterexamples and provide better alternatives to the statement. We conduct experiments using real-world data sets and examples to show the effectiveness of our proposed evaluation metric and the efficiency of our algorithmic framework.


2021 ◽  
Vol 50 (2) ◽  
pp. 6-17
Author(s):  
Johannes Doleschal ◽  
Benny Kimelfeld ◽  
Wim Martens

A common conceptual view of text analysis is that of a two-step process, where we first extract relations from text documents and then apply a relational query over the result. Hence, text analysis shares technical challenges with, and can draw ideas from, relational databases. A framework that formally instantiates this connection is that of the document spanners. In this article, we review recent advances in various research efforts that adapt fundamental database concepts to text analysis through the lens of document spanners. Among others, we discuss aspects of query evaluation, aggregate queries, provenance, and distributed query planning.


Algorithms ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 34 ◽  
Author(s):  
Maria-Evangelia Papadaki ◽  
Nicolas Spyratos ◽  
Yannis Tzitzikas

The continuous accumulation of multi-dimensional data and the development of Semantic Web and Linked Data published in the Resource Description Framework (RDF) bring new requirements for data analytics tools. Such tools should take into account the special features of RDF graphs, exploit the semantics of RDF and support flexible aggregate queries. In this paper, we present an approach for applying analytics to RDF data based on a high-level functional query language, called HIFUN. According to that language, each analytical query is considered to be a well-formed expression of a functional algebra and its definition is independent of the nature and structure of the data. In this paper, we investigate how HIFUN can be used for easing the formulation of analytic queries over RDF data. We detail the applicability of HIFUN over RDF, as well as the transformations of data that may be required, we introduce the translation rules of HIFUN queries to SPARQL and we describe a first implementation of the proposed model.


2021 ◽  
pp. 527-535
Author(s):  
Ningchao Ge ◽  
Peng Peng ◽  
Zheng Qin ◽  
Mingdao Li
Keyword(s):  

IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Somayeh Dolatnezhad Samarin ◽  
Morteza Amini

Author(s):  
Mahmoud Abo-Khamis ◽  
Sungjin Im ◽  
Benjamin Moseley ◽  
Kirk Pruhs ◽  
Alireza Samadian
Keyword(s):  

2020 ◽  
Vol 45 (4) ◽  
pp. 1-41
Author(s):  
Mahmoud Abo Khamis ◽  
Ryan R. Curtin ◽  
Benjamin Moseley ◽  
Hung Q. Ngo ◽  
Xuanlong Nguyen ◽  
...  
Keyword(s):  

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 615
Author(s):  
Robin Andeer ◽  
Måns Magnusson ◽  
Anna Wedell ◽  
Henrik Stranneheim

Coverage analysis is essential when analysing massive parallel sequencing (MPS) data. The analysis indicates existence of false negatives or positives in a region of interest or poorly covered genomic regions. There are several tools that have excellent performance when doing coverage analysis on a few samples with predefined regions. However, there is no current tool for collecting samples over a longer period of time for aggregated coverage analysis of multiple samples or sequencing methods. Furthermore, current coverage analysis tools do not generate customized coverage reports or enable exploratory coverage analysis without extensive bioinformatic skill and access to the original alignment files. We present Chanjo, a user friendly coverage analysis tool for persistent storage of coverage data, that, accompanied with Chanjo Report, produces coverage reports that summarize coverage data for predefined regions in an elegant manner. Chanjo Report can produce both structured coverage reports and dynamic reports tailored to a subset of genomic regions, coverage cut-offs or samples. Chanjo stores data in an SQL database where thousands of samples can be added over time, which allows for aggregate queries to discover problematic regions. Chanjo is well tested, supports whole exome and genome sequencing, and follows common UNIX standards, allowing for easy integration into existing pipelines. Chanjo is easy to install and operate, and provides a solution for persistent coverage analysis and clinical-grade reporting. It makes it easy to set up a local database and automate the addition of multiple samples and report generation. To our knowledge there is no other tool with matching capabilities. Chanjo handles the common file formats in genetics, such as BED and BAM, and makes it easy to produce PDF coverage reports that are highly valuable for individuals with limited bioinformatic expertise. We believe Chanjo to be a vital tool for clinicians and researchers performing MPS analysis.


Sign in / Sign up

Export Citation Format

Share Document