Computing possible and certain answers over order-incomplete data

2017 ◽

Cited By ~ 6

Author(s):

Marco Console ◽

Paolo Guagliardo ◽

Leonid Libkin

Keyword(s):

Incomplete Data ◽

Real Life ◽

Relational Algebra ◽

Approximation Schemes ◽

Query Rewriting ◽

Possible World ◽

First Order ◽

Minimum Number ◽

Database Technology ◽

Certain Answers

Querying incomplete data is an important task both in data management, and in many AI applications that use query rewriting to take advantage of relational database technology. Usually one looks for answers that are certain, i.e., true in every possible world represented by an incomplete database. For positive queries, expressed either in positive relational algebra or as unions of conjunctive queries, finding such answers can be done efficiently when databases and query answers are sets. Real-life databases however use bag, rather than set, semantics. For bags, instead of saying that a tuple is certainly in the answer, we have more detailed information: namely, the range of the numbers of occurrences of the tuple in query answers. We show that the behavior of positive queries is different under bag semantics: finding the minimum number of occurrences can still be done efficiently, but for maximum it becomes intractable. We use these results to investigate approximation schemes for computing certain answers to arbitrary first-order queries that have been proposed for set semantics. One of them cannot be adapted to bags, as it relies on the intractable maxima of occurrences, but another scheme only deals with minima, and we show how to adapt it to bag semantics without losing efficiency.

Download Full-text

Best Answers over Incomplete Data : Complexity and First-Order Rewritings

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/236 ◽

2019 ◽

Author(s):

Amélie Gheerbrant ◽

Cristina Sirangelo

Keyword(s):

Incomplete Data ◽

Decision Problem ◽

Possible Worlds ◽

Query Answering ◽

Query Rewriting ◽

Conjunctive Queries ◽

First Order ◽

Database Technology ◽

Practical Algorithm ◽

Certain Answers

Answering queries over incomplete data is ubiquitous in data management and in many AI applications that use query rewriting to take advantage of relational database technology. In these scenarios one lacks full information on the data but queries still need to be answered with certainty. The certainty aspect often makes query answering unfeasible except for restricted classes, such as unions of conjunctive queries. In addition often there are no, or very few certain answers, thus expensive computation is in vain. Therefore we study a relaxation of certain answers called best answers. They are defined as those answers for which there is no better one (that is, no answer true in more possible worlds). When certain answers exist the two notions coincide. We compare different ways of casting query answering as a decision problem and characterise its complexity for first-order queries, showing significant differences in the behavior of best and certain answers.We then restrict attention to best answers for unions of conjunctive queries and produce a practical algorithm for finding them based on query rewriting techniques.

Download Full-text

Explainable Certain Answers

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/233 ◽

2018 ◽

Cited By ~ 2

Author(s):

Giovanni Amendola ◽

Leonid Libkin

Keyword(s):

Incomplete Data ◽

General Framework ◽

Relational Databases ◽

Possible Worlds ◽

The Other ◽

Closed World ◽

Common Intersection ◽

The Common ◽

Certain Answers ◽

Natural Way

When a dataset is not fully specified and can represent many possible worlds, one commonly answers queries by computing certain answers to them. A natural way of defining certainty is to say that an answer is certain if it is consistent with query answers in all possible worlds, and is furthermore the most informative answer with this property. However, the existence and complexity of such answers is not yet well understood even for relational databases. Thus in applications one tends to use different notions, essentially the intersection of query answers in possible worlds. However, justification of such notions has long been questioned. This leads to two problems: are certain answers based on informativeness feasible in applications? and can a clean justification be provided for intersection-based notions? Our goal is to answer both. For the former, we show that such answers may not exist, or be very large, even in simple cases of querying incomplete data. For the latter, we add the concept of explanations to the notion of informativeness: it shows not only that one object is more informative than the other, but also says why this is so. This leads to a modified notion of certainty: explainable certain answers. We present a general framework for reasoning about them, and show that for open and closed world relational databases, they are precisely the common intersection-based notions of certainty.

Download Full-text

Knowledge-Preserving Certain Answers for SQL-like Queries

Proceedings of the Seventeenth International Conference on Principles of Knowledge Representation and Reasoning ◽

10.24963/kr.2020/78 ◽

2020 ◽

Author(s):

Etienne Toussaint ◽

Paolo Guagliardo ◽

Leonid Libkin

Keyword(s):

Information Content ◽

Data Model ◽

Incomplete Data ◽

Relational Databases ◽

Missing Values ◽

A Priori ◽

Real Life ◽

Query Languages ◽

Incomplete Databases ◽

Certain Answers

Answering queries over incomplete data is based on finding answers that are certainly true, independently of how missing values are interpreted. This informal description has given rise to several different mathematical definitions of certainty. To unify them, a framework based on "explanations", or extra information about incomplete data, was recently proposed. It partly succeeded in justifying query answering methods for relational databases under set semantics, but had two major limitations. First, it was firmly tied to the set data model, and a fixed way of comparing incomplete databases with respect to their information content. These assumptions fail for real-life database queries in languages such as SQL that use bag semantics instead. Second, it was restricted to queries that only manipulate data, while in practice most analytical SQL queries invent new values, typically via arithmetic operations and aggregation. To leverage our understanding of the notion of certainty for queries in SQL-like languages, we consider incomplete databases whose information content may be enriched by additional knowledge. The knowledge order among them is derived from their semantics, rather than being fixed a priori. The resulting framework allows us to capture and justify existing notions of certainty, and extend these concepts to other data models and query languages. As natural applications, we provide for the first time a well-founded definition of certain answers for the relational bag data model and for value-inventing queries on incomplete databases, addressing the key shortcomings of previous approaches.

Download Full-text

Handling ray transform of symmetric tensor fields and the Radon transform of differential forms on incomplete data

Daghestan Electronic Mathematical Reports ◽

10.31029/demr.1.2 ◽

2014 ◽

pp. 56-70

Author(s):

Ziyaudin Medzhidov ◽

Keyword(s):

Radon Transform ◽

Incomplete Data ◽

Differential Forms ◽

Symmetric Tensor ◽

Tensor Fields ◽

Ray Transform

Download Full-text

Faculty Opinions recommendation of Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1846956.1388054 ◽

2010 ◽

Author(s):

Mike Kenward ◽

Rhian Daniel

Keyword(s):

Incomplete Data ◽

Robust Estimator ◽

Population Mean ◽

Doubly Robust

Download Full-text

Random Value Imputation -- An Advanced Way of Managing Incomplete Data

SSRN Electronic Journal ◽

10.2139/ssrn.2602181 ◽

2015 ◽

Author(s):

Tathagata Mukhopadhyay

Keyword(s):

Incomplete Data

Download Full-text

Corrigendum to “Bayesian test for asymmetry and nonstationarity in MTAR model with possibly incomplete data”

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2005.07.014 ◽

2006 ◽

Vol 50 (2) ◽

pp. 584

Author(s):

Soo Jung Park ◽

Dong Wan Shin ◽

Byeong Uk Park ◽

Woo Chul Kim ◽

Man-Suk Oh

Keyword(s):

Incomplete Data ◽

Bayesian Test

Download Full-text

A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data

Soft Computing ◽

10.1007/s00500-021-05590-y ◽

2021 ◽

Author(s):

Baligh Al-Helali ◽

Qi Chen ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Genetic Programming ◽

Incomplete Data ◽

Symbolic Regression ◽

Imputation Method

Download Full-text

A General Framework for Mixed and Incomplete Data Clustering Based on Swarm Intelligence Algorithms

Mathematics ◽

10.3390/math9070786 ◽

2021 ◽

Vol 9 (7) ◽

pp. 786

Author(s):

Yenny Villuendas-Rey ◽

Eley Barroso-Cubas ◽

Oscar Camacho-Nieto ◽

Cornelio Yáñez-Márquez

Keyword(s):

Swarm Intelligence ◽

Data Clustering ◽

Incomplete Data ◽

Missing Values ◽

Clustering Algorithms ◽

Bat Algorithm ◽

Hybrid Features ◽

Bee Colony ◽

Learning Tasks ◽

Clustering Data

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.

Download Full-text

Computing possible and certain answers over order-incomplete data

On Querying Incomplete Information in Databases under Bag Semantics

Best Answers over Incomplete Data : Complexity and First-Order Rewritings

Explainable Certain Answers

Knowledge-Preserving Certain Answers for SQL-like Queries

Handling ray transform of symmetric tensor fields and the Radon transform of differential forms on incomplete data

Faculty Opinions recommendation of Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data.

Random Value Imputation -- An Advanced Way of Managing Incomplete Data

Corrigendum to “Bayesian test for asymmetry and nonstationarity in MTAR model with possibly incomplete data”

A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data

A General Framework for Mixed and Incomplete Data Clustering Based on Swarm Intelligence Algorithms

Export Citation Format