scholarly journals Computing possible and certain answers over order-incomplete data

2019 ◽  
Vol 797 ◽  
pp. 42-76
Author(s):  
Antoine Amarilli ◽  
Mouhamadou Lamine Ba ◽  
Daniel Deutch ◽  
Pierre Senellart
Author(s):  
Marco Console ◽  
Paolo Guagliardo ◽  
Leonid Libkin

Querying incomplete data is an important task both in data management, and in many AI applications that use query rewriting to take advantage of relational database technology. Usually one looks for answers that are certain, i.e., true in every possible world represented by an incomplete database. For positive queries, expressed either in positive relational algebra or as unions of conjunctive queries, finding such answers can be done efficiently when databases and query answers are sets. Real-life databases however use bag, rather than set, semantics. For bags, instead of saying that a tuple is certainly in the answer, we have more detailed information: namely, the range of the numbers of occurrences of the tuple in query answers. We show that the behavior of positive queries is different under bag semantics: finding the minimum number of occurrences can still be done efficiently, but for maximum it becomes intractable. We use these results to investigate approximation schemes for computing certain answers to arbitrary first-order queries that have been proposed for set semantics. One of them cannot be adapted to bags, as it relies on the intractable maxima of occurrences, but another scheme only deals with minima, and we show how to adapt it to bag semantics without losing efficiency.


Author(s):  
Amélie Gheerbrant ◽  
Cristina Sirangelo

Answering queries over incomplete data is ubiquitous in data management and in many AI applications that use query rewriting to take advantage of relational database technology. In these scenarios one lacks full information on the data but queries still need to be answered with certainty. The certainty aspect often makes query answering unfeasible except for restricted classes, such as unions of conjunctive queries. In addition often there are no, or very few certain answers, thus expensive computation is in vain. Therefore we study a relaxation of certain answers called best answers. They are defined as those answers for which there is no better one (that is, no answer true in more possible worlds). When certain answers exist the two notions coincide. We compare different ways of casting query answering as a decision problem and characterise its complexity for first-order queries, showing significant differences in the behavior of best and certain answers.We then restrict attention to best answers for unions of conjunctive queries and produce a practical algorithm for finding them based on query rewriting techniques.


Author(s):  
Giovanni Amendola ◽  
Leonid Libkin

When a dataset is not fully specified and can represent many possible worlds, one commonly answers queries by computing certain answers to them. A natural way of defining certainty is to say that an answer is certain if it is consistent with query answers in all possible worlds, and is furthermore the most informative answer with this property. However, the existence and complexity of such answers is not yet well understood even for relational databases. Thus in applications one tends to use different notions, essentially the intersection of query answers in possible worlds. However, justification of such notions has long been questioned. This leads to two problems: are certain answers based on informativeness feasible in applications? and can a clean justification be provided for intersection-based notions? Our goal is to answer both. For the former, we show that such answers may not exist, or be very large, even in simple cases of querying incomplete data. For the latter, we add the concept of explanations to the notion of informativeness: it shows not only that one object is more informative than the other, but also says why this is so. This leads to a modified notion of certainty: explainable certain answers. We present a general framework for reasoning about them, and show that for open and closed world relational databases, they are precisely the common intersection-based notions of certainty.


Author(s):  
Etienne Toussaint ◽  
Paolo Guagliardo ◽  
Leonid Libkin

Answering queries over incomplete data is based on finding answers that are certainly true, independently of how missing values are interpreted. This informal description has given rise to several different mathematical definitions of certainty. To unify them, a framework based on "explanations", or extra information about incomplete data, was recently proposed. It partly succeeded in justifying query answering methods for relational databases under set semantics, but had two major limitations. First, it was firmly tied to the set data model, and a fixed way of comparing incomplete databases with respect to their information content. These assumptions fail for real-life database queries in languages such as SQL that use bag semantics instead. Second, it was restricted to queries that only manipulate data, while in practice most analytical SQL queries invent new values, typically via arithmetic operations and aggregation. To leverage our understanding of the notion of certainty for queries in SQL-like languages, we consider incomplete databases whose information content may be enriched by additional knowledge. The knowledge order among them is derived from their semantics, rather than being fixed a priori. The resulting framework allows us to capture and justify existing notions of certainty, and extend these concepts to other data models and query languages. As natural applications, we provide for the first time a well-founded definition of certain answers for the relational bag data model and for value-inventing queries on incomplete databases, addressing the key shortcomings of previous approaches.


2006 ◽  
Vol 50 (2) ◽  
pp. 584
Author(s):  
Soo Jung Park ◽  
Dong Wan Shin ◽  
Byeong Uk Park ◽  
Woo Chul Kim ◽  
Man-Suk Oh

Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 786
Author(s):  
Yenny Villuendas-Rey ◽  
Eley Barroso-Cubas ◽  
Oscar Camacho-Nieto ◽  
Cornelio Yáñez-Márquez

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.


Sign in / Sign up

Export Citation Format

Share Document