Preferred Repairs for Inconsistent Databases

Author(s):  
Sergio Greco ◽  
Cristina Sirangelo ◽  
Irina Trubitsyna ◽  
Ester Zumpano

The objective of this article is to investigate the problems related to the extensional integration of information sources. In particular, we propose an approach for managing inconsistent databases, that is, databases violating integrity constraints. The problem of dealing with inconsistent information has recently assumed additional relevance as it plays a key role in all the areas in which duplicate information or conflicting information is likely to occur (Agarwal et al., 1995; Arenas, Bertossi & Chomicki, 1999; Bry, 1997; Dung, 1996; Lin & Mendelzon, 1999; Subrahmanian, 1994).

Author(s):  
Sergio Flesca ◽  
Sergio Greco ◽  
Ester Zumpano

Integrity constraints are a fundamental part of a database schema. They are generally used to define constraints on data (functional dependencies, inclusion dependencies, exclusion dependencies, etc.), and their enforcement ensures a semantically correct state of a database. As the presence of data inconsistent with respect to integrity constraints is not unusual, its management plays a key role in all the areas in which duplicate or conflicting information is likely to occur, such as database integration, data warehousing, and federated databases (Bry, 1997; Lin, 1996; Subrahmanian, 1994). It is well known that the presence of inconsistent data can be managed by “repairing” the database, that is, by providing consistent databases, obtained by a minimal set of update operations on the inconsistent original environment, or by consistently answering queries posed over the inconsistent database.


Author(s):  
Luciano Caroprese ◽  
Ester Zumpano

Data integration aims to provide a uniform integrated access to multiple heterogeneous information sources designed independently and having strictly related contents. However, the integrated view, constructed by integrating the information provided by the different data sources by means of a specified integration strategy could potentially contain inconsistent data; that is, it can violate some of the constraints defined on the data. In the presence of an inconsistent integrated database, in other words, a database that does not satisfy some integrity constraints, two possible solutions have been investigated in the literature (Agarwal, Keller, Wiederhold, & Saraswat, 1995; Bry, 1997; Calì, Calvanese, De Giacomo, & Lenzerini, 2002; Dung, 1996; Grant & Subrahmanian, 1995; S. Greco & Zumpano, 2000; Lin & Mendelzon, 1999): repairing the database or computing consistent answers over the inconsistent database. Intuitively, a repair of the database consists of deleting or inserting a minimal number of tuples so that the resulting database is consistent, whereas the computation of the consistent answer consists of selecting the set of certain tuples (i.e., those belonging to all repaired databases) and the set of uncertain tuples (i.e., those belonging to a proper subset of repaired databases).


2020 ◽  
Vol 13 (10) ◽  
pp. 1682-1695
Author(s):  
Ester Livshits ◽  
Alireza Heidari ◽  
Ihab F. Ilyas ◽  
Benny Kimelfeld

The problem of mining integrity constraints from data has been extensively studied over the past two decades for commonly used types of constraints, including the classic Functional Dependencies (FDs) and the more general Denial Constraints (DCs). In this paper, we investigate the problem of mining from data approximate DCs, that is, DCs that are "almost" satisfied. Approximation allows us to discover more accurate constraints in inconsistent databases and detect rules that are generally correct but may have a few exceptions. It also allows to avoid overfitting and obtain constraints that are more general, more natural, and less contrived. We introduce the algorithm ADCMiner for mining approximate DCs. An important feature of this algorithm is that it does not assume any specific approximation function for DCs, but rather allows for arbitrary approximation functions that satisfy some natural axioms that we define in the paper. We also show how our algorithm can be combined with sampling to return highly accurate results considerably faster.


2021 ◽  
Vol 11 (2) ◽  
pp. 79-87
Author(s):  
Meredith Carroll ◽  
Paige Sanchez ◽  
Donna Wilt

Abstract. The purpose of this study was to examine how pilots respond to conflicting information on the flight deck. In this study, 108 airline, corporate, and general aviation pilots completed an online questionnaire reporting weather, traffic, and navigation information conflicts experienced on the flight deck, including which information sources they trusted and acted on. Results indicated that weather information conflicts are most commonly experienced, and typically between a certified source in the panel and an uncertified electronic flight bag application. Most participants (a) trusted certified systems due to their accuracy, reliability, recency, and knowledge about the source, and (2) acted on the certified system due to trust, being trained and required to use it, and its indicating a more hazardous situation.


Author(s):  
Sergio Greco ◽  
Ester Zumpano

Data integration aims at providing a uniform integrated access to multiple heterogeneous information sources, which were designed independently for autonomous applications and whose contents are strictly related.


Author(s):  
Sergio Greco ◽  
Ester Zumpano

The aim of data integration is to provide a uniform integrated access to multiple heterogeneous information sources, which were designed independently for autonomous applications and whose contents are strictly related.


2014 ◽  
Vol 16 (2) ◽  
pp. e60 ◽  
Author(s):  
Katri Hämeen-Anttila ◽  
Hedvig Nordeng ◽  
Esa Kokki ◽  
Johanna Jyrkkä ◽  
Angela Lupattelli ◽  
...  

Author(s):  
Samson Abramsky ◽  
Giovanni Carù

We establish a strong link between two apparently unrelated topics: the study of conflicting information in the formal framework of valuation algebras, and the phenomena of non-locality and contextuality. In particular, we show that these peculiar features of quantum theory are mathematically equivalent to a general notion of disagreement between information sources. This result vastly generalizes previously observed connections between contextuality, relat- ional databases, constraint satisfaction problems and logical paradoxes, and gives further proof that contextual behaviour is not a phenomenon limited to quantum physics, but pervades various domains of mathematics and computer science. The connection allows to translate theorems, methods and algorithms from one field to the other, and paves the way for the application of generic inference algorithms to study contextuality. This article is part of the theme issue ‘Contextuality and probability in quantum mechanics and beyond’.


Author(s):  
Luciano Caroprese ◽  
Cristian Molinaro ◽  
Irina Trubitsyna ◽  
Ester Zumpano

Integrating data from different sources consists of two main steps, the first in which the various relations are merged together, and the second in which some tuples are removed (or inserted) from the resulting database in order to satisfy integrity constraints. There are several ways to integrate databases or possibly distributed information sources, but whatever integration architecture we choose, the heterogeneity of the sources to be integrated causes subtle problems. In particular, the database obtained from the integration process may be inconsistent with respect to integrity constraints, that is, one or more integrity constraints are not satisfied. Integrity constraints represent an important source of information about the real world. They are usually used to define constraints on data (functional dependencies, inclusion dependencies, etc.) and have, nowadays, a wide applicability in several contexts such as semantic query optimization, cooperative query answering, database integration, and view update. Since the satisfaction of integrity constraints cannot generally be guaranteed, if the database is obtained from the integration of different information sources, in the evaluation of queries, we must compute answers that are consistent with the integrity constraints. The following example shows a case of inconsistency. Example 1: Consider the following database schema consisting of the single binary relation Teaches (Course, Professor) where the attribute Course is a key for the relation. Assume there are two different instances for the relations Teaches, D1={(c1,p1),(c2,p2)} and D2={(c1,p1),(c2,p3)}. The two instances satisfy the constraint that Course is a key, but from their union we derive a relation that does not satisfy the constraint since there are two distinct tuples with the same value for the attribute Course. In the integration of two conflicting databases simple solutions could be based on the definition of preference criteria such as a partial order on the source information or a majority criterion (Lin & Mendelzon, 1996). However, these solutions are not generally satisfactory, and more useful solutions are those based on (1) the computation of “repairs” for the database, and (2) the computation of consistent answers (Arenas, Bertossi, & Chomicki, 1999). The computation of repairs is based on the definition of minimal sets of insertion and deletion operations so that the resulting database satisfies all constraints. The computation of consistent answers is based on the identification of tuples satisfying integrity constraints and on the selection of tuples matching the goal. For instance, for the integrated database of Example 1, we have two alternative repairs consisting in the deletion of one of the tuples (c2,p2) and (c2,p3). The consistent answer to a query over the relation Teaches contains the unique tuple (c1,p1) so that we do not know which professor teaches course c2. Therefore, it is very important, in the presence of inconsistent data, not only to compute the set of consistent answers, but also to know which facts are unknown and if there are possible repairs for the database.


2002 ◽  
pp. 172-202
Author(s):  
Sergio Greco ◽  
Ester Zumpano

Integrity constraints represent an important source of information about the real world. They are usually used to define constraints on data (functional dependencies, inclusion dependencies, etc.). Nowadays integrity constraints have a wide applicability in several contexts such as semantic query optimization, cooperative query answering, database integration and view update. Often databases may be inconsistent with respect to integrity constraints, that is, one or more integrity constraints are not satisfied. This may happen, for instance, when the database is obtained from the integration of different information sources. The integration of knowledge from multiple sources is an important aspect in several areas such as data warehousing, database integration, automated reasoning systems and active reactive databases.


Sign in / Sign up

Export Citation Format

Share Document