Consistent Queries over Databases with Integrity Constraints

2009 ◽  
pp. 2051-2058
Author(s):  
Luciano Caroprese ◽  
Cristian Molinaro ◽  
Irina Trubitsyna ◽  
Ester Zumpano

Integrating data from different sources consists of two main steps, the first in which the various relations are merged together, and the second in which some tuples are removed (or inserted) from the resulting database in order to satisfy integrity constraints. There are several ways to integrate databases or possibly distributed information sources, but whatever integration architecture we choose, the heterogeneity of the sources to be integrated causes subtle problems. In particular, the database obtained from the integration process may be inconsistent with respect to integrity constraints, that is, one or more integrity constraints are not satisfied. Integrity constraints represent an important source of information about the real world. They are usually used to define constraints on data (functional dependencies, inclusion dependencies, etc.) and have, nowadays, a wide applicability in several contexts such as semantic query optimization, cooperative query answering, database integration, and view update.

2002 ◽  
pp. 172-202
Author(s):  
Sergio Greco ◽  
Ester Zumpano

Integrity constraints represent an important source of information about the real world. They are usually used to define constraints on data (functional dependencies, inclusion dependencies, etc.). Nowadays integrity constraints have a wide applicability in several contexts such as semantic query optimization, cooperative query answering, database integration and view update. Often databases may be inconsistent with respect to integrity constraints, that is, one or more integrity constraints are not satisfied. This may happen, for instance, when the database is obtained from the integration of different information sources. The integration of knowledge from multiple sources is an important aspect in several areas such as data warehousing, database integration, automated reasoning systems and active reactive databases.


Author(s):  
Luciano Caroprese ◽  
Cristian Molinaro ◽  
Irina Trubitsyna ◽  
Ester Zumpano

Integrating data from different sources consists of two main steps, the first in which the various relations are merged together, and the second in which some tuples are removed (or inserted) from the resulting database in order to satisfy integrity constraints. There are several ways to integrate databases or possibly distributed information sources, but whatever integration architecture we choose, the heterogeneity of the sources to be integrated causes subtle problems. In particular, the database obtained from the integration process may be inconsistent with respect to integrity constraints, that is, one or more integrity constraints are not satisfied. Integrity constraints represent an important source of information about the real world. They are usually used to define constraints on data (functional dependencies, inclusion dependencies, etc.) and have, nowadays, a wide applicability in several contexts such as semantic query optimization, cooperative query answering, database integration, and view update. Since the satisfaction of integrity constraints cannot generally be guaranteed, if the database is obtained from the integration of different information sources, in the evaluation of queries, we must compute answers that are consistent with the integrity constraints. The following example shows a case of inconsistency. Example 1: Consider the following database schema consisting of the single binary relation Teaches (Course, Professor) where the attribute Course is a key for the relation. Assume there are two different instances for the relations Teaches, D1={(c1,p1),(c2,p2)} and D2={(c1,p1),(c2,p3)}. The two instances satisfy the constraint that Course is a key, but from their union we derive a relation that does not satisfy the constraint since there are two distinct tuples with the same value for the attribute Course. In the integration of two conflicting databases simple solutions could be based on the definition of preference criteria such as a partial order on the source information or a majority criterion (Lin & Mendelzon, 1996). However, these solutions are not generally satisfactory, and more useful solutions are those based on (1) the computation of “repairs” for the database, and (2) the computation of consistent answers (Arenas, Bertossi, & Chomicki, 1999). The computation of repairs is based on the definition of minimal sets of insertion and deletion operations so that the resulting database satisfies all constraints. The computation of consistent answers is based on the identification of tuples satisfying integrity constraints and on the selection of tuples matching the goal. For instance, for the integrated database of Example 1, we have two alternative repairs consisting in the deletion of one of the tuples (c2,p2) and (c2,p3). The consistent answer to a query over the relation Teaches contains the unique tuple (c1,p1) so that we do not know which professor teaches course c2. Therefore, it is very important, in the presence of inconsistent data, not only to compute the set of consistent answers, but also to know which facts are unknown and if there are possible repairs for the database.


Author(s):  
Sergio Flesca ◽  
Sergio Greco ◽  
Ester Zumpano

Integrity constraints are a fundamental part of a database schema. They are generally used to define constraints on data (functional dependencies, inclusion dependencies, exclusion dependencies, etc.), and their enforcement ensures a semantically correct state of a database. As the presence of data inconsistent with respect to integrity constraints is not unusual, its management plays a key role in all the areas in which duplicate or conflicting information is likely to occur, such as database integration, data warehousing, and federated databases (Bry, 1997; Lin, 1996; Subrahmanian, 1994). It is well known that the presence of inconsistent data can be managed by “repairing” the database, that is, by providing consistent databases, obtained by a minimal set of update operations on the inconsistent original environment, or by consistently answering queries posed over the inconsistent database.


Author(s):  
Pongtawat Chippimolchai ◽  
◽  
Kiyoshi Akama ◽  
Vilas Wuwongse ◽  

We developed a semantic query optimization framework for deductive databases based on equivalent transformation (ET) rules. ET rules, prepared from the semantic knowledge of databases, such as integrity constraints, transform given queries into syntactically different but semantically equivalent and more efficient forms. We formally prove the correctness of query transformations by ET rules. For efficiency, we propose a two-phase heuristic-based strategy to guide query transformations and introduce a condition-based control strategy to prevent unwanted, unnecessary transformations. We give examples demonstrating the possible optimization.


Author(s):  
B Sathiya ◽  
T.V. Geetha

The prime textual sources used for ontology learning are a domain corpus and dynamic large text from web pages. The first source is limited and possibly outdated, while the second is uncertain. To overcome these shortcomings, a novel ontology learning methodology is proposed to utilize the different sources of text such as a corpus, web pages and the massive probabilistic knowledge base, Probase, for an effective automated construction of ontology. Specifically, to discover taxonomical relations among the concept of the ontology, a new web page based two-level semantic query formation methodology using the lexical syntactic patterns (LSP) and a novel scoring measure: Fitness built on Probase are proposed. Also, a syntactic and statistical measure called COS (Co-occurrence Strength) scoring, and Domain and Range-NTRD (Non-Taxonomical Relation Discovery) algorithms are proposed to accurately identify non-taxonomical relations(NTR) among concepts, using evidence from the corpus and web pages.


2020 ◽  
Vol 13 (10) ◽  
pp. 1682-1695
Author(s):  
Ester Livshits ◽  
Alireza Heidari ◽  
Ihab F. Ilyas ◽  
Benny Kimelfeld

The problem of mining integrity constraints from data has been extensively studied over the past two decades for commonly used types of constraints, including the classic Functional Dependencies (FDs) and the more general Denial Constraints (DCs). In this paper, we investigate the problem of mining from data approximate DCs, that is, DCs that are "almost" satisfied. Approximation allows us to discover more accurate constraints in inconsistent databases and detect rules that are generally correct but may have a few exceptions. It also allows to avoid overfitting and obtain constraints that are more general, more natural, and less contrived. We introduce the algorithm ADCMiner for mining approximate DCs. An important feature of this algorithm is that it does not assume any specific approximation function for DCs, but rather allows for arbitrary approximation functions that satisfy some natural axioms that we define in the paper. We also show how our algorithm can be combined with sampling to return highly accurate results considerably faster.


Sign in / Sign up

Export Citation Format

Share Document