scholarly journals Satisfaction and Implication of Integrity Constraints in Ontology-based Data Access

Author(s):  
Charalampos Nikolaou ◽  
Bernardo Cuenca Grau ◽  
Egor V. Kostylev ◽  
Mark Kaminski ◽  
Ian Horrocks

We extend ontology-based data access with integrity constraints over both the source and target schemas. The relevant reasoning problems in this setting are constraint satisfaction—to check whether a database satisfies the target constraints given the mappings and the ontology—and source-to-target (resp., target-to-source) constraint implication, which is to check whether a target constraint (resp., a source constraint) is satisfied by each database satisfying the source constraints (resp., the target constraints). We establish decidability and complexity bounds for all these problems in the case where ontologies are expressed in DL-LiteR and constraints range from functional dependencies to disjunctive tuple-generating dependencies.

2020 ◽  
Vol 13 (10) ◽  
pp. 1682-1695
Author(s):  
Ester Livshits ◽  
Alireza Heidari ◽  
Ihab F. Ilyas ◽  
Benny Kimelfeld

The problem of mining integrity constraints from data has been extensively studied over the past two decades for commonly used types of constraints, including the classic Functional Dependencies (FDs) and the more general Denial Constraints (DCs). In this paper, we investigate the problem of mining from data approximate DCs, that is, DCs that are "almost" satisfied. Approximation allows us to discover more accurate constraints in inconsistent databases and detect rules that are generally correct but may have a few exceptions. It also allows to avoid overfitting and obtain constraints that are more general, more natural, and less contrived. We introduce the algorithm ADCMiner for mining approximate DCs. An important feature of this algorithm is that it does not assume any specific approximation function for DCs, but rather allows for arbitrary approximation functions that satisfy some natural axioms that we define in the paper. We also show how our algorithm can be combined with sampling to return highly accurate results considerably faster.


2018 ◽  
Vol 61 ◽  
pp. 171-213 ◽  
Author(s):  
Sergio Abriola ◽  
Pablo Barceló ◽  
Diego Figueira ◽  
Santiago Figueira

Bisimulation provides structural conditions to characterize indistinguishability from an external observer between nodes on labeled graphs. It is a fundamental notion used in many areas, such as verification, graph-structured databases, and constraint satisfaction. However, several current applications use graphs where nodes also contain data (the so called "data graphs"), and where observers can test for equality or inequality of data values (e.g., asking the attribute 'name' of a node to be different from that of all its neighbors). The present work constitutes a first investigation of "data aware" bisimulations on data graphs. We study the problem of computing such bisimulations, based on the observational indistinguishability for XPath ---a language that extends modal logics like PDL with tests for data equality--- with and without transitive closure operators. We show that in general the problem is PSpace-complete, but identify several restrictions that yield better complexity bounds (coNP, PTime) by controlling suitable parameters of the problem, namely the amount of non-locality allowed, and the class of models considered (graphs, DAGs, trees). In particular, this analysis yields a hierarchy of tractable fragments.


2009 ◽  
pp. 2051-2058
Author(s):  
Luciano Caroprese ◽  
Cristian Molinaro ◽  
Irina Trubitsyna ◽  
Ester Zumpano

Integrating data from different sources consists of two main steps, the first in which the various relations are merged together, and the second in which some tuples are removed (or inserted) from the resulting database in order to satisfy integrity constraints. There are several ways to integrate databases or possibly distributed information sources, but whatever integration architecture we choose, the heterogeneity of the sources to be integrated causes subtle problems. In particular, the database obtained from the integration process may be inconsistent with respect to integrity constraints, that is, one or more integrity constraints are not satisfied. Integrity constraints represent an important source of information about the real world. They are usually used to define constraints on data (functional dependencies, inclusion dependencies, etc.) and have, nowadays, a wide applicability in several contexts such as semantic query optimization, cooperative query answering, database integration, and view update.


Author(s):  
Sergio Flesca ◽  
Sergio Greco ◽  
Ester Zumpano

Integrity constraints are a fundamental part of a database schema. They are generally used to define constraints on data (functional dependencies, inclusion dependencies, exclusion dependencies, etc.), and their enforcement ensures a semantically correct state of a database. As the presence of data inconsistent with respect to integrity constraints is not unusual, its management plays a key role in all the areas in which duplicate or conflicting information is likely to occur, such as database integration, data warehousing, and federated databases (Bry, 1997; Lin, 1996; Subrahmanian, 1994). It is well known that the presence of inconsistent data can be managed by “repairing” the database, that is, by providing consistent databases, obtained by a minimal set of update operations on the inconsistent original environment, or by consistently answering queries posed over the inconsistent database.


2016 ◽  
Vol 25 (03) ◽  
pp. 1650008
Author(s):  
L. Nourine ◽  
R. Ragab ◽  
F. Toumani

Automatic synthesis of web services business protocols (BPs) aims at solving algorithmically the problem of deriving a mediator that realizes a BP of a target service using a set of specifications of available services. This problem, and its variants, gave rise to a large number of fundamental research work over the last decade. However, existing works considered this problem under the restriction that the number of instances of an available service that can be involved in a composition is bounded by a constant [Formula: see text] which is fixed a priori. This paper investigates the unbounded variant of this problem using a formal framework in which web service BPs are described by means of finite state machines (FSM). We show that in this context, the protocol synthesis problem can be reduced to that of testing simulation preorder between an FSM and an (infinitely) iterated product of FSMs. Existing results regarding close decision problems in the context of the so-called shuffle languages are rather negative and cannot be directly exploited in our context. In this paper, we develop a novel technique to prove the decidability of testing simulation in our case of interest. We provide complexity bounds for the general protocol synthesis problem and identify two cases of particular interest, namely loop-free target services and hybrid states-free component services, for which protocol synthesis is shown to be respectively NP-COMPETE and EXPTIME-COMPLETE.


2020 ◽  
Vol 34 (03) ◽  
pp. 2790-2797
Author(s):  
Marco Console ◽  
Maurizio Lenzerini

Ontology-based data management (OBDM) is a powerful knowledge-oriented paradigm for managing data spread over multiple heterogeneous sources. In OBDM, the data sources of an information system are handled through the reconciled view provided by an ontology, i.e., the conceptualization of the underlying domain of interest expressed in some formal language. In any information systems where the basic knowledge resides in data sources, it is of paramount importance to specify the acceptable states of such information. Usually, this is done via integrity constraints, i.e., requirements that the data must satisfy formally expressed in some specific language. However, while the semantics of integrity constraints are clear in the context of databases, the presence of inferred information, typical of OBDM systems, considerably complicates the matter. In this paper, we establish a novel framework for integrity constraints in the OBDM scenarios, based on the notion of knowledge state of the information system. For integrity constraints in this framework, we define a language based on epistemic logic, and study decidability and complexity of both checking satisfaction and performing different forms of static analysis on them.


2013 ◽  
Vol 4 (3) ◽  
pp. 17-30 ◽  
Author(s):  
Oliver Curé ◽  
Fadhela Kerdjoudj ◽  
David Faye ◽  
Chan Le Duc ◽  
Myriam Lamolle

NoSQL stores are emerging as an efficient alternative to relational database management systems in the context of big data. Many actors in this domain consider that to gain a wider adoption, several extensions have to be integrated. Some of them focus on the ways of proposing more schemas, supporting adapted declarative query languages and providing integrity constraints in order to control data consistency and enhance data quality. The authors consider that these issues can be dealt with in the context of Ontology Based Data Access (OBDA). OBDA is a new data management paradigm that exploits the semantic knowledge represented in ontologies when querying data stored in a database. They provide a proof of concept of OBDA's ability to tackle these three issues in a social application related to the medical domain.


2013 ◽  
Vol 48 ◽  
pp. 115-174 ◽  
Author(s):  
A. Calì ◽  
G. Gottlob ◽  
M. Kifer

The chase algorithm is a fundamental tool for query evaluation and for testing query containment under tuple-generating dependencies (TGDs) and equality-generating dependencies (EGDs). So far, most of the research on this topic has focused on cases where the chase procedure terminates. This paper introduces expressive classes of TGDs defined via syntactic restrictions: guarded TGDs (GTGDs) and weakly guarded sets of TGDs (WGTGDs). For these classes, the chase procedure is not guaranteed to terminate and thus may have an infinite outcome. Nevertheless, we prove that the problems of conjunctive-query answering and query containment under such TGDs are decidable. We provide decision procedures and tight complexity bounds for these problems. Then we show how EGDs can be incorporated into our results by providing conditions under which EGDs do not harmfully interact with TGDs and do not affect the decidability and complexity of query answering. We show applications of the aforesaid classes of constraints to the problem of answering conjunctive queries in F-Logic Lite, an object-oriented ontology language, and in some tractable Description Logics.


2012 ◽  
Vol 12 (4-5) ◽  
pp. 701-718 ◽  
Author(s):  
MARIO ALVIANO ◽  
WOLFGANG FABER ◽  
NICOLA LEONE ◽  
MARCO MANNA

AbstractDatalog is one of the best-known rule-based languages, and extensions of it are used in a wide context of applications. An important Datalog extension is Disjunctive Datalog, which significantly increases the expressivity of the basic language. Disjunctive Datalog is useful in a wide range of applications, ranging from Databases (e.g., Data Integration) to Artificial Intelligence (e.g., diagnosis and planning under incomplete knowledge). However, in recent years an important shortcoming of Datalog-based languages became evident, e.g. in the context of data-integration (consistent query-answering, ontology-based data access) and Semantic Web applications: The language does not permit any generation of and reasoning with unnamed individuals in an obvious way. In general, it is weak in supporting many cases of existential quantification. To overcome this problem, Datalog∃ has recently been proposed, which extends traditional Datalog by existential quantification in rule heads. In this work, we propose a natural extension of Disjunctive Datalog and Datalog∃, called Datalog∃,˅, which allows both disjunctions and existential quantification in rule heads and is therefore an attractive language for knowledge representation and reasoning, especially in domains where ontology-based reasoning is needed. We formally define syntax and semantics of the language Datalog∃,˅, and provide a notion of instantiation, which we prove to be adequate for Datalog∃,˅. A main issue of Datalog∃ and hence also of Datalog∃,˅ is that decidability is no longer guaranteed for typical reasoning tasks. In order to address this issue, we identify many decidable fragments of the language, which extend, in a natural way, analog classes defined in the non-disjunctive case. Moreover, we carry out an in-depth complexity analysis, deriving interesting results which range from Logarithmic Space to Exponential Time.


Sign in / Sign up

Export Citation Format

Share Document