Fuzzy Functional Dependencies (FFDs) as Integrity Constraints

Approximate denial constraints

Proceedings of the VLDB Endowment ◽

10.14778/3401960.3401966 ◽

2020 ◽

Vol 13 (10) ◽

pp. 1682-1695

Author(s):

Ester Livshits ◽

Alireza Heidari ◽

Ihab F. Ilyas ◽

Benny Kimelfeld

Keyword(s):

Integrity Constraints ◽

Functional Dependencies ◽

Approximation Function ◽

Inconsistent Databases ◽

The Past

The problem of mining integrity constraints from data has been extensively studied over the past two decades for commonly used types of constraints, including the classic Functional Dependencies (FDs) and the more general Denial Constraints (DCs). In this paper, we investigate the problem of mining from data approximate DCs, that is, DCs that are "almost" satisfied. Approximation allows us to discover more accurate constraints in inconsistent databases and detect rules that are generally correct but may have a few exceptions. It also allows to avoid overfitting and obtain constraints that are more general, more natural, and less contrived. We introduce the algorithm ADCMiner for mining approximate DCs. An important feature of this algorithm is that it does not assume any specific approximation function for DCs, but rather allows for arbitrary approximation functions that satisfy some natural axioms that we define in the paper. We also show how our algorithm can be combined with sampling to return highly accurate results considerably faster.

Download Full-text

Consistent Queries over Databases with Integrity Constraints

Database Technologies ◽

10.4018/978-1-60566-058-5.ch122 ◽

2009 ◽

pp. 2051-2058

Author(s):

Luciano Caroprese ◽

Cristian Molinaro ◽

Irina Trubitsyna ◽

Ester Zumpano

Keyword(s):

Integrity Constraints ◽

Database Integration ◽

Functional Dependencies ◽

Distributed Information ◽

Semantic Query ◽

Cooperative Query Answering ◽

Semantic Query Optimization ◽

Integration Architecture ◽

Source Of Information ◽

Different Sources

Integrating data from different sources consists of two main steps, the first in which the various relations are merged together, and the second in which some tuples are removed (or inserted) from the resulting database in order to satisfy integrity constraints. There are several ways to integrate databases or possibly distributed information sources, but whatever integration architecture we choose, the heterogeneity of the sources to be integrated causes subtle problems. In particular, the database obtained from the integration process may be inconsistent with respect to integrity constraints, that is, one or more integrity constraints are not satisfied. Integrity constraints represent an important source of information about the real world. They are usually used to define constraints on data (functional dependencies, inclusion dependencies, etc.) and have, nowadays, a wide applicability in several contexts such as semantic query optimization, cooperative query answering, database integration, and view update.

Download Full-text

Managing Inconsistent Databases Using Active Integrity Constraints

Encyclopedia of Database Technologies and Applications ◽

10.4018/978-1-59140-560-3.ch059 ◽

2005 ◽

pp. 345-350

Author(s):

Sergio Flesca ◽

Sergio Greco ◽

Ester Zumpano

Keyword(s):

Data Warehousing ◽

Integrity Constraints ◽

Database Integration ◽

Functional Dependencies ◽

Minimal Set ◽

Federated Databases ◽

Inconsistent Databases ◽

Conflicting Information ◽

Integration Data ◽

Inconsistent Database

Integrity constraints are a fundamental part of a database schema. They are generally used to define constraints on data (functional dependencies, inclusion dependencies, exclusion dependencies, etc.), and their enforcement ensures a semantically correct state of a database. As the presence of data inconsistent with respect to integrity constraints is not unusual, its management plays a key role in all the areas in which duplicate or conflicting information is likely to occur, such as database integration, data warehousing, and federated databases (Bry, 1997; Lin, 1996; Subrahmanian, 1994). It is well known that the presence of inconsistent data can be managed by “repairing” the database, that is, by providing consistent databases, obtained by a minimal set of update operations on the inconsistent original environment, or by consistently answering queries posed over the inconsistent database.

Download Full-text

Satisfaction and Implication of Integrity Constraints in Ontology-based Data Access

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/253 ◽

2019 ◽

Cited By ~ 1

Author(s):

Charalampos Nikolaou ◽

Bernardo Cuenca Grau ◽

Egor V. Kostylev ◽

Mark Kaminski ◽

Ian Horrocks

Keyword(s):

Constraint Satisfaction ◽

Data Access ◽

Integrity Constraints ◽

Functional Dependencies ◽

Complexity Bounds ◽

Decidability And Complexity

We extend ontology-based data access with integrity constraints over both the source and target schemas. The relevant reasoning problems in this setting are constraint satisfaction—to check whether a database satisfies the target constraints given the mappings and the ontology—and source-to-target (resp., target-to-source) constraint implication, which is to check whether a target constraint (resp., a source constraint) is satisfied by each database satisfying the source constraints (resp., the target constraints). We establish decidability and complexity bounds for all these problems in the case where ontologies are expressed in DL-LiteR and constraints range from functional dependencies to disjunctive tuple-generating dependencies.

Download Full-text

Consistent Queries Over Databases with Integrity Constraints

Database Integrity ◽

10.4018/978-1-930708-38-9.ch006 ◽

2002 ◽

pp. 172-202

Author(s):

Sergio Greco ◽

Ester Zumpano

Keyword(s):

Query Optimization ◽

Information Sources ◽

Integrity Constraints ◽

Database Integration ◽

Functional Dependencies ◽

Multiple Sources ◽

Semantic Query ◽

Cooperative Query Answering ◽

Semantic Query Optimization ◽

Source Of Information

Integrity constraints represent an important source of information about the real world. They are usually used to define constraints on data (functional dependencies, inclusion dependencies, etc.). Nowadays integrity constraints have a wide applicability in several contexts such as semantic query optimization, cooperative query answering, database integration and view update. Often databases may be inconsistent with respect to integrity constraints, that is, one or more integrity constraints are not satisfied. This may happen, for instance, when the database is obtained from the integration of different information sources. The integration of knowledge from multiple sources is an important aspect in several areas such as data warehousing, database integration, automated reasoning systems and active reactive databases.

Download Full-text

Modelling of Graph Databases

Journal of Advanced Engineering and Computation ◽

10.25073/jaec.201711.44 ◽

2017 ◽

Vol 1 (1) ◽

pp. 04 ◽

Cited By ~ 1

Author(s):

Jaroslav Pokorny

Keyword(s):

Relational Databases ◽

Original Work ◽

Integrity Constraints ◽

Graph Database ◽

Graph Databases ◽

Conceptual Level ◽

Functional Dependencies ◽

Creative Commons ◽

Er Model ◽

Open Access Article

Comparing graph databases with traditional,e.g., relational databases, some important database features are often missing there. Particularly, a graph database schema including integrity constraints is mostly not explicitly defined, also a conceptual modelling is not used. It is hard to check a consistency of the graph database, because almost no integrity constraints are defined or only their very simple representatives can be specified. In the paper, we discuss these issues and present current possibilities and challenges in graph database modelling. We focus also on integrity constraints modelling and propose functional dependencies between entity types, which reminds modelling functional dependencies known from relational databases. We show a number of examples of often cited GDBMSs and their approach to database schemas and ICs specification. Also a conceptual level of a graph database design is considered. We propose a sufficient conceptual model based on a binary variant of the ER model and show its relationship to a graph database model, i.e. a mapping conceptual schemas to database schemas. An alternative based on the conceptual functions called attributes is presented. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Full-text

Integrity Constraints Revisited: From Exact to Approximate Implication

Logical Methods in Computer Science ◽

10.46298/lmcs-18(1:5)2022 ◽

2022 ◽

Vol Volume 18, Issue 1 ◽

Author(s):

Batya Kenig ◽

Dan Suciu

Keyword(s):

Linear Inequality ◽

Measure Theory ◽

Probability Distributions ◽

Integrity Constraints ◽

Functional Dependencies ◽

Data Dependencies ◽

Relaxation Problem ◽

Implication Problem ◽

Schema Design ◽

Degree Of Satisfaction

Integrity constraints such as functional dependencies (FD) and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Then, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Finally, we show how some of the results in the paper can be derived using the I-measure theory, which relates between information theoretic measures and set theory. Our results recover, and sometimes extend, previously known results about the implication problem: the implication of MVDs and FDs can be checked by considering only 2-tuple relations.

Download Full-text

Modified algorithm for fast bandwidth selection for kernel estimates of multidimensional probability densities

Izmeritel`naya Tekhnika ◽

10.32446/0368-1025it.2020-11-9-13 ◽

2020 ◽

pp. 9-13

Author(s):

A. V. Lapko ◽

V. A. Lapko

Keyword(s):

Probability Density ◽

Optimal Parameter ◽

Random Variables ◽

Kernel Functions ◽

Independent Random Variables ◽

Functional Dependencies ◽

Approximation Properties ◽

Probability Density Estimation ◽

Multidimensional Probability ◽

Selection Of

An original technique has been justified for the fast bandwidths selection of kernel functions in a nonparametric estimate of the multidimensional probability density of the Rosenblatt–Parzen type. The proposed method makes it possible to significantly increase the computational efficiency of the optimization procedure for kernel probability density estimates in the conditions of large-volume statistical data in comparison with traditional approaches. The basis of the proposed approach is the analysis of the optimal parameter formula for the bandwidths of a multidimensional kernel probability density estimate. Dependencies between the nonlinear functional on the probability density and its derivatives up to the second order inclusive of the antikurtosis coefficients of random variables are found. The bandwidths for each random variable are represented as the product of an undefined parameter and their mean square deviation. The influence of the error in restoring the established functional dependencies on the approximation properties of the kernel probability density estimation is determined. The obtained results are implemented as a method of synthesis and analysis of a fast bandwidths selection of the kernel estimation of the two-dimensional probability density of independent random variables. This method uses data on the quantitative characteristics of a family of lognormal distribution laws.

Download Full-text

Modelling spatial integrity constraints with OCL

Revue internationale de géomatique ◽

10.3166/rig.21hs.95-123 ◽

2011 ◽

Vol 21 (SI) ◽

pp. 95-123 ◽

Cited By ~ 2

Author(s):

François Pinet ◽

Magali Duboisset ◽

Michel Schneider

Keyword(s):

Integrity Constraints

Download Full-text

CRISPRi enables isoform-specific loss-of-function screens and identification of gastric cancer-specific isoform dependencies

Genome Biology ◽

10.1186/s13059-021-02266-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Rebecca Davies ◽

Ling Liu ◽

Sheng Taotao ◽

Natasha Tuano ◽

Richa Chaturvedi ◽

...

Keyword(s):

Gastric Cancer ◽

Functional Dependencies ◽

Loss Of Function ◽

Genetic Screens ◽

Transcript Isoforms ◽

Specific Loss ◽

Current Loss ◽

Specific Transcript ◽

Multiple Promoters ◽

Poor Patient

Abstract Introduction Genes contain multiple promoters that can drive the expression of various transcript isoforms. Although transcript isoforms from the same gene could have diverse and non-overlapping functions, current loss-of-function methodologies are not able to differentiate between isoform-specific phenotypes. Results Here, we show that CRISPR interference (CRISPRi) can be adopted for targeting specific promoters within a gene, enabling isoform-specific loss-of-function genetic screens. We use this strategy to test functional dependencies of 820 transcript isoforms that are gained in gastric cancer (GC). We identify a subset of GC-gained transcript isoform dependencies, and of these, we validate CIT kinase as a novel GC dependency. We further show that some genes express isoforms with opposite functions. Specifically, we find that the tumour suppressor ZFHX3 expresses an isoform that has a paradoxical oncogenic role that correlates with poor patient outcome. Conclusions Our work finds isoform-specific phenotypes that would not be identified using current loss-of-function approaches that are not designed to target specific transcript isoforms.

Download Full-text