scholarly journals On the Discovery of Semantically Meaningful SQL Constraints from Armstrong Samples: Foundations, Implementation, and Evaluation

2021 ◽  
Author(s):  
◽  
Van Tran Bao Le

<p>A database is said to be C-Armstrong for a finite set Σ of data dependencies in a class C if the database satisfies all data dependencies in Σ and violates all data dependencies in C that are not implied by Σ. Therefore, Armstrong databases are concise, user-friendly representations of abstract data dependencies that can be used to judge, justify, convey, and test the understanding of database design choices. Indeed, an Armstrong database satisfies exactly those data dependencies that are considered meaningful by the current design choice Σ. Structural and computational properties of Armstrong databases have been deeply investigated in Codd’s Turing Award winning relational model of data. Armstrong databases have been incorporated in approaches towards relational database design. They have also been found useful for the elicitation of requirements, the semantic sampling of existing databases, and the specification of schema mappings. This research establishes a toolbox of Armstrong databases for SQL data. This is challenging as SQL data can contain null marker occurrences in columns declared NULL, and may contain duplicate rows. Thus, the existing theory of Armstrong databases only applies to idealized instances of SQL data, that is, instances without null marker occurrences and without duplicate rows. For the thesis, two popular interpretations of null markers are considered: the no information interpretation used in SQL, and the exists but unknown interpretation by Codd. Furthermore, the study is limited to the popular class C of functional dependencies. However, the presence of duplicate rows means that the class of uniqueness constraints is no longer subsumed by the class of functional dependencies, in contrast to the relational model of data. As a first contribution a provably-correct algorithm is developed that computes Armstrong databases for an arbitrarily given finite set of uniqueness constraints and functional dependencies. This contribution is based on axiomatic, algorithmic and logical characterizations of the associated implication problem that are also established in this thesis. While the problem to decide whether a given database is Armstrong for a given set of such constraints is precisely exponential, our algorithm computes an Armstrong database with a number of rows that is at most quadratic in the number of rows of a minimum-sized Armstrong database. As a second contribution the algorithms are implemented in the form of a design tool. Users of the tool can therefore inspect Armstrong databases to analyze their current design choice Σ. Intuitively, Armstrong databases are useful for the acquisition of semantically meaningful constraints, if the users can recognize the actual meaningfulness of constraints that they incorrectly perceived as meaningless before the inspection of an Armstrong database. As a final contribution, measures are introduced that formalize the term “useful” and it is shown by some detailed experiments that Armstrong tables, as computed by the tool, are indeed useful. In summary, this research establishes a toolbox of Armstrong databases that can be applied by database designers to concisely visualize constraints on SQL data. Such support can lead to database designs that guarantee efficient data management in practice.</p>

2016 ◽  
Vol 27 (2) ◽  
pp. 27-48
Author(s):  
András Benczúr ◽  
Gyula I. Szabó

This paper introduces a generalized data base concept that unites relational and semi structured data models. As an important theoretical result we could find a quadratic decision algorithm for the implication problem of functional and join dependencies defined on the united data model. As practical contribution we presented a normal form for the new data model as a tool for data base design. With our novel representations of regular expressions, a more effective searching method could be developed. XML elements are described by XML schema languages such as a DTD or an XML Schema definition. The instances of these elements are semi-structured tuples. A semi-structured tuple is an ordered list of (attribute: value) pairs. We may think of a semi-structured tuple as a sentence of a formal language, where the values are the terminal symbols and the attribute names are the non-terminal symbols. In the authors' former work (Szabó and Benczúr, 2015) they introduced the notion of the extended tuple as a sentence from a regular language generated by a grammar where the non-terminal symbols of the grammar are the attribute names of the tuple. Sets of extended tuples are the extended relations. The authors then introduced the dual language, which generates the tuple types allowed to occur in extended relations. They defined functional dependencies (regular FD - RFD) over extended relations. In this paper they rephrase the RFD concept by directly using regular expressions over attribute names to define extended tuples. By the help of a special vertex labeled graph associated to regular expressions the specification of substring selection for the projection operation can be defined. The normalization for regular schemas is more complex than it is in the relational model, because the schema of an extended relation can contain an infinite number of tuple types. However, the authors can define selection, projection and join operations on extended relations too, so a lossless-join decomposition can be performed. They extended their previous model to deal with XML schema indicators too, e.g., with numerical constraints. They added line and set constructors too, in order to extend their model with more general projection and selection operators. This model establishes a query language with table join functionality for collected XML element data.


Author(s):  
Radim Belohlavek ◽  
Vilem Vychodil

This chapter deals with data dependencies in Codd’s relational model of data. In particular, we deal with fuzzy logic extensions of the relational model that consist of adding similarity relations to domains and consider functional dependencies in these extensions. We present a particular extension and functional dependencies in this extension that follow the principles of fuzzy logic in a narrow sense. We present selected features and results regarding this extension. Then, we use this extension as a reference model and compare it to several other extensions proposed in the literature. We argue that following the principles of fuzzy logic in a narrow sense, the same way we can follow the principles of classical logic in the case of the ordinary Codd relational model, helps achieve transparency, versatility, conceptual clarity, and theoretical and computational tractability of the extension. We outline several topics for future research.


10.28945/3199 ◽  
2008 ◽  
Author(s):  
Milos Bogdanovic ◽  
Aleksandar Stanimirovic ◽  
Nikola Davidovic ◽  
Leonid Stoimenov

Most universities where students study informational technologies and computer science have an introductory course dealing with the development and design of databases. These courses often include usage of database design tools. In this paper, the #EER tool is presented, the task of which is to make the process of relational databases design easier for the students and partially automatize it. The tool evolved due to the experience in using similar tools for educational purposes. It enables fast and efficient development of the relational database conceptual model and its automatized compilation into a relational model and further to data definition language (DDL) commands. #EER tool is based on the extended entity-relationship (EER) model for conceptual modeling of relational databases. Modular architecture of the tool, the development of which is based on the usage of the design patterns, along with the benefits that its usage brings, is also presented.


2022 ◽  
Vol Volume 18, Issue 1 ◽  
Author(s):  
Batya Kenig ◽  
Dan Suciu

Integrity constraints such as functional dependencies (FD) and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Then, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Finally, we show how some of the results in the paper can be derived using the I-measure theory, which relates between information theoretic measures and set theory. Our results recover, and sometimes extend, previously known results about the implication problem: the implication of MVDs and FDs can be checked by considering only 2-tuple relations.


Author(s):  
Morad Hajji ◽  
Mohammed Qbadou ◽  
Khalifa Mansouri

Ontologies are spreading more and more in the field of information technologies as a privileged solution allowing the formalization of knowledge. The theoretical model of ontologies is most promising. They are increasingly ubiquitous given the benefits they present. Despite the proliferation of research proposing approaches dedicated to the design of a database from an ontology, the tools to design a database from an ontology are rare or inaccessible. Thus, in this contribution, we present our approach for the development of an Eclipse Plug-in, in order to automatically generate a conceptual model of a relational database from an ontology. To evaluate the usefulness of our approach, we used our resulting Eclipse Plug-in to automatically generate a conceptual model of a relational database from an ontology, customize it, and automatically generate the corresponding SQL script for Data Definition. The results of this experiment showed that our Plug-in constitutes a concretization of our approach and a means of automatic translation from the ontological model to the relational model.


Author(s):  
Jean-Marc Petit ◽  
Mohand-Saïd Hacid

This chapter revisits conceptual database design and focuses on the so-called “logical database tuning”. We first recall fundamental differences between constructor-oriented models (like extended Entity-Relationship models) and attribute-oriented models (like the relational model). Then, we introduce an integrated algorithm for translating ER-like conceptual database schemas to relational database schemas. To consider the tuning of such logical databases, we highlight two extreme cases: null-free databases and efficient — though non redundant — databases. Finally, we point out how SQL workloads could be used a posteriori as a help for the designers and/or the database administrators to reach a compromise between these extreme cases. While a lot of papers and books have been devoted for many years to database design, we hope that this chapter will clarify the understanding of database designers when implementing their databases and database administrators when maintaining their databases.


Sign in / Sign up

Export Citation Format

Share Document