The implication problem of data dependencies over SQL table definitions

2012 ◽  
Vol 37 (2) ◽  
pp. 1-40 ◽  
Author(s):  
Sven Hartmann ◽  
Sebastian Link
2022 ◽  
Vol Volume 18, Issue 1 ◽  
Author(s):  
Batya Kenig ◽  
Dan Suciu

Integrity constraints such as functional dependencies (FD) and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Then, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Finally, we show how some of the results in the paper can be derived using the I-measure theory, which relates between information theoretic measures and set theory. Our results recover, and sometimes extend, previously known results about the implication problem: the implication of MVDs and FDs can be checked by considering only 2-tuple relations.


2021 ◽  
Author(s):  
◽  
Van Tran Bao Le

<p>A database is said to be C-Armstrong for a finite set Σ of data dependencies in a class C if the database satisfies all data dependencies in Σ and violates all data dependencies in C that are not implied by Σ. Therefore, Armstrong databases are concise, user-friendly representations of abstract data dependencies that can be used to judge, justify, convey, and test the understanding of database design choices. Indeed, an Armstrong database satisfies exactly those data dependencies that are considered meaningful by the current design choice Σ. Structural and computational properties of Armstrong databases have been deeply investigated in Codd’s Turing Award winning relational model of data. Armstrong databases have been incorporated in approaches towards relational database design. They have also been found useful for the elicitation of requirements, the semantic sampling of existing databases, and the specification of schema mappings. This research establishes a toolbox of Armstrong databases for SQL data. This is challenging as SQL data can contain null marker occurrences in columns declared NULL, and may contain duplicate rows. Thus, the existing theory of Armstrong databases only applies to idealized instances of SQL data, that is, instances without null marker occurrences and without duplicate rows. For the thesis, two popular interpretations of null markers are considered: the no information interpretation used in SQL, and the exists but unknown interpretation by Codd. Furthermore, the study is limited to the popular class C of functional dependencies. However, the presence of duplicate rows means that the class of uniqueness constraints is no longer subsumed by the class of functional dependencies, in contrast to the relational model of data. As a first contribution a provably-correct algorithm is developed that computes Armstrong databases for an arbitrarily given finite set of uniqueness constraints and functional dependencies. This contribution is based on axiomatic, algorithmic and logical characterizations of the associated implication problem that are also established in this thesis. While the problem to decide whether a given database is Armstrong for a given set of such constraints is precisely exponential, our algorithm computes an Armstrong database with a number of rows that is at most quadratic in the number of rows of a minimum-sized Armstrong database. As a second contribution the algorithms are implemented in the form of a design tool. Users of the tool can therefore inspect Armstrong databases to analyze their current design choice Σ. Intuitively, Armstrong databases are useful for the acquisition of semantically meaningful constraints, if the users can recognize the actual meaningfulness of constraints that they incorrectly perceived as meaningless before the inspection of an Armstrong database. As a final contribution, measures are introduced that formalize the term “useful” and it is shown by some detailed experiments that Armstrong tables, as computed by the tool, are indeed useful. In summary, this research establishes a toolbox of Armstrong databases that can be applied by database designers to concisely visualize constraints on SQL data. Such support can lead to database designs that guarantee efficient data management in practice.</p>


Micromachines ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 183
Author(s):  
Jose Ricardo Gomez-Rodriguez ◽  
Remberto Sandoval-Arechiga ◽  
Salvador Ibarra-Delgado ◽  
Viktor Ivan Rodriguez-Abdala ◽  
Jose Luis Vazquez-Avila ◽  
...  

Current computing platforms encourage the integration of thousands of processing cores, and their interconnections, into a single chip. Mobile smartphones, IoT, embedded devices, desktops, and data centers use Many-Core Systems-on-Chip (SoCs) to exploit their compute power and parallelism to meet the dynamic workload requirements. Networks-on-Chip (NoCs) lead to scalable connectivity for diverse applications with distinct traffic patterns and data dependencies. However, when the system executes various applications in traditional NoCs—optimized and fixed at synthesis time—the interconnection nonconformity with the different applications’ requirements generates limitations in the performance. In the literature, NoC designs embraced the Software-Defined Networking (SDN) strategy to evolve into an adaptable interconnection solution for future chips. However, the works surveyed implement a partial Software-Defined Network-on-Chip (SDNoC) approach, leaving aside the SDN layered architecture that brings interoperability in conventional networking. This paper explores the SDNoC literature and classifies it regarding the desired SDN features that each work presents. Then, we described the challenges and opportunities detected from the literature survey. Moreover, we explain the motivation for an SDNoC approach, and we expose both SDN and SDNoC concepts and architectures. We observe that works in the literature employed an uncomplete layered SDNoC approach. This fact creates various fertile areas in the SDNoC architecture where researchers may contribute to Many-Core SoCs designs.


Author(s):  
Chafik Arar ◽  
Mohamed Salah Khireddine

The paper proposes a new reliable fault-tolerant scheduling algorithm for real-time embedded systems. The proposed algorithm is based on static scheduling that allows to include the dependencies and the execution cost of tasks and data dependencies in its scheduling decisions. Our scheduling algorithm is dedicated to multi-bus heterogeneous architectures with multiple processors linked by several shared buses. This scheduling algorithm is considering only one bus fault caused by hardware faults and compensated by software redundancy solutions. The proposed algorithm is based on both active and passive backup copies to minimize the scheduling length of data on buses. In the experiments, the proposed methods are evaluated in terms of data scheduling length for a set of DSP benchmarks. The experimental results show the effectiveness of our technique.


Author(s):  
Srikanth M. Kannapan ◽  
Dean L. Taylor

Abstract Naive interpretations of concurrent engineering may expect extreme parallelization of tasks and simultaneous accommodation of multiple perspectives. In fact, from our efforts at modeling tasks in a MEMS (Micro-Electro-Mechanical Systems) pressure sensor design project, it appears that data dependencies due to the structure of tasks and the product itself result in scenarios of decision and action that must be carefully coordinated. This paper refines a previously described information model for defining evolving contexts of product model aspects and team member perspectives, with software agents acting on behalf of team members to execute tasks. The pressure sensor design project is analyzed in the framework of the information model. A scenario of decision and action for design of the pressure sensor is modeled as a design process plan. Conflict on a shared parameter occurs as a consequence of introducing some parallelism between the capacitance and deflection agents in the process. We present a technique for negotiating such conflicts by definition and propagation of utility functions on decision parameters and axiomatic negotiation.


2009 ◽  
Vol 27 (4) ◽  
pp. 1377-1386 ◽  
Author(s):  
M. Antón ◽  
D. Loyola ◽  
M. López ◽  
J. M. Vilaplana ◽  
M. Bañón ◽  
...  

Abstract. The main objective of this article is to compare the total ozone data from the new Global Ozone Monitoring Experiment instrument (GOME-2/MetOp) with reliable ground-based measurement recorded by five Brewer spectroradiometers in the Iberian Peninsula. In addition, a similar comparison for the predecessor instrument GOME/ERS-2 is described. The period of study is a whole year from May 2007 to April 2008. The results show that GOME-2/MetOp ozone data already has a very good quality, total ozone columns are on average 3.05% lower than Brewer measurements. This underestimation is higher than that obtained for GOME/ERS-2 (1.46%). However, the relative differences between GOME-2/MetOp and Brewer measurements show significantly lower variability than the differences between GOME/ERS-2 and Brewer data. Dependencies of these relative differences with respect to the satellite solar zenith angle (SZA), the satellite scan angle, the satellite cloud cover fraction (CF), and the ground-based total ozone measurements are analyzed. For both GOME instruments, differences show no significant dependence on SZA. However, GOME-2/MetOp data show a significant dependence on the satellite scan angle (+1.5%). In addition, GOME/ERS-2 differences present a clear dependence with respect to the CF and ground-based total ozone; such differences are minimized for GOME-2/MetOp. The comparison between the daily total ozone values provided by both GOME instruments shows that GOME-2/MetOp ozone data are on average 1.46% lower than GOME/ERS-2 data without any seasonal dependence. Finally, deviations of a priori climatological ozone profile used by the satellite retrieval algorithm from the true ozone profile are analyzed. Although excellent agreement between a priori climatological and measured partial ozone values is found for the middle and high stratosphere, relative differences greater than 15% are common for the troposphere and lower stratosphere.


Sign in / Sign up

Export Citation Format

Share Document