Embedded Functional Dependencies and Data-completeness Tailored Database Design

2021 ◽  
Vol 46 (2) ◽  
pp. 1-46
Author(s):  
Ziheng Wei ◽  
Sebastian Link

We establish a principled schema design framework for data with missing values. The framework is based on the new notion of an embedded functional dependency, which is independent of the interpretation of missing values, able to express completeness and integrity requirements on application data, and capable of capturing redundant data value occurrences that may cause problems with processing data that meets the requirements. We establish axiomatic, algorithmic, and logical foundations for reasoning about embedded functional dependencies. These foundations enable us to introduce generalizations of Boyce-Codd and Third normal forms that avoid processing difficulties of any application data, or minimize these difficulties across dependency-preserving decompositions, respectively. We show how to transform any given schema into application schemata that meet given completeness and integrity requirements, and the conditions of the generalized normal forms. Data over those application schemata are therefore fit for purpose by design. Extensive experiments with benchmark schemata and data illustrate the effectiveness of our framework for the acquisition of the constraints, the schema design process, and the performance of the schema designs in terms of updates and join queries.

Author(s):  
Miroslav Hudec ◽  
Miljan Vučetić ◽  
Mirko Vujošević

Data mining methods based on fuzzy logic have been developed recently and have become an increasingly important research area. In this chapter, the authors examine possibilities for discovering potentially useful knowledge from relational database by integrating fuzzy functional dependencies and linguistic summaries. Both methods use fuzzy logic tools for data analysis, acquiring, and representation of expert knowledge. Fuzzy functional dependencies could detect whether dependency between two examined attributes in the whole database exists. If dependency exists only between parts of examined attributes' domains, fuzzy functional dependencies cannot detect its characters. Linguistic summaries are a convenient method for revealing this kind of dependency. Using fuzzy functional dependencies and linguistic summaries in a complementary way could mine valuable information from relational databases. Mining intensities of dependencies between database attributes could support decision making, reduce the number of attributes in databases, and estimate missing values. The proposed approach is evaluated with case studies using real data from the official statistics. Strengths and weaknesses of the described methods are discussed. At the end of the chapter, topics for further research activities are outlined.


Author(s):  
Tadeusz Pankowski

This chapter addresses the problem of data integration in a P2P environment, where each peer stores schema of its local data, mappings between the schemas, and some schema constraints. The goal of the integration is to answer queries formulated against a chosen peer. The answer must consist of data stored in the queried peer as well as data of its direct and indirect partners. The chapter focuses on defining and using mappings, schema constraints, query propagation across the P2P system, and query answering in such scenario. Schemas, mappings, constraints (functional dependencies) and queries are all expressed using a unified approach based on tree-pattern formulas. The chapter discusses how functional dependencies can be exploited to increase information content of answers (by discovering missing values) and to control merging operations and propagation strategies. The chapter proposes algorithms for translating high-level specifications of mappings and queries into XQuery programs, and it shows how the discussed method has been implemented in SixP2P (or 6P2P) system.


2020 ◽  
Vol 9 (2-3) ◽  
pp. 85-99
Author(s):  
Munqath Alattar ◽  
Attila Sali

Abstract Missing data value is an extensive problem in both research and industrial developers. Two general approaches are there to deal with the problem of missing values in databases; they could be either ignored (removed) or imputed (filled in) with new values (Farhangfar et al. in IEEE Trans Syst Man Cybern-Part A: Syst Hum 37(5):692–709, 2007). For some SQL tables, it is possible that some candidate key of the table is not null-free and this needs to be handled. Possible keys and certain keys to deal with this situation were introduced in Köhler et al. (VLDB J 25(4):571–596, 2016). In the present paper, we introduce an intermediate concept called strongly possible keys that is based on a data mining approach using only information already contained in the SQL table. A strongly possible key is a key that holds for some possible world which is obtained by replacing any occurrences of nulls with some values already appearing in the corresponding attributes. Implication among strongly possible keys is characterized, and Armstrong tables are constructed. An algorithm to verify a strongly possible key is given applying bipartite matching. Connection between matroid intersection problem and system of strongly possible keys is established. For the cases when no strongly possible keys hold, an approximation notion is introduced to calculate the closeness of any given set of attributes to be considered as a strongly possible key using the $$g_3$$ g 3 measure, and we derive its component version $$g_4$$ g 4 . Analytical comparisons are given between the two measures.


2020 ◽  
Vol 19 ◽  

Data bases play an important role in applied Mathematics. Normalization for relational databases is very important to avoid anomalies of relations which may not be in normalized forms of the third normal forms. But, normalization may be a difficult task, since the designers of the databases may not fully understand the domain of each attribute that are contained in the relation schema or they may not have full understanding about the concept of normalization. In this paper an efficient method that checks the possibility of the need of further normalization using stored data in relations is presented based on possible functional dependencies between attributes in the relations. By checking possible functional dependencies, the database designers can determine the need of further normalization, and may improve the structure of the relation schemas. Experiments were performed for an example of relational database that can be found in the organization of tutorial of MySQL which is a representational database management system, and the experiments showed good results.


2018 ◽  
Vol 11 (8) ◽  
pp. 880-892 ◽  
Author(s):  
Laure Berti-Équille ◽  
Hazar Harmouch ◽  
Felix Naumann ◽  
Noël Novelli ◽  
Saravanan Thirumuruganathan

2021 ◽  
Author(s):  
Rehana Parvin

A challenge of working with traditional database systems with large amounts of data is that decision making requires numerous comparisons. Health-related database systems are examples of such databases, which contain millions of data entries and require fast data processing to examine related information to make complex decisions. In this thesis, a fuzzy database system is developed by integration of fuzzy inference system (FIS) and fuzzy schema design, and implementing it by SQL in three different health-care contexts; the assessments of heart disease, diabetes mellitus, and liver disorders. The fuzzy database system is implemented with the potential of having any form of data and tested with different types of data value, including crisp, linguistic, and null (i.e., missing) data. The developed system can explore crisp and linguistic data with loosely defined boundary conditions for decision-making. FIS and neural network-based solutions are implemented in MATLAB for the mentioned contexts for the comparison and validation with the dataset used in published works.


Author(s):  
Wai Yin Mok

JSON (JavaScript Object Notation) is a lightweight data-interchange format for the Internet. JSON is built on two structures: (1) a collection of name/value pairs and (2) an ordered list of values (http://www.json.org/). Because of this simple approach, JSON is easy to use and it has the potential to be the data interchange format of choice for the Internet. Similar to XML, JSON schemas allow nested structures to model hierarchical data. As data interchange over the Internet increases exponentially due to cloud computing or otherwise, redundancy free JSON data are an attractive form of communication because they improve the quality of data communication through eliminating update anomaly. Nested Normal Form, a normal form for hierarchical data, is a precise characterization of redundancy. A nested table, or a hierarchical schema, is in Nested Normal Form if and only if it is free of redundancy caused by multivalued and functional dependencies. Using Nested Normal Form as a guide, this paper introduces a JSON schema design methodology that begins with UML use case diagrams, communication diagrams and class diagrams that model a system under study. Based on the use cases’ execution frequencies and the data passed between involved parties in the communication diagrams, the proposed methodology selects classes from the class diagrams to be the roots of JSON scheme trees and repeatedly adds classes from the class diagram to the scheme trees as long as the schemas satisfy Nested Normal Form. This process continues until all of the classes in the class diagram have been added to some JSON scheme trees.


Sign in / Sign up

Export Citation Format

Share Document