Embedded Functional Dependencies and Data-completeness Tailored Database Design

We establish a principled schema design framework for data with missing values. The framework is based on the new notion of an embedded functional dependency, which is independent of the interpretation of missing values, able to express completeness and integrity requirements on application data, and capable of capturing redundant data value occurrences that may cause problems with processing data that meets the requirements. We establish axiomatic, algorithmic, and logical foundations for reasoning about embedded functional dependencies. These foundations enable us to introduce generalizations of Boyce-Codd and Third normal forms that avoid processing difficulties of any application data, or minimize these difficulties across dependency-preserving decompositions, respectively. We show how to transform any given schema into application schemata that meet given completeness and integrity requirements, and the conditions of the generalized normal forms. Data over those application schemata are therefore fit for purpose by design. Extensive experiments with benchmark schemata and data illustrate the effectiveness of our framework for the acquisition of the constraints, the schema design process, and the performance of the schema designs in terms of updates and join queries.

Download Full-text

Computing Join Queries with Functional Dependencies

Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems - PODS '16 ◽

10.1145/2902251.2902289 ◽

2016 ◽

Cited By ~ 8

Author(s):

Mahmoud Abo Khamis ◽

Hung Q. Ngo ◽

Dan Suciu

Keyword(s):

Functional Dependencies ◽

Join Queries

Download Full-text

Comparison of Linguistic Summaries and Fuzzy Functional Dependencies Related to Data Mining

Advances in Data Mining and Database Management - Biologically-Inspired Techniques for Knowledge Discovery and Data Mining ◽

10.4018/978-1-4666-6078-6.ch008 ◽

2014 ◽

pp. 174-203 ◽

Cited By ~ 4

Author(s):

Miroslav Hudec ◽

Miljan Vučetić ◽

Mirko Vujošević

Keyword(s):

Data Mining ◽

Fuzzy Logic ◽

Relational Databases ◽

Missing Values ◽

Expert Knowledge ◽

Real Data ◽

Research Area ◽

Functional Dependencies ◽

Useful Knowledge ◽

Important Research Area

Data mining methods based on fuzzy logic have been developed recently and have become an increasingly important research area. In this chapter, the authors examine possibilities for discovering potentially useful knowledge from relational database by integrating fuzzy functional dependencies and linguistic summaries. Both methods use fuzzy logic tools for data analysis, acquiring, and representation of expert knowledge. Fuzzy functional dependencies could detect whether dependency between two examined attributes in the whole database exists. If dependency exists only between parts of examined attributes' domains, fuzzy functional dependencies cannot detect its characters. Linguistic summaries are a convenient method for revealing this kind of dependency. Using fuzzy functional dependencies and linguistic summaries in a complementary way could mine valuable information from relational databases. Mining intensities of dependencies between database attributes could support decision making, reduce the number of attributes in databases, and estimate missing values. The proposed approach is evaluated with case studies using real data from the official statistics. Strengths and weaknesses of the described methods are discussed. At the end of the chapter, topics for further research activities are outlined.

Download Full-text

Pattern-Based Schema Mapping and Query Answering in Peer-to-Peer XML Data Integration System

Advanced Database Query Systems - Advances in Data Mining and Database Management ◽

10.4018/978-1-60960-475-2.ch009 ◽

2011 ◽

pp. 221-246 ◽

Cited By ~ 1

Author(s):

Tadeusz Pankowski

Keyword(s):

Data Integration ◽

Missing Values ◽

Peer To Peer ◽

Query Answering ◽

Functional Dependencies ◽

Local Data ◽

Integration System ◽

Data Integration System ◽

P2p System ◽

High Level

This chapter addresses the problem of data integration in a P2P environment, where each peer stores schema of its local data, mappings between the schemas, and some schema constraints. The goal of the integration is to answer queries formulated against a chosen peer. The answer must consist of data stored in the queried peer as well as data of its direct and indirect partners. The chapter focuses on defining and using mappings, schema constraints, query propagation across the P2P system, and query answering in such scenario. Schemas, mappings, constraints (functional dependencies) and queries are all expressed using a unified approach based on tree-pattern formulas. The chapter discusses how functional dependencies can be exploited to increase information content of answers (by discovering missing values) and to control merging operations and propagation strategies. The chapter proposes algorithms for translating high-level specifications of mappings and queries into XQuery programs, and it shows how the discussed method has been implemented in SixP2P (or 6P2P) system.

Download Full-text

Strongly Possible Keys for SQL

Journal on Data Semantics ◽

10.1007/s13740-020-00113-8 ◽

2020 ◽

Vol 9 (2-3) ◽

pp. 85-99

Author(s):

Munqath Alattar ◽

Attila Sali

Keyword(s):

Data Mining ◽

Missing Values ◽

Possible World ◽

Bipartite Matching ◽

Matroid Intersection ◽

Intersection Problem ◽

Data Mining Approach ◽

Data Value ◽

Two Measures ◽

Matroid Intersection Problem

Abstract Missing data value is an extensive problem in both research and industrial developers. Two general approaches are there to deal with the problem of missing values in databases; they could be either ignored (removed) or imputed (filled in) with new values (Farhangfar et al. in IEEE Trans Syst Man Cybern-Part A: Syst Hum 37(5):692–709, 2007). For some SQL tables, it is possible that some candidate key of the table is not null-free and this needs to be handled. Possible keys and certain keys to deal with this situation were introduced in Köhler et al. (VLDB J 25(4):571–596, 2016). In the present paper, we introduce an intermediate concept called strongly possible keys that is based on a data mining approach using only information already contained in the SQL table. A strongly possible key is a key that holds for some possible world which is obtained by replacing any occurrences of nulls with some values already appearing in the corresponding attributes. Implication among strongly possible keys is characterized, and Armstrong tables are constructed. An algorithm to verify a strongly possible key is given applying bipartite matching. Connection between matroid intersection problem and system of strongly possible keys is established. For the cases when no strongly possible keys hold, an approximation notion is introduced to calculate the closeness of any given set of attributes to be considered as a strongly possible key using the $$g_3$$ g 3 measure, and we derive its component version $$g_4$$ g 4 . Analytical comparisons are given between the two measures.

Download Full-text

A Method for Normalization of Relation Schema Based on Data to Abide by the Third Normal Form

WSEAS TRANSACTIONS ON MATHEMATICS ◽

10.37394/23206.2020.19.20 ◽

2020 ◽

Vol 19 ◽

Keyword(s):

Normal Form ◽

Relational Databases ◽

Normal Forms ◽

Applied Mathematics ◽

Database Management System ◽

Functional Dependencies ◽

Full Understanding ◽

Data Bases ◽

The Third ◽

Relation Schema

Data bases play an important role in applied Mathematics. Normalization for relational databases is very important to avoid anomalies of relations which may not be in normalized forms of the third normal forms. But, normalization may be a difficult task, since the designers of the databases may not fully understand the domain of each attribute that are contained in the relation schema or they may not have full understanding about the concept of normalization. In this paper an efficient method that checks the possibility of the need of further normalization using stored data in relations is presented based on possible functional dependencies between attributes in the relations. By checking possible functional dependencies, the database designers can determine the need of further normalization, and may improve the structure of the relation schemas. Experiments were performed for an example of relational database that can be found in the organization of tutorial of MySQL which is a representational database management system, and the experiments showed good results.

Download Full-text

XML Normal Forms Based on Constraint-Tree-Based Functional Dependencies

Lecture Notes in Computer Science - Advances in Web and Network Technologies, and Information Management ◽

10.1007/978-3-540-72909-9_39 ◽

2007 ◽

pp. 348-357 ◽

Cited By ~ 4

Author(s):

Teng Lv ◽

Ping Yan

Keyword(s):

Normal Forms ◽

Functional Dependencies ◽

Constraint Tree

Download Full-text

Functional dependencies and normal forms in the fuzzy relational database model

Information Sciences ◽

10.1016/0020-0255(92)90002-p ◽

1992 ◽

Vol 60 (1-2) ◽

pp. 1-28 ◽

Cited By ~ 59

Author(s):

Sujeet Shenoi ◽

Austin Melton ◽

L.T. Fan

Keyword(s):

Relational Database ◽

Normal Forms ◽

Functional Dependencies ◽

Database Model

Download Full-text

Discovery of genuine functional dependencies from relational data with missing values

Proceedings of the VLDB Endowment ◽

10.14778/3204028.3204032 ◽

2018 ◽

Vol 11 (8) ◽

pp. 880-892 ◽

Cited By ~ 9

Author(s):

Laure Berti-Équille ◽

Hazar Harmouch ◽

Felix Naumann ◽

Noël Novelli ◽

Saravanan Thirumuruganathan

Keyword(s):

Missing Values ◽

Relational Data ◽

Functional Dependencies

Download Full-text

Fuzzy database for medical diagnosis.

10.32920/ryerson.14664432.v1 ◽

2021 ◽

Author(s):

Rehana Parvin

Keyword(s):

Decision Making ◽

Fuzzy Inference ◽

Database Systems ◽

Database System ◽

Inference System ◽

Related Information ◽

Data Value ◽

Schema Design ◽

Health Related ◽

Fuzzy Database

A challenge of working with traditional database systems with large amounts of data is that decision making requires numerous comparisons. Health-related database systems are examples of such databases, which contain millions of data entries and require fast data processing to examine related information to make complex decisions. In this thesis, a fuzzy database system is developed by integration of fuzzy inference system (FIS) and fuzzy schema design, and implementing it by SQL in three different health-care contexts; the assessments of heart disease, diabetes mellitus, and liver disorders. The fuzzy database system is implemented with the potential of having any form of data and tested with different types of data value, including crisp, linguistic, and null (i.e., missing) data. The developed system can explore crisp and linguistic data with loosely defined boundary conditions for decision-making. FIS and neural network-based solutions are implemented in MATLAB for the mentioned contexts for the comparison and validation with the dataset used in published works.

Download Full-text

Utilizing Nested Normal Form to Design Redundancy Free JSON Schemas

International Journal of Recent Contributions from Engineering Science & IT (iJES) ◽

10.3991/ijes.v4i4.6539 ◽

2016 ◽

Vol 4 (4) ◽

pp. 21

Author(s):

Wai Yin Mok

Keyword(s):

Normal Form ◽

Data Communication ◽

The Internet ◽

Class Diagram ◽

Hierarchical Data ◽

Functional Dependencies ◽

Data Interchange ◽

Schema Design ◽

Class Diagrams ◽

Interchange Format

JSON (JavaScript Object Notation) is a lightweight data-interchange format for the Internet. JSON is built on two structures: (1) a collection of name/value pairs and (2) an ordered list of values (http://www.json.org/). Because of this simple approach, JSON is easy to use and it has the potential to be the data interchange format of choice for the Internet. Similar to XML, JSON schemas allow nested structures to model hierarchical data. As data interchange over the Internet increases exponentially due to cloud computing or otherwise, redundancy free JSON data are an attractive form of communication because they improve the quality of data communication through eliminating update anomaly. Nested Normal Form, a normal form for hierarchical data, is a precise characterization of redundancy. A nested table, or a hierarchical schema, is in Nested Normal Form if and only if it is free of redundancy caused by multivalued and functional dependencies. Using Nested Normal Form as a guide, this paper introduces a JSON schema design methodology that begins with UML use case diagrams, communication diagrams and class diagrams that model a system under study. Based on the use cases’ execution frequencies and the data passed between involved parties in the communication diagrams, the proposed methodology selects classes from the class diagrams to be the roots of JSON scheme trees and repeatedly adds classes from the class diagram to the scheme trees as long as the schemas satisfy Nested Normal Form. This process continues until all of the classes in the class diagram have been added to some JSON scheme trees.

Download Full-text