schema design
Recently Published Documents


TOTAL DOCUMENTS

143
(FIVE YEARS 26)

H-INDEX

13
(FIVE YEARS 2)

2022 ◽  
Vol Volume 18, Issue 1 ◽  
Author(s):  
Batya Kenig ◽  
Dan Suciu

Integrity constraints such as functional dependencies (FD) and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Then, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Finally, we show how some of the results in the paper can be derived using the I-measure theory, which relates between information theoretic measures and set theory. Our results recover, and sometimes extend, previously known results about the implication problem: the implication of MVDs and FDs can be checked by considering only 2-tuple relations.


AIChE Journal ◽  
2021 ◽  
Author(s):  
Hawley Helmbrecht ◽  
Nuo Xu ◽  
Rick Liao ◽  
Elizabeth Nance

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Hetong Ma ◽  
Liu Shen ◽  
Haixia Sun ◽  
Zidu Xu ◽  
Li Hou ◽  
...  

Abstract Background The coronavirus disease (COVID-19), a pneumonia caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has shown its destructiveness with more than one million confirmed cases and dozens of thousands of death, which is highly contagious and still spreading globally. World-wide studies have been conducted aiming to understand the COVID-19 mechanism, transmission, clinical features, etc. A cross-language terminology of COVID-19 is essential for improving knowledge sharing and scientific discovery dissemination. Methods We developed a bilingual terminology of COVID-19 named COVID Term with mapping Chinese and English terms. The terminology was constructed as follows: (1) Classification schema design; (2) Concept representation model building; (3) Term source selection and term extraction; (4) Hierarchical structure construction; (5) Quality control (6) Web service. We built open access for the terminology, providing search, browse, and download services. Results The proposed COVID Term include 10 categories: disease, anatomic site, clinical manifestation, demographic and socioeconomic characteristics, living organism, qualifiers, psychological assistance, medical equipment, instruments and materials, epidemic prevention and control, diagnosis and treatment technique respectively. In total, COVID Terms covered 464 concepts with 724 Chinese terms and 887 English terms. All terms are openly available online (COVID Term URL: http://covidterm.imicams.ac.cn). Conclusions COVID Term is a bilingual terminology focused on COVID-19, the epidemic pneumonia with a high risk of infection around the world. It will provide updated bilingual terms of the disease to help health providers and medical professionals retrieve and exchange information and knowledge in multiple languages. COVID Term was released in machine-readable formats (e.g., XML and JSON), which would contribute to the information retrieval, machine translation and advanced intelligent techniques application.


Author(s):  
Robin La Fontaine

"Which came first," begins an old joke. But the more interesting question might be, "does it even matter?" There are many obvious and several not-so-obvious ways in which the order of items (be they XML elements or attributes, or JSON maps or arrays) can be understood to be significant or insignificant. These are not new questions and how they’re answered plays out across vocabulary design, schema design, and individual documents. They are important questions when it comes deciding if two documents are “the same” or “different” and to what extent. This paper challenges the one-size-fits-all decree in XML that order needs to be preserved and reviews the implications of 'order'. When ordered elements can be moved then we have something that has some common ground with orderless. This paper establishes a continuum between ordered information and orderless information and proposes that these are not as far apart as they might at first appear.


2021 ◽  
Vol 46 (2) ◽  
pp. 1-46
Author(s):  
Ziheng Wei ◽  
Sebastian Link

We establish a principled schema design framework for data with missing values. The framework is based on the new notion of an embedded functional dependency, which is independent of the interpretation of missing values, able to express completeness and integrity requirements on application data, and capable of capturing redundant data value occurrences that may cause problems with processing data that meets the requirements. We establish axiomatic, algorithmic, and logical foundations for reasoning about embedded functional dependencies. These foundations enable us to introduce generalizations of Boyce-Codd and Third normal forms that avoid processing difficulties of any application data, or minimize these difficulties across dependency-preserving decompositions, respectively. We show how to transform any given schema into application schemata that meet given completeness and integrity requirements, and the conditions of the generalized normal forms. Data over those application schemata are therefore fit for purpose by design. Extensive experiments with benchmark schemata and data illustrate the effectiveness of our framework for the acquisition of the constraints, the schema design process, and the performance of the schema designs in terms of updates and join queries.


2021 ◽  
Author(s):  
Rehana Parvin

A challenge of working with traditional database systems with large amounts of data is that decision making requires numerous comparisons. Health-related database systems are examples of such databases, which contain millions of data entries and require fast data processing to examine related information to make complex decisions. In this thesis, a fuzzy database system is developed by integration of fuzzy inference system (FIS) and fuzzy schema design, and implementing it by SQL in three different health-care contexts; the assessments of heart disease, diabetes mellitus, and liver disorders. The fuzzy database system is implemented with the potential of having any form of data and tested with different types of data value, including crisp, linguistic, and null (i.e., missing) data. The developed system can explore crisp and linguistic data with loosely defined boundary conditions for decision-making. FIS and neural network-based solutions are implemented in MATLAB for the mentioned contexts for the comparison and validation with the dataset used in published works.


2021 ◽  
Author(s):  
Rehana Parvin

A challenge of working with traditional database systems with large amounts of data is that decision making requires numerous comparisons. Health-related database systems are examples of such databases, which contain millions of data entries and require fast data processing to examine related information to make complex decisions. In this thesis, a fuzzy database system is developed by integration of fuzzy inference system (FIS) and fuzzy schema design, and implementing it by SQL in three different health-care contexts; the assessments of heart disease, diabetes mellitus, and liver disorders. The fuzzy database system is implemented with the potential of having any form of data and tested with different types of data value, including crisp, linguistic, and null (i.e., missing) data. The developed system can explore crisp and linguistic data with loosely defined boundary conditions for decision-making. FIS and neural network-based solutions are implemented in MATLAB for the mentioned contexts for the comparison and validation with the dataset used in published works.


Sign in / Sign up

Export Citation Format

Share Document