A TWO-LEVEL METADATA DICTIONARY APPROACH FOR SEMANTIC QUERY PROCESSING IN MULTIDATABASE SYSTEMS

Author(s):  
MIIN-JENG PAN ◽  
SHI-KUO CHANG ◽  
CHIEN-CHIAO YANG

A multidatabase system (MDBS) is a system that integrates several autonomous database systems and provides users with a uniform access to all the databases. In this paper we developed a two-level active metadata dictionary approach for semantic query processing. To capture the global view of data schemas of participating databases which may be heterogeneous, a Hornclause data model is used. The lower-level metadata dictionaries (LLMDs) keep metadata for each corresponding local database in MDBS. The higher-level metadata dictionary (HLMD) integrates the metadata about all LLMDs. The database integration strategy includes two phases: schema translation and schema integration. It is a bottom-up approach integrating schema from the underlying database schemas. The evaluation strategy is a top-down approach. It starts with a query as a global goal to be achieved, unifies and optimizes the query to decompose the goal into subgoals that can be evaluated against extensional database, then translates these subgoals into corresponding queries against underlying DBMSs. To solve the control problem, we employ a G-net model for procedure control and inference control. An experimental implementation in Prolog is described.

2019 ◽  
Vol 5 (1) ◽  
pp. 65-79
Author(s):  
Yunhong Ji ◽  
Yunpeng Chai ◽  
Xuan Zhou ◽  
Lipeng Ren ◽  
Yajie Qin

AbstractIntra-query fault tolerance has increasingly been a concern for online analytical processing, as more and more enterprises migrate data analytical systems from mainframes to commodity computers. Most massive parallel processing (MPP) databases do not support intra-query fault tolerance. They may suffer from prolonged query latency when running on unreliable commodity clusters. While SQL-on-Hadoop systems can utilize the fault tolerance support of low-level frameworks, such as MapReduce and Spark, their cost-effectiveness is not always acceptable. In this paper, we propose a smart intra-query fault tolerance (SIFT) mechanism for MPP databases. SIFT achieves fault tolerance by performing checkpointing, i.e., materializing intermediate results of selected operators. Different from existing approaches, SIFT aims at promoting query success rate within a given time. To achieve its goal, it needs to: (1) minimize query rerunning time after encountering failures and (2) introduce as less checkpointing overhead as possible. To evaluate SIFT in real-world MPP database systems, we implemented it in Greenplum. The experimental results indicate that it can improve success rate of query processing effectively, especially when working with unreliable hardware.


2009 ◽  
pp. 2051-2058
Author(s):  
Luciano Caroprese ◽  
Cristian Molinaro ◽  
Irina Trubitsyna ◽  
Ester Zumpano

Integrating data from different sources consists of two main steps, the first in which the various relations are merged together, and the second in which some tuples are removed (or inserted) from the resulting database in order to satisfy integrity constraints. There are several ways to integrate databases or possibly distributed information sources, but whatever integration architecture we choose, the heterogeneity of the sources to be integrated causes subtle problems. In particular, the database obtained from the integration process may be inconsistent with respect to integrity constraints, that is, one or more integrity constraints are not satisfied. Integrity constraints represent an important source of information about the real world. They are usually used to define constraints on data (functional dependencies, inclusion dependencies, etc.) and have, nowadays, a wide applicability in several contexts such as semantic query optimization, cooperative query answering, database integration, and view update.


2000 ◽  
Vol 09 (03) ◽  
pp. 315-355 ◽  
Author(s):  
QIANG ZHU ◽  
P.-Å. LARSON

A multidatabase system (MDBS) integrates information from multiple pre-existing local databases. A major challenge for global query optimization in an MDBS is that some required local information about local database systems such as local cost models may not be available at the global level due to local autonomy. A feasible method to tackle this challenge is to group local queries on a local database system into classes and then use the costs of sample queries from each query class to derive a cost formula for the class via regression analysis. This paper discusses the issues on how to classify local queries so that a good cost formula can be derived for each query class. Two classification approaches, i.e. bottom-up and top-down, are suggested. The relationship between these two approaches is discussed. Classification rules that can be used in the approaches are identified. Problems regarding composition and redundancy of classification rules are studied. Classification algorithms are given. To test the membership of a query in a class, an efficient algorithm based on ranks is introduced. In addition, a hybrid classification approach that combines the bottom-up and top-down ones is also suggested. Experimental results demonstrate that the suggested query classification techniques can be used to derive good local cost formulas for global query optimization in an MDBS.


2020 ◽  
Vol 38 (4) ◽  
pp. 795-817
Author(s):  
Jan Kossmann ◽  
Rainer Schlosser

Abstract Challenges for self-driving database systems, which tune their physical design and configuration autonomously, are manifold: Such systems have to anticipate future workloads, find robust configurations efficiently, and incorporate knowledge gained by previous actions into later decisions. We present a component-based framework for self-driving database systems that enables database integration and development of self-managing functionality with low overhead by relying on separation of concerns. By keeping the components of the framework reusable and exchangeable, experiments are simplified, which promotes further research in that area. Moreover, to optimize multiple mutually dependent features, e.g., index selection and compression configurations, we propose a linear programming (LP) based algorithm to derive an efficient tuning order automatically. Afterwards, we demonstrate the applicability and scalability of our approach with reproducible examples.


Sign in / Sign up

Export Citation Format

Share Document