scholarly journals Automating the Schema Matching Process for Heterogeneous Data Warehouses

Author(s):  
Marko Banek ◽  
Boris Vrdoljak ◽  
A. Min Tjoa ◽  
Zoran Skočir
Author(s):  
Banek Marko ◽  
Vrdoljak Boris ◽  
Min Tjoa A ◽  
Skocir Zoran

A federated data warehouse is a logical integration of data warehouses applicable when physical integration is impossible due to privacy policy or legal restrictions. In healthcare systems federated data warehouses are a most feasible source of data for deducing guidelines for evidence-based medicine based on data material from different participating institutions. In order to enable the translation of queries in a federated approach, schemas of the federated warehouse and the local warehouses must be matched. In this paper we present a procedure that enables the matching process for schema structures specific to the multidimensional model of data warehouses: facts, measures, dimensions, aggregation levels and dimensional attributes. Similarities between warehouse-specific structures are computed by using linguistic and structural comparison. The calculated values are used to create necessary mappings.


2018 ◽  
Vol 42 (1) ◽  
pp. 39-61 ◽  
Author(s):  
Marko Gulić ◽  
Marin Vuković

Ontology matching plays an important role in the integration of heterogeneous data sources that are described by ontologies. In order to determine correspondences between ontologies, a set of matchers can be used. After the execution of these matchers and the aggregation of the results obtained by these matchers, a final alignment method is executed in order to select appropriate correspondences between entities of compared ontologies. The final alignment method is an important part of the ontology matching process because it directly determines the output result of this process. In this paper we improve our iterative final alignment method by introducing an automatic adjustment of final alignment threshold as well as a new rule for determining false correspondences with similarity values greater than adjusted threshold. An evaluation of the method is performed on the test ontologies of the OAEI evaluation contest and a comparison with other final alignment methods is given.


2008 ◽  
pp. 3116-3141
Author(s):  
Shi-Ming Huang ◽  
David C. Yen ◽  
Hsiang-Yuan Hsueh

The materialized view approach is widely adopted in implementations of data warehouse systems in or-der for efficiency purposes. In terms of the construction of a materialized data warehouse system, some managerial problems still exist to most developers and users in the view resource maintenance area in particular. Resource redundancy and data inconsistency among materialized views in a data warehouse system is a problem that many developers and users struggle with. In this article, a space-efficient protocol for materialized view maintenance with a global data view on data warehouses with embedded proxies is proposed. In the protocol set, multilevel proxy-based protocols with a data compensating mechanism are provided to certify the consistency and uniqueness of materialized data among data resources and materialized views. The authors also provide a set of evaluation experiences and derivations to verify the feasibility of proposed protocols and mechanisms. With such protocols as proxy services, the performance and space utilization of the materialized view approach will be improved. Furthermore, the consistency issue among materialized data warehouses and heterogeneous data sources can be properly accomplished by applying a dynamic compensating and synchronization mechanism. The trade-off between efficiency, storage consumption, and data validity for view maintenance tasks can be properly balanced.


Author(s):  
Ejaz Ahmed ◽  
Nik Bessis ◽  
Peter Norrington ◽  
Yong Yue

Much work has been done in the area of data access and integration using various data mapping, matching, and loading techniques. One of the main concerns when integrating data from heterogeneous data sources is data redundancy. The concern is mainly due to the different business contexts and purposes from which the data systems were originally built. A common process for accessing data from integrated databases involves the use of each data source’s own catalogue or metadata schema. In this article, the authors take the view that there is a greater chance of data inconsistencies, such as data redundancies when integrating them within a grid environment as compared to traditional distributed paradigms. The importance of improving the data search and matching process is briefly discussed, and a partial service oriented generic strategy is adopted to consolidate distinct catalogue schemas of federated databases to access information seamlessly. To this end, a proposed matching strategy between structure objects and data values across federated databases in a grid environment is presented.


2010 ◽  
Vol 2 (4) ◽  
pp. 51-64 ◽  
Author(s):  
Ejaz Ahmed ◽  
Nik Bessis ◽  
Peter Norrington ◽  
Yong Yue

Much work has been done in the area of data access and integration using various data mapping, matching, and loading techniques. One of the main concerns when integrating data from heterogeneous data sources is data redundancy. The concern is mainly due to the different business contexts and purposes from which the data systems were originally built. A common process for accessing data from integrated databases involves the use of each data source’s own catalogue or metadata schema. In this article, the authors take the view that there is a greater chance of data inconsistencies, such as data redundancies when integrating them within a grid environment as compared to traditional distributed paradigms. The importance of improving the data search and matching process is briefly discussed, and a partial service oriented generic strategy is adopted to consolidate distinct catalogue schemas of federated databases to access information seamlessly. To this end, a proposed matching strategy between structure objects and data values across federated databases in a grid environment is presented.


2020 ◽  
Vol 35 ◽  
Author(s):  
Jomar Da Silva ◽  
Kate Revoredo ◽  
Fernanda Baião ◽  
Jérôme Euzenat

Abstract Ontology matching aims at discovering mappings between the entities of two ontologies. It plays an important role in the integration of heterogeneous data sources that are described by ontologies. Interactive ontology matching involves domain experts in the matching process. In some approaches, the expert provides feedback about mappings between ontology entities, that is, these approaches select mappings to present to the expert who replies which of them should be accepted or rejected, so taking advantage of the knowledge of domain experts towards finding an alignment. In this paper, we present Alin, an interactive ontology matching approach which uses expert feedback not only to approve or reject selected mappings but also to dynamically improve the set of selected mappings, that is, to interactively include and to exclude mappings from it. This additional use for expert answers aims at increasing in the benefit brought by each expert answer. For this purpose, Alin uses four techniques. Two techniques were used in the previous versions of Alin to dynamically select concept and attribute mappings. Two new techniques are introduced in this paper: one to dynamically select relationship mappings and another one to dynamically reject inconsistent selected mappings using anti-patterns. We compared Alin with state-of-the-art tools, showing that it generates alignment of comparable quality.


Author(s):  
Yongjie Zhu ◽  
Shenzhan Feng

In the process of data integration among heterogeneous databases, it is significantly important to analyze the identical attributes and characteristics of the databases. However, the existing main data attribute matching model has the defects of oversize matching space and low matching precision. Therefore, this paper puts forward a heterogeneous data attribute matching model on the basis of fusion of SOM and BP network through analyzing the attribute matching process of heterogeneous databases. This model firstly matches the heterogeneous data attributes in advance by SOM network to determine the centre scope of attribute data to be matched. Secondly, the accurate match will be carried out through BP network of the standard heterogeneous data various attribute center. Finally, the matching result of the relevant actual database shows that this model can effectively reduce the matching space in the case of complex pattern. As for the large-scale data matching, the matching accuracy is relatively high. The average precision is 89.52%, and the average recall rate is 100%.


2019 ◽  
Vol 27 (1) ◽  
pp. 109-118 ◽  
Author(s):  
Nicholas J Dobbins ◽  
Clifford H Spital ◽  
Robert A Black ◽  
Jason M Morrison ◽  
Bas de Veer ◽  
...  

Abstract Objective Academic medical centers and health systems are increasingly challenged with supporting appropriate secondary use of clinical data. Enterprise data warehouses have emerged as central resources for these data, but often require an informatician to extract meaningful information, limiting direct access by end users. To overcome this challenge, we have developed Leaf, a lightweight self-service web application for querying clinical data from heterogeneous data models and sources. Materials and Methods Leaf utilizes a flexible biomedical concept system to define hierarchical concepts and ontologies. Each Leaf concept contains both textual representations and SQL query building blocks, exposed by a simple drag-and-drop user interface. Leaf generates abstract syntax trees which are compiled into dynamic SQL queries. Results Leaf is a successful production-supported tool at the University of Washington, which hosts a central Leaf instance querying an enterprise data warehouse with over 300 active users. Through the support of UW Medicine (https://uwmedicine.org), the Institute of Translational Health Sciences (https://www.iths.org), and the National Center for Data to Health (https://ctsa.ncats.nih.gov/cd2h/), Leaf source code has been released into the public domain at https://github.com/uwrit/leaf. Discussion Leaf allows the querying of single or multiple clinical databases simultaneously, even those of different data models. This enables fast installation without costly extraction or duplication. Conclusions Leaf differs from existing cohort discovery tools because it does not specify a required data model and is designed to seamlessly leverage existing user authentication systems and clinical databases in situ. We believe Leaf to be useful for health system analytics, clinical research data warehouses, precision medicine biobanks, and clinical studies involving large patient cohorts.


Sign in / Sign up

Export Citation Format

Share Document