Automating the Schema Matching Process for Heterogeneous Data Warehouses

Automated Integration of Heterogeneous Data Warehouse Schemas

Strategic Advancements in Utilizing Data Mining and Warehousing Technologies ◽

10.4018/978-1-60566-717-1.ch003 ◽

2011 ◽

pp. 38-59

Author(s):

Banek Marko ◽

Vrdoljak Boris ◽

Min Tjoa A ◽

Skocir Zoran

Keyword(s):

Data Warehouse ◽

Evidence Based Medicine ◽

Healthcare Systems ◽

Heterogeneous Data ◽

Evidence Based ◽

Data Warehouses ◽

Multidimensional Model ◽

Matching Process ◽

Legal Restrictions ◽

Based Medicine

A federated data warehouse is a logical integration of data warehouses applicable when physical integration is impossible due to privacy policy or legal restrictions. In healthcare systems federated data warehouses are a most feasible source of data for deducing guidelines for evidence-based medicine based on data material from different participating institutions. In order to enable the translation of queries in a federated approach, schemas of the federated warehouse and the local warehouses must be matched. In this paper we present a procedure that enables the matching process for schema structures specific to the multidimensional model of data warehouses: facts, measures, dimensions, aggregation levels and dimensional attributes. Similarities between warehouse-specific structures are computed by using linguistic and structural comparison. The calculated values are used to create necessary mappings.

Download Full-text

An Iterative Automatic Final Alignment Method in the Ontology Matching System

Journal of information and organizational sciences ◽

10.31341/jios.42.1.3 ◽

2018 ◽

Vol 42 (1) ◽

pp. 39-61 ◽

Cited By ~ 1

Author(s):

Marko Gulić ◽

Marin Vuković

Keyword(s):

Heterogeneous Data ◽

Data Sources ◽

Ontology Matching ◽

Alignment Method ◽

Automatic Adjustment ◽

Matching Process ◽

Heterogeneous Data Sources ◽

Final Alignment

Ontology matching plays an important role in the integration of heterogeneous data sources that are described by ontologies. In order to determine correspondences between ontologies, a set of matchers can be used. After the execution of these matchers and the aggregation of the results obtained by these matchers, a final alignment method is executed in order to select appropriate correspondences between entities of compared ontologies. The final alignment method is an important part of the ontology matching process because it directly determines the output result of this process. In this paper we improve our iterative final alignment method by introducing an automatic adjustment of final alignment threshold as well as a new rule for determining false correspondences with similarity values greater than adjusted threshold. An evaluation of the method is performed on the test ontologies of the OAEI evaluation contest and a comparison with other final alignment methods is given.

Download Full-text

A Space-Efficient Protocol for Consistency of External View Maintenance on Data Warehouse Systems

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch198 ◽

2008 ◽

pp. 3116-3141

Author(s):

Shi-Ming Huang ◽

David C. Yen ◽

Hsiang-Yuan Hsueh

Keyword(s):

Data Warehouse ◽

Heterogeneous Data ◽

Materialized Views ◽

Data Warehouses ◽

View Maintenance ◽

Synchronization Mechanism ◽

Materialized View ◽

Heterogeneous Data Sources ◽

Data Validity ◽

Data View

The materialized view approach is widely adopted in implementations of data warehouse systems in or-der for efficiency purposes. In terms of the construction of a materialized data warehouse system, some managerial problems still exist to most developers and users in the view resource maintenance area in particular. Resource redundancy and data inconsistency among materialized views in a data warehouse system is a problem that many developers and users struggle with. In this article, a space-efficient protocol for materialized view maintenance with a global data view on data warehouses with embedded proxies is proposed. In the protocol set, multilevel proxy-based protocols with a data compensating mechanism are provided to certify the consistency and uniqueness of materialized data among data resources and materialized views. The authors also provide a set of evaluation experiences and derivations to verify the feasibility of proposed protocols and mechanisms. With such protocols as proxy services, the performance and space utilization of the materialized view approach will be improved. Furthermore, the consistency issue among materialized data warehouses and heterogeneous data sources can be properly accomplished by applying a dynamic compensating and synchronization mechanism. The trade-off between efficiency, storage consumption, and data validity for view maintenance tasks can be properly balanced.

Download Full-text

Managing Inconsistencies in Data Grid Environments

Evolving Developments in Grid and Cloud Computing ◽

10.4018/978-1-4666-0056-0.ch022 ◽

2012 ◽

pp. 303-316

Author(s):

Ejaz Ahmed ◽

Nik Bessis ◽

Peter Norrington ◽

Yong Yue

Keyword(s):

Data Access ◽

Heterogeneous Data ◽

Data Mapping ◽

Data Systems ◽

Grid Environment ◽

Federated Databases ◽

Matching Process ◽

Generic Strategy ◽

Service Oriented ◽

Matching Strategy

Much work has been done in the area of data access and integration using various data mapping, matching, and loading techniques. One of the main concerns when integrating data from heterogeneous data sources is data redundancy. The concern is mainly due to the different business contexts and purposes from which the data systems were originally built. A common process for accessing data from integrated databases involves the use of each data source’s own catalogue or metadata schema. In this article, the authors take the view that there is a greater chance of data inconsistencies, such as data redundancies when integrating them within a grid environment as compared to traditional distributed paradigms. The importance of improving the data search and matching process is briefly discussed, and a partial service oriented generic strategy is adopted to consolidate distinct catalogue schemas of federated databases to access information seamlessly. To this end, a proposed matching strategy between structure objects and data values across federated databases in a grid environment is presented.

Download Full-text

Managing Inconsistencies in Data Grid Environments

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2010100105 ◽

2010 ◽

Vol 2 (4) ◽

pp. 51-64 ◽

Cited By ~ 2

Author(s):

Ejaz Ahmed ◽

Nik Bessis ◽

Peter Norrington ◽

Yong Yue

Keyword(s):

Data Access ◽

Heterogeneous Data ◽

Data Mapping ◽

Data Systems ◽

Grid Environment ◽

Federated Databases ◽

Matching Process ◽

Generic Strategy ◽

Service Oriented ◽

Matching Strategy

Much work has been done in the area of data access and integration using various data mapping, matching, and loading techniques. One of the main concerns when integrating data from heterogeneous data sources is data redundancy. The concern is mainly due to the different business contexts and purposes from which the data systems were originally built. A common process for accessing data from integrated databases involves the use of each data source’s own catalogue or metadata schema. In this article, the authors take the view that there is a greater chance of data inconsistencies, such as data redundancies when integrating them within a grid environment as compared to traditional distributed paradigms. The importance of improving the data search and matching process is briefly discussed, and a partial service oriented generic strategy is adopted to consolidate distinct catalogue schemas of federated databases to access information seamlessly. To this end, a proposed matching strategy between structure objects and data values across federated databases in a grid environment is presented.

Download Full-text

Alin: improving interactive ontology matching by interactively revising mapping suggestions

The Knowledge Engineering Review ◽

10.1017/s0269888919000249 ◽

2020 ◽

Vol 35 ◽

Author(s):

Jomar Da Silva ◽

Kate Revoredo ◽

Fernanda Baião ◽

Jérôme Euzenat

Keyword(s):

State Of The Art ◽

Heterogeneous Data ◽

Data Sources ◽

Ontology Matching ◽

New Techniques ◽

Domain Experts ◽

Matching Process ◽

Heterogeneous Data Sources ◽

Expert Answer ◽

Comparable Quality

Abstract Ontology matching aims at discovering mappings between the entities of two ontologies. It plays an important role in the integration of heterogeneous data sources that are described by ontologies. Interactive ontology matching involves domain experts in the matching process. In some approaches, the expert provides feedback about mappings between ontology entities, that is, these approaches select mappings to present to the expert who replies which of them should be accepted or rejected, so taking advantage of the knowledge of domain experts towards finding an alignment. In this paper, we present Alin, an interactive ontology matching approach which uses expert feedback not only to approve or reject selected mappings but also to dynamically improve the set of selected mappings, that is, to interactively include and to exclude mappings from it. This additional use for expert answers aims at increasing in the benefit brought by each expert answer. For this purpose, Alin uses four techniques. Two techniques were used in the previous versions of Alin to dynamically select concept and attribute mappings. Two new techniques are introduced in this paper: one to dynamically select relationship mappings and another one to dynamically reject inconsistent selected mappings using anti-patterns. We compared Alin with state-of-the-art tools, showing that it generates alignment of comparable quality.

Download Full-text

A Matching Method of Heterogeneous Database based on SOM and BP Neural Network

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2021.15.42 ◽

2021 ◽

Vol 15 ◽

pp. 383-392

Author(s):

Yongjie Zhu ◽

Shenzhan Feng

Keyword(s):

Large Scale ◽

Recall Rate ◽

Heterogeneous Data ◽

Heterogeneous Databases ◽

Heterogeneous Database ◽

Matching Model ◽

Bp Network ◽

Data Matching ◽

Matching Process ◽

Attribute Matching

In the process of data integration among heterogeneous databases, it is significantly important to analyze the identical attributes and characteristics of the databases. However, the existing main data attribute matching model has the defects of oversize matching space and low matching precision. Therefore, this paper puts forward a heterogeneous data attribute matching model on the basis of fusion of SOM and BP network through analyzing the attribute matching process of heterogeneous databases. This model firstly matches the heterogeneous data attributes in advance by SOM network to determine the centre scope of attribute data to be matched. Secondly, the accurate match will be carried out through BP network of the standard heterogeneous data various attribute center. Finally, the matching result of the relevant actual database shows that this model can effectively reduce the matching space in the case of complex pattern. As for the large-scale data matching, the matching accuracy is relatively high. The average precision is 89.52%, and the average recall rate is 100%.

Download Full-text

Leaf: an open-source, model-agnostic, data-driven web application for cohort discovery and translational biomedical research

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz165 ◽

2019 ◽

Vol 27 (1) ◽

pp. 109-118 ◽

Cited By ~ 2

Author(s):

Nicholas J Dobbins ◽

Clifford H Spital ◽

Robert A Black ◽

Jason M Morrison ◽

Bas de Veer ◽

...

Keyword(s):

Clinical Data ◽

Web Application ◽

Building Blocks ◽

Heterogeneous Data ◽

Data Models ◽

Direct Access ◽

Data Warehouses ◽

Concept System ◽

Clinical Databases ◽

Large Patient

Abstract Objective Academic medical centers and health systems are increasingly challenged with supporting appropriate secondary use of clinical data. Enterprise data warehouses have emerged as central resources for these data, but often require an informatician to extract meaningful information, limiting direct access by end users. To overcome this challenge, we have developed Leaf, a lightweight self-service web application for querying clinical data from heterogeneous data models and sources. Materials and Methods Leaf utilizes a flexible biomedical concept system to define hierarchical concepts and ontologies. Each Leaf concept contains both textual representations and SQL query building blocks, exposed by a simple drag-and-drop user interface. Leaf generates abstract syntax trees which are compiled into dynamic SQL queries. Results Leaf is a successful production-supported tool at the University of Washington, which hosts a central Leaf instance querying an enterprise data warehouse with over 300 active users. Through the support of UW Medicine (https://uwmedicine.org), the Institute of Translational Health Sciences (https://www.iths.org), and the National Center for Data to Health (https://ctsa.ncats.nih.gov/cd2h/), Leaf source code has been released into the public domain at https://github.com/uwrit/leaf. Discussion Leaf allows the querying of single or multiple clinical databases simultaneously, even those of different data models. This enables fast installation without costly extraction or duplication. Conclusions Leaf differs from existing cohort discovery tools because it does not specify a required data model and is designed to seamlessly leverage existing user authentication systems and clinical databases in situ. We believe Leaf to be useful for health system analytics, clinical research data warehouses, precision medicine biobanks, and clinical studies involving large patient cohorts.

Download Full-text