Exploring a framework for identity and attribute linking across heterogeneous data systems

An interactive data access and retrieval system, developed at the U.S. National Oceanographic Data Genter (NODG) and available at <ext-link ext-link-type="uri" href="http://www.node.noaa.gov">http://www.node.noaa.gov</ext-link>, is presented in this paper. The purposes of this paper are: (1) to illustrate the procedures of quality control and loading oceanographic data into the NODG ocean databases and (2) to describe the development of a system to manage, visualize, and disseminate the NODG data holdings over the Internet. The objective of the system is to provide ease of access to data that will be required for data assimilation models. With advances in scientific understanding of the ocean dynamics, data assimilation models require the synthesis of data from a variety of resources. Modern intelligent data systems usually involve integrating distributed heterogeneous data and information sources. As the repository for oceanographic data, NOAA’s National Oceanographic Data Genter (NODG) is in a unique position to develop such a data system. In support of the data assimilation needs, NODG has developed a system to facilitate browsing of the oceanographic environmental data and information that is available on-line at NODG. Users may select oceanographic data based on geographic areas, time periods and measured parameters. Once the selection is complete, users may produce a station location plot, produce plots of the parameters or retrieve the data.

Download Full-text

Managing Inconsistencies in Data Grid Environments

Evolving Developments in Grid and Cloud Computing ◽

10.4018/978-1-4666-0056-0.ch022 ◽

2012 ◽

pp. 303-316

Author(s):

Ejaz Ahmed ◽

Nik Bessis ◽

Peter Norrington ◽

Yong Yue

Keyword(s):

Data Access ◽

Heterogeneous Data ◽

Data Mapping ◽

Data Systems ◽

Grid Environment ◽

Federated Databases ◽

Matching Process ◽

Generic Strategy ◽

Service Oriented ◽

Matching Strategy

Much work has been done in the area of data access and integration using various data mapping, matching, and loading techniques. One of the main concerns when integrating data from heterogeneous data sources is data redundancy. The concern is mainly due to the different business contexts and purposes from which the data systems were originally built. A common process for accessing data from integrated databases involves the use of each data source’s own catalogue or metadata schema. In this article, the authors take the view that there is a greater chance of data inconsistencies, such as data redundancies when integrating them within a grid environment as compared to traditional distributed paradigms. The importance of improving the data search and matching process is briefly discussed, and a partial service oriented generic strategy is adopted to consolidate distinct catalogue schemas of federated databases to access information seamlessly. To this end, a proposed matching strategy between structure objects and data values across federated databases in a grid environment is presented.

Download Full-text

Managing Inconsistencies in Data Grid Environments

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2010100105 ◽

2010 ◽

Vol 2 (4) ◽

pp. 51-64 ◽

Cited By ~ 2

Author(s):

Ejaz Ahmed ◽

Nik Bessis ◽

Peter Norrington ◽

Yong Yue

Keyword(s):

Data Access ◽

Heterogeneous Data ◽

Data Mapping ◽

Data Systems ◽

Grid Environment ◽

Federated Databases ◽

Matching Process ◽

Generic Strategy ◽

Service Oriented ◽

Matching Strategy

Much work has been done in the area of data access and integration using various data mapping, matching, and loading techniques. One of the main concerns when integrating data from heterogeneous data sources is data redundancy. The concern is mainly due to the different business contexts and purposes from which the data systems were originally built. A common process for accessing data from integrated databases involves the use of each data source’s own catalogue or metadata schema. In this article, the authors take the view that there is a greater chance of data inconsistencies, such as data redundancies when integrating them within a grid environment as compared to traditional distributed paradigms. The importance of improving the data search and matching process is briefly discussed, and a partial service oriented generic strategy is adopted to consolidate distinct catalogue schemas of federated databases to access information seamlessly. To this end, a proposed matching strategy between structure objects and data values across federated databases in a grid environment is presented.

Download Full-text

Big Data Architectures: A Detailed and Application Oriented Analysis

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.h7179.078919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 2182-2190 ◽

Cited By ~ 1

Keyword(s):

Big Data ◽

Heterogeneous Data ◽

Use Cases ◽

Software Requirements ◽

Data Systems ◽

Company Reputation ◽

Data Architecture ◽

Big Data Systems ◽

A Company ◽

Build Systems

Big Data refers to huge amounts of heterogeneous data from both traditional and new sources, growing at a higher rate than ever. Due to their high heterogeneity, it is a challenge to build systems to centrally process and analyze efficiently such data which are internal and external to organizations. A Big data architecture describes the blueprint of a system handling massive volume of data during its storage, processing, analysis and visualization. Several architectures belonging to different categories have been proposed by academia and industry but the field is still lacking benchmarks. Therefore, a detailed analysis of the characteristics of the existing architectures is required in order to ease the choice between architectures for specific use cases or industry requirements. The types of data sources, the hardware requirements, the maximum tolerable latency, the fitment to industry, the amount of data to be handled are some of the factors that need to be considered carefully before making the choice of an architecture of a Big Data system. However, the wrong choice of architecture can result in huge decline for a company reputation and business. This paper reviews the most prominent existing Big Data architectures, their advantages and shortcomings, their hardware requirements, their open source and proprietary software requirements and some of their realworld use cases catering to each industry. The purpose of this body of work is to equip Big Data architects with the necessary resources to make better informed choices to design optimal Big Data systems.

Download Full-text

Extracting and Transforming Heterogeneous Data from XML files for Big Data

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3438.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 4276-4280

Keyword(s):

Machine Learning ◽

Big Data ◽

Learning Community ◽

Digital Technology ◽

Data Warehousing ◽

Continuous Process ◽

Heterogeneous Data ◽

Data Generation ◽

Data Systems ◽

The Web

Digital technology is fast changing in the recent years and with this change, the number of data systems, sources, and formats has also increased exponentially. So the process of extracting data from these multiple source systems and transforming it to suit for various analytics processes is gaining importance at an alarming rate. In order to handle Big Data, the process of transformation is quite challenging, as data generation is a continuous process. In this paper, we extract data from various heterogeneous sources from the web and try to transform it into a form which is vastly used in data warehousing so that it caters to the analytical needs of the machine learning community.

Download Full-text

Ontologies and Data Management: A Brief Survey

KI - Künstliche Intelligenz ◽

10.1007/s13218-020-00686-3 ◽

2020 ◽

Vol 34 (3) ◽

pp. 329-353 ◽

Cited By ~ 2

Author(s):

Thomas Schneider ◽

Mantas Šimkus

Keyword(s):

Information Systems ◽

Data Management ◽

Research Work ◽

Background Knowledge ◽

Heterogeneous Data ◽

Complete Data ◽

Data Systems

Abstract Information systems have to deal with an increasing amount of data that is heterogeneous, unstructured, or incomplete. In order to align and complete data, systems may rely on taxonomies and background knowledge that are provided in the form of an ontology. This survey gives an overview of research work on the use of ontologies for accessing incomplete and/or heterogeneous data.

Download Full-text

Electronic Health Records Aggregators (EHRagg )

Methods of Information in Medicine ◽

10.1055/s-0040-1714395 ◽

2020 ◽

Vol 59 (02/03) ◽

pp. 096-103

Author(s):

Belén Prados-Suárez ◽

Carlos Molina Fernández ◽

Carmen Peña Yañez

Keyword(s):

Heterogeneous Data ◽

Health Data ◽

Data Sources ◽

Data Systems ◽

Related Data ◽

Heterogeneous Data Sources ◽

Health Related ◽

Long Time ◽

Unified View ◽

Fair Principles

Abstract Background Integration of health data systems is an open problem. Most of the active initiatives are based on the use of standards. However, achieving a widely and generalized compliment of such standards still seems a costly task that will take a long time to be completed. Even more, most of the standards are proposed for a specific use, without integrating other needs. Objectives We propose an alternative to get a unified view of health-related data, valid for several uses, that unites heterogeneous data sources. Methods Our proposal integrates developments made so far to automatically learn how to extract and convert data from different health-related systems. It enables the creation of a single multipurpose point of access. Results We present the EhRagg notion and its related concepts. EHRagg is defined as a middleware that, following the FAIR principles, integrates health data sources offering a unified view over them.

Download Full-text