Exploring a framework for identity and attribute linking across heterogeneous data systems

Author(s):  
Nathan Wilder ◽  
Jared M. Smith ◽  
Audris Mockus
1999 ◽  
Vol 33 (3) ◽  
pp. 55-66 ◽  
Author(s):  
L. Charles Sun

An interactive data access and retrieval system, developed at the U.S. National Oceanographic Data Genter (NODG) and available at <ext-link ext-link-type="uri" href="http://www.node.noaa.gov">http://www.node.noaa.gov</ext-link>, is presented in this paper. The purposes of this paper are: (1) to illustrate the procedures of quality control and loading oceanographic data into the NODG ocean databases and (2) to describe the development of a system to manage, visualize, and disseminate the NODG data holdings over the Internet. The objective of the system is to provide ease of access to data that will be required for data assimilation models. With advances in scientific understanding of the ocean dynamics, data assimilation models require the synthesis of data from a variety of resources. Modern intelligent data systems usually involve integrating distributed heterogeneous data and information sources. As the repository for oceanographic data, NOAA’s National Oceanographic Data Genter (NODG) is in a unique position to develop such a data system. In support of the data assimilation needs, NODG has developed a system to facilitate browsing of the oceanographic environmental data and information that is available on-line at NODG. Users may select oceanographic data based on geographic areas, time periods and measured parameters. Once the selection is complete, users may produce a station location plot, produce plots of the parameters or retrieve the data.


Author(s):  
Ejaz Ahmed ◽  
Nik Bessis ◽  
Peter Norrington ◽  
Yong Yue

Much work has been done in the area of data access and integration using various data mapping, matching, and loading techniques. One of the main concerns when integrating data from heterogeneous data sources is data redundancy. The concern is mainly due to the different business contexts and purposes from which the data systems were originally built. A common process for accessing data from integrated databases involves the use of each data source’s own catalogue or metadata schema. In this article, the authors take the view that there is a greater chance of data inconsistencies, such as data redundancies when integrating them within a grid environment as compared to traditional distributed paradigms. The importance of improving the data search and matching process is briefly discussed, and a partial service oriented generic strategy is adopted to consolidate distinct catalogue schemas of federated databases to access information seamlessly. To this end, a proposed matching strategy between structure objects and data values across federated databases in a grid environment is presented.


2010 ◽  
Vol 2 (4) ◽  
pp. 51-64 ◽  
Author(s):  
Ejaz Ahmed ◽  
Nik Bessis ◽  
Peter Norrington ◽  
Yong Yue

Much work has been done in the area of data access and integration using various data mapping, matching, and loading techniques. One of the main concerns when integrating data from heterogeneous data sources is data redundancy. The concern is mainly due to the different business contexts and purposes from which the data systems were originally built. A common process for accessing data from integrated databases involves the use of each data source’s own catalogue or metadata schema. In this article, the authors take the view that there is a greater chance of data inconsistencies, such as data redundancies when integrating them within a grid environment as compared to traditional distributed paradigms. The importance of improving the data search and matching process is briefly discussed, and a partial service oriented generic strategy is adopted to consolidate distinct catalogue schemas of federated databases to access information seamlessly. To this end, a proposed matching strategy between structure objects and data values across federated databases in a grid environment is presented.


Big Data refers to huge amounts of heterogeneous data from both traditional and new sources, growing at a higher rate than ever. Due to their high heterogeneity, it is a challenge to build systems to centrally process and analyze efficiently such data which are internal and external to organizations. A Big data architecture describes the blueprint of a system handling massive volume of data during its storage, processing, analysis and visualization. Several architectures belonging to different categories have been proposed by academia and industry but the field is still lacking benchmarks. Therefore, a detailed analysis of the characteristics of the existing architectures is required in order to ease the choice between architectures for specific use cases or industry requirements. The types of data sources, the hardware requirements, the maximum tolerable latency, the fitment to industry, the amount of data to be handled are some of the factors that need to be considered carefully before making the choice of an architecture of a Big Data system. However, the wrong choice of architecture can result in huge decline for a company reputation and business. This paper reviews the most prominent existing Big Data architectures, their advantages and shortcomings, their hardware requirements, their open source and proprietary software requirements and some of their realworld use cases catering to each industry. The purpose of this body of work is to equip Big Data architects with the necessary resources to make better informed choices to design optimal Big Data systems.


Digital technology is fast changing in the recent years and with this change, the number of data systems, sources, and formats has also increased exponentially. So the process of extracting data from these multiple source systems and transforming it to suit for various analytics processes is gaining importance at an alarming rate. In order to handle Big Data, the process of transformation is quite challenging, as data generation is a continuous process. In this paper, we extract data from various heterogeneous sources from the web and try to transform it into a form which is vastly used in data warehousing so that it caters to the analytical needs of the machine learning community.


2020 ◽  
Vol 34 (3) ◽  
pp. 329-353 ◽  
Author(s):  
Thomas Schneider ◽  
Mantas Šimkus

Abstract Information systems have to deal with an increasing amount of data that is heterogeneous, unstructured, or incomplete. In order to align and complete data, systems may rely on taxonomies and background knowledge that are provided in the form of an ontology. This survey gives an overview of research work on the use of ontologies for accessing incomplete and/or heterogeneous data.


2020 ◽  
Vol 59 (02/03) ◽  
pp. 096-103
Author(s):  
Belén Prados-Suárez ◽  
Carlos Molina Fernández ◽  
Carmen Peña Yañez

Abstract Background Integration of health data systems is an open problem. Most of the active initiatives are based on the use of standards. However, achieving a widely and generalized compliment of such standards still seems a costly task that will take a long time to be completed. Even more, most of the standards are proposed for a specific use, without integrating other needs. Objectives We propose an alternative to get a unified view of health-related data, valid for several uses, that unites heterogeneous data sources. Methods Our proposal integrates developments made so far to automatically learn how to extract and convert data from different health-related systems. It enables the creation of a single multipurpose point of access. Results We present the EhRagg notion and its related concepts. EHRagg is defined as a middleware that, following the FAIR principles, integrates health data sources offering a unified view over them.


Sign in / Sign up

Export Citation Format

Share Document