scholarly journals Data System and Data Management in a Federation of HPC/Cloud Centers

Author(s):  
Johannes Munke ◽  
Mohamad Hayek ◽  
Martin Golasowski ◽  
Rubén J. García-Hernández ◽  
Frédéric Donnat ◽  
...  
1986 ◽  
Vol 56 ◽  
pp. 31-44 ◽  
Author(s):  
W CAMPBELL ◽  
P SMITH ◽  
R PRICE ◽  
L ROELOFS

2013 ◽  
Vol 2 (1) ◽  
pp. 165-176 ◽  
Author(s):  
T. A. Boden ◽  
M. Krassovski ◽  
B. Yang

Abstract. The Carbon Dioxide Information Analysis Center (CDIAC) at Oak Ridge National Laboratory (ORNL), USA has provided scientific data management support for the US Department of Energy and international climate change science since 1982. Among the many data archived and available from CDIAC are collections from long-term measurement projects. One current example is the AmeriFlux measurement network. AmeriFlux provides continuous measurements from forests, grasslands, wetlands, and croplands in North, Central, and South America and offers important insight about carbon cycling in terrestrial ecosystems. To successfully manage AmeriFlux data and support climate change research, CDIAC has designed flexible data systems using proven technologies and standards blended with new, evolving technologies and standards. The AmeriFlux data system, comprised primarily of a relational database, a PHP-based data interface and a FTP server, offers a broad suite of AmeriFlux data. The data interface allows users to query the AmeriFlux collection in a variety of ways and then subset, visualize and download the data. From the perspective of data stewardship, on the other hand, this system is designed for CDIAC to easily control database content, automate data movement, track data provenance, manage metadata content, and handle frequent additions and corrections. CDIAC and researchers in the flux community developed data submission guidelines to enhance the AmeriFlux data collection, enable automated data processing, and promote standardization across regional networks. Both continuous flux and meteorological data and irregular biological data collected at AmeriFlux sites are carefully scrutinized by CDIAC using established quality-control algorithms before the data are ingested into the AmeriFlux data system. Other tasks at CDIAC include reformatting and standardizing the diverse and heterogeneous datasets received from individual sites into a uniform and consistent network database, generating high-level derived products to meet the current demands from a broad user group, and developing new products in anticipation of future needs. In this paper, we share our approaches to meet the challenges of standardizing, archiving and delivering quality, well-documented AmeriFlux data worldwide to benefit others with similar challenges of handling diverse climate change data, to further heighten awareness and use of an outstanding ecological data resource, and to highlight expanded software engineering applications being used for climate change measurement data.


2021 ◽  
Author(s):  
Stephan Hachinger ◽  
Jan Martinovič ◽  
Olivier Terzo ◽  
Marc Levrier ◽  
Alberto Scionti ◽  
...  

Author(s):  
T. A. Boden ◽  
M. Krassovski ◽  
B. Yang

Abstract. The Carbon Dioxide Information Analysis Center (CDIAC) at Oak Ridge National Laboratory (ORNL), USA has provided scientific data management support for the US Department of Energy and international climate change science since 1982. Among the many data archived and available from CDIAC are collections from long-term measurement projects. One current example is the AmeriFlux measurement network. AmeriFlux provides continuous measurements from forests, grasslands, wetlands, and croplands in North, Central, and South America and offers important insight about carbon cycling in terrestrial ecosystems. To successfully manage AmeriFlux data and support climate change research, CDIAC has designed flexible data systems using proven technologies and standards blended with new, evolving technologies and standards. The AmeriFlux data system, comprised primarily of a relational database, a PHP based data-interface and a FTP server, offers a broad suite of AmeriFlux data. The data interface allows users to query the AmeriFlux collection in a variety of ways and then subset, visualize and download the data. From the perspective of data stewardship, on the other hand, this system is designed for CDIAC to easily control database content, automate data movement, track data provenance, manage metadata content, and handle frequent additions and corrections. CDIAC and researchers in the flux community developed data submission guidelines to enhance the AmeriFlux data collection, enable automated data processing, and promote standardization across regional networks. Both continuous flux and meteorological data and irregular biological data collected at AmeriFlux sites are carefully scrutinized by CDIAC using established quality-control algorithms before the data are ingested into the AmeriFlux data system. Other tasks at CDIAC include reformatting and standardizing the diverse and heterogeneous datasets received from individual sites into a uniform and consistent network database, generating high-level derived products to meet the current demands from a broad user group, and developing new products in anticipation of future needs. In this paper, we share our approaches to meet the challenges of standardizing, archiving and delivering quality, well-documented AmeriFlux data worldwide to benefit others with similar challenges of handling diverse climate change data, to further heighten awareness and use of an outstanding ecological data resource, and to highlight expanded software engineering applications being used for climate change measurement data.


2015 ◽  
Vol 4 (2) ◽  
pp. 203-213 ◽  
Author(s):  
M. B. Krassovski ◽  
J. S. Riggs ◽  
L. A. Hook ◽  
W. R. Nettles ◽  
P. J. Hanson ◽  
...  

Abstract. Ecosystem-scale manipulation experiments represent large science investments that require well-designed data acquisition and management systems to provide reliable, accurate information to project participants and third party users. The SPRUCE project (Spruce and Peatland Responses Under Climatic and Environmental Change, http://mnspruce.ornl.gov) is such an experiment funded by the Department of Energy's (DOE), Office of Science, Terrestrial Ecosystem Science (TES) Program. The SPRUCE experimental mission is to assess ecosystem-level biological responses of vulnerable, high carbon terrestrial ecosystems to a range of climate warming manipulations and an elevated CO2 atmosphere. SPRUCE provides a platform for testing mechanisms controlling the vulnerability of organisms, biogeochemical processes, and ecosystems to climatic change (e.g., thresholds for organism decline or mortality, limitations to regeneration, biogeochemical limitations to productivity, and the cycling and release of CO2 and CH4 to the atmosphere). The SPRUCE experiment will generate a wide range of continuous and discrete measurements. To successfully manage SPRUCE data collection, achieve SPRUCE science objectives, and support broader climate change research, the research staff has designed a flexible data system using proven network technologies and software components. The primary SPRUCE data system components are the following: 1. data acquisition and control system – set of hardware and software to retrieve biological and engineering data from sensors, collect sensor status information, and distribute feedback to control components; 2. data collection system – set of hardware and software to deliver data to a central depository for storage and further processing; 3. data management plan – set of plans, policies, and practices to control consistency, protect data integrity, and deliver data. This publication presents our approach to meeting the challenges of designing and constructing an efficient data system for managing high volume sources of in situ observations in a remote, harsh environmental location. The approach covers data flow starting from the sensors and ending at the archival/distribution points, discusses types of hardware and software used, examines design considerations that were used to choose them, and describes the data management practices chosen to control and enhance the value of the data.


1984 ◽  
Vol 106 (4) ◽  
pp. 297-303 ◽  
Author(s):  
W. C. Mosley

Certain aspects of the space station data system set it apart from any other spacecraft data system ever developed: serviceable and repairable to a degree never before possible; operational lifetime spans generations of technology; dynamically extendible to accommodate evolutionary growth; unprecedented levels of complexity will occur early in the life cycle; unprecedented levels of data storage and onboard processing capacity will be required early in the life cycle. These factors, as well as others that are known, top-level space station requirements, form the basis from which derived data management system requirements (again top-level) are assembled. From these requirements, the architectural properties that need emphasis in the architectural selection process are identified. Additionally, other issues that may influence the architecture are identified and an architectural baseline is established. From this base, an example of controlled growth helps to identify other technology issues, so that the architectural discussion ends with a summary of technology issues that result from the requirements and the need for tools and techniques to support controlled growth. The architecture discussion is followed by a discussion of generic technology areas as well as related technology areas and the challenges to initial space station development. Technology areas that offer promise of enhanced capabilities in the future are also identified. This paper is not a comprehensive assessment of data management system technology, nor is it advocating a particular approach. Its purpose is to illuminate design and technology areas for further discussion and formal investigation.


Author(s):  
M. B. Krassovski ◽  
J. S. Riggs ◽  
L. A. Hook ◽  
W. R. Nettles ◽  
P. J. Hanson ◽  
...  

Abstract. Ecosystem-scale manipulation experiments represent large science investments that require well-designed data acquisition and management systems to provide reliable, accurate information to project participants and third party users. The SPRUCE Project (Spruce and Peatland Responses Under Climatic and Environmental Change, http://mnspruce.ornl.gov) is such an experiment funded by the Department of Energy's (DOE), Office of Science, Terrestrial Ecosystem Science (TES) Program. The SPRUCE experimental mission is to assess ecosystem-level biological responses of vulnerable, high carbon terrestrial ecosystems to a range of climate warming manipulations and an elevated CO2 atmosphere. SPRUCE provides a platform for testing mechanisms controlling the vulnerability of organisms, biogeochemical processes, and ecosystems to climatic change (e.g., thresholds for organism decline or mortality, limitations to regeneration, biogeochemical limitations to productivity, the cycling and release of CO2 and CH4 to the atmosphere). The SPRUCE experiment will generate a wide range of continuous and discrete measurements. To successfully manage SPRUCE data collection, achieve SPRUCE science objectives, and support broader climate change research, the research staff has designed a flexible data system using proven network technologies and software components. The primary SPRUCE data system components are: 1. Data acquisition and control system – set of hardware and software to retrieve biological and engineering data from sensors, collect sensor status information, and distribute feedback to control components. 2. Data collection system – set of hardware and software to deliver data to a central depository for storage and further processing. 3. Data management plan – set of plans, policies, and practices to control consistency, protect data integrity, and deliver data. This publication presents our approach to meeting the challenges of designing and constructing an efficient data system for managing high volume sources of in-situ observations in a remote, harsh environmental location. The approach covers data flow starting from the sensors and ending at the archival/distribution points, discusses types of hardware and software used, examines design considerations that were used to choose them, and describes the data management practices chosen to control and enhance the value of the data.


Sign in / Sign up

Export Citation Format

Share Document