Challenges of Research Data Management for High Performance Computing

Author(s):  
Björn Schembera ◽  
Thomas Bönisch
Author(s):  
Reiner Anderl ◽  
Orkun Yaman

High Performance Computing (HPC) has become ubiquitous for simulations in the industrial context. To identify the requirements for integration of HPC-relevant data and processes a survey has been conducted concerning the German car manufacturers and service and component suppliers. This contribution presents the results of the evaluation and suggests an architecture concept to integrate data and workflows related with CAE and HPC-facilities in PLM. It describes the state of the art of HPC-applications within the simulation domain. Intensive efforts are currently invested on CAE-data management. However, an approach to systematic data management of HPC does not exist. This study states importance of an integrating approach for data management of HPC-applications and develops an architectural framework to implement HPC-data management into the existing PLM landscape. Requirements on key functionalities and interfaces are defined as well as a framework for a reference information model is conceptualized.


2014 ◽  
Vol 9 (2) ◽  
pp. 17-27 ◽  
Author(s):  
Ritu Arora ◽  
Maria Esteva ◽  
Jessica Trelogan

The process of developing a digital collection in the context of a research project often involves a pipeline pattern during which data growth, data types, and data authenticity need to be assessed iteratively in relation to the different research steps and in the interest of archiving. Throughout a project’s lifecycle curators organize newly generated data while cleaning and integrating legacy data when it exists, and deciding what data will be preserved for the long term. Although these actions should be part of a well-oiled data management workflow, there are practical challenges in doing so if the collection is very large and heterogeneous, or is accessed by several researchers contemporaneously. There is a need for data management solutions that can help curators with efficient and on-demand analyses of their collection so that they remain well-informed about its evolving characteristics. In this paper, we describe our efforts towards developing a workflow to leverage open science High Performance Computing (HPC) resources for routinely and efficiently conducting data management tasks on large collections. We demonstrate that HPC resources and techniques can significantly reduce the time for accomplishing critical data management tasks, and enable a dynamic archiving throughout the research process. We use a large archaeological data collection with a long and complex formation history as our test case. We share our experiences in adopting open science HPC resources for large-scale data management, which entails understanding usage of the open source HPC environment and training users. These experiences can be generalized to meet the needs of other data curators working with large collections.


2013 ◽  
Vol 8 (1) ◽  
pp. 279-287 ◽  
Author(s):  
Damien Lecarpentier ◽  
Peter Wittenburg ◽  
Willem Elbers ◽  
Alberto Michelini ◽  
Riam Kanso ◽  
...  

The EUDAT project is a pan-European data initiative that started in October 2011. The project brings together a unique consortium of 25 partners – including research communities, national data and high performance computing (HPC) centres, technology providers, and funding agencies – from 13 countries. EUDAT aims to build a sustainable cross-disciplinary and cross-national data infrastructure that provides a set of shared services for accessing and preserving research data.


Sign in / Sign up

Export Citation Format

Share Document