Dynamic Data Management in Workflow Execution

Author(s):  
Ewa Deelman ◽  
Ann Chervenak

Scientific applications such as those in astronomy, earthquake science, gravitational-wave physics, and others have embraced workflow technologies to do large-scale science. Workflows enable researchers to collaboratively design, manage, and obtain results that involve hundreds of thousands of steps, access terabytes of data, and generate similar amounts of intermediate and final data products. Although workflow systems are able to facilitate the automated generation of data products, many issues still remain to be addressed. These issues exist in different forms in the workflow lifecycle. This chapter describes a workflow lifecycle as consisting of a workflow generation phase where the analysis is defined, the workflow planning phase where resources needed for execution are selected, the workflow execution part, where the actual computations take place, and the result, metadata, and provenance storing phase. The authors discuss the issues related to data management at each step of the workflow cycle. They describe challenge problems and illustrate them in the context of real-life applications. They discuss the challenges, possible solutions, and open issues faced when mapping and executing large-scale workflows on current cyberinfrastructure. They particularly emphasize the issues related to the management of data throughout the workflow lifecycle.


2017 ◽  
Vol 898 ◽  
pp. 062012 ◽  
Author(s):  
T Beermann ◽  
M Lassnig ◽  
M Barisits ◽  
C Serfon ◽  
V Garonne ◽  
...  

2019 ◽  
Vol 214 ◽  
pp. 07009
Author(s):  
Frank Berghaus ◽  
Tobias Wegner ◽  
Mario Lassnig ◽  
Marcus Ebert ◽  
Cedric Serfon ◽  
...  

Input data for applications that run in cloud computing centres can be stored at remote repositories, typically with multiple copies of the most popular data stored at many sites. Locating and retrieving the remote data can be challenging, and we believe that federating the storage can address this problem. In this approach, the closest copy of the data is used based on geographical or other information. Currently, we are using the dynamic data federation, Dynafed, a software solution developed by CERN IT. Dynafed supports several industry standard interfaces, such as Amazon S3, Microsoft Azure and HTTP with WebDAV extensions. Dynafed functions as an abstraction layer under which protocol-dependent authentication details are hidden from the user, requiring the user to only provide an X509 certificate. We have set up an instance of Dynafed and integrated it into the ATLAS distributed data management system, Rucio. We report on the challenges faced during the installation and integration.


2019 ◽  
Vol 1 (1) ◽  
pp. 35-44
Author(s):  
Ali Ahmadinia

Dynamic data management for multiprocessor systems in the absence of an operating system (OS) is a challenging area of research. OSs are typically used to abstract developers from the process of managing dynamic data at runtime. However, due to the many different types of multiprocessor available, an OS is not always available, making the management of dynamic data a difficult task. In this article, we present a hardware and software co-design methodology for the management of dynamic data in multiprocessor system on chips (MPSoC) development environments without an OS. We compare and contrast the method of sharing dynamic data between cores with standard methods and also to static data management methods and find that the proposed methodology can improve the performance of dynamic memory operations by up to 72.94% with negligible power and resource consumption.


Author(s):  
Friedhelm Meyer auf der Heide ◽  
Berthold Vöcking
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document