scholarly journals Workflow Systems for Science: Concepts and Tools

2013 ◽  
Vol 2013 ◽  
pp. 1-15 ◽  
Author(s):  
Domenico Talia

The wide availability of high-performance computing systems, Grids and Clouds, allowed scientists and engineers to implement more and more complex applications to access and process large data repositories and run scientific experiments in silico on distributed computing platforms. Most of these applications are designed as workflows that include data analysis, scientific computation methods, and complex simulation techniques. Scientific applications require tools and high-level mechanisms for designing and executing complex workflows. For this reason, in the past years, many efforts have been devoted towards the development of distributed workflow management systems for scientific applications. This paper discusses basic concepts of scientific workflows and presents workflow system tools and frameworks used today for the implementation of application in science and engineering on high-performance computers and distributed systems. In particular, the paper reports on a selection of workflow systems largely used for solving scientific problems and discusses some open issues and research challenges in the area.

Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

The latest advances in network and distributedsystem technologies now allow integration of a vast variety of services with almost unlimited processing power, using large amounts of data. Sharing of resources is often viewed as the key goal for distributed systems, and in this context the sharing of stored data appears as the most important aspect of distributed resource sharing. Scientific applications are the first to take advantage of such environments as the requirements of current and future high performance computing experiments are pressing, in terms of even higher volumes of issued data to be stored and managed. While these new environments reveal huge opportunities for large-scale distributed data storage and management, they also raise important technical challenges, which need to be addressed. The ability to support persistent storage of data on behalf of users, the consistent distribution of up-to-date data, the reliable replication of fast changing datasets or the efficient management of large data transfers are just some of these new challenges. In this chapter we discuss how the existing distributed computing infrastructure is adequate for supporting the required data storage and management functionalities. We highlight the issues raised from storing data over large distributed environments and discuss the recent research efforts dealing with challenges of data retrieval, replication and fast data transfers. Interaction of data management with other data sensitive, emerging technologies as the workflow management is also addressed.


Author(s):  
Gina Brander ◽  
Colleen Pawliuk

Program  objective:  To  advance  the  methodology  and  improve  the  data  management  of  the  scoping  review through  the  integration  of  two  health  librarians  onto  the  clinical  research  team.  Participants  and  setting:  Two  librarians were  embedded  on  a  multidisciplinary,  geographically  dispersed  pediatric  palliative  and  end-of-life  research  team  conducting a  scoping  review  headquartered  at  the  British  Columbia  Children’s  Hospital  Research  Institute.  Program:  The  team’s embedded  librarians  guided  and  facilitated  all  stages  of  a  scoping  review  of  180  Q3  conditions  and  10  symptoms.  Outcomes: The  scoping  review  was  enhanced  in  quality  and  efficiency  through  the  integration  of  librarians  onto  the  team.  Conclusions: Health  librarians  embedded  on  clinical  research  teams  can  help  guide  and  facilitate  the  scoping  review  process  to  improve workflow  management  and  overall  methodology.  Librarians  are  particularly  well  equipped  to  solve  challenges  arising  from large  data  sets,  broad  research  questions  with  a  high  level  of  specificity,  and  geographically  dispersed  team  members. Knowledge  of  emerging  and  established  citation-screening  and  bibliographic  software  and  review  tools  can  help  librarians  to address  these  challenges  and  provide  efficient  workflow  management. 


2011 ◽  
Vol 1 (2) ◽  
pp. 17-38 ◽  
Author(s):  
Madjid Tavana ◽  
Timothy E. Busch ◽  
Eleanor L. Davis

Military operations are highly complex workflow systems that require careful planning and execution. The interactive complexity and tight coupling between people and technological systems has been increasing in military operations, which leads to both improved efficiency and a greater vulnerability to mission accomplishment due to attack or system failure. Although the ability to resist and recover from failure is important to many systems and processes, the robustness and resiliency of workflow management systems has received little attention in literature. The authors propose a novel workflow modeling framework using high-level Petri nets (PNs). The proposed framework is capable of both modeling structure and providing a wide range of qualitative and quantitative analysis. The concepts of self-protecting and self-healing systems are captured by the robustness and resiliency measures proposed in this study. The proposed measures are plotted in a Cartesian coordinate system; a classification scheme with four quadrants (i.e., possession, preservation, restoration, and devastation) is proposed to show the state of the system in terms of robustness and resiliency. The authors introduce an overall sustainability index for the system based on the theory of displaced ideals. The application of the methodology in the evaluation of an air tasking order generation system at the United States Air Force is demonstrated.


2012 ◽  
Vol 157-158 ◽  
pp. 839-842 ◽  
Author(s):  
Ya Li ◽  
Hai Rui Wang ◽  
Xiong Tong ◽  
Li Zhang

The paper addresses the problem of flexible Workflow Management Systems (WFMS) in distributed environment. Concerning the serious deficiency of flexibility in the current workflow systems, we describe how our workflow system meets the requirements of interoperability, scalability, flexibility, dependability and adaptability. With an additional route engine, the execution path will be adjusted dynamically according to the execution conditions so as to improve the flexibility and dependability of the system. A dynamic register mechanism of domain engines is introduced to improve the scalability and adaptability of the system. The system is general purpose and open: it has been designed and implemented as a set of CORBA services. The system serves as an example of the use of middleware technologies to provide a fault-tolerant execution environment for long running distributed applications. The system also provides a mechanism for communication of distributed components in order to support inter-organizational WFMS.


2007 ◽  
Vol 16 (02) ◽  
pp. 155-175 ◽  
Author(s):  
H. A. REIJERS ◽  
S. POELMANS

The image of workflow systems as being context-insensitive technology, hindering rather than supporting people in performing their work may still exist at present. This impression is also raised in the well-known and often cited case study within Establishment Printers. Using this case as a starting point, this paper presents an analysis of more recent workflow implementations to support the view that modern workflow systems are widely applied in the services industry and are considered useful by performers to support their way of working. In cases where the introduction of workflow technology initially disrupted the flow of work, a wide range of configuration options was available to mend such situations. A detailed analysis of a workflow implementation in a Belgian financial organization clearly shows that re-configuration decisions, like a finer step granularity, can transform a pre-structured production-type workflow system into a flexible application allowing and supporting a smooth flow of work.


Author(s):  
Hartwig Anzt ◽  
Erik Boman ◽  
Rob Falgout ◽  
Pieter Ghysels ◽  
Michael Heroux ◽  
...  

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.


Symmetry ◽  
2020 ◽  
Vol 12 (6) ◽  
pp. 1029
Author(s):  
Anabi Hilary Kelechi ◽  
Mohammed H. Alsharif ◽  
Okpe Jonah Bameyi ◽  
Paul Joan Ezra ◽  
Iorshase Kator Joseph ◽  
...  

Power-consuming entities such as high performance computing (HPC) sites and large data centers are growing with the advance in information technology. In business, HPC is used to enhance the product delivery time, reduce the production cost, and decrease the time it takes to develop a new product. Today’s high level of computing power from supercomputers comes at the expense of consuming large amounts of electric power. It is necessary to consider reducing the energy required by the computing systems and the resources needed to operate these computing systems to minimize the energy utilized by HPC entities. The database could improve system energy efficiency by sampling all the components’ power consumption at regular intervals and the information contained in a database. The information stored in the database will serve as input data for energy-efficiency optimization. More so, device workload information and different usage metrics are stored in the database. There has been strong momentum in the area of artificial intelligence (AI) as a tool for optimizing and processing automation by leveraging on already existing information. This paper discusses ideas for improving energy efficiency for HPC using AI.


2021 ◽  
Vol 14 (6) ◽  
pp. 1019-1032
Author(s):  
Yuanyuan Sun ◽  
Sheng Wang ◽  
Huorong Li ◽  
Feifei Li

Data confidentiality is one of the biggest concerns that hinders enterprise customers from moving their workloads to the cloud. Thanks to the trusted execution environment (TEE), it is now feasible to build encrypted databases in the enclave that can process customers' data while keeping it confidential to the cloud. Though some enclave-based encrypted databases emerge recently, there remains a large unexplored area in between about how confidentiality can be achieved in different ways and what influences are implied by them. In this paper, we first provide a broad exploration of possible design choices in building encrypted database storage engines, rendering trade-offs in security, performance and functionality. We observe that choices on different dimensions can be independent and their combination determines the overall trade-off of the entire storage. We then propose Enclage , an encrypted storage engine that makes practical trade-offs. It adopts many enclave-native designs, such as page-level encryption, reduced enclave interaction, and hierarchical memory buffer, which offer high-level security guarantee and high performance at the same time. To make better use of the limited enclave memory, we derive the optimal page size in enclave and adopt delta decryption to access large data pages with low cost. Our experiments show that Enclage outperforms the baseline, a common storage design in many encrypted databases, by over 13x in throughput and about 5x in storage savings.


2019 ◽  
Vol 12 (7) ◽  
pp. 3001-3015 ◽  
Author(s):  
Shahbaz Memon ◽  
Dorothée Vallot ◽  
Thomas Zwinger ◽  
Jan Åström ◽  
Helmut Neukirchen ◽  
...  

Abstract. Scientific computing applications involving complex simulations and data-intensive processing are often composed of multiple tasks forming a workflow of computing jobs. Scientific communities running such applications on computing resources often find it cumbersome to manage and monitor the execution of these tasks and their associated data. These workflow implementations usually add overhead by introducing unnecessary input/output (I/O) for coupling the models and can lead to sub-optimal CPU utilization. Furthermore, running these workflow implementations in different environments requires significant adaptation efforts, which can hinder the reproducibility of the underlying science. High-level scientific workflow management systems (WMS) can be used to automate and simplify complex task structures by providing tooling for the composition and execution of workflows – even across distributed and heterogeneous computing environments. The WMS approach allows users to focus on the underlying high-level workflow and avoid low-level pitfalls that would lead to non-optimal resource usage while still allowing the workflow to remain portable between different computing environments. As a case study, we apply the UNICORE workflow management system to enable the coupling of a glacier flow model and calving model which contain many tasks and dependencies, ranging from pre-processing and data management to repetitive executions in heterogeneous high-performance computing (HPC) resource environments. Using the UNICORE workflow management system, the composition, management, and execution of the glacier modelling workflow becomes easier with respect to usage, monitoring, maintenance, reusability, portability, and reproducibility in different environments and by different user groups. Last but not least, the workflow helps to speed the runs up by reducing model coupling I/O overhead and it optimizes CPU utilization by avoiding idle CPU cores and running the models in a distributed way on the HPC cluster that best fits the characteristics of each model.


Author(s):  
Madjid Tavana ◽  
Timothy E. Busch ◽  
Eleanor L. Davis

Military operations are highly complex workflow systems that require careful planning and execution. The interactive complexity and tight coupling between people and technological systems has been increasing in military operations, which leads to both improved efficiency and a greater vulnerability to mission accomplishment due to attack or system failure. Although the ability to resist and recover from failure is important to many systems and processes, the robustness and resiliency of workflow management systems has received little attention in literature. The authors propose a novel workflow modeling framework using high-level Petri nets (PNs). The proposed framework is capable of both modeling structure and providing a wide range of qualitative and quantitative analysis. The concepts of self-protecting and self-healing systems are captured by the robustness and resiliency measures proposed in this study. The proposed measures are plotted in a Cartesian coordinate system; a classification scheme with four quadrants (i.e., possession, preservation, restoration, and devastation) is proposed to show the state of the system in terms of robustness and resiliency. The authors introduce an overall sustainability index for the system based on the theory of displaced ideals. The application of the methodology in the evaluation of an air tasking order generation system at the United States Air Force is demonstrated.


Sign in / Sign up

Export Citation Format

Share Document