scholarly journals Executing cyclic scientific workflows in the cloud

Author(s):  
Michel Krämer ◽  
Hendrik M. Würz ◽  
Christian Altenhofen

AbstractWe present an algorithm and a software architecture for a cloud-based system that executes cyclic scientific workflows whose structure may change during run time. Existing approaches either rely on workflow definitions based on directed acyclic graphs (DAGs) or require workarounds to implement cyclic structures. In contrast, our system supports cycles natively, avoids workarounds, and as such reduces the complexity of workflow modelling and maintenance. Our algorithm traverses workflow graphs and transforms them iteratively into linear sequences of executable actions. We call these sequences process chains. Our software architecture distributes the process chains to multiple compute nodes in the cloud and oversees their execution. We evaluate our approach by applying it to two practical use cases from the domains of astronomy and engineering. We also compare it with two existing workflow management systems. The evaluation demonstrates that our algorithm is able to execute dynamically changing workflows with cycles and that design and maintenance of complex workflows is easier than with existing solutions. It also shows that our software architecture can run process chains on multiple compute nodes in parallel to significantly speed up the workflow execution. An implementation of our algorithm and the software architecture is available with the Steep Workflow Management System that we released under an open-source license. The resources for the first practical use case are also available as open source for reproduction.

2020 ◽  
Author(s):  
Maria Luiza Mondelli ◽  
Marcelo Monteiro Galheigo ◽  
Vivivan Medeiros ◽  
Bruno F. Bastos ◽  
Antônio Tadeu Azevedo Gomes ◽  
...  

Bioinformatics experiments are rapidly and constantly evolving due improvements in sequencing technologies. These experiments usually demand high performance computation and produce huge quantities of data. They also require different programs to be executed in a certain order, allowing the experiments to be modeled as workflows. However, users do not always have the infrastructure needed to perform these experiments. Our contribution is the integration of scientific workflow management systems and grid-enabled scientific gateways, providing the user with a transparent way to run these workflows in geographically distributed computing resources. The availability of the workflow through the gateway allows for a better usability of these experiments.


2020 ◽  
Author(s):  
Wellington Olveira ◽  
Paolo Missier ◽  
Daniel De Olveira ◽  
Vanessa Braganholo

Scientific workflows rely on provenance to be understandable, reproducible and trustworthy. Nowadays, there is a growing demand for interoperability between provenance data generated from heterogeneous workflow management systems. To address this issue, some provenance models have been proposed by extending PROV to support specific requirements of scientific workflows. In this paper, we present two prominent provenance models for scientific workflows, PROV-Wf and ProvOne, which are specializations of PROV, and compare their elements and relationships. Our goal is to provide an overview of each one and to support the choice for the most suitable for a specific context.


2003 ◽  
Vol 12 (04) ◽  
pp. 411-440 ◽  
Author(s):  
Roberto Silveira Silva Filho ◽  
Jacques Wainer ◽  
Edmundo R. M. Madeira

Standard client-server workflow management systems are usually designed as client-server systems. The central server is responsible for the coordination of the workflow execution and, in some cases, may manage the activities database. This centralized control architecture may represent a single point of failure, which compromises the availability of the system. We propose a fully distributed and configurable architecture for workflow management systems. It is based on the idea that the activities of a case (an instance of the process) migrate from host to host, executing the workflow tasks, following a process plan. This core architecture is improved with the addition of other distributed components so that other requirements for Workflow Management Systems, besides scalability, are also addressed. The components of the architecture were tested in different distributed and centralized configurations. The ability to configure the location of components and the use of dynamic allocation of tasks were effective for the implementation of load balancing policies.


2005 ◽  
Vol 14 (01) ◽  
pp. 1-24 ◽  
Author(s):  
GWAN-HWAN HWANG ◽  
YUNG-CHUAN LEE ◽  
BOR-YIH WU

In this paper, we propose a new failure-recovery model for workflow management systems (WfMSs). This model is supported with a new language, called the workflow failure-handling (WfFH) language, which allows the workflow designer to write programs so that he can use data-flow analysis technology to guide the failure recovery in workflow execution. With the WfFH language, the computation of the end compensation point and the compensation set for failure recovery can proceed during the workflow process run-time according to the execution results and status of workflow activities. Also, the failure-recovery definitions programmed with the WfFH language can be independent, thereby dramatically reducing the maintenance overhead of workflow processes. A prototype is built in a Java-based object-oriented workflow management system, called JOO-WfMS. We also report our experiences in constructing this prototype.


2021 ◽  
Vol 251 ◽  
pp. 03019
Author(s):  
Francis Pham ◽  
David Dossett ◽  
Martin Sevior

In 2019 a Python plugin package, b2cal, based on the Apache Airflow workflow management platform was developed to automate the calibration at Belle II. It uses Directed Acyclic Graphs to describe the ordering of processes and Flask to provide administration and job submission web pages. This system was hosted in Melbourne, Australia and submitted calibration jobs to the High Energy Accelerator Research Organization in Japan. In 2020 the b2cal was dockerised and deployed at the Deutsches Elektronen-Synchrotron Laboratory in Germany. Improvements have been implemented that allow jobs to be submitted to multiple calibration centers and for the amount of required human interactions to be greatly reduced. All job submissions and validation of calibration constants now occur as soon as possible. In this paper, we describe the upgrades to the automated calibration at Belle II.


GigaScience ◽  
2020 ◽  
Vol 9 (6) ◽  
Author(s):  
Michael Kluge ◽  
Marie-Sophie Friedl ◽  
Amrei L Menzel ◽  
Caroline C Friedel

Abstract Background Advances in high-throughput methods have brought new challenges for biological data analysis, often requiring many interdependent steps applied to a large number of samples. To address this challenge, workflow management systems, such as Watchdog, have been developed to support scientists in the (semi-)automated execution of large analysis workflows. Implementation Here, we present Watchdog 2.0, which implements new developments for module creation, reusability, and documentation and for reproducibility of analyses and workflow execution. Developments include a graphical user interface for semi-automatic module creation from software help pages, sharing repositories for modules and workflows, and a standardized module documentation format. The latter allows generation of a customized reference book of public and user-specific modules. Furthermore, extensive logging of workflow execution, module and software versions, and explicit support for package managers and container virtualization now ensures reproducibility of results. A step-by-step analysis protocol generated from the log file may, e.g., serve as a draft of a manuscript methods section. Finally, 2 new execution modes were implemented. One allows resuming workflow execution after interruption or modification without rerunning successfully executed tasks not affected by changes. The second one allows detaching and reattaching to workflow execution on a local computer while tasks continue running on computer clusters. Conclusions Watchdog 2.0 provides several new developments that we believe to be of benefit for large-scale bioinformatics analysis and that are not completely covered by other competing workflow management systems. The software itself, module and workflow repositories, and comprehensive documentation are freely available at https://www.bio.ifi.lmu.de/watchdog.


2018 ◽  
Vol 7 (2) ◽  
Author(s):  
Itana Maria De Souza Gimenes ◽  
Fabrício Ricardo Lazilha ◽  
Edson Alves De Oliveira Junior ◽  
Leonor Barroca

This paper presents a component-based product line for workflow management systems. The process followed to design the product line was based on the Catalysis method. Extensions were made to represent variability across the process. The domain of workflow management systems has been shown to be appropriate to the application of the product line approach as there are a standard architecture and models established by a regulatory board, the Workflow Management Coalition. In addition, there is a demand for similar workflow management systems but with some different features. The product line architecture was evaluated with Rapide simulation tools. The evaluation was based on selected scenarios, thus, avoiding implementation issues. The strategy that has been used to populate the architecture and experiment with the product line is shown. In particular, the design of the workflow execution manager component is described.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Monsef Boughrous ◽  
Hanan El Bakkali

Workflow management systems are very important for any organization to manage and model complex business processes. However, significant work is needed to keep a workflow resilient and secure. Therefore, organizations apply a strict security policy and enforce access control constraints. As a result, the number of available and authorized users for the workflow execution decreases drastically. Thus, in many cases, such a situation leads to a workflow deadlock situation, where there no available authorized user-task assignments for critical tasks to accomplish the workflow execution. In the literature, this problem has gained interest of security researchers in the recent years, and is known as the workflow satisfiability problem (WSP). In this paper, we propose a new approach to bypass the WSP and to ensure workflow resiliency and security. For this purpose, we define workflow criticality, which can be used as a metric during run-time to prevent WSP. We believe that the workflow criticality value will help workflow managers to make decisions and start a mitigation solution in case of a critical workflow. Moreover, we propose a delegation process algorithm (DP) as a mitigation solution that uses workflow instance criticality, delegation, and priority concepts to find authorized and suitable users to perform the critical task with low-security risks.


2020 ◽  
Vol 245 ◽  
pp. 02016
Author(s):  
David Dossett ◽  
Martin Sevior

The Belle II detector began collecting data from e+e− collisions at the SuperKEKB electron-positron collider in March 2019. Belle II aims to collect a data sample 50 times larger than the previous generation of B-factories. For Belle II analyses to be competitive it is crucial that calibration payloads for this data are calculated promptly prior to data reconstruction. To accomplish this goal a Python plugin package has been developed based on the open-source Apache Airflow package; using Directed Acyclic Graphs (DAGs) to describe the ordering of processes and Flask to provide administration and job submission web pages. DAGs for calibration process submission, monitoring of incoming data files, and validation of calibration payloads have all been created to help automate the calibration procedure. Flask plugin classes have been developed to extend the built-in Airflow administration and monitoring web pages. Authentication was included through the use of the pre-existing X.509 grid certificates of Belle II users.


Sign in / Sign up

Export Citation Format

Share Document