Watchdog 2.0: New developments for reusability, reproducibility, and workflow execution

Abstract Background Advances in high-throughput methods have brought new challenges for biological data analysis, often requiring many interdependent steps applied to a large number of samples. To address this challenge, workflow management systems, such as Watchdog, have been developed to support scientists in the (semi-)automated execution of large analysis workflows. Implementation Here, we present Watchdog 2.0, which implements new developments for module creation, reusability, and documentation and for reproducibility of analyses and workflow execution. Developments include a graphical user interface for semi-automatic module creation from software help pages, sharing repositories for modules and workflows, and a standardized module documentation format. The latter allows generation of a customized reference book of public and user-specific modules. Furthermore, extensive logging of workflow execution, module and software versions, and explicit support for package managers and container virtualization now ensures reproducibility of results. A step-by-step analysis protocol generated from the log file may, e.g., serve as a draft of a manuscript methods section. Finally, 2 new execution modes were implemented. One allows resuming workflow execution after interruption or modification without rerunning successfully executed tasks not affected by changes. The second one allows detaching and reattaching to workflow execution on a local computer while tasks continue running on computer clusters. Conclusions Watchdog 2.0 provides several new developments that we believe to be of benefit for large-scale bioinformatics analysis and that are not completely covered by other competing workflow management systems. The software itself, module and workflow repositories, and comprehensive documentation are freely available at https://www.bio.ifi.lmu.de/watchdog.

Download Full-text

A Fully Distributed Architecture for Large Scale Workflow Enactment

International Journal of Cooperative Information Systems ◽

10.1142/s0218843003000802 ◽

2003 ◽

Vol 12 (04) ◽

pp. 411-440 ◽

Cited By ~ 7

Author(s):

Roberto Silveira Silva Filho ◽

Jacques Wainer ◽

Edmundo R. M. Madeira

Keyword(s):

Large Scale ◽

Single Point ◽

Workflow Management ◽

Management Systems ◽

Centralized Control ◽

Dynamic Allocation ◽

Workflow Management Systems ◽

Client Server ◽

Workflow Execution ◽

Distributed Components

Standard client-server workflow management systems are usually designed as client-server systems. The central server is responsible for the coordination of the workflow execution and, in some cases, may manage the activities database. This centralized control architecture may represent a single point of failure, which compromises the availability of the system. We propose a fully distributed and configurable architecture for workflow management systems. It is based on the idea that the activities of a case (an instance of the process) migrate from host to host, executing the workflow tasks, following a process plan. This core architecture is improved with the addition of other distributed components so that other requirements for Workflow Management Systems, besides scalability, are also addressed. The components of the architecture were tested in different distributed and centralized configurations. The ability to configure the location of components and the use of dynamic allocation of tasks were effective for the implementation of load balancing policies.

Download Full-text

Research and Design on P2P Business Process Execution Framework for Workflow Management Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.989-994.4771 ◽

2014 ◽

Vol 989-994 ◽

pp. 4771-4774

Author(s):

Tao Wu

Keyword(s):

Business Process ◽

Large Scale ◽

Workflow Management ◽

Workflow Engine ◽

Management Systems ◽

Workflow Management Systems ◽

Workflow Execution ◽

Distributed Framework ◽

Process Execution ◽

Business Process Execution

Efficient business workflow management in large-scale areas is in great demand. However, current business workflow management systems are short of distributed workflow execution support. In our paper, we design and implement a distributed framework called PeerODE for Apache ODE (Orchestration Director Engine) [1], an open-sourced business workflow engine. PeerODE presents a scalable approach to P2P business process execution. The scheduling experiment on PeerODE shows that the framework handles the distributed business process execution effectively.

Download Full-text

A FLEXIBLE FAILURE-RECOVERY MODEL FOR WORKFLOW MANAGEMENT SYSTEMS

International Journal of Cooperative Information Systems ◽

10.1142/s0218843005000992 ◽

2005 ◽

Vol 14 (01) ◽

pp. 1-24 ◽

Cited By ~ 5

Author(s):

GWAN-HWAN HWANG ◽

YUNG-CHUAN LEE ◽

BOR-YIH WU

Keyword(s):

Flow Analysis ◽

Workflow Management ◽

Object Oriented ◽

Failure Recovery ◽

Management Systems ◽

Recovery Model ◽

Workflow Management Systems ◽

Data Flow Analysis ◽

Workflow Execution ◽

Failure Handling

In this paper, we propose a new failure-recovery model for workflow management systems (WfMSs). This model is supported with a new language, called the workflow failure-handling (WfFH) language, which allows the workflow designer to write programs so that he can use data-flow analysis technology to guide the failure recovery in workflow execution. With the WfFH language, the computation of the end compensation point and the compensation set for failure recovery can proceed during the workflow process run-time according to the execution results and status of workflow activities. Also, the failure-recovery definitions programmed with the WfFH language can be independent, thereby dramatically reducing the maintenance overhead of workflow processes. A prototype is built in a Java-based object-oriented workflow management system, called JOO-WfMS. We also report our experiences in constructing this prototype.

Download Full-text

A distributed execution environment for large-scale workflow management systems with subnets and server migration

Proceedings of CoopIS 97: 2nd IFCIS Conference on Cooperative Information Systems ◽

10.1109/coopis.1997.613807 ◽

2002 ◽

Cited By ~ 14

Author(s):

T. Bauer ◽

P. Dadam

Keyword(s):

Large Scale ◽

Workflow Management ◽

Management Systems ◽

Workflow Management Systems ◽

Distributed Execution ◽

Execution Environment

Download Full-text

Network integration of data and analysis of oncology interest

Journal of Integrative Bioinformatics ◽

10.1515/jib-2006-21 ◽

2006 ◽

Vol 3 (1) ◽

pp. 45-55

Author(s):

P. Romano ◽

G. Bertolini ◽

F. De Paoli ◽

M. Fattore ◽

D. Marra ◽

...

Keyword(s):

Web Services ◽

Large Scale ◽

New Technologies ◽

Workflow Management ◽

Genome Project ◽

Management Systems ◽

Workflow Management Systems ◽

The Creation ◽

The Human Genome Project ◽

Integrate Data

Summary The Human Genome Project has deeply transformed biology and the field has since then expanded to the management, processing, analysis and visualization of large quantities of data from genomics, proteomics, medicinal chemistry and drug screening. This huge amount of data and the heterogeneity of software tools that are used implies the adoption on a very large scale of new, flexible tools that can enable researchers to integrate data and analysis on the network. ICT technology standards and tools, like Web Services and related languages, and workflow management systems, can support the creation and deployment of such systems. While a number of Web Services are appearing and personal workflow management systems are also being more and more offered to researchers, a reference portal enabling the vast majority of unskilled researchers to take profit from these new technologies is still lacking. In this paper, we introduce the rationale for the creation of such a portal and present the architecture and some preliminary results for the development of a portal for the enactment of workflows of interest in oncology.

Download Full-text

Management of Global Large-Scale Projects through a Federation of Multiple Web-Based Workflow Management Systems

Project Management Journal ◽

10.1177/875697280303400306 ◽

2003 ◽

Vol 34 (3) ◽

pp. 40-47 ◽

Cited By ~ 9

Author(s):

Yuosre F. Badir ◽

Rémi Founou ◽

Claude Stricker ◽

Vincent Bourquin

Keyword(s):

Large Scale ◽

Workflow Management ◽

Management Systems ◽

Workflow Management Systems ◽

Web Based

Download Full-text

The integration of product data and workflow management systems in a large scale engineering database application

Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156) ◽

10.1109/ideas.1998.694389 ◽

2002 ◽

Cited By ~ 6

Author(s):

R. McClatchey ◽

Z. Kovacs ◽

F. Estrella ◽

J.-M. Le Goff ◽

G. Chevenier ◽

...

Keyword(s):

Large Scale ◽

Workflow Management ◽

Management Systems ◽

Workflow Management Systems ◽

Product Data ◽

Database Application ◽

Engineering Database

Download Full-text

A Component-based Product Line for Workflow Management Systems

CLEI electronic journal ◽

10.19153/cleiej.7.2.5 ◽

2018 ◽

Vol 7 (2) ◽

Author(s):

Itana Maria De Souza Gimenes ◽

Fabrício Ricardo Lazilha ◽

Edson Alves De Oliveira Junior ◽

Leonor Barroca

Keyword(s):

Workflow Management ◽

Product Line ◽

Management Systems ◽

Workflow Management Systems ◽

Simulation Tools ◽

Workflow Execution ◽

Product Line Architecture ◽

Workflow Management Coalition

This paper presents a component-based product line for workflow management systems. The process followed to design the product line was based on the Catalysis method. Extensions were made to represent variability across the process. The domain of workflow management systems has been shown to be appropriate to the application of the product line approach as there are a standard architecture and models established by a regulatory board, the Workflow Management Coalition. In addition, there is a demand for similar workflow management systems but with some different features. The product line architecture was evaluated with Rapide simulation tools. The evaluation was based on selected scenarios, thus, avoiding implementation issues. The strategy that has been used to populate the architecture and experiment with the product line is shown. In particular, the design of the workflow execution manager component is described.

Download Full-text

Integrating Open Grid Services Architecture Data Access and Integration with computational Grid workflows

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2009.0040 ◽

2009 ◽

Vol 367 (1897) ◽

pp. 2521-2532 ◽

Cited By ~ 1

Author(s):

Tamas Kukla ◽

Tamas Kiss ◽

Peter Kacsuk ◽

Gabor Terstyanszky

Keyword(s):

Workflow Management ◽

Data Access ◽

Computational Grid ◽

Management Systems ◽

Grid Services ◽

Integration Technique ◽

Workflow Management Systems ◽

Workflow Execution ◽

Reference Implementation ◽

Open Grid Services Architecture

Although many scientific applications rely on data stored in databases, most workflow management systems are not capable of establishing database connections during workflow execution. For this reason, e-Scientists have to use different tools before workflow submission to access their datasets and gather the required data on which they want to carry out computational experiments. Open Grid Services Architecture Data Access and Integration (OGSA-DAI) is a good candidate to use as middleware providing access to several structured and semi-structured database products through Web/Grid services. The integration technique and its reference implementation described in this paper enable e-Scientists to reach databases via OGSA-DAI within their scientific workflows at run-time and give a general solution that can be adopted by any workflow management system.

Download Full-text

Bioinformatics pipeline using JUDI: Just Do It!

Bioinformatics ◽

10.1093/bioinformatics/btz956 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2572-2574

Author(s):

Soumitra Pal ◽

Teresa M Przytycka

Keyword(s):

Large Scale ◽

Workflow Management ◽

Supplementary Information ◽

Management Systems ◽

Modular Approach ◽

Workflow Management Systems ◽

Bioinformatics Pipeline ◽

Large Scale Data ◽

Optimal Execution ◽

Scale Data

Abstract Summary Large-scale data analysis in bioinformatics requires pipelined execution of multiple software. Generally each stage in a pipeline takes considerable computing resources and several workflow management systems (WMS), e.g. Snakemake, Nextflow, Common Workflow Language, Galaxy, etc. have been developed to ensure optimum execution of the stages across two invocations of the pipeline. However, when the pipeline needs to be executed with different settings of parameters, e.g. thresholds, underlying algorithms, etc. these WMS require significant scripting to ensure an optimal execution. We developed JUDI on top of DoIt, a Python based WMS, to systematically handle parameter settings based on the principles of database management systems. Using a novel modular approach that encapsulates a parameter database in each task and file associated with a pipeline stage, JUDI simplifies plug-and-play of the pipeline stages. For a typical pipeline with n parameters, JUDI reduces the number of lines of scripting required by a factor of O(n). With properly designed parameter databases, JUDI not only enables reproducing research under published values of parameters but also facilitates exploring newer results under novel parameter settings. Availability and implementation https://github.com/ncbi/JUDI Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text