Towards an Internet of Science

AbstractBig data and complex analysis workflows (pipelines) are common issues in data driven science such as bioinformatics. Large amounts of computational tools are available for data analysis. Additionally, many workflow management systems to piece together such tools into data analysis pipelines have been developed. For example, more than 50 computational tools for read mapping are available representing a large amount of duplicated effort. Furthermore, it is unclear whether these tools are correct and only a few have a user base large enough to have encountered and reported most of the potential problems. Bringing together many largely untested tools in a computational pipeline must lead to unpredictable results. Yet, this is the current state. While presently data analysis is performed on personal computers/workstations/clusters, the future will see development and analysis shift to the cloud. None of the workflow management systems is ready for this transition. This presents the opportunity to build a new system, which will overcome current duplications of effort, introduce proper testing, allow for development and analysis in public and private clouds, and include reporting features leading to interactive documents.

Download Full-text

Design considerations for workflow management systems use in production genomics research and the clinic

10.1101/2021.04.03.437906 ◽

2021 ◽

Author(s):

Azza E Ahmed ◽

Joshua Allen ◽

Tajesvi Bhat ◽

Prakruthi Burra ◽

Christina E Fliege ◽

...

Keyword(s):

Complex Analysis ◽

Workflow Management ◽

Variant Calling ◽

Management Systems ◽

Systematic Evaluation ◽

Workflow Management Systems ◽

Healthcare Settings ◽

Genomics Research ◽

Bioinformatics Application ◽

Big Data Technologies

Background: The changing landscape of genomics research and clinical practice has created a need for computational pipelines capable of efficiently orchestrating complex analysis stages while handling large volumes of data across heterogeneous computational environments. Workflow Management Systems (WfMSs) are the software components employed to fill this gap. Results: This work provides an approach and systematic evaluation of key features of popular bioinformatics WfMSs in use today: Nextflow, CWL, and WDL and some of their executors, along with Swift/T, a workflow manager commonly used in high-scale physics applications. We employed two use cases: a variant-calling genomic pipeline and a scalability-testing framework, where both were run locally, on an HPC cluster, and in the cloud. This allowed for evaluation of those four WfMSs in terms of language expressiveness, modularity, scalability, robustness, reproducibility, interoperability, ease of development, along with adoption and usage in research labs and healthcare settings. This article is trying to answer, "which WfMS should be chosen for a given bioinformatics application regardless of analysis type?". Conclusions: The choice of a given WfMS is a function of both its intrinsic language and engine features. Within bioinformatics, where analysts are a mix of dry and wet lab scientists, the choice is also governed by collaborations and adoption within large consortia and technical support provided by the WfMS team/community. As the community and its needs continue to evolve along with computational infrastructure, WfMSs will also evolve, especially those with permissive licenses that allow commercial use. In much the same way as the dataflow paradigm and containerization are now well understood to be very useful in bioinformatics applications, we will continue to see innovations of tools and utilities for other purposes, like big data technologies, interoperability, and provenance.

Download Full-text

Design considerations for workflow management systems use in production genomics research and the clinic

Scientific Reports ◽

10.1038/s41598-021-99288-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Azza E. Ahmed ◽

Joshua M. Allen ◽

Tajesvi Bhat ◽

Prakruthi Burra ◽

Christina E. Fliege ◽

...

Keyword(s):

Complex Analysis ◽

Workflow Management ◽

Variant Calling ◽

Management Systems ◽

Systematic Evaluation ◽

Workflow Management Systems ◽

Healthcare Settings ◽

Genomics Research ◽

Bioinformatics Application ◽

Big Data Technologies

AbstractThe changing landscape of genomics research and clinical practice has created a need for computational pipelines capable of efficiently orchestrating complex analysis stages while handling large volumes of data across heterogeneous computational environments. Workflow Management Systems (WfMSs) are the software components employed to fill this gap. This work provides an approach and systematic evaluation of key features of popular bioinformatics WfMSs in use today: Nextflow, CWL, and WDL and some of their executors, along with Swift/T, a workflow manager commonly used in high-scale physics applications. We employed two use cases: a variant-calling genomic pipeline and a scalability-testing framework, where both were run locally, on an HPC cluster, and in the cloud. This allowed for evaluation of those four WfMSs in terms of language expressiveness, modularity, scalability, robustness, reproducibility, interoperability, ease of development, along with adoption and usage in research labs and healthcare settings. This article is trying to answer, which WfMS should be chosen for a given bioinformatics application regardless of analysis type?. The choice of a given WfMS is a function of both its intrinsic language and engine features. Within bioinformatics, where analysts are a mix of dry and wet lab scientists, the choice is also governed by collaborations and adoption within large consortia and technical support provided by the WfMS team/community. As the community and its needs continue to evolve along with computational infrastructure, WfMSs will also evolve, especially those with permissive licenses that allow commercial use. In much the same way as the dataflow paradigm and containerization are now well understood to be very useful in bioinformatics applications, we will continue to see innovations of tools and utilities for other purposes, like big data technologies, interoperability, and provenance.

Download Full-text

Using rapid prototyping to choose a bioinformatics workflow management system

10.1101/2020.08.04.236208 ◽

2020 ◽

Author(s):

Michael J. Jackson ◽

Edward Wallace ◽

Kostas Kavoussanakis

Keyword(s):

Data Analysis ◽

Rapid Prototyping ◽

Management System ◽

Low Cost ◽

Workflow Management ◽

Workflow Management System ◽

Management Systems ◽

Workflow Management Systems ◽

The Right ◽

Selection Of

AbstractWorkflow management systems represent, manage, and execute multi-step computational analyses and offer many benefits to bioinformaticians. They provide a common language for describing analysis workflows, contributing to reproducibility and to building libraries of reusable components. They can support both incremental build and re-entrancy – the ability to selectively re-execute parts of a workflow in the presence of additional inputs or changes in configuration and to resume execution from where a workflow previously stopped. Many workflow management systems enhance portability by supporting the use of containers, high-performance computing systems and clouds. Most importantly, workflow management systems allow bioinformaticians to delegate how their workflows are run to the workflow management system and its developers. This frees the bioinformaticians to focus on the content of these workflows, their data analyses, and their science.RiboViz is a package to extract biological insight from ribosome profiling data to help advance understanding of protein synthesis. At the heart of RiboViz is an analysis workflow, implemented in a Python script. To conform to best practices for scientific computing which recommend the use of build tools to automate workflows and to re-use code instead of rewriting it, the authors reimplemented this workflow within a workflow management system. To select a workflow management system, a rapid survey of available systems was undertaken, and candidates were shortlisted: Snakemake, cwltool and Toil (implementations of the Common Workflow Language) and Nextflow. An evaluation of each candidate, via rapid prototyping of a subset of the RiboViz workflow, was performed and Nextflow was chosen. The selection process took 10 person-days, a small cost for the assurance that Nextflow best satisfied the authors’ requirements. This use of rapid prototyping can offer a low-cost way of making a more informed selection of software to use within projects, rather than relying solely upon reviews and recommendations by others.Author summaryData analysis involves many steps, as data are wrangled, processed, and analysed using a succession of unrelated software packages. Running all the right steps, in the right order, with the right outputs in the right places is a major source of frustration. Workflow management systems require that each data analysis step be “wrapped” in a structured way, describing its inputs, parameters, and outputs. By writing these wrappers the scientist can focus on the meaning of each step, which is the interesting part. The system uses these wrappers to decide what steps to run and how to run these, and takes charge of running the steps, including reporting on errors. This makes it much easier to repeatedly run the analysis and to run it transparently upon different computers. To select a workflow management system, we surveyed available tools and selected three for “rapid prototype” implementations to evaluate their suitability for our project. We advocate this rapid prototyping as a low-cost (both time and effort) way of making an informed selection of a system for use within a project. We conclude that many similar multi-step data analysis workflows can be rewritten in a workflow management system.

Download Full-text

The Case Handling Case

International Journal of Cooperative Information Systems ◽

10.1142/s0218843003000784 ◽

2003 ◽

Vol 12 (03) ◽

pp. 365-391 ◽

Cited By ~ 70

Author(s):

H. A. Reijers ◽

J. H. M. Rigter ◽

W. M. P. van der Aalst

Keyword(s):

Management System ◽

Workflow Management ◽

Workflow Management System ◽

Critical Assessment ◽

Data Driven ◽

Management Systems ◽

Dutch Government ◽

Workflow Management Systems ◽

Alternative Approach ◽

Data Driven Approach

On the Dutch workflow market, a new and interesting paradigm named "case handling" is emerging. The goal of case handling is to overcome the limitations of existing workflow management systems. By using a data-driven approach combined with implicit routing and carefully avoiding context tunneling, awareness and flexibility are improved. Currently, many organizations are considering case handling systems such as FLOWer (Pallas Athena) rather than the more traditional workflow management systems. This paper provides a critical assessment of this development. The goal is to show the pro's and con's of case handling. Moreover, based on this assessment, an alternative approach using slightly extended workflow management systems is proposed. This approach is being pursued by the Dutch government in a project involving the workflow management system Staffware. Based on our experiences thus far, we provide guidelines for selecting the proper technology.

Download Full-text

Landscape Analysis for the Specimen Data Refinery

Research Ideas and Outcomes ◽

10.3897/rio.6.e57602 ◽

2020 ◽

Vol 6 ◽

Cited By ~ 1

Author(s):

Stephanie Walton ◽

Laurence Livermore ◽

Olaf Bánki ◽

Robert Cubey ◽

Robyn Drinkwater ◽

...

Keyword(s):

Natural History ◽

Software Development ◽

State Of The Art ◽

Workflow Management ◽

Management Systems ◽

Workflow Management Systems ◽

Current State ◽

Development Teams ◽

Software Development Teams ◽

Automated Tools

This report reviews the current state-of-the-art applied approaches on automated tools, services and workflows for extracting information from images of natural history specimens and their labels. We consider the potential for repurposing existing tools, including workflow management systems; and areas where more development is required. This paper was written as part of the SYNTHESYS+ project for software development teams and informatics teams working on new software-based approaches to improve mass digitisation of natural history specimens.

Download Full-text

Automation of in-silico data analysis processes through workflow management systems

Briefings in Bioinformatics ◽

10.1093/bib/bbm056 ◽

2007 ◽

Vol 9 (1) ◽

pp. 57-68 ◽

Cited By ~ 39

Author(s):

P. Romano

Keyword(s):

Data Analysis ◽

In Silico ◽

Workflow Management ◽

Management Systems ◽

Workflow Management Systems

Download Full-text

PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows

The International Journal of High Performance Computing Applications ◽

10.1177/1094342015594515 ◽

2016 ◽

Vol 31 (1) ◽

pp. 4-18 ◽

Cited By ~ 14

Author(s):

Ewa Deelman ◽

Christopher Carothers ◽

Anirban Mandal ◽

Brian Tierney ◽

Jeffrey S Vetter ◽

...

Keyword(s):

Data Analysis ◽

Performance Optimization ◽

Scientific Discovery ◽

Research Question ◽

Workflow Management ◽

Scientific Workflows ◽

Management Systems ◽

Workflow Management Systems ◽

Extreme Scale ◽

Systems Simulation

Computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Thus, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation and data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.

Download Full-text

Multi-criteria task assignment in workflow management systems

36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the ◽

10.1109/hicss.2003.1174458 ◽

2003 ◽

Cited By ~ 8

Author(s):

Minxin Shen ◽

Gwo-Hshiung Tzeng ◽

Duen-Ren Liu

Keyword(s):

Workflow Management ◽

Task Assignment ◽

Management Systems ◽

Workflow Management Systems

Download Full-text

Specifying and Executing User Agents in an Environment of Reasoning and RESTful Systems Using the Guard-Stage-Milestone Approach

Journal on Data Semantics ◽

10.1007/s13740-021-00123-0 ◽

2021 ◽

Author(s):

Tobias Käfer ◽

Benjamin Jochum ◽

Nico Aßfalg ◽

Leonard Nürnberg

Keyword(s):

Linked Data ◽

Operational Semantics ◽

Workflow Management ◽

Management Systems ◽

Workflow Management Systems ◽

The Web

AbstractFor Read-Write Linked Data, an environment of reasoning and RESTful interaction, we investigate the use of the Guard-Stage-Milestone approach for specifying and executing user agents. We present an ontology to specify user agents. Moreover, we give operational semantics to the ontology in a rule language that allows for executing user agents on Read-Write Linked Data. We evaluate our approach formally and regarding performance. Our work shows that despite different assumptions of this environment in contrast to the traditional environment of workflow management systems, the Guard-Stage-Milestone approach can be transferred and successfully applied on the web of Read-Write Linked Data.

Download Full-text

Workflow management systems in radiology

10.1117/12.319772 ◽

1998 ◽

Cited By ~ 1

Author(s):

Thomas Wendler ◽

Kirsten Meetz ◽

Joachim Schmidt

Keyword(s):

Workflow Management ◽

Management Systems ◽

Workflow Management Systems

Download Full-text