Distributed HPC Resources Orchestration for Supporting Large-Scale Workflow Execution

2012 ◽

pp. 177-187

Author(s):

Ewa Deelman ◽

Ann Chervenak

Keyword(s):

Data Management ◽

Gravitational Wave ◽

Large Scale ◽

Real Life ◽

Scientific Workflows ◽

Workflow Systems ◽

Automated Generation ◽

Workflow Execution ◽

Data Products ◽

Workflow Planning

Scientific applications such as those in astronomy, earthquake science, gravitational-wave physics, and others have embraced workflow technologies to do large-scale science. Workflows enable researchers to collaboratively design, manage, and obtain results that involve hundreds of thousands of steps, access terabytes of data, and generate similar amounts of intermediate and final data products. Although workflow systems are able to facilitate the automated generation of data products, many issues still remain to be addressed. These issues exist in different forms in the workflow lifecycle. This chapter describes a workflow lifecycle as consisting of a workflow generation phase where the analysis is defined, the workflow planning phase where resources needed for execution are selected, the workflow execution part, where the actual computations take place, and the result, metadata, and provenance storing phase. The authors discuss the issues related to data management at each step of the workflow cycle. They describe challenge problems and illustrate them in the context of real-life applications. They discuss the challenges, possible solutions, and open issues faced when mapping and executing large-scale workflows on current cyberinfrastructure. They particularly emphasize the issues related to the management of data throughout the workflow lifecycle.

Download Full-text

Workflows Científicos com Apoio de Bases de Conhecimento em Tempo Real

10.5753/bresci.2016.9123 ◽

2020 ◽

Author(s):

Victor S. Bursztyn ◽

Jonas Dias ◽

Marta Mattoso

Keyword(s):

Knowledge Base ◽

Domain Knowledge ◽

Large Scale ◽

Workflow Engine ◽

Workflow Execution ◽

Human In The Loop ◽

Domain Specific ◽

Political Sciences ◽

Provenance Data ◽

Domain Specific Knowledge

One major challenge in large-scale experiments is the analytical capacity to contrast ongoing results with domain knowledge. We approach this challenge by constructing a domain-specific knowledge base, which is queried during workflow execution. We introduce K-Chiron, an integrated solution that combines a state-of-the-art automatic knowledge base construction (KBC) system to Chiron, a well-established workflow engine. In this work we experiment in the context of Political Sciences to show how KBC may be used to improve human-in-the-loop (HIL) support in scientific experiments. While HIL in traditional domain expert supervision is done offline, in K-Chiron it is done online, i.e. at runtime. We achieve results in less laborious ways, to the point of enabling a breed of experiments that could be unfeasible with traditional HIL. Finally, we show how provenance data could be leveraged with KBC to enable further experimentation in more dynamic settings.

Download Full-text

A Fully Distributed Architecture for Large Scale Workflow Enactment

International Journal of Cooperative Information Systems ◽

10.1142/s0218843003000802 ◽

2003 ◽

Vol 12 (04) ◽

pp. 411-440 ◽

Cited By ~ 7

Author(s):

Roberto Silveira Silva Filho ◽

Jacques Wainer ◽

Edmundo R. M. Madeira

Keyword(s):

Large Scale ◽

Single Point ◽

Workflow Management ◽

Management Systems ◽

Centralized Control ◽

Dynamic Allocation ◽

Workflow Management Systems ◽

Client Server ◽

Workflow Execution ◽

Distributed Components

Standard client-server workflow management systems are usually designed as client-server systems. The central server is responsible for the coordination of the workflow execution and, in some cases, may manage the activities database. This centralized control architecture may represent a single point of failure, which compromises the availability of the system. We propose a fully distributed and configurable architecture for workflow management systems. It is based on the idea that the activities of a case (an instance of the process) migrate from host to host, executing the workflow tasks, following a process plan. This core architecture is improved with the addition of other distributed components so that other requirements for Workflow Management Systems, besides scalability, are also addressed. The components of the architecture were tested in different distributed and centralized configurations. The ability to configure the location of components and the use of dynamic allocation of tasks were effective for the implementation of load balancing policies.

Download Full-text

Dynamic Fault-Tolerant Workflow Scheduling with Hybrid Spatial-Temporal Re-Execution in Clouds

Information ◽

10.3390/info10050169 ◽

2019 ◽

Vol 10 (5) ◽

pp. 169 ◽

Cited By ~ 2

Author(s):

Na Wu ◽

Decheng Zuo ◽

Zhan Zhang

Keyword(s):

Large Scale ◽

Fault Tolerant ◽

Critical Path ◽

High Reliability ◽

Low Cost ◽

Cost Effective ◽

Scientific Workflow ◽

Workflow Scheduling ◽

Workflow Execution ◽

Computing Environments

Improving reliability is one of the major concerns of scientific workflow scheduling in clouds. The ever-growing computational complexity and data size of workflows present challenges to fault-tolerant workflow scheduling. Therefore, it is essential to design a cost-effective fault-tolerant scheduling approach for large-scale workflows. In this paper, we propose a dynamic fault-tolerant workflow scheduling (DFTWS) approach with hybrid spatial and temporal re-execution schemes. First, DFTWS calculates the time attributes of tasks and identifies the critical path of workflow in advance. Then, DFTWS assigns appropriate virtual machine (VM) for each task according to the task urgency and budget quota in the phase of initial resource allocation. Finally, DFTWS performs online scheduling, which makes real-time fault-tolerant decisions based on failure type and task criticality throughout workflow execution. The proposed algorithm is evaluated on real-world workflows. Furthermore, the factors that affect the performance of DFTWS are analyzed. The experimental results demonstrate that DFTWS achieves a trade-off between high reliability and low cost objectives in cloud computing environments.

Download Full-text

Watchdog 2.0: New developments for reusability, reproducibility, and workflow execution

GigaScience ◽

10.1093/gigascience/giaa068 ◽

2020 ◽

Vol 9 (6) ◽

Cited By ~ 1

Author(s):

Michael Kluge ◽

Marie-Sophie Friedl ◽

Amrei L Menzel ◽

Caroline C Friedel

Keyword(s):

Large Scale ◽

Workflow Management ◽

Biological Data ◽

Management Systems ◽

Workflow Management Systems ◽

Computer Clusters ◽

New Developments ◽

Workflow Execution ◽

Biological Data Analysis ◽

Log File

Abstract Background Advances in high-throughput methods have brought new challenges for biological data analysis, often requiring many interdependent steps applied to a large number of samples. To address this challenge, workflow management systems, such as Watchdog, have been developed to support scientists in the (semi-)automated execution of large analysis workflows. Implementation Here, we present Watchdog 2.0, which implements new developments for module creation, reusability, and documentation and for reproducibility of analyses and workflow execution. Developments include a graphical user interface for semi-automatic module creation from software help pages, sharing repositories for modules and workflows, and a standardized module documentation format. The latter allows generation of a customized reference book of public and user-specific modules. Furthermore, extensive logging of workflow execution, module and software versions, and explicit support for package managers and container virtualization now ensures reproducibility of results. A step-by-step analysis protocol generated from the log file may, e.g., serve as a draft of a manuscript methods section. Finally, 2 new execution modes were implemented. One allows resuming workflow execution after interruption or modification without rerunning successfully executed tasks not affected by changes. The second one allows detaching and reattaching to workflow execution on a local computer while tasks continue running on computer clusters. Conclusions Watchdog 2.0 provides several new developments that we believe to be of benefit for large-scale bioinformatics analysis and that are not completely covered by other competing workflow management systems. The software itself, module and workflow repositories, and comprehensive documentation are freely available at https://www.bio.ifi.lmu.de/watchdog.

Download Full-text

Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example

2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06) ◽

10.1109/e-science.2006.261098 ◽

2006 ◽

Cited By ~ 30

Author(s):

Ewa Deelman ◽

Scott Callaghan ◽

Edward Field ◽

Hunter Francoeur ◽

Robert Graves ◽

...

Keyword(s):

Large Scale ◽

Resource Provisioning ◽

Workflow Execution

Download Full-text

Scheduling Dynamic Parallel Loop Workflow in Cloud Environment

Walailak Journal of Science and Technology (WJST) ◽

10.48048/wjst.2018.2267 ◽

2016 ◽

Vol 15 (1) ◽

pp. 19-27

Author(s):

Sucha SMANCHAT ◽

Kanchana VIRIYAPANT

Keyword(s):

Task Scheduling ◽

Large Scale ◽

Task Mapping ◽

Parallel Loop ◽

Workflow Execution ◽

Cost Constraints ◽

Cloud Resource ◽

Resource Adjustment ◽

Computing Platforms ◽

Parallel Workflow

Scientific workflows have been employed to automate large scale scientific experiments by leveraging computational power provided on-demand by cloud computing platforms. Among these workflows, a parallel loop workflow is used for studying the effects of different input values of a scientific experiment. Because of its independent loop characteristic, a parallel loop workflow can be dynamically executed as parallel workflow instances to accelerate the execution. Such execution negates workflow traversal used in existing works to calculate execution time and cost during scheduling in order to maintain time and cost constraints. In this paper, we propose a novel scheduling technique that is able to handle dynamic parallel loop workflow execution through a new method for evaluating execution progress together with a workflow instance arrival control and a cloud resource adjustment mechanism. The proposed technique, which aims at maintaining a workflow deadline while reducing cost, is tested using 3 existing task scheduling heuristics as its task mapping strategies. The simulation results show that the proposed technique is practical and performs better when the time constraint is more relaxed. It also prefers task scheduling heuristics that allow for a more accurate progress evaluation.

Download Full-text

Enabling rapid cloud-based analysis of thousands of human genomes via Butler

10.1101/185736 ◽

2017 ◽

Cited By ~ 2

Author(s):

Sergei Yakneen ◽

Sebastian M. Waszak ◽

Michael Gertz ◽

Jan O. Korbel

Keyword(s):

Large Scale ◽

Self Healing ◽

Cloud Infrastructure ◽

Computational Framework ◽

Workflow Execution ◽

Software Configuration ◽

Human Genomes ◽

Whole Genomes ◽

Flexible Framework ◽

Execution Management

We present Butler, a computational framework developed in the context of the international Pan-cancer Analysis of Whole Genomes (PCAWG)1 project to overcome the challenges of orchestrating analyses of thousands of human genomes on the cloud. Butler operates equally well on public and academic clouds. This highly flexible framework facilitates management of virtual cloud infrastructure, software configuration, genomics workflow development, and provides unique capabilities in workflow execution management. By comprehensively collecting and analysing metrics and logs, performing anomaly detection as well as notification and cluster self-healing, Butler enables large-scale analytical processing of human genomes with 43% increased throughput compared to prior setups. Butler was key for delivering the germline genetic variant call-sets in 2,834 cancer genomes analysed by PCAWG1.

Download Full-text

Research and Design on P2P Business Process Execution Framework for Workflow Management Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.989-994.4771 ◽

2014 ◽

Vol 989-994 ◽

pp. 4771-4774

Author(s):

Tao Wu

Keyword(s):

Business Process ◽

Large Scale ◽

Workflow Management ◽

Workflow Engine ◽

Management Systems ◽

Workflow Management Systems ◽

Workflow Execution ◽

Distributed Framework ◽

Process Execution ◽

Business Process Execution

Efficient business workflow management in large-scale areas is in great demand. However, current business workflow management systems are short of distributed workflow execution support. In our paper, we design and implement a distributed framework called PeerODE for Apache ODE (Orchestration Director Engine) [1], an open-sourced business workflow engine. PeerODE presents a scalable approach to P2P business process execution. The scheduling experiment on PeerODE shows that the framework handles the distributed business process execution effectively.

Download Full-text

Workflows Científicos com Apoio de Bases de Conhecimento em Tempo Real

10.5753/bresci.2016.10009 ◽

2018 ◽

Author(s):

Victor S. Bursztyn ◽

Jonas Dias ◽

Marta Mattoso

Keyword(s):

Knowledge Base ◽

Domain Knowledge ◽

Large Scale ◽

Workflow Engine ◽

Workflow Execution ◽

Human In The Loop ◽

Domain Specific ◽

Political Sciences ◽

Provenance Data ◽

Domain Specific Knowledge

One major challenge in large-scale experiments is the analytical capacity to contrast ongoing results with domain knowledge. We approach this challenge by constructing a domain-specific knowledge base, which is queried during workflow execution. We introduce K-Chiron, an integrated solution that combines a state-of-the-art automatic knowledge base construction (KBC) system to Chiron, a well-established workflow engine. In this work we experiment in the context of Political Sciences to show how KBC may be used to improve human-in-the-loop (HIL) support in scientific experiments. While HIL in traditional domain expert supervision is done offline, in K-Chiron it is done online, i.e. at runtime. We achieve results in less laborious ways, to the point of enabling a breed of experiments that could be unfeasible with traditional HIL. Finally, we show how provenance data could be leveraged with KBC to enable further experimentation in more dynamic settings.

Download Full-text

Distributed HPC Resources Orchestration for Supporting Large-Scale Workflow Execution

Data Management in Scientific Workflows

Workflows Científicos com Apoio de Bases de Conhecimento em Tempo Real

A Fully Distributed Architecture for Large Scale Workflow Enactment

Dynamic Fault-Tolerant Workflow Scheduling with Hybrid Spatial-Temporal Re-Execution in Clouds

Watchdog 2.0: New developments for reusability, reproducibility, and workflow execution

Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example

Scheduling Dynamic Parallel Loop Workflow in Cloud Environment

Enabling rapid cloud-based analysis of thousands of human genomes via Butler

Research and Design on P2P Business Process Execution Framework for Workflow Management Systems

Workflows Científicos com Apoio de Bases de Conhecimento em Tempo Real

Export Citation Format