Scientific Workflow Scheduling with Provenance Data in a Multisite Cloud

Scientific workflows are a means of conducting in silico experiments in modern computing infrastructures for e-Science, often built on top of Grids. Monitoring of Grid scientific workflows is essential not only for performance analysis but also to collect provenance data and gather feedback useful in future decisions, e.g., related to optimization of resource usage. In this paper, basic problems related to monitoring of Grid scientific workflows are discussed. Being highly distributed, loosely coupled in space and time, heterogeneous, and heavily using legacy codes, workflows are exceptionally challenging from the monitoring point of view. We propose a Grid monitoring architecture for scientific workflows. Monitoring data correlation problem is described and an algorithm for on-line distributed collection of monitoring data is proposed. We demonstrate a prototype implementation of the proposed workflow monitoring architecture, the GEMINI monitoring system, and its use for monitoring of a real-life scientific workflow.

Download Full-text

A Hybrid Algorithm for Multi-Objective Scientific Workflow Scheduling in IaaS Cloud

IEEE Access ◽

10.1109/access.2019.2939294 ◽

2019 ◽

Vol 7 ◽

pp. 125783-125795 ◽

Cited By ~ 1

Author(s):

Yongqiang Gao ◽

Shuyun Zhang ◽

Jiantao Zhou

Keyword(s):

Hybrid Algorithm ◽

Scientific Workflow ◽

Workflow Scheduling ◽

Multi Objective

Download Full-text

A Survey of Modern Scientific Workflow Scheduling Algorithms and Systems in the Era of Big Data

2020 IEEE International Conference on Services Computing (SCC) ◽

10.1109/scc49832.2020.00026 ◽

2020 ◽

Author(s):

Junwen Liu ◽

Shiyong Lu ◽

Dunren Che

Keyword(s):

Big Data ◽

Scheduling Algorithms ◽

Scientific Workflow ◽

Workflow Scheduling

Download Full-text

Efficient scientific workflow scheduling for deadline-constrained parallel tasks in cloud computing environments

Information Sciences ◽

10.1016/j.ins.2020.04.039 ◽

2020 ◽

Vol 531 ◽

pp. 31-46 ◽

Cited By ~ 1

Author(s):

Longxin Zhang ◽

Liqian Zhou ◽

Ahmad Salah

Keyword(s):

Cloud Computing ◽

Scientific Workflow ◽

Workflow Scheduling ◽

Parallel Tasks ◽

Computing Environments

Download Full-text

A Novel Approach to Location-Aware Scheduling of Workflows Over Edge Computing Resources

International Journal of Web Services Research ◽

10.4018/ijwsr.2020070104 ◽

2020 ◽

Vol 17 (3) ◽

pp. 56-68

Author(s):

Yin Li ◽

Yuyin Ma ◽

Ziyang Zeng

Keyword(s):

Real World ◽

Service Level Agreement ◽

Service Level ◽

Scientific Workflow ◽

Edge Computing ◽

Workflow Scheduling ◽

Location Aware ◽

Novel Approach ◽

Negative Impacts ◽

Multiple Metrics

Edge computing is pushing the frontier of computing applications, data, and services away from centralized nodes to the logical extremes of a network. A major technological challenge for workflow scheduling in the edge computing environment is cost reduction with service-level-agreement (SLA) constraints in terms of performance and quality-of-service requirements because real-world workflow applications are constantly subject to negative impacts (e.g., network congestions, unexpected long message delays, shrinking coverage, range of edge servers due to battery depletion. To address the above concern, we propose a novel approach to location-aware and proximity-constrained multi-workflow scheduling with edge computing resources). The proposed approach is capable of minimizing monetary costs with user-required workflow completion deadlines. It employs an evolutionary algorithm (i.e., the discrete firefly algorithm) for the generation of near-optimal scheduling decisions. For the validation purpose, the authors show that our proposed approach outperforms traditional peers in terms multiple metrics based on a real-world dataset of edge resource locations and multiple well-known scientific workflow templates.

Download Full-text

Makespan–Cost–Reliability-Optimized Workflow Scheduling Using Evolutionary Techniques in Clouds

Journal of Circuits System and Computers ◽

10.1142/s0218126620501674 ◽

2019 ◽

Vol 29 (10) ◽

pp. 2050167

Author(s):

Xiumin Zhou ◽

Gongxuan Zhang ◽

Tian Wang ◽

Mingyue Zhang ◽

Xiji Wang ◽

...

Keyword(s):

Quality Of Service ◽

Energy Consumption ◽

System Reliability ◽

Scientific Workflow ◽

Workflow Scheduling ◽

Workflow Systems ◽

Multi Objective ◽

Qos Parameters ◽

Crossover And Mutation

Most popular scientific workflow systems can now support the deployment of tasks to the cloud. The execution of workflow on cloud has become a multi-objective scheduling in order to meet the needs of users in many aspects. Cost and makespan are considered to be the two most important objects. In addition to these, there are some other Quality-of-Service (QoS) parameters including system reliability, energy consumption and so on. Here, we focus on three objectives: cost, makespan and system reliability. In this paper, we propose a Multi-objective Evolutionary Algorithm on the Cloud (MEAC). In the algorithm, we design some novel schemes including problem-specific encoding and also evolutionary operations, such as crossover and mutation. Simulations on real-world and random workflows are conducted and the results show that MEAC can get on average about 5% higher hypervolume value than some other workflow scheduling algorithms.

Download Full-text

Thuật toán MODE giải bài toán lập lịch luồng công việc

Research and Development on Information and Communication Technology ◽

10.32913/rd-ict.vol1.no37.254 ◽

2017 ◽

pp. 51

Author(s):

Phan Thanh Toàn Phan Thanh Toàn

Keyword(s):

Information And Communication Technology ◽

Communication Technology ◽

Large Scale ◽

Scientific Workflow ◽

Workflow Scheduling ◽

Challenging Problem ◽

On Demand ◽

Local Extrema ◽

Information And Communication ◽

And Storage

Cloud computing is a new trend of information and communication technology that enables resource distribution and sharing at a large scale. The Cloud consists of a collection of virtual machine that promise to provision on-demand computational and storage resources when needed. End-users can access these resources via the Internet and have to pay only for their usage. Scheduling of scientific workflow applications on the Cloud is a challenging problem that has been the focus of many researchers for many years. In this work, we propose a novel algorithm for workflow scheduling that is derived from the Opposition-based Differential Evolution method. This algorithm does not only ensure fast convergence but it also averts getting trapped into local extrema. Our CloudSim-based simulations show that our algorithm is superior to its predecessors. Moreover, the deviation of its solution from the optimal one is negligible.

Download Full-text

The PBase Scientific Workflow Provenance Repository

International Journal of Digital Curation ◽

10.2218/ijdc.v9i2.332 ◽

2014 ◽

Vol 9 (2) ◽

pp. 28-38 ◽

Cited By ~ 16

Author(s):

Víctor Cuevas-Vicenttín ◽

Parisa Kianmajd ◽

Bertram Ludäscher ◽

Paolo Missier ◽

Fernando Chirigati ◽

...

Keyword(s):

Scientific Workflow ◽

Scientific Workflows ◽

Data Reuse ◽

Data Intensive ◽

Research Collaborations ◽

Provenance Data ◽

Scientific Experiments ◽

History Of ◽

Scientific Results ◽

User Friendly

Scientific workflows and their supporting systems are becoming increasingly popular for compute-intensive and data-intensive scientific experiments. The advantages scientific workflows offer include rapid and easy workflow design, software and data reuse, scalable execution, sharing and collaboration, and other advantages that altogether facilitate “reproducible science”. In this context, provenance – information about the origin, context, derivation, ownership, or history of some artifact – plays a key role, since scientists are interested in examining and auditing the results of scientific experiments. However, in order to perform such analyses on scientific results as part of extended research collaborations, an adequate environment and tools are required. Concretely, the need arises for a repository that will facilitate the sharing of scientific workflows and their associated execution traces in an interoperable manner, also enabling querying and visualization. Furthermore, such functionality should be supported while taking performance and scalability into account. With this purpose in mind, we introduce PBase: a scientific workflow provenance repository implementing the ProvONE proposed standard, which extends the emerging W3C PROV standard for provenance data with workflow specific concepts. PBase is built on the Neo4j graph database, thus offering capabilities such as declarative and efficient querying. Our experiences demonstrate the power gained by supporting various types of queries for provenance data. In addition, PBase is equipped with a user friendly interface tailored for the visualization of scientific workflow provenance data, making the specification of queries and the interpretation of their results easier and more effective.

Download Full-text

BlockFlow: Trust in Scientific Provenance Data

10.5753/bresci.2019.10033 ◽

2019 ◽

Author(s):

Raiane Coelho ◽

Regina Braga ◽

José Maria David ◽

Fernanda Campos ◽

Victor Ströele

Keyword(s):

Data Sharing ◽

Scientific Collaboration ◽

Scientific Workflow ◽

Collaborative Environments ◽

Scientific Experimentation ◽

Provenance Data ◽

Scientific Experiments

In scientific collaboration, the data sharing, the exchange of ideas and results is crucial to promote knowledge and accelerate the development of science. Trust is extremely important in this context as well as reproducibility. Although in scientific workflow the provenance has been the basis for reproducibility, in collaborative environments it is necessary to ensure integrity and trustworthiness of this provenance data. One of the technologies that have emerged and can help to address these issues is blockchain. A blockchain-based provenance system for collaborative scientific experiments could lead to a trustworthy environment for scientific experimentation. In this vein, this paper presents the specification of an architecture, named BlockFlow, that provides trust for distributed provenance data.

Download Full-text