scholarly journals Scientific Workflow Scheduling with Provenance Data in a Multisite Cloud

Author(s):  
Ji Liu ◽  
Esther Pacitti ◽  
Patrick Valduriez ◽  
Marta Mattoso
2008 ◽  
Vol 16 (2-3) ◽  
pp. 205-216
Author(s):  
Bartosz Balis ◽  
Marian Bubak ◽  
Bartłomiej Łabno

Scientific workflows are a means of conducting in silico experiments in modern computing infrastructures for e-Science, often built on top of Grids. Monitoring of Grid scientific workflows is essential not only for performance analysis but also to collect provenance data and gather feedback useful in future decisions, e.g., related to optimization of resource usage. In this paper, basic problems related to monitoring of Grid scientific workflows are discussed. Being highly distributed, loosely coupled in space and time, heterogeneous, and heavily using legacy codes, workflows are exceptionally challenging from the monitoring point of view. We propose a Grid monitoring architecture for scientific workflows. Monitoring data correlation problem is described and an algorithm for on-line distributed collection of monitoring data is proposed. We demonstrate a prototype implementation of the proposed workflow monitoring architecture, the GEMINI monitoring system, and its use for monitoring of a real-life scientific workflow.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 125783-125795 ◽  
Author(s):  
Yongqiang Gao ◽  
Shuyun Zhang ◽  
Jiantao Zhou

2020 ◽  
Vol 17 (3) ◽  
pp. 56-68
Author(s):  
Yin Li ◽  
Yuyin Ma ◽  
Ziyang Zeng

Edge computing is pushing the frontier of computing applications, data, and services away from centralized nodes to the logical extremes of a network. A major technological challenge for workflow scheduling in the edge computing environment is cost reduction with service-level-agreement (SLA) constraints in terms of performance and quality-of-service requirements because real-world workflow applications are constantly subject to negative impacts (e.g., network congestions, unexpected long message delays, shrinking coverage, range of edge servers due to battery depletion. To address the above concern, we propose a novel approach to location-aware and proximity-constrained multi-workflow scheduling with edge computing resources). The proposed approach is capable of minimizing monetary costs with user-required workflow completion deadlines. It employs an evolutionary algorithm (i.e., the discrete firefly algorithm) for the generation of near-optimal scheduling decisions. For the validation purpose, the authors show that our proposed approach outperforms traditional peers in terms multiple metrics based on a real-world dataset of edge resource locations and multiple well-known scientific workflow templates.


2019 ◽  
Vol 29 (10) ◽  
pp. 2050167
Author(s):  
Xiumin Zhou ◽  
Gongxuan Zhang ◽  
Tian Wang ◽  
Mingyue Zhang ◽  
Xiji Wang ◽  
...  

Most popular scientific workflow systems can now support the deployment of tasks to the cloud. The execution of workflow on cloud has become a multi-objective scheduling in order to meet the needs of users in many aspects. Cost and makespan are considered to be the two most important objects. In addition to these, there are some other Quality-of-Service (QoS) parameters including system reliability, energy consumption and so on. Here, we focus on three objectives: cost, makespan and system reliability. In this paper, we propose a Multi-objective Evolutionary Algorithm on the Cloud (MEAC). In the algorithm, we design some novel schemes including problem-specific encoding and also evolutionary operations, such as crossover and mutation. Simulations on real-world and random workflows are conducted and the results show that MEAC can get on average about 5% higher hypervolume value than some other workflow scheduling algorithms.


Author(s):  
Phan Thanh Toàn Phan Thanh Toàn

Cloud computing is a new trend of information and communication technology that enables resource distribution and sharing at a large scale. The Cloud consists of a collection of virtual machine that promise to provision on-demand computational and storage resources when needed. End-users can access these resources via the Internet and have to pay only for their usage. Scheduling of scientific workflow applications on the Cloud is a challenging problem that has been the focus of many researchers for many years. In this work, we propose a novel algorithm for workflow scheduling that is derived from the Opposition-based Differential Evolution method. This algorithm does not only ensure fast convergence but it also averts getting trapped into local extrema. Our CloudSim-based simulations show that our algorithm is superior to its predecessors. Moreover, the deviation of its solution from the optimal one is negligible.


2014 ◽  
Vol 9 (2) ◽  
pp. 28-38 ◽  
Author(s):  
Víctor Cuevas-Vicenttín ◽  
Parisa Kianmajd ◽  
Bertram Ludäscher ◽  
Paolo Missier ◽  
Fernando Chirigati ◽  
...  

Scientific workflows and their supporting systems are becoming increasingly popular for compute-intensive and data-intensive scientific experiments. The advantages scientific workflows offer include rapid and easy workflow design, software and data reuse, scalable execution, sharing and collaboration, and other advantages that altogether facilitate “reproducible science”. In this context, provenance – information about the origin, context, derivation, ownership, or history of some artifact – plays a key role, since scientists are interested in examining and auditing the results of scientific experiments. However, in order to perform such analyses on scientific results as part of extended research collaborations, an adequate environment and tools are required. Concretely, the need arises for a repository that will facilitate the sharing of scientific workflows and their associated execution traces in an interoperable manner, also enabling querying and visualization. Furthermore, such functionality should be supported while taking performance and scalability into account. With this purpose in mind, we introduce PBase: a scientific workflow provenance repository implementing the ProvONE proposed standard, which extends the emerging W3C PROV standard for provenance data with workflow specific concepts. PBase is built on the Neo4j graph database, thus offering capabilities such as declarative and efficient querying. Our experiences demonstrate the power gained by supporting various types of queries for provenance data. In addition, PBase is equipped with a user friendly interface tailored for the visualization of scientific workflow provenance data, making the specification of queries and the interpretation of their results easier and more effective.


2019 ◽  
Author(s):  
Raiane Coelho ◽  
Regina Braga ◽  
José Maria David ◽  
Fernanda Campos ◽  
Victor Ströele

In scientific collaboration, the data sharing, the exchange of ideas and results is crucial to promote knowledge and accelerate the development of science. Trust is extremely important in this context as well as reproducibility. Although in scientific workflow the provenance has been the basis for reproducibility, in collaborative environments it is necessary to ensure integrity and trustworthiness of this provenance data. One of the technologies that have emerged and can help to address these issues is blockchain. A blockchain-based provenance system for collaborative scientific experiments could lead to a trustworthy environment for scientific experimentation. In this vein, this paper presents the specification of an architecture, named BlockFlow, that provides trust for distributed provenance data.


Sign in / Sign up

Export Citation Format

Share Document