An Approach to Configuration Management of Scientific Workflows

2017 ◽  
Vol 9 (2) ◽  
pp. 20-46 ◽  
Author(s):  
Tassio Ferenzini Martins Sirqueira ◽  
Regina Braga ◽  
Marco Antônio P. Araújo ◽  
José Maria N. David ◽  
Fernanda Campos ◽  
...  

A scientific software ecosystem aims to integrate all stages of an experiment and its related workflows, in order to solve complex problems. In this vein, in order to assure the experiment proper execution, any modification that occurs must be propagated to the associated workflows, which must be maintained and evolved for the successful conduction of the research. One way to ensure this control is through configuration management using data provenance. In this work, the authors use data provenance concepts and models, together with ontologies to provide an architecture for the storage and query of scientific experiment information. Considering the architecture, a proof of concept was conducted using workflows extracted from the myExperiment repository. The results are presented along the paper.

Author(s):  
Lenita M. Ambrósio ◽  
José Maria N. David ◽  
Regina Braga ◽  
Fernanda Campos ◽  
Victor Ströele ◽  
...  

Author(s):  
Anton Michlmayr ◽  
Florian Rosenberg ◽  
Philipp Leitner ◽  
Schahram Dustdar

In general, provenance describes the origin and well-documented history of a given object. This notion has been applied in information systems, mainly to provide data provenance of scientific workflows. Similar to this, provenance in Service-oriented Computing has also focused on data provenance. However, the authors argue that in service-centric systems the origin and history of services is equally important. This paper presents an approach that addresses service provenance. The authors show how service provenance information can be collected and retrieved, and how security mechanisms guarantee integrity and access to this information, while also providing user-specific views on provenance. Finally, the paper gives a performance evaluation of the authors’ approach, which has been integrated into the VRESCo Web service runtime environment.


Author(s):  
Vincent Breton ◽  
Eddy Caron ◽  
Frederic Desprez ◽  
Gael Le Mahec

As grids become more and more attractive for solving complex problems with high computational and storage requirements, bioinformatics starts to be ported on large scale platforms. The BLAST kernel, one of the main cornerstone of high performance genomics, was one the first application ported on such platform. However, if a simple parallelization was enough for the first proof of concept, its use in production platform needed more optimized algorithms. In this chapter, we review existing parallelization and “gridification” approaches as well as related issues such as data management and replication, and a case study using the DIET middleware over the Grid’5000 experimental platform.


Author(s):  
Khalid Belhajjame ◽  
Paolo Missier ◽  
Carole Goble

Data provenance is key to understanding and interpreting the results of scientific experiments. This chapter introduces and characterises data provenance in scientific workflows using illustrative examples taken from real-world workflows. The characterisation takes the form of a taxonomy that is used for comparing and analysing provenance capabilities supplied by existing scientific workflow systems.


2021 ◽  
Vol 12 (2) ◽  
Author(s):  
Leonardo Ramos ◽  
Fabio Porto ◽  
Daniel De Oliveira

Scientific research based on computer simulations is complex since it may involve managing the enormous volumes of data and metadata produced during the life cycle of a scientific experiment, from the formulation of hypotheses to its final evaluation. This wealth of data needs to be structured and managed in a way that makes sense to scientists so that relevant knowledge can be extracted to contribute to the scientific research process. In addition, when it comes to the scope of the scientific project as a whole, it may be associated with several different scientific experiments, which in turn may require executions of different scientific workflows, which makes the task rather arduous. All of this can become even more difficult if we consider that the project tasks must be associated with the execution of such simulations (which may take hours or even days), that the hypotheses of a phenomenon need validation and replication, and that the project team may be geographically dispersed. This article presents an approach called PhenoManager that aims at helping scientists managing their scientific projects and the cycle of the scientific method as a whole. PhenoManager can assist the scientist in structuring, validating, and reproducing hypotheses of a phenomenon through configurable computational models in the approach. For the evaluation of this article was used SciPhy, a scientific workflow in the field of bioinformatics, concluding that the proposed approach brings gains without considerable performance losses.


2010 ◽  
Vol 7 (2) ◽  
pp. 65-86 ◽  
Author(s):  
Anton Michlmayr ◽  
Florian Rosenberg ◽  
Philipp Leitner ◽  
Schahram Dustdar

In general, provenance describes the origin and well-documented history of a given object. This notion has been applied in information systems, mainly to provide data provenance of scientific workflows. Similar to this, provenance in Service-oriented Computing has also focused on data provenance. However, the authors argue that in service-centric systems the origin and history of services is equally important. This paper presents an approach that addresses service provenance. The authors show how service provenance information can be collected and retrieved, and how security mechanisms guarantee integrity and access to this information, while also providing user-specific views on provenance. Finally, the paper gives a performance evaluation of the authors’ approach, which has been integrated into the VRESCo Web service runtime environment.


2017 ◽  
Vol 28 (1) ◽  
pp. 43-62
Author(s):  
Jun Liu ◽  
Sudha Ram

Provenance is becoming increasingly important as more and more people are using data that they themselves did not generate. In the last decade, significant efforts have been directed toward developing generic, shared data provenance ontologies that support the interoperability of provenance across systems. An issue that is impeding the use of such provenance ontologies is that a generic provenance ontology, no matter how complete it is, is insufficient for capturing the diverse, complex provenance requirements in different domains. In this paper, the authors propose a novel approach to adapting and extending the W7 model, a well-known generic ontology of data provenance. Relying on various knowledge expansion mechanisms provided by the Conceptual Graph formalism, the authors' approach enables us to develop domain ontologies of provenance in a disciplined yet flexible way.


Sign in / Sign up

Export Citation Format

Share Document