On the Anonymization of Workflow Provenance without Compromising the Transparency of Lineage

2022 ◽  
Vol 14 (1) ◽  
pp. 1-27
Author(s):  
Khalid Belhajjame

Workflows have been adopted in several scientific fields as a tool for the specification and execution of scientific experiments. In addition to automating the execution of experiments, workflow systems often include capabilities to record provenance information, which contains, among other things, data records used and generated by the workflow as a whole but also by its component modules. It is widely recognized that provenance information can be useful for the interpretation, verification, and re-use of workflow results, justifying its sharing and publication among scientists. However, workflow execution in some branches of science can manipulate sensitive datasets that contain information about individuals. To address this problem, we investigate, in this article, the problem of anonymizing the provenance of workflows. In doing so, we consider a popular class of workflows in which component modules use and generate collections of data records as a result of their invocation, as opposed to a single data record. The solution we propose offers guarantees of confidentiality without compromising lineage information, which provides transparency as to the relationships between the data records used and generated by the workflow modules. We provide algorithmic solutions that show how the provenance of a single module and an entire workflow can be anonymized and present the results of experiments that we conducted for their evaluation.

Author(s):  
Ewa Deelman ◽  
Ann Chervenak

Scientific applications such as those in astronomy, earthquake science, gravitational-wave physics, and others have embraced workflow technologies to do large-scale science. Workflows enable researchers to collaboratively design, manage, and obtain results that involve hundreds of thousands of steps, access terabytes of data, and generate similar amounts of intermediate and final data products. Although workflow systems are able to facilitate the automated generation of data products, many issues still remain to be addressed. These issues exist in different forms in the workflow lifecycle. This chapter describes a workflow lifecycle as consisting of a workflow generation phase where the analysis is defined, the workflow planning phase where resources needed for execution are selected, the workflow execution part, where the actual computations take place, and the result, metadata, and provenance storing phase. The authors discuss the issues related to data management at each step of the workflow cycle. They describe challenge problems and illustrate them in the context of real-life applications. They discuss the challenges, possible solutions, and open issues faced when mapping and executing large-scale workflows on current cyberinfrastructure. They particularly emphasize the issues related to the management of data throughout the workflow lifecycle.


Author(s):  
Khalid Belhajjame ◽  
Paolo Missier ◽  
Carole Goble

Data provenance is key to understanding and interpreting the results of scientific experiments. This chapter introduces and characterises data provenance in scientific workflows using illustrative examples taken from real-world workflows. The characterisation takes the form of a taxonomy that is used for comparing and analysing provenance capabilities supplied by existing scientific workflow systems.


2020 ◽  
Author(s):  
Lenita Ambrósio ◽  
José Maria David ◽  
Regina Braga ◽  
Fernanda Campos ◽  
Victor Ströele ◽  
...  

Managing contextual and provenance information plays a key role in the scientific domain. Activities which are carried out in this domain are often collaborative and distributed. Thus, aiming to examine and audit results already obtained, researchers need to be aware of the actions taken by other members of the group. Contextual and provenance information are essential to enhance the reproducibility and reuse of experiment. The goal of this work is to present a conceptual framework that provides guidelines capable of supporting the modeling of provenance and context in a software ecosystem platform to support scientific experimentation. Preliminary results are also presented when the proposed solution is used to design software ecosystem platform components.


2020 ◽  
Vol 16 (4) ◽  
pp. 427-449
Author(s):  
Kan Ngamakeur ◽  
Sira Yongchareon

Purpose The paper aims to study realization requirements for the flexible enactment of artifact-centric business processes in a dynamic, collaborative environment and to develop a workflow execution framework that can effectively address those requirements. Design/methodology/approach This study proposed a framework and contract-based, event-driven architecture design and implementation that can directly realize collaborative artifact-centric business processes in service-oriented architecture (SOA) without any model conversion. Findings The results show that the approach is feasible in presenting several key benefits over the use of existing workflow systems to run artifact-centric processes. Originality/value Most of the existing approaches require an artifact-centric model to be transformed into executable workflow languages to run on existing workflow management systems. This study argues that the model conversion can incur losses of information and affect traceability and monitoring ability of workflows, especially in an SOA where a workflow can span across multiple inter-business entities.


2006 ◽  
Vol 14 (3-4) ◽  
pp. 209-216 ◽  
Author(s):  
J.D. Blower ◽  
A.B. Harrison ◽  
K. Haines

The service-oriented approach to performing distributed scientific research is potentially very powerful but is not yet widely used in many scientific fields. This is partly due to the technical difficulties involved in creating services and workflows and the inefficiency of many workflow systems with regard to handling large datasets. We present the Styx Grid Service, a simple system that wraps command-line programs and allows them to be run over the Internet exactly as if they were local programs. Styx Grid Services are very easy to create and use and can be composed into powerful workflows with simple shell scripts or more sophisticated graphical tools. An important feature of the system is that data can be streamed directly from service to service, significantly increasing the efficiency of workflows that use large data volumes. The status and progress of Styx Grid Services can be monitored asynchronously using a mechanism that places very few demands on firewalls. We show how Styx Grid Services can interoperate with with Web Services and WS-Resources using suitable adapters.


2009 ◽  
Vol 23 (2) ◽  
pp. 117-127 ◽  
Author(s):  
Astrid Wichmann ◽  
Detlev Leutner

Seventy-nine students from three science classes conducted simulation-based scientific experiments. They received one of three kinds of instructional support in order to encourage scientific reasoning during inquiry learning: (1) basic inquiry support, (2) advanced inquiry support including explanation prompts, or (3) advanced inquiry support including explanation prompts and regulation prompts. Knowledge test as well as application test results show that students with regulation prompts significantly outperformed students with explanation prompts (knowledge: d = 0.65; application: d = 0.80) and students with basic inquiry support only (knowledge: d = 0.57; application: d = 0.83). The results are in line with a theoretical focus on inquiry learning according to which students need specific support with respect to the regulation of scientific reasoning when developing explanations during experimentation activities.


2015 ◽  
Vol 14 (4) ◽  
pp. 165-181 ◽  
Author(s):  
Sarah Dudenhöffer ◽  
Christian Dormann

Abstract. The purpose of this study was to replicate the dimensions of the customer-related social stressors (CSS) concept across service jobs, to investigate their consequences for service providers’ well-being, and to examine emotional dissonance as mediator. Data of 20 studies comprising of different service jobs (N = 4,199) were integrated into a single data set and meta-analyzed. Confirmatory factor analyses and explorative principal component analysis confirmed four CSS scales: disproportionate expectations, verbal aggression, ambiguous expectations, disliked customers. These CSS scales were associated with burnout and job satisfaction. Most of the effects were partially mediated by emotional dissonance. Further analyses revealed that differences among jobs exist with regard to the factor solution. However, associations between CSS and outcomes are mainly invariant across service jobs.


1865 ◽  
Vol 13 (3) ◽  
pp. 32-35
Author(s):  
James Glaisher

2017 ◽  
Vol 13 (1) ◽  
pp. 4522-4534
Author(s):  
Armando Tomás Canero

This paper presents sound propagation based on a transverse wave model which does not collide with the interpretation of physical events based on the longitudinal wave model, but responds to the correspondence principle and allows interpreting a significant number of scientific experiments that do not follow the longitudinal wave model. Among the problems that are solved are: the interpretation of the location of nodes and antinodes in a Kundt tube of classical mechanics, the traslation of phonons in the vacuum interparticle of quantum mechanics and gravitational waves in relativistic mechanics.


Sign in / Sign up

Export Citation Format

Share Document