Probabilistic Inference of Fine-Grained Data Provenance

Author(s):  
Mohammad Rezwanul Huq ◽  
Peter M. G. Apers ◽  
Andreas Wombacher
2020 ◽  
Vol 14 (4) ◽  
pp. 485-497
Author(s):  
Nan Zheng ◽  
Zachary G. Ives

Data provenance tools aim to facilitate reproducible data science and auditable data analyses, by tracking the processes and inputs responsible for each result of an analysis. Fine-grained provenance further enables sophisticated reasoning about why individual output results appear or fail to appear. However, for reproducibility and auditing, we need a provenance archival system that is tamper-resistant , and efficiently stores provenance for computations computed over time (i.e., it compresses repeated results). We study this problem, developing solutions for storing fine-grained provenance in relational storage systems while both compressing and protecting it via cryptographic hashes. We experimentally validate our proposed solutions using both scientific and OLAP workloads.


2021 ◽  
Vol 30 (1) ◽  
pp. 3-24
Author(s):  
Pingcheng Ruan ◽  
Tien Tuan Anh Dinh ◽  
Qian Lin ◽  
Meihui Zhang ◽  
Gang Chen ◽  
...  

2019 ◽  
Vol 12 (9) ◽  
pp. 975-988 ◽  
Author(s):  
Pingcheng Ruan ◽  
Gang Chen ◽  
Tien Tuan Anh Dinh ◽  
Qian Lin ◽  
Beng Chin Ooi ◽  
...  

2019 ◽  
Vol 118 ◽  
pp. 134-145 ◽  
Author(s):  
Raphael Spiekermann ◽  
Ben Jolly ◽  
Alexander Herzig ◽  
Tom Burleigh ◽  
David Medyckyj-Scott

2015 ◽  
Vol 26 (2) ◽  
pp. 32-47 ◽  
Author(s):  
Salmin Sultana ◽  
Elisa Bertino

Existing provenance systems operate at a single layer of abstraction (workflow/process/OS) at which they record and store provenance. However, the provenance captured from different layers provides the highest benefit when integrated through a unified provenance framework. To build such a framework, a comprehensive provenance model able to represent the provenance of data objects with various semantics and granularity is the first step. In this paper, the authors propose a provenance model able to represent the provenance of any data object captured at any abstraction layer and present an abstract schema of the model. The expressive nature of the model enables a wide range of provenance queries. The authors also illustrate the utility of their model in real world data processing systems. In the paper, they also introduce a data provenance distributed middleware system composed of several different components and services that capture provenance according to their model and securely stores it in a central repository. As part of our middleware, the authors present a thin stackable file system, called FiPS, for capturing local provenance in a portable manner. FiPS is able to capture provenance at various degrees of granularity, transform provenance records into secure information, and direct the resulting provenance data to various persistent storage systems.


Sign in / Sign up

Export Citation Format

Share Document