provenance information
Recently Published Documents


TOTAL DOCUMENTS

121
(FIVE YEARS 45)

H-INDEX

8
(FIVE YEARS 2)

2022 ◽  
Vol 14 (1) ◽  
pp. 1-27
Author(s):  
Khalid Belhajjame

Workflows have been adopted in several scientific fields as a tool for the specification and execution of scientific experiments. In addition to automating the execution of experiments, workflow systems often include capabilities to record provenance information, which contains, among other things, data records used and generated by the workflow as a whole but also by its component modules. It is widely recognized that provenance information can be useful for the interpretation, verification, and re-use of workflow results, justifying its sharing and publication among scientists. However, workflow execution in some branches of science can manipulate sensitive datasets that contain information about individuals. To address this problem, we investigate, in this article, the problem of anonymizing the provenance of workflows. In doing so, we consider a popular class of workflows in which component modules use and generate collections of data records as a result of their invocation, as opposed to a single data record. The solution we propose offers guarantees of confidentiality without compromising lineage information, which provides transparency as to the relationships between the data records used and generated by the workflow modules. We provide algorithmic solutions that show how the provenance of a single module and an entire workflow can be anonymized and present the results of experiments that we conducted for their evaluation.


Author(s):  
Oliver Reinhardt ◽  
Tom Warnke ◽  
Adelinde M. Uhrmacher

AbstractConducting simulation studies within a model-based framework is a complex process, in which many different concerns must be considered. Central tasks include the specification of the simulation model, the execution of simulation runs, the conduction of systematic simulation experiments, and the management and documentation of the model’s context. In this chapter, we look into how these concerns can be separated and handled by applying domain-specific languages (DSLs), that is, languages that are tailored to specific tasks in a specific application domain. We demonstrate and discuss the features of the approach by using the modelling language ML3, the experiment specification language SESSL, and PROV, a graph-based standard to describe the provenance information underlying the multi-stage process of model development.


GigaScience ◽  
2021 ◽  
Vol 10 (12) ◽  
Author(s):  
Matthias Lange ◽  
Blaise T F Alako ◽  
Guy Cochrane ◽  
Mehmood Ghaffar ◽  
Martin Mascher ◽  
...  

Abstract Background Linking nucleotide sequence data (NSD) to scientific publication citations can enhance understanding of NSD provenance, scientific use, and reuse in the community. By connecting publications with NSD records, NSD geographical provenance information, and author geographical information, it becomes possible to assess the contribution of NSD to infer trends in scientific knowledge gain at the global level. Findings We extracted and linked records from the European Nucleotide Archive to citations in open-access publications aggregated at Europe PubMed Central. A total of 8,464,292 ENA accessions with geographical provenance information were associated with publications. We conducted a data quality review to uncover potential issues in publication citation information extraction and author affiliation tagging and developed and implemented best-practice recommendations for citation extraction. We constructed flat data tables and a data warehouse with an interactive web application to enable ad hoc exploration of NSD use and summary statistics. Conclusions The extraction and linking of NSD with associated publication citations enables transparency. The quality review contributes to enhanced text mining methods for identifier extraction and use. Furthermore, the global provision and use of NSD enable scientists worldwide to join literature and sequence databases in a multidimensional fashion. As a concrete use case, we visualized statistics of country clusters concerning NSD access in the context of discussions around digital sequence information under the United Nations Convention on Biological Diversity.


2021 ◽  
Vol 10 (47) ◽  
Author(s):  
Briana Benton ◽  
Stephen King ◽  
Samuel R. Greenfield ◽  
Nikhita Puthuveetil ◽  
Amy L. Reese ◽  
...  

Lack of data provenance negatively impacts scientific reproducibility and the reliability of genomic data. The ATCC Genome Portal ( https://genomes.atcc.org ) addresses this by providing data provenance information for microbial whole-genome assemblies originating from authenticated biological materials. To date, we have sequenced 1,579 complete genomes, including 466 type strains and 1,156 novel genomes.


2021 ◽  
Author(s):  
◽  
Benjamin Philip Palmer

<p>An increasing number of products are exclusively digital items, such as media files, licenses, services, or subscriptions. In many cases customers do not purchase these items directly from the originator of the product but through a reseller instead. Examples of some well known resellers include GoDaddy, the iTunes music store, and Amazon. This thesis considers the concept of provenance of digital items in reseller chains. Provenance is defined as the origin and ownership history of an item. In the context of digital items, the origin of the item refers to the supplier that created it and the ownership history establishes a chain of ownership from the supplier to the customer. While customers and suppliers are concerned with the provenance of the digital items, resellers will not want the details of the transactions they have taken part in made public. Resellers will require the provenance information to be anonymous and unlinkable to prevent third parties building up large amounts of information on the transactions of resellers. This thesis develops security mechanisms that provide customers and suppliers with assurances about the provenance of a digital item, even when the reseller is untrusted, while providing anonymity and unlinkability for resellers . The main contribution of this thesis is the design, development, and analysis of the tagged transaction protocol. A formal description of the problem and the security properties for anonymously providing provenance for digital items in reseller chains are defined. A thorough security analysis using proofs by contradiction shows the protocol fulfils the security requirements. This security analysis is supported by modelling the protocol and security requirements using Communicating Sequential Processes (CSP) and the Failures Divergences Refinement (FDR) model checker. An extended version of the tagged transaction protocol is also presented that provides revocable anonymity for resellers that try to conduct a cloning attack on the protocol. As well as an analysis of the security of the tagged transaction protocol, a performance analysis is conducted providing complexity results as well as empirical results from an implementation of the protocol.</p>


2021 ◽  
Author(s):  
◽  
Benjamin Philip Palmer

<p>An increasing number of products are exclusively digital items, such as media files, licenses, services, or subscriptions. In many cases customers do not purchase these items directly from the originator of the product but through a reseller instead. Examples of some well known resellers include GoDaddy, the iTunes music store, and Amazon. This thesis considers the concept of provenance of digital items in reseller chains. Provenance is defined as the origin and ownership history of an item. In the context of digital items, the origin of the item refers to the supplier that created it and the ownership history establishes a chain of ownership from the supplier to the customer. While customers and suppliers are concerned with the provenance of the digital items, resellers will not want the details of the transactions they have taken part in made public. Resellers will require the provenance information to be anonymous and unlinkable to prevent third parties building up large amounts of information on the transactions of resellers. This thesis develops security mechanisms that provide customers and suppliers with assurances about the provenance of a digital item, even when the reseller is untrusted, while providing anonymity and unlinkability for resellers . The main contribution of this thesis is the design, development, and analysis of the tagged transaction protocol. A formal description of the problem and the security properties for anonymously providing provenance for digital items in reseller chains are defined. A thorough security analysis using proofs by contradiction shows the protocol fulfils the security requirements. This security analysis is supported by modelling the protocol and security requirements using Communicating Sequential Processes (CSP) and the Failures Divergences Refinement (FDR) model checker. An extended version of the tagged transaction protocol is also presented that provides revocable anonymity for resellers that try to conduct a cloning attack on the protocol. As well as an analysis of the security of the tagged transaction protocol, a performance analysis is conducted providing complexity results as well as empirical results from an implementation of the protocol.</p>


2021 ◽  
pp. 1-42
Author(s):  
Aline Menin ◽  
Franck Michel ◽  
Fabien Gandon ◽  
Raphaël Gazzotti ◽  
Elena Cabrio ◽  
...  

Abstract The unprecedented mobilization of scientists, consequent of the COVID-19 pandemics, has generated an enormous number of scholarly articles that is impossible for a human being to keep track and explore without appropriate tool support. In this context, we created the Covid-on-the-Web project, which aims to assist the access, querying, and sense making of COVID-19 related literature by combining efforts from semantic web, natural language processing, and visualization fields. Particularly, in this paper, we present (i) an RDF dataset, a linked version of the “COVID-19 Open Research Dataset” (CORD-19), enriched via entity linking and argument mining, and (ii) the “Linked Data Visualizer” (LDViz), 28 which assists the querying and visual exploration of the referred dataset. The LDViz tool assists the exploration of different views of the data by combining a querying management interface, which enables the definition of meaningful subsets of data through SPARQL queries, and a visualization interface based on a set of six visualization techniques integrated in a chained visualization concept, which also supports the tracking of provenance information. We demonstrate the potential of our approach to assist biomedical researchers in solving domain-related tasks, as well as to perform exploratory analyses through use case scenarios.


2021 ◽  
pp. 147387162110326
Author(s):  
Deokgun Park ◽  
Mohamed Suhail ◽  
Minsheng Zheng ◽  
Cody Dunne ◽  
Eric Ragan ◽  
...  

Tracking the sensemaking process is a well-established practice in many data analysis tools, and many visualization tools facilitate overview and recall during and after exploration. However, the resulting communication materials such as presentations or infographics often omit provenance information for the sake of simplicity. This unfortunately limits later viewers from engaging in further collaborative sensemaking or discussion about the analysis. We present a design study where we introduced visual provenance and analytics to urban transportation planning. Maintaining the provenance of all analyses was critical to support collaborative sensemaking among the many and diverse stakeholders. Our system, STORYFACETS, exposes several different views of the same analysis session, each view designed for a specific audience: (1) the trail view provides a data flow canvas that supports in-depth exploration + provenance (expert analysts); (2) the dashboard view organizes visualizations and other content into a space-filling layout to support high-level analysis (managers); and (3) the slideshow view supports linear storytelling via interactive step-by-step presentations (laypersons). Views are linked so that when one is changed, provenance is maintained. Visual provenance is available on demand to support iterative sensemaking for any team member.


2021 ◽  
Vol 17 (8) ◽  
pp. e1009227
Author(s):  
Kai Budde ◽  
Jacob Smith ◽  
Pia Wilsdorf ◽  
Fiete Haack ◽  
Adelinde M. Uhrmacher

For many biological systems, a variety of simulation models exist. A new simulation model is rarely developed from scratch, but rather revises and extends an existing one. A key challenge, however, is to decide which model might be an appropriate starting point for a particular problem and why. To answer this question, we need to identify entities and activities that contributed to the development of a simulation model. Therefore, we exploit the provenance data model, PROV-DM, of the World Wide Web Consortium and, building on previous work, continue developing a PROV ontology for simulation studies. Based on a case study of 19 Wnt/β-catenin signaling models, we identify crucial entities and activities as well as useful metadata to both capture the provenance information from individual simulation studies and relate these forming a family of models. The approach is implemented in WebProv, a web application for inserting and querying provenance information. Our specialization of PROV-DM contains the entities Research Question, Assumption, Requirement, Qualitative Model, Simulation Model, Simulation Experiment, Simulation Data, and Wet-lab Data as well as activities referring to building, calibrating, validating, and analyzing a simulation model. We show that most Wnt simulation models are connected to other Wnt models by using (parts of) these models. However, the overlap, especially regarding the Wet-lab Data used for calibration or validation of the models is small. Making these aspects of developing a model explicit and queryable is an important step for assessing and reusing simulation models more effectively. Exposing this information helps to integrate a new simulation model within a family of existing ones and may lead to the development of more robust and valid simulation models. We hope that our approach becomes part of a standardization effort and that modelers adopt the benefits of provenance when considering or creating simulation models.


2021 ◽  
Author(s):  
Kerstin Gierend ◽  
Frank Krüger ◽  
Dagmar Waltemath ◽  
Maximilian Fünfgeld ◽  
Atinkut Alamirrew Zeleke ◽  
...  

BACKGROUND Provenance supports the understanding of data genesis and it is a key factor to ensure the trustworthiness of the digital objects containing (sensitive) scientific data. Provenance information contributes to a better understanding of scientific results and fosters collaboration on existing data as well as data-sharing. This encompasses defining comprehensive concepts and standards for transparency and traceability, reproducibility, validity and quality assurance during clinical and scientific data workflows and/or research. OBJECTIVE The aim of this scoping review is to investigate approaches and challenges for provenance tracking as well as disclosing current knowledge gaps in the area. The review covers modeling aspects as well as metadata frameworks for capturing meaningful and usable provenance information during creation, collection and processing of (sensitive) scientific biomedical data. The objective of the review also includes the examination of quality aspects of provenance criteria. METHODS The scoping review will follow the methodological framework by Arksey and O'Malley. Relevant publications will be obtained by querying PubMed and Web of Science. All articles in English language will be included, within the time period between 2006 and 23-March 2021. Database retrieval will be accompanied by manual search for grey literature. Potential publications will then be exported into a reference management software, and duplicates will be removed. Afterwards, the obtained set of papers will be transferred into a systematic review management tool. All publications will be screened, extracted and analyzed: title and abstract screening will be carried out by 4 independent reviewers. Majority vote is required for consent to eligibility of articles based on defined inclusion and exclusion criteria. Full-text reading will be performed independently by 2 reviewers and in the last step key information will be extracted on a template which has been evaluated by the reviewers beforehand. If agreement cannot be reached, the conflict will be resolved by a domain expert. Charted data will be analyzed by categorizing and summarizing the individual data items based on the research questions. Tabular or graphical overviews will be given, if applicable. RESULTS The reporting follows the extension of the PRISMA statements for scoping reviews (PRISMA-ScR). Electronic database searches in PubMed and Web of Science resulted in 469 matches after deduplication. As of June 2021, the scoping review is in the full text screening stage. The data extraction using the pretested charting template will follow the full text screening stage. We expect the scoping review report to be completed by the end of 2021. CONCLUSIONS Information about the origin of healthcare data has a major impact on the quality and the reusability of scientific results as well as follow-up activities. This scoping review will provide information about current approaches, challenges or knowledge gaps with provenance tracking in biomedical sciences.


Sign in / Sign up

Export Citation Format

Share Document