data provenance
Recently Published Documents


TOTAL DOCUMENTS

491
(FIVE YEARS 170)

H-INDEX

25
(FIVE YEARS 7)

2022 ◽  
pp. 167-187
Author(s):  
James Cheney ◽  
Adriane Chapman ◽  
Joy Davidson ◽  
Alistair B. Forbes
Keyword(s):  

2021 ◽  
Author(s):  
David A Yarmosh ◽  
Juan G Lopera ◽  
Nikhita P Puthuveetil ◽  
Patrick Ford Combs ◽  
Amy L Reese ◽  
...  

The quality and traceability of microbial genomics data in public databases is deteriorating as they rapidly expand and struggle to cope with data curation challenges. While the availability of public genomic data has become essential for modern life sciences research, the curation of the data is a growing area of concern that has significant real-world impacts on public health epidemiology, drug discovery, and environmental biosurveillance research. While public microbial genome databases such as NCBI's RefSeq database leverage the scalability of crowd sourcing for growth, they do not require data provenance to the original biological source materials or accurate descriptions of how the data was produced. Here, we describe the de novo assembly of 1,113 bacterial genome references produced from authenticated materials sourced from the American Type Culture Collection (ATCC), each with full data provenance. Over 98% of these ATCC Standard Reference Genomes (ASRGs) are superior to assemblies for comparable strains found in NCBI's RefSeq database. Comparative genomics analysis revealed significant issues in RefSeq bacterial genome assemblies related to genome completeness, mutations, structural differences, metadata errors, and gaps in traceability to the original biological source materials. For example, nearly half of RefSeq assemblies lack details on sample source information, sequencing technology, or bioinformatics methods. We suggest there is an intrinsic connection between the quality of genomic metadata, the traceability of the data, and the methods used to produce them with the quality of the resulting genome assemblies themselves. Our results highlight common problems with "reference genomes" and underscore the importance of data provenance for precision science and reproducibility. These gaps in metadata accuracy and data provenance represent an "elephant in the room" for microbial genomics research, but addressing these issues would require raising the level of accountability for data depositors and our own expectations of data quality.


Computing ◽  
2021 ◽  
Author(s):  
Carlos Sáenz-Adán ◽  
Francisco J. García-Izquierdo ◽  
Beatriz Pérez ◽  
Trung Dong Huynh ◽  
Luc Moreau

AbstractData provenance is a form of knowledge graph providing an account of what a system performs, describing the data involved, and the processes carried out over them. It is crucial to ascertaining the origin of data, validating their quality, auditing applications behaviours, and, ultimately, making them accountable. However, instrumenting applications, especially legacy ones, to track the provenance of their operations remains a significant technical hurdle, hindering the adoption of provenance technology. UML2PROV is a software-engineering methodology that facilitates the instrumentation of provenance recording in applications designed with UML diagrams. It automates the generation of (1) templates for the provenance to be recorded and (2) the code to capture values required to instantiate those templates from an application at run time, both from the application’s UML diagrams. By so doing, UML2PROV frees application developers from manual instrumentation of provenance capturing while ensuring the quality of recorded provenance. In this paper, we present in detail UML2PROV’s approach to generating application code for capturing provenance values via the means of Bindings Generation Module (BGM). In particular, we propose a set of requirements for BGM implementations and describe an event-based design of BGM that relies on the Aspect-Oriented Programming (AOP) paradigm to automatically weave the generated code into an application. Finally, we present three different BGM implementations following the above design and analyze their pros and cons in terms of computing/storage overheads and implications to provenance consumers.


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 7994
Author(s):  
Mpyana Mwamba Merlec ◽  
Youn Kyu Lee ◽  
Seng-Phil Hong ◽  
Hoh Peter In

A massive amount of sensitive personal data is being collected and used by scientists, businesses, and governments. This has led to unprecedented threats to privacy rights and the security of personal data. There are few solutions that empower individuals to provide systematic consent agreements on distinct personal information and control who can collect, access, and use their data for specific purposes and periods. Individuals should be able to delegate consent rights, access consent-related information, and withdraw their given consent at any time. We propose a smart-contract-based dynamic consent management system, backed by blockchain technology, targeting personal data usage under the general data protection regulation. Our user-centric dynamic consent management system allows users to control their personal data collection and consent to its usage throughout the data lifecycle. Transaction history and logs are recorded in a blockchain that provides trusted tamper-proof data provenance, accountability, and traceability. A prototype of our system was designed and implemented to demonstrate its feasibility. The acceptability and reliability of the system were assessed by experimental testing and validation processes. We also analyzed the security and privacy of the system and evaluated its performance.


2021 ◽  
Vol 10 (47) ◽  
Author(s):  
Briana Benton ◽  
Stephen King ◽  
Samuel R. Greenfield ◽  
Nikhita Puthuveetil ◽  
Amy L. Reese ◽  
...  

Lack of data provenance negatively impacts scientific reproducibility and the reliability of genomic data. The ATCC Genome Portal ( https://genomes.atcc.org ) addresses this by providing data provenance information for microbial whole-genome assemblies originating from authenticated biological materials. To date, we have sequenced 1,579 complete genomes, including 466 type strains and 1,156 novel genomes.


Author(s):  
Jonathan Oakley ◽  
Carl Worley ◽  
Lu Yu ◽  
Richard Brooks ◽  
Ilker Ozcelik ◽  
...  

Clinical trials are a multi-billion dollar industry. One of the biggest challenges facing the clinical trial research community is satisfying Part 11 of Title 21 of the Code of Federal Regulations and ISO 27789. These controls provide audit requirements that guarantee the reliability of the data contained in the electronic records. Context-aware smart devices and wearable IoT devices have become increasingly common in clinical trials. Electronic Data Capture (EDC) and Clinical Data Management Systems (CDMS) do not currently address the new challenges introduced using these devices. The healthcare digital threat landscape is continually evolving, and the prevalence of sensor fusion and wearable devices compounds the growing attack surface. We propose Scrybe, a permissioned blockchain, as a method of storing proof of clinical trial data provenance. We illustrate how Scrybe addresses each control and the limitations of the Ethereum-based blockchains. Finally, we provide a proof-of-concept integration with REDCap to show tamper resistance.


2021 ◽  
Author(s):  
Arnob Zahid ◽  
Stephen C. Wingreen ◽  
Ravishankar Sharma

BACKGROUND The current digital health context is incapable of supporting the future need for data security and storage in digital health services. It requires implementing a robust, interoperable, and scalable data storage and security solution to address this future need. Blockchain is an emerging information technology that can support this industry's timely needs. Therefore, a clear foundational understanding of Blockchain affordances for digital health is significant to harness its full potential. OBJECTIVE Objective: This paper presents a comprehensive review of Blockchain affordances for digital health. The review aims to: 1) identify the perceived Blockchain affordances and 2) explore the recent Blockchain research in digital health (actualized). METHODS We applied the Systematic Literature Review (SLR) methodology to review the literature extant. Furthermore, we applied the affordance theory lens to define and defend our findings on Blockchain affordances. RESULTS A total of 3627 relevant papers have been identified and analysed in this review study. Of these, 90 were probed deeply. Our analysis identified 14 Blockchain affordances (Access control, Interoperability, Security, Tamper-resistance, Traceability, Anonymity, Data Provenance, Identity, Immutability, Integrity, Privacy, Transparency, and Trust) which are perceived and actualized in digital health. Our study also discovered several constraints in Blockchain implementation such as security and privacy, interoperability, scalability, and infrastructural support that requires further research attention. CONCLUSIONS We believe this study will guide further Blockchain research in the digital health domain and informatively contribute to eliminating (decreasing) the dark side of digital health and improving (increasing) the bright side for the future.


Sign in / Sign up

Export Citation Format

Share Document