Compact, tamper-resistant archival of fine-grained provenance

Data provenance tools aim to facilitate reproducible data science and auditable data analyses, by tracking the processes and inputs responsible for each result of an analysis. Fine-grained provenance further enables sophisticated reasoning about why individual output results appear or fail to appear. However, for reproducibility and auditing, we need a provenance archival system that is tamper-resistant , and efficiently stores provenance for computations computed over time (i.e., it compresses repeated results). We study this problem, developing solutions for storing fine-grained provenance in relational storage systems while both compressing and protecting it via cryptographic hashes. We experimentally validate our proposed solutions using both scientific and OLAP workloads.

Download Full-text

In or Out?

10.1093/oso/9780198793380.003.0003 ◽

2018 ◽

Author(s):

Catherine E. De Vries

Keyword(s):

Public Opinion ◽

Status Quo ◽

Data Sources ◽

Fine Grained ◽

Alternative State ◽

Empirical Measurement ◽

The Status ◽

People’S Attitudes ◽

Over Time ◽

The Eu

This chapter introduces a benchmark theory of public opinion towards European integration. Rather than relying on generic labels like support or scepticism, the chapter suggests that public opinion towards the EU is both multidimensional and multilevel in nature. People’s attitudes towards Europe are essentially based on a comparison between the benefits of the status quo of membership and those associated with an alternative state, namely one’s country being outside the EU. This comparison is coined the ‘EU differential’. When comparing these benefits, people rely on both their evaluations of the outcomes (policy evaluations) and the system that produces them (regime evaluations). This chapter presents a fine-grained conceptualization of what it means to be an EU supporter or Eurosceptic; it also designs a careful empirical measurement strategy to capture variation, both cross-nationally and over time. The chapter cross-validates these measures against a variety of existing and newly developed data sources.

Download Full-text

Citizen sociolinguists scaling back

Applied Linguistics Review ◽

10.1515/applirev-2019-0133 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Betsy Rymes ◽

Gareth Smail

Keyword(s):

Social Meaning ◽

Social Significance ◽

Communicative Practices ◽

Everyday Language ◽

Fine Grained ◽

Multilingual Communication ◽

The Social ◽

Linguistic Practices ◽

Over Time

AbstractThis paper examines the different ways that professional experts and everyday language users engage in scaling practices to claim authority when they talk about multilingual practices and the social significance they assign to them. Specifically, we compare sociolinguists’ use of the term translanguaging to describe multilingual and multimodal practices to the diverse observations of amateur online commentators, or citizen sociolinguists. Our analysis focuses on commentary on cross-linguistic communicative practices in Wales, or “things Welsh people say.” We ultimately argue that by calling practices “translanguaging” and defaulting to scaled-up interpretations of multilingual communication, sociolinguists are increasingly missing out on analyses of how the social meaning of (cross)linguistic practices accrues and evolves within specific communities over time. By contrast, the fine-grained perceptions of “citizen sociolinguists” as they discuss their own communicative practices in context may have something unique and underexamined to offer us as researchers of communicative diversity.

Download Full-text

In the eye of the recipient

Scientific Study of Literature ◽

10.1075/ssol.4.2.05rie ◽

2014 ◽

Vol 4 (2) ◽

pp. 211-232 ◽

Cited By ~ 3

Author(s):

Katrin Riese ◽

Mareike Bayer ◽

Gerhard Lauer ◽

Annekathrin Schacht

Keyword(s):

Pupil Diameter ◽

Methodological Approach ◽

Narrative Fiction ◽

Data Analyses ◽

Different Dimensions ◽

Literary Classics ◽

The 19Th Century ◽

Physiological Indicator ◽

Over Time ◽

Cognitive Dimensions

Plot suspense is one of the most important components of narrative fiction that motivate recipients to follow fictional characters through their worlds. The present study investigates the dynamic development of narrative suspense in excerpts of literary classics from the 19th century in a multi-methodological approach. For two texts, differing in suspense as judged by a large independent sample, we collected (a) data from questionnaires, indicating different affective and cognitive dimensions of receptive engagement, (b) continuous ratings of suspense during text reception from both experts and lay recipients, and (c) registration of pupil diameter as a physiological indicator of changes in emotional arousal and attention during reception. Data analyses confirmed differences between the two texts at different dimensions of receptive engagement and, importantly, revealed significant correlations of pupil diameter and the course of suspense over time. Our findings demonstrate that changes of the pupil diameter provide a reliable ‘online’ indicator of suspense.

Download Full-text

Disparity and Dynamics of Social Distancing Behaviors in Japan: An Investigation of mobile phone mobility data (Preprint)

10.2196/preprints.31557 ◽

2021 ◽

Author(s):

Zeyu Lyu ◽

Hiroki Takikawa

Keyword(s):

Large Scale ◽

Age Groups ◽

Mitigation Strategies ◽

Social Distancing ◽

Demographic Groups ◽

Fine Grained ◽

Mobility Data ◽

The Social ◽

The Government ◽

Over Time

BACKGROUND The availability of large-scale and fine-grained aggregated mobility data has allowed researchers to observe the dynamic of social distancing behaviors at high spatial and temporal resolutions. Despite the increasing attentions paid to this research agenda, limited studies have focused on the demographic factors related to mobility and the dynamics of social distancing behaviors has not been fully investigated. OBJECTIVE This study aims to assist in the design and implementation of public health policies by exploring the social distancing behaviors among various demographic groups over time. METHODS We combined several data sources, including mobile tracking data and geographical statistics, to estimate visiting population of entertainment venues across demographic groups, which can be considered as the proxy of social distancing behaviors. Then, we employed time series analyze methods to investigate how voluntary and policy-induced social distancing behaviors shift over time across demographic groups. RESULTS Our findings demonstrate distinct patterns of social distancing behaviors and their dynamics across age groups. The population in the entertainment venues comprised mainly of individuals aged 20–40 years, while according to the dynamics of the mobility index and the policy-induced behavior, among the age groups, the extent of reduction of the frequency of visiting entertainment venues during the pandemic was generally the highest among younger individuals. Also, our results indicate the importance of implementing the social distancing policy promptly to limit the spread of the COVID-19 infection. However, it should be noticed that although the policy intervention during the second wave in Japan appeared to increase the awareness of the severity of the pandemic and concerns regarding COVID-19, its direct impact has been largely decreased could only last for a short time. CONCLUSIONS At the time we wrote this paper, in Japan, the number of daily confirmed cases was continuously increasing. Thus, this study provides a timely reference for decision makers about the current situation of policy-induced compliance behaviors. On the one hand, age-dependent disparity requires target mitigation strategies to increase the intention of elderly individuals to adopt mobility restriction behaviors. On the other hand, considering the decreasing impact of self-restriction recommendations, the government should employ policy interventions that limit the resurgence of cases, especially by imposing stronger, stricter social distancing interventions, as they are necessary to promote social distancing behaviors and mitigate the transmission of COVID-19. CLINICALTRIAL None

Download Full-text

Probabilistic Inference of Fine-Grained Data Provenance

Lecture Notes in Computer Science - Database and Expert Systems Applications ◽

10.1007/978-3-642-32600-4_22 ◽

2012 ◽

pp. 296-310 ◽

Cited By ~ 1

Author(s):

Mohammad Rezwanul Huq ◽

Peter M. G. Apers ◽

Andreas Wombacher

Keyword(s):

Probabilistic Inference ◽

Data Provenance ◽

Fine Grained

Download Full-text

Introducing Advanced Fine-grained Security in dCache-SRM for PetaByte-scale Storage Systems on Global Data Grids: gPLAZMA `grid-aware PLuggable AuthoriZation MAnagement System'

2006 IEEE Nuclear Science Symposium Conference Record ◽

10.1109/nssmic.2006.356233 ◽

2006 ◽

Cited By ~ 1

Author(s):

Abhishek Singh Rana ◽

Frank Wurthwein ◽

Timur Perelmutov ◽

Robert Kennedy ◽

Jon Bakken ◽

...

Keyword(s):

Management System ◽

Storage Systems ◽

Data Grids ◽

Fine Grained ◽

Authorization Management ◽

Global Data

Download Full-text

Replication of Data Analyses

Stepping in the Same River Twice ◽

10.12987/yale/9780300209549.003.0014 ◽

2017 ◽

Cited By ~ 1

Author(s):

Emery R. Boose ◽

Barbara S. Lerner

Keyword(s):

Original Work ◽

Statistical Tests ◽

Scientific Paper ◽

Scientific Data ◽

Data Sources ◽

Narrative Form ◽

Data Provenance ◽

General Description ◽

Data Set ◽

Data Analyses

The metadata that describe how scientific data are created and analyzed are typically limited to a general description of data sources, software used, and statistical tests applied and are presented in narrative form in the methods section of a scientific paper or a data set description. Recognizing that such narratives are usually inadequate to support reproduction of the analysis of the original work, a growing number of journals now require that authors also publish their data. However, finer-scale metadata that describe exactly how individual items of data were created and transformed and the processes by which this was done are rarely provided, even though such metadata have great potential to improve data set reliability. This chapter focuses on the detailed process metadata, called “data provenance,” required to ensure reproducibility of analyses and reliable re-use of the data.

Download Full-text

QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science

10.7287/peerj.preprints.27295v2 ◽

2018 ◽

Cited By ~ 90

Author(s):

Evan Bolyen ◽

Jai Ram Rideout ◽

Matthew R Dillon ◽

Nicholas A Bokulich ◽

Christian Abnet ◽

...

Keyword(s):

Data Science ◽

Temporal Analysis ◽

Data Provenance ◽

Policy Makers ◽

Shotgun Metagenomics ◽

Spatial And Temporal Analysis ◽

Microbiome Research ◽

Scientists And Engineers ◽

Visualization Tools ◽

Microbiome Data

We present QIIME 2, an open-source microbiome data science platform accessible to users spanning the microbiome research ecosystem, from scientists and engineers to clinicians and policy makers. QIIME 2 provides new features that will drive the next generation of microbiome research. These include interactive spatial and temporal analysis and visualization tools, support for metabolomics and shotgun metagenomics analysis, and automated data provenance tracking to ensure reproducible, transparent microbiome data science.

Download Full-text

Fine-grained lineage for safer notebook interactions

Proceedings of the VLDB Endowment ◽

10.14778/3447689.3447712 ◽

2021 ◽

Vol 14 (6) ◽

pp. 1093-1101

Author(s):

Stephen Macke ◽

Hongpu Gong ◽

Doris Jung-Lin Lee ◽

Andrew Head ◽

Doris Xin ◽

...

Keyword(s):

Static Analysis ◽

Intermediate State ◽

Data Science ◽

Safety Issues ◽

Fine Grained ◽

Potential Safety

Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called "cells", notebooks allow users to enjoy particularly tight feedback. However, as cells are added, removed, reordered, and rerun, this hidden intermediate state accumulates, making execution behavior difficult to reason about, and leading to errors and lack of reproducibility. We present nbsafety, a custom Jupyter kernel that uses runtime tracing and static analysis to automatically manage lineage associated with cell execution and global notebook state. nbsafety detects and prevents errors that users make during unaided notebook interactions, all while preserving the flexibility of existing notebook semantics. We evaluate nbsafety's ability to prevent erroneous interactions by replaying and analyzing 666 real notebook sessions. Of these, nbsafety identified 117 sessions with potential safety errors, and in the remaining 549 sessions, the cells that nbsafety identified as resolving safety issues were more than 7X more likely to be selected by users for re-execution compared to a random baseline, even though the users were not using nbsafety and were therefore not influenced by its suggestions.

Download Full-text

Big Data, Data Science, and Career Pathways

Career Pathways ◽

10.1093/oso/9780190907785.003.0014 ◽

2020 ◽

pp. 239-254

Author(s):

David W. Dorsey

Keyword(s):

Big Data ◽

Data Science ◽

Career Pathways ◽

Unstructured Data ◽

Future Application ◽

The Internet ◽

The Future ◽

Skill Requirements ◽

Enormous Number ◽

Over Time

With the rise of the internet and the related explosion in the amount of data that are available, the field of data science has expanded rapidly, and analytic techniques designed for use in “big data” contexts have become popular. These include techniques for analyzing both structured and unstructured data. This chapter explores the application of these techniques to the development and evaluation of career pathways. For example, data scientists can analyze online job listings and resumes to examine changes in skill requirements and careers over time and to examine job progressions across an enormous number of people. Similarly, analysts can evaluate whether information on career pathways accurately captures realistic job progressions. Within organizations, the increasing amount of data make it possible to pinpoint the specific skills, behaviors, and attributes that maximize performance in specific roles. The chapter concludes with ideas for the future application of big data to career pathways.

Download Full-text