Privacy protected graphical functionality in DataSHIELD

ABSTRACT ObjectivesIn several disciplines such as in biomedicine and social sciences the analysis of individual-level data or the co-analysis of data from different studies requires the pooling and the sharing of those data. However, sharing and combining sensitive individual-level data is often prohibited by ethico-legal constraints and other barriers such as the control maintenance and the huge sample sizes. The graphical illustration of microdata is also often forbidden as can potentially be unsecured on the identification of sensitive information. For example the plot of a standard scatterplot is disclosive as can explicitly specify the exact values of two measurements for each single individual. ApproachDataSHIELD (www.datashield.ac.uk) is a novel approach that allows the analysis of sensitive individual-level data and the co-analysis of such data from several studies simultaneously without physically pooling the data. ResultsDataSHIELD functionality consists of several functions that provide the flexibility of performing data analysis through different statistical techniques. A part of this environment includes a number of graphical-related functions for the graphical illustration of the statistical properties and relationships between different variables. We overview the graphical functions in DataSHIELD (ds.histogram, ds.heatmapPlot, ds.contourPlot) and demonstrate a number of new functions including ds.scatterPlot and ds.boxPlot developed based on the application of different computational approaches like the k-Nearest Neighbours algorithm and ensuring privacy protected analysis. ConclusionDataSHIELD graphical functionality has certain methodological features for the representation of the relationships between different variables preserving their statistical properties and assuring the data privacy protection. These graphical approaches can be used or enhanced for application in various areas where confidentiality and information sensitivity is considered, for example in longitudinal data and survival analysis, in epidemiological studies, in geospatial analysis and several others.

Download Full-text

UNDER THE WATCHFUL EYE: USERS’ PERCEPTIONS OF ONLINE PRIVACY AND SURVEILLANCE

AoIR Selected Papers of Internet Research ◽

10.5210/spir.v2020i0.11335 ◽

2020 ◽

Author(s):

Alecea Irene Standlee

Keyword(s):

Data Privacy ◽

The United States ◽

Group Affiliation ◽

Individual Level ◽

Internet Users ◽

Level Data ◽

Qualitative Work ◽

The Impact ◽

Depth Interviews ◽

Fundamental Shift

This project seeks to contribute to the question, “How do internet users navigate data privacy in a digitally surveilled online world?” I augment this ongoing discussion by examining the perceptions and practices concerning privacy and self-representation in digital spaces among young adults, 18-22. This qualitative work utilizes in-depth interviews of college students in the United States to collect both behavioral and attitudinal patterns. Specifically, I consider the impact of the strategic interventions of corporate and governmental platforms to collect, distribute, and utilize individual level data on research participants’ information consumption, individual identity representation, and group affiliation. A preliminary analysis of the data finds participants engage in narrative rationalizations to help them navigate the cultural expectations of online engagement within a surveilled environment. Patterns of strategic self-representation are shaped by such rationalizations and justifications, including a fundamental shift in what the concept "privacy" means in an online world.

Download Full-text

A software package for the application of probabilistic anonymisation to sensitive individual-level data: a proof of principle with an example from the ALSPAC birth cohort study

Longitudinal and Life Course Studies ◽

10.14301/llcs.v9i4.478 ◽

2018 ◽

Vol 9 (4) ◽

pp. 433-446 ◽

Cited By ~ 1

Author(s):

Demetris Avraam ◽

Andy Boyd ◽

Harvey Goldstein ◽

Paul Burton

Keyword(s):

Cohort Study ◽

Birth Cohort ◽

Software Package ◽

Birth Cohort Study ◽

Individual Level ◽

Level Data ◽

Sensitive Individual ◽

Proof Of Principle

Download Full-text

Variant-specific inflation factors for assessing population stratification at the phenotypic variance level

Nature Communications ◽

10.1038/s41467-021-23655-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Tamar Sofer ◽

Xiuwen Zheng ◽

Cecelia A. Laurie ◽

Stephanie M. Gogarten ◽

Jennifer A. Brody ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Statistical Power ◽

Epidemiological Studies ◽

Phenotypic Variance ◽

Whole Genome ◽

Association Analyses ◽

Individual Level ◽

Level Data ◽

The Impact

AbstractIn modern Whole Genome Sequencing (WGS) epidemiological studies, participant-level data from multiple studies are often pooled and results are obtained from a single analysis. We consider the impact of differential phenotype variances by study, which we term ‘variance stratification’. Unaccounted for, variance stratification can lead to both decreased statistical power, and increased false positives rates, depending on how allele frequencies, sample sizes, and phenotypic variances vary across the studies that are pooled. We develop a procedure to compute variant-specific inflation factors, and show how it can be used for diagnosis of genetic association analyses on pooled individual level data from multiple studies. We describe a WGS-appropriate analysis approach, implemented in freely-available software, which allows study-specific variances and thereby improves performance in practice. We illustrate the variance stratification problem, its solutions, and the proposed diagnostic procedure, in simulations and in data from the Trans-Omics for Precision Medicine Whole Genome Sequencing Program (TOPMed), used in association tests for hemoglobin concentrations and BMI.

Download Full-text

Linking Individual-level Facebook Posts with Psychological and Health Data in an Epidemiologic Cohort: A Feasibility Study (Preprint)

10.2196/preprints.32423 ◽

2021 ◽

Author(s):

Peter James ◽

Claudia Trudel-Fitzgerald ◽

Harold H Lee ◽

Hayami K Koga ◽

Laura D Kubzansky ◽

...

Keyword(s):

Social Media ◽

Older Women ◽

Psychological Factors ◽

Data Privacy ◽

Third Party ◽

Health Study ◽

Participation Rates ◽

Individual Level ◽

Exit Survey ◽

Level Data

BACKGROUND Psychological factors (e.g., depression, optimism) and related biological and behavioral responses are associated with numerous physical health outcomes. The majority of research in this area relies on self-reported assessments of psychological factors, which are difficult to scale because they may be expensive to administer and time-consuming to complete. Investigators are increasingly interested in using social media as a novel and convenient platform for obtaining information rapidly in large populations. OBJECTIVE We evaluated the feasibility of obtaining Facebook data from a large ongoing cohort of midlife and older women which may be used to assess psychological functioning efficiently with low cost. METHODS This protocol was conducted with participants in the Nurses’ Health Study II (NHSII) which was started in 1989 with biennial follow-ups. Facebook does not share data readily; therefore, we developed procedures to enable women to download and transfer their Facebook data to the cohort servers (for linkage with other study data they have provided). Since privacy is a critical concern when collecting individual-level data, we partnered with a third-party software developer, Digi.me, to enable participants to obtain their own Facebook data and to send it securely to our research team. In 2020, we invited a subset of the 18,519 NHSII participants (aged 56-73 years) via email to participate. Women were selected if they reported on the 2017-2018 questionnaire that they regularly posted to Facebook and were still active cohort participants. We included an exit survey for those who chose not to participate to gauge reasons for non-participation. RESULTS We invited 309 women to participate. Few women signed the consent form (N=52) and only three used the Digi.me app to download and transfer their Facebook data. These low participation rates were observed despite modifying our protocol between waves of recruitment, including by 1) excluding active healthcare workers, who might be less available to participate due to the pandemic; 2) developing a Frequently Asked Questions factsheet to provide more information regarding the protocol; and 3) simplifying the instructions for using the Digi.me app. On our exit survey, reasons most commonly reported for not participating were concerns regarding data privacy and hesitation sharing personal Facebook posts. The low participation rates suggest that obtaining individual-level Facebook data in a cohort of middle-aged and older women may be challenging. CONCLUSIONS In this cohort of midlife and older women who were actively participating for over three decades, we were largely unable to obtain permission to access to individual-level data from participants’ Facebook accounts. Despite working with a third-party to customize an app to implement safeguards for privacy, data privacy remained a key concern in these women. Future studies aiming to leverage individual-level social media should explore alternate populations or means of sharing social media data.

Download Full-text

Orchestrating privacy-protected big data analyses of data from different resources with R and DataSHIELD

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008880 ◽

2021 ◽

Vol 17 (3) ◽

pp. e1008880

Author(s):

Yannick Marcon ◽

Tom Bishop ◽

Demetris Avraam ◽

Xavier Escriba-Montagut ◽

Patricia Ryser-Welch ◽

...

Keyword(s):

Big Data ◽

Epidemiological Studies ◽

Individual Level ◽

Data Analyses ◽

Level Data ◽

Data Integration System ◽

R Packages ◽

Legal Constraints ◽

Online Book ◽

The Individual

Combined analysis of multiple, large datasets is a common objective in the health- and biosciences. Existing methods tend to require researchers to physically bring data together in one place or follow an analysis plan and share results. Developed over the last 10 years, the DataSHIELD platform is a collection of R packages that reduce the challenges of these methods. These include ethico-legal constraints which limit researchers’ ability to physically bring data together and the analytical inflexibility associated with conventional approaches to sharing results. The key feature of DataSHIELD is that data from research studies stay on a server at each of the institutions that are responsible for the data. Each institution has control over who can access their data. The platform allows an analyst to pass commands to each server and the analyst receives results that do not disclose the individual-level data of any study participants. DataSHIELD uses Opal which is a data integration system used by epidemiological studies and developed by the OBiBa open source project in the domain of bioinformatics. However, until now the analysis of big data with DataSHIELD has been limited by the storage formats available in Opal and the analysis capabilities available in the DataSHIELD R packages. We present a new architecture (“resources”) for DataSHIELD and Opal to allow large, complex datasets to be used at their original location, in their original format and with external computing facilities. We provide some real big data analysis examples in genomics and geospatial projects. For genomic data analyses, we also illustrate how to extend the resources concept to address specific big data infrastructures such as GA4GH or EGA, and make use of shell commands. Our new infrastructure will help researchers to perform data analyses in a privacy-protected way from existing data sharing initiatives or projects. To help researchers use this framework, we describe selected packages and present an online book (https://isglobal-brge.github.io/resource_bookdown).

Download Full-text

Alternatives to Social Science One

Political Science and Politics ◽

10.1017/s1049096520000438 ◽

2020 ◽

Vol 53 (4) ◽

pp. 710-711

Author(s):

Margaret Levi ◽

Betsy Rajala

Keyword(s):

Social Science ◽

Administrative Data ◽

Third Party ◽

Social Scientists ◽

Social Good ◽

Individual Level ◽

Level Data ◽

Highly Sensitive ◽

Research Facilities ◽

Sensitive Individual

ABSTRACTThis article responds to King and Persily’s (2019) proposal for a new model of industry–academic partnership using an independent third party to mediate between firms and academics. We believe this is a reasonable proposal for highly sensitive individual-level data, but it may not be appropriate for all types of data. We explore alternative options to their proposal, including Administrative Data Research Facilities, Data Collaboratives at GovLab, and Tech Data for Social Good Initiative at the Center for Advanced Study in the Behavioral Sciences. We believe social scientists should continue to explore, evaluate, and scale a variety of industry–academic data-sharing models.

Download Full-text

Islam and Female Education: Evidence from Individual-level Data

SSRN Electronic Journal ◽

10.2139/ssrn.366340 ◽

2003 ◽

Author(s):

Mandana Hajj ◽

Ugo G. Panizza

Keyword(s):

Female Education ◽

Individual Level ◽

Level Data

Download Full-text

Review of Associations between Built Environment Characteristics and Severe Acute Respiratory Syndrome Coronavirus 2 Infection Risk

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18147561 ◽

2021 ◽

Vol 18 (14) ◽

pp. 7561

Author(s):

Jingjing Wang ◽

Xueying Wu ◽

Ruoyu Wang ◽

Dongsheng He ◽

Dongying Li ◽

...

Keyword(s):

Severe Acute Respiratory Syndrome ◽

Built Environment ◽

Longitudinal Research ◽

Empirical Studies ◽

Infection Risk ◽

Design Guidelines ◽

Individual Level ◽

Transit Accessibility ◽

Level Data ◽

Transmission Pathways

The coronavirus disease 2019 pandemic has stimulated intensive research interest in its transmission pathways and infection factors, e.g., socioeconomic and demographic characteristics, climatology, baseline health conditions or pre-existing diseases, and government policies. Meanwhile, some empirical studies suggested that built environment attributes may be associated with the transmission mechanism and infection risk of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, no review has been conducted to explore the effect of built environment characteristics on the infection risk. This research gap prevents government officials and urban planners from creating effective urban design guidelines to contain SARS-CoV-2 infections and face future pandemic challenges. This review summarizes evidence from 25 empirical studies and provides an overview of the effect of built environment on SARS-CoV-2 infection risk. Virus infection risk was positively associated with the density of commercial facilities, roads, and schools and with public transit accessibility, whereas it was negatively associated with the availability of green spaces. This review recommends several directions for future studies, namely using longitudinal research design and individual-level data, considering multilevel factors and extending to diversified geographic areas.

Download Full-text

The Role of Snack Choices, Body Weight Stereotypes and Smoking Behavior in Assessing Risk Factors for Adolescent Overweight and Obesity

Foods ◽

10.3390/foods10030557 ◽

2021 ◽

Vol 10 (3) ◽

pp. 557

Author(s):

Elena Raptou

Keyword(s):

Body Weight ◽

Smoking Behavior ◽

Negative Influence ◽

Overweight And Obesity ◽

Individual Level ◽

Level Data ◽

Prevention Interventions ◽

Frequent Consumption ◽

Adolescent Overweight ◽

Relationship Of

This study investigated the relationship of behavioral factors, such as snack choices, obesity stereotypes and smoking with adolescents’ body weight. Individual-level data for 1254 Greek youths were selected via a formal questionnaire. Snack choices seem to be gender specific with girls showing a stronger preference for healthier snacks. Frequent consumption of high-calorie and more filling snacks was found to increase Body Mass Index (BMI) in both genders. Fruit/vegetable snacks were associated with lower body weight in females, whereas cereal/nut snacks had a negative influence in males’ BMI. The majority of participants expressed anti-fat attitudes and more boys than girls assigned positive attributes to lean peers. The endorsement of the thin-ideal was positively associated with the BMI of both adolescent boys and girls. This study also revealed that neglecting potential endogeneity issues can lead to biased estimates of smoking. Gender may be a crucial moderator of smoking–BMI relationships. Male smokers presented a higher obesity risk, whereas female smokers were more likely to be underweight. Nutrition professionals should pay attention to increase the acceptance of healthy snack options. Gender differences in the influence of weight stereotypes and smoking on BMI should be considered in order to enhance the efficacy of obesity prevention interventions.

Download Full-text

When Marriage Gets Hard: Intra-Coalition Conflict and Electoral Accountability

Comparative Political Studies ◽

10.1177/00104140211024307 ◽

2021 ◽

pp. 001041402110243

Author(s):

Carolina Plescia ◽

Sylvia Kritzinger

Keyword(s):

Economic Performance ◽

Political Representation ◽

Economic Evaluations ◽

Responsibility Attribution ◽

Electoral Accountability ◽

Individual Level ◽

Coalition Governments ◽

Level Data ◽

European Election ◽

Election Studies

Combining individual-level with event-level data across 25 European countries and three sets of European Election Studies, this study examines the effect of conflict between parties in coalition government on electoral accountability and responsibility attribution. We find that conflict increases punishment for poor economic performance precisely because it helps clarify to voters parties’ actions and responsibilities while in office. The results indicate that under conditions of conflict, the punishment is equal for all coalition partners when they share responsibility for poor economic performance. When there is no conflict within a government, the effect of poor economic evaluations on vote choice is rather low, with slightly more punishment targeted to the prime minister’s party. These findings have important implications for our understanding of electoral accountability and political representation in coalition governments.

Download Full-text