scholarly journals Private information leakage from functional genomics data: Quantification with calibration experiments and reduction via data sanitization protocols

2018 ◽  
Author(s):  
Gamze Gürsoy ◽  
Prashant Emani ◽  
Charlotte M. Brannon ◽  
Otto A. Jolanki ◽  
Arif Harmanci ◽  
...  

AbstractThe generation of functional genomics datasets is surging, as they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intention of functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to share raw reads for better analyses and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, thus enabling principled privacy-utility trade-offs. It works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA-sequencing. The procedure depends on quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples.

Cell ◽  
2020 ◽  
Vol 183 (4) ◽  
pp. 905-917.e16
Author(s):  
Gamze Gürsoy ◽  
Prashant Emani ◽  
Charlotte M. Brannon ◽  
Otto A. Jolanki ◽  
Arif Harmanci ◽  
...  

2019 ◽  
Author(s):  
Gamze Gürsoy ◽  
Charlotte M. Brannon ◽  
Fabio C.P. Navarro ◽  
Mark Gerstein

AbstractFunctional genomics data is becoming clinically actionable, raising privacy concerns. However, quantifying the privacy leakage by genotyping is difficult due to the heterogeneous nature of sequencing techniques. Thus, we present FANCY, a tool that rapidly estimates number of leaking variants from raw RNA-Seq, ATAC-Seq and ChIP-Seq reads, without explicit genotyping. FANCY employs supervised regression using overall sequencing statistics as features and provides an estimate of the overall privacy risk before data release. FANCY can predict the cumulative number of leaking SNVs with a 0.95 average R2 for all independent test sets. We acknowledged the importance of accurate prediction even when the number of leaked variants is low, so we developed a special version of model, which can make predictions with higher accuracy for only a few leaking variants. A python and MATLAB implementation of FANCY, as well as custom scripts to generate the features can be found at https://github.com/gersteinlab/FANCY. We also provide jupyter notebooks so that users can optimize the parameters in the regression model based on their own data. An easy-to-use webserver that takes inputs and displays results can be found at fancy.gersteinlab.org.


Author(s):  
Gamze Gürsoy ◽  
Charlotte M Brannon ◽  
Fabio C P Navarro ◽  
Mark Gerstein

Abstract Motivation Functional genomics data are becoming clinically actionable, raising privacy concerns. However, quantifying privacy leakage via genotyping is difficult due to the heterogeneous nature of sequencing techniques. Thus, we present FANCY, a tool that rapidly estimates the number of leaking variants from raw RNA-Seq, ATAC-Seq and ChIP-Seq reads, without explicit genotyping. FANCY employs supervised regression using overall sequencing statistics as features and provides an estimate of the overall privacy risk before data release. Results FANCY can predict the cumulative number of leaking SNVs with an average 0.95 R2 for all independent test sets. We realize the importance of accurate prediction when the number of leaked variants is low. Thus, we develop a special version of the model, which can make predictions with higher accuracy when the number of leaking variants is low. Availability and implementation A python and MATLAB implementation of FANCY, as well as custom scripts to generate the features can be found at https://github.com/gersteinlab/FANCY. We also provide jupyter notebooks so that users can optimize the parameters in the regression model based on their own data. An easy-to-use webserver that takes inputs and displays results can be found at fancy.gersteinlab.org. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 32 (6) ◽  
pp. 1679-1703 ◽  
Author(s):  
Le Wang ◽  
Zao Sun ◽  
Xiaoyong Dai ◽  
Yixin Zhang ◽  
Hai-hua Hu

Purpose The purpose of this paper is to facilitate understanding of how to mitigate the privacy concerns of users who have experienced privacy invasions. Design/methodology/approach Drawing on the communication privacy management theory, the authors developed a model suggesting that privacy concerns form through a cognitive process involving threat-coping appraisals, institutional privacy assurances and privacy experiences. The model was tested using data from an empirical survey with 913 randomly selected social media users. Findings Privacy concerns are jointly determined by perceived privacy risks and privacy self-efficacy. The perceived effectiveness of institutional privacy assurances in terms of established privacy policies and privacy protection technology influences the perceptions of privacy risks and privacy self-efficacy. More specifically, privacy invasion experiences are negatively associated with the perceived effectiveness of institutional privacy assurances. Research limitations/implications Privacy concerns are conceptualized as general concerns that reflect an individual’s worry about the possible loss of private information. The specific types of private information were not differentiated. Originality/value This paper is among the first to clarify the specific mechanisms through which privacy invasion experiences influence privacy concerns. Privacy concerns have long been viewed as resulting from individual actions. The study contributes to literature by linking privacy concerns with institutional privacy practice.


2021 ◽  
Vol 10 (1) ◽  
pp. 25
Author(s):  
Junghwan Kim ◽  
Mei-Po Kwan

This paper examines people’s privacy concerns, perceptions of social benefits, and acceptance of various COVID-19 control measures that harness location information using data collected through an online survey in the U.S. and South Korea. The results indicate that people have higher privacy concerns for methods that use more sensitive and private information. The results also reveal that people’s perceptions of social benefits are low when their privacy concerns are high, indicating a trade-off relationship between privacy concerns and perceived social benefits. Moreover, the acceptance by South Koreans for most mitigation methods is significantly higher than that by people in the U.S. Lastly, the regression results indicate that South Koreans (compared to people in the U.S.) and people with a stronger collectivist orientation tend to have higher acceptance for the control measures because they have lower privacy concerns and perceive greater social benefits for the measures. These findings advance our understanding of the important role of geographic context and culture as well as people’s experiences of the mitigation measures applied to control a previous pandemic.


2021 ◽  
Vol 24 (3) ◽  
pp. 1-36
Author(s):  
Meisam Mohammady ◽  
Momen Oqaily ◽  
Lingyu Wang ◽  
Yuan Hong ◽  
Habib Louafi ◽  
...  

As network security monitoring grows more sophisticated, there is an increasing need for outsourcing such tasks to third-party analysts. However, organizations are usually reluctant to share their network traces due to privacy concerns over sensitive information, e.g., network and system configuration, which may potentially be exploited for attacks. In cases where data owners are convinced to share their network traces, the data are typically subjected to certain anonymization techniques, e.g., CryptoPAn, which replaces real IP addresses with prefix-preserving pseudonyms. However, most such techniques either are vulnerable to adversaries with prior knowledge about some network flows in the traces or require heavy data sanitization or perturbation, which may result in a significant loss of data utility. In this article, we aim to preserve both privacy and utility through shifting the trade-off from between privacy and utility to between privacy and computational cost. The key idea is for the analysts to generate and analyze multiple anonymized views of the original network traces: Those views are designed to be sufficiently indistinguishable even to adversaries armed with prior knowledge, which preserves the privacy, whereas one of the views will yield true analysis results privately retrieved by the data owner, which preserves the utility. We formally analyze the privacy of our solution and experimentally evaluate it using real network traces provided by a major ISP. The experimental results show that our approach can significantly reduce the level of information leakage (e.g., less than 1% of the information leaked by CryptoPAn) with comparable utility.


2013 ◽  
Vol 53 (8) ◽  
pp. 796 ◽  
Author(s):  
Karl Behrendt ◽  
Oscar Cacho ◽  
James M. Scott ◽  
Randall Jones

This study addresses the problem of balancing the trade-offs between the need for animal production, profit, and the goal of achieving persistence of desirable species within grazing systems. The bioeconomic framework applied in this study takes into account the impact of climate risk and the management of pastures and grazing rules on the botanical composition of the pasture resource, a factor that impacts on livestock production and economic returns over time. The framework establishes the links between inputs, the state of the pasture resource and outputs, to identify optimal pasture development strategies. The analysis is based on the application of a dynamic pasture resource development simulation model within a seasonal stochastic dynamic programming framework. This enables the derivation of optimum decisions within complex grazing enterprises, over both short-term tactical (such as grazing rest) and long-term strategic (such as pasture renovation) time frames and under climatic uncertainty. The simulation model is parameterised using data and systems from the Cicerone Project farmlet experiment. Results indicate that the strategic decision of pasture renovation should only be considered when pastures are in a severely degraded state, whereas the tactical use of grazing rest or low stocking rates should be considered as the most profitable means of maintaining adequate proportions of desirable species within a pasture sward. The optimal stocking rates identified reflected a pattern which may best be described as a seasonal saving and consumption cycle. The optimal tactical and strategic decisions at different pasture states, based on biomass and species composition, varies both between seasons and in response to the imposed soil fertility regime. Implications of these findings at the whole-farm level are discussed in the context of the Cicerone Project farmlets.


2014 ◽  
Vol 281 (1796) ◽  
pp. 20141733 ◽  
Author(s):  
Alexandra Alvergne ◽  
Virpi Lummaa

The negative wealth–fertility relationship brought about by market integration remains a puzzle to classic evolutionary models. Evolutionary ecologists have argued that this phenomenon results from both stronger trade-offs between reproductive and socioeconomic success in the highest social classes and the comparison of groups rather than individuals. Indeed, studies in contemporary low fertility settings have typically used aggregated samples that may mask positive wealth–fertility relationships. Furthermore, while much evidence attests to trade-offs between reproductive and socioeconomic success, few studies have explicitly tested the idea that such constraints are intensified by market integration. Using data from Mongolia, a post-socialist nation that underwent mass privatization, we examine wealth–fertility relationships over time and across a rural–urban gradient. Among post-reproductive women, reproductive fitness is the lowest in urban areas, but increases with wealth in all regions. After liberalization, a demographic–economic paradox emerges in urban areas: while educational attainment negatively impacts female fertility in all regions, education uniquely provides socioeconomic benefits in urban contexts. As market integration progresses, socio-economic returns to education increase and women who limit their reproduction to pursue education get wealthier. The results support the view that selection favoured mechanisms that respond to opportunities for status enhancement rather than fertility maximization.


Sign in / Sign up

Export Citation Format

Share Document