scholarly journals Differential Privacy under Dependent Tuples - The Case of Genomic Privacy

2019 ◽  
Author(s):  
Nour Almadhoun ◽  
Erman Ayday ◽  
Özgür Ulusoy

Abstract Motivation The rapid progress in genome sequencing has led to high availability of genomic data. However, due to growing privacy concerns about the participant’s sensitive information, accessing results and data of genomic studies is restricted to only trusted individuals. On the other hand, paving the way to biomedical discoveries requires granting open access to genomic databases. Privacy-preserving mechanisms can be a solution for granting wider access to such data while protecting their owners. In particular, there has been growing interest in applying the concept of differential privacy (DP) while sharing summary statistics about genomic data. DP provides a mathematically rigorous approach but it does not consider the dependence between tuples in a database, which may degrade the privacy guarantees offered by the DP. Results In this work, focusing on genomic databases, we show this drawback of DP and we propose techniques to mitigate it. First, using a real-world genomic dataset, we demonstrate the feasibility of an inference attack on differentially private query results by utilizing the correlations between the tuples in the dataset. The results show that the adversary can infer sensitive genomic data about a user from the differentially private query results by exploiting correlations between genomes of family members. Second, we propose a mechanism for privacy-preserving sharing of statistics from genomic datasets to attain privacy guarantees while taking into consideration the dependence between tuples. By evaluating our mechanism on different genomic datasets, we empirically demonstrate that our proposed mechanism can achieve up to 50% better privacy than traditional DP-based solutions. Availability https://github.com/nourmadhoun/Differential-privacy-genomic-inference-attack. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Dennis Grishin ◽  
Jean Louis Raisaro ◽  
Juan Ramón Troncoso-Pastoriza ◽  
Kamal Obbad ◽  
Kevin Quinn ◽  
...  

AbstractThe growing number of health-data breaches, the use of genomic databases for law enforcement purposes and the lack of transparency of personal-genomics companies are raising unprecedented privacy concerns. To enable a secure exploration of genomic datasets with controlled and transparent data access, we propose a novel approach that combines cryptographic privacy-preserving technologies, such as homomorphic encryption and secure multi-party computation, with the auditability of blockchains. This approach provides strong security guarantees against realistic threat models by empowering individual citizens to decide who can query and access their genomic data and by ensuring end-to-end data confidentiality. Our open-source implementation supports queries on the encrypted genomic data of hundreds of thousands of individuals, with minimal overhead. Our work opens a path towards multi-functional, privacy-preserving genomic-data analysis.One Sentence SummaryA citizen-centered open-source response to the privacy concerns that hinder population genomics, based on modern cryptography.


Author(s):  
Mete Akgün ◽  
Ali Burak Ünal ◽  
Bekir Ergüner ◽  
Nico Pfeifer ◽  
Oliver Kohlbacher

Abstract Motivation The use of genome data for diagnosis and treatment is becoming increasingly common. Researchers need access to as many genomes as possible to interpret the patient genome, to obtain some statistical patterns and to reveal disease–gene relationships. The sensitive information contained in the genome data and the high risk of re-identification increase the privacy and security concerns associated with sharing such data. In this article, we present an approach to identify disease-associated variants and genes while ensuring patient privacy. The proposed method uses secure multi-party computation to find disease-causing mutations under specific inheritance models without sacrificing the privacy of individuals. It discloses only variants or genes obtained as a result of the analysis. Thus, the vast majority of patient data can be kept private. Results Our prototype implementation performs analyses on thousands of genomic data in milliseconds, and the runtime scales logarithmically with the number of patients. We present the first inheritance model (recessive, dominant and compound heterozygous) based privacy-preserving analyses of genomic data to find disease-causing mutations. Furthermore, we re-implement the privacy-preserving methods (MAX, SETDIFF and INTERSECTION) proposed in a previous study. Our MAX, SETDIFF and INTERSECTION implementations are 2.5, 1122 and 341 times faster than the corresponding operations of the state-of-the-art protocol, respectively. Availability and implementation https://gitlab.com/DIFUTURE/privacy-preserving-genomic-diagnosis. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 10 (18) ◽  
pp. 6396
Author(s):  
Jong Wook Kim ◽  
Su-Mee Moon ◽  
Sang-ug Kang ◽  
Beakcheol Jang

The popularity of wearable devices equipped with a variety of sensors that can measure users’ health status and monitor their lifestyle has been increasing. In fact, healthcare service providers have been utilizing these devices as a primary means to collect considerable health data from users. Although the health data collected via wearable devices are useful for providing healthcare services, the indiscriminate collection of an individual’s health data raises serious privacy concerns. This is because the health data measured and monitored by wearable devices contain sensitive information related to the wearer’s personal health and lifestyle. Therefore, we propose a method to aggregate health data obtained from users’ wearable devices in a privacy-preserving manner. The proposed method leverages local differential privacy, which is a de facto standard for privacy-preserving data processing and aggregation, to collect sensitive health data. In particular, to mitigate the error incurred by the perturbation mechanism of location differential privacy, the proposed scheme first samples a small number of salient data that best represents the original health data, after which the scheme collects the sampled salient data instead of the entire set of health data. Our experimental results show that the proposed sampling-based collection scheme achieves significant improvement in the estimated accuracy when compared with straightforward solutions. Furthermore, the experimental results verify that an effective tradeoff between the level of privacy protection and the accuracy of aggregate statistics can be achieved with the proposed approach.


2021 ◽  
Author(s):  
Jude TCHAYE-KONDI ◽  
Yanlong Zhai ◽  
Liehuang Zhu

<div>We address privacy and latency issues in the edge/cloud computing environment while training a centralized AI model. In our particular case, the edge devices are the only data source for the model to train on the central server. Current privacy-preserving and reducing network latency solutions rely on a pre-trained feature extractor deployed on the devices to help extract only important features from the sensitive dataset. However, finding a pre-trained model or pubic dataset to build a feature extractor for certain tasks may turn out to be very challenging. With the large amount of data generated by edge devices, the edge environment does not really lack data, but its improper access may lead to privacy concerns. In this paper, we present DeepGuess , a new privacy-preserving, and latency aware deeplearning framework. DeepGuess uses a new learning mechanism enabled by the AutoEncoder(AE) architecture called Inductive Learning, which makes it possible to train a central neural network using the data produced by end-devices while preserving their privacy. With inductive learning, sensitive data remains on devices and is not explicitly involved in any backpropagation process. The AE’s Encoder is deployed on devices to extracts and transfers important features to the server. To enhance privacy, we propose a new local deferentially private algorithm that allows the Edge devices to apply random noise to features extracted from their sensitive data before transferred to an untrusted server. The experimental evaluation of DeepGuess demonstrates its effectiveness and ability to converge on a series of experiments.</div>


Author(s):  
Shuo Han ◽  
George J. Pappas

Many modern dynamical systems, such as smart grids and traffic networks, rely on user data for efficient operation. These data often contain sensitive information that the participating users do not wish to reveal to the public. One major challenge is to protect the privacy of participating users when utilizing user data. Over the past decade, differential privacy has emerged as a mathematically rigorous approach that provides strong privacy guarantees. In particular, differential privacy has several useful properties, including resistance to both postprocessing and the use of side information by adversaries. Although differential privacy was first proposed for static-database applications, this review focuses on its use in the context of control systems, in which the data under processing often take the form of data streams. Through two major applications—filtering and optimization algorithms—we illustrate the use of mathematical tools from control and optimization to convert a nonprivate algorithm to its private counterpart. These tools also enable us to quantify the trade-offs between privacy and system performance.


Author(s):  
Ferdinando Fioretto ◽  
Lesia Mitridati ◽  
Pascal Van Hentenryck

This paper introduces a differentially private (DP) mechanism to protect the information exchanged during the coordination of sequential and interdependent markets. This coordination represents a classic Stackelberg game and relies on the exchange of sensitive information between the system agents. The paper is motivated by the observation that the perturbation introduced by traditional DP mechanisms fundamentally changes the underlying optimization problem and even leads to unsatisfiable instances. To remedy such limitation, the paper introduces the Privacy-Preserving Stackelberg Mechanism (PPSM), a framework that enforces the notions of feasibility and fidelity (i.e. near-optimality) of the privacy-preserving information to the original problem objective. PPSM complies with the notion of differential privacy and ensures that the outcomes of the privacy-preserving coordination mechanism are close-to-optimality for each agent. Experimental results on several gas and electricity market benchmarks based on a real case study demonstrate the effectiveness of the proposed approach. A full version of this paper [Fioretto et al., 2020b] contains complete proofs and additional discussion on the motivating application.


2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i136-i145
Author(s):  
Nour Almadhoun ◽  
Erman Ayday ◽  
Özgür Ulusoy

Abstract Motivation The rapid decrease in the sequencing technology costs leads to a revolution in medical research and clinical care. Today, researchers have access to large genomic datasets to study associations between variants and complex traits. However, availability of such genomic datasets also results in new privacy concerns about personal information of the participants in genomic studies. Differential privacy (DP) is one of the rigorous privacy concepts, which received widespread interest for sharing summary statistics from genomic datasets while protecting the privacy of participants against inference attacks. However, DP has a known drawback as it does not consider the correlation between dataset tuples. Therefore, privacy guarantees of DP-based mechanisms may degrade if the dataset includes dependent tuples, which is a common situation for genomic datasets due to the inherent correlations between genomes of family members. Results In this article, using two real-life genomic datasets, we show that exploiting the correlation between the dataset participants results in significant information leak from differentially private results of complex queries. We formulate this as an attribute inference attack and show the privacy loss in minor allele frequency (MAF) and chi-square queries. Our results show that using the results of differentially private MAF queries and utilizing the dependency between tuples, an adversary can reveal up to 50% more sensitive information about the genome of a target (compared to original privacy guarantees of standard DP-based mechanisms), while differentially privacy chi-square queries can reveal up to 40% more sensitive information. Furthermore, we show that the adversary can use the inferred genomic data obtained from the attribute inference attack to infer the membership of a target in another genomic dataset (e.g. associated with a sensitive trait). Using a log-likelihood-ratio test, our results also show that the inference power of the adversary can be significantly high in such an attack even using inferred (and hence partially incorrect) genomes. Availability and implementation https://github.com/nourmadhoun/Inference-Attacks-Differential-Privacy


Author(s):  
Neelu khare ◽  
Kumaran U.

The tremendous growth of social networking systems enables the active participation of a wide variety of users. This has led to an increased probability of security and privacy concerns. In order to solve the issue, the article defines a secure and privacy-preserving approach to protect user data across Cloud-based online social networks. The proposed approach models social networks as a directed graph, such that a user can share sensitive information with other users only if there exists a directed edge from one user to another. The connectivity between data users data is efficiently shared using an attribute-based encryption (ABE) with different data access levels. The proposed ABE technique makes use of a trapdoor function to re-encrypt the data without the use of proxy re-encryption techniques. Experimental evaluation states that the proposed approach provides comparatively better results than the existing techniques.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Kok-Seng Wong ◽  
Myung Ho Kim

Advances in both sensor technologies and network infrastructures have encouraged the development of smart environments to enhance people’s life and living styles. However, collecting and storing user’s data in the smart environments pose severe privacy concerns because these data may contain sensitive information about the subject. Hence, privacy protection is now an emerging issue that we need to consider especially when data sharing is essential for analysis purpose. In this paper, we consider the case where two agents in the smart environment want to measure the similarity of their collected or stored data. We use similarity coefficient functionFSCas the measurement metric for the comparison with differential privacy model. Unlike the existing solutions, our protocol can facilitate more than one request to computeFSCwithout modifying the protocol. Our solution ensures privacy protection for both the inputs and the computedFSCresults.


Sign in / Sign up

Export Citation Format

Share Document