Exploring the impact of LSH parameters in privacy-preserving personalization

2014 ◽  
Vol 18 (4) ◽  
pp. 33-44 ◽  
Author(s):  
Armen Aghasaryan ◽  
Makram Bouzid ◽  
Dimitre Kostadinov ◽  
Animesh Nandi
Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1754
Author(s):  
Depeng Chen ◽  
Carlos Borrego ◽  
Guillermo Navarro-Arribas

This paper focuses on the problem of providing anonymous communications in opportunistic networks. To that end, we propose an approach using Mix networks that enables a relatively simple solution. Opportunistic networks present some constraints that make the deployment of typical network anonymity solutions difficult or infeasible. We show, utilizing simulations on the basis of real mobility traces, that the proposed solution is feasible for some scenarios by introducing a tolerable penalty in terms of message delay and delivery. To investigate the impact of routing strategies, we offer two different methods to select Mix nodes. From the experiment results, we show the trade-off between network performance and security.


2021 ◽  
Vol 2021 ◽  
pp. 1-18
Author(s):  
Zihao Shao ◽  
Huiqiang Wang ◽  
Yifan Zou ◽  
Zihan Gao ◽  
Hongwu Lv

Mobile Crowdsensing (MCS) has evolved into an effective and valuable paradigm to engage mobile users to sense and collect urban-scale information. However, users risk their location privacy while reporting data with actual sensing locations. Existing works of location privacy-preserving are primarily based on single-region location information, which rely on a trusted and centralized sensing platform and ignore the impact of regional differences on user privacy-preserving demands. To tackle this issue, we propose a Location Difference-Based Privacy-Preserving Framework (LDPF), leveraging the powerful edge servers deployed between users and the sensing platform to hide and manage users according to regional user characteristics. More specifically, for popular regions, based on the edge servers and the k-anonymity algorithm, we propose a Coordinate Transformation and Bit Commitment (CTBC) privacy-preserving method that effectively guarantees the privacy of location data without relying on a trusted sensing platform. For remote regions, based on a more realistic distance calculation mode, we design a Paillier Encryption Data Coding (PDC) privacy-preserving method that realizes the secure computation for users’ location and prevents malicious users from deceiving. The theoretical analysis and simulation results demonstrate the security and efficiency of the proposed framework in location difference-based privacy-preserving.


2010 ◽  
Vol 143-144 ◽  
pp. 38-42
Author(s):  
Cai Mei Wang ◽  
Yi Min Guo ◽  
Ya Jun Guo ◽  
Yan Hua Guo

This paper proposes a novel entropy-based metric for evaluating silent cascade which is a prevalent trajectory privacy preserving method in LBSs (location-based services). Within this measure, the trajectory privacy is quantified as the probability of the relevance between the user’s pseudonym before and after each mix-zone. After a period of time, the tracked user may take many potential trajectories from the perspective of the adversary. The user’s trajectory privacy level is calculated using information entropy. The most distinguishable aspect of the measure is to take into account the adversarial background knowledge. We develop methods to describe and quantify the adversarial background knowledge. Simulation results reflect the impact of background knowledge on the privacy level in the metric, and show that this metric is effective and valuable to measure the user’s trajectory privacy level correctly even the adversary has variable background knowledge.


2019 ◽  
Vol 9 (15) ◽  
pp. 3034
Author(s):  
Mohamed Ben Haj Frej ◽  
Julius Dichter ◽  
Navarun Gupta

Cloud computing is reserving its position in the market as the next disruptive utility paradigm. It is found on the pay-as-you-use model. Cloud computing is changing the way information technology (IT) operates for individuals as well as for companies. Cloud computing comes with different offerings to accommodate diverse applications. It comes with many successful adoption stories and a few unfortunate ones that are related to security breaches. Security concerns are what is making many companies reluctant to fully embrace the cloud realm. To enhance trust and entice adoption between cloud clients (CC) and cloud service providers (CSP), a new paradigm of depending on involving a third-party auditor (TPA) has been introduced. Hence, implementing a solution with a TPA comes with its toll in terms of trust and processing overhead. A lightweight security protocol to give the CC extra control with tools to audit the TPA and the CSP is paramount to the solution. In this paper, we are introducing a novel protocol: the lightweight accountable privacy-preserving (LAPP) protocol. Our proposed protocol is lightweight in terms of processing and communication costs. It is based on a newly introduced mathematical model along with two algorithms. We have conducted simulation experiments to measure the impact of our method. We have compared LAPP to the most eminent privacy-preserving methods in the cloud research field, using the open source cloud computing simulator GreenCloud. Our simulation results showed superiority in performance for LAPP in regard to time complexity, accuracy, and computation time on auditing. The aim of the time complexity and computation time on auditing simulations is to measure the lightweight aspect of our proposed protocol as well as to improve the quality of service.


Author(s):  
Mona Mohamed ◽  
Sahar Ghanem ◽  
Magdy Nagi

Privacy-preserving data publishing have been studied widely on static data. However, many recent applications generate data streams that are real-time, unbounded, rapidly changing, and distributed in nature. Recently, few work addressed k-anonymity and l-diversity for data streams. Their model implied that if the stream is distributed, it is collected at a central site for anonymization. In this paper, we propose a novel distributed model where distributed streams are first anonymized by distributed (collecting) sites before merging and releasing. Our approach extends Continuously Anonymizing STreaming data via adaptive cLustEring (CASTLE) [4], a cluster-based approach that provides both k-anonymity and l-diversity for centralized data streams. The main idea is for each site to construct its local clustering model and exchange this local view with other sites to globally construct approximately the same clustering view. The approach is heuristic in a sense that not every update to the local view is sent, instead triggering events are selected for exchanging cluster information. Extensive experiments on a real data set are performed to study the introduced Information Loss (IL) on different settings. First, the impact of the different parameters on IL are quantified. Then k-anonymity and l-diversity are compared in terms of messaging cost and IL. Finally, the effectiveness of the proposed distributed model is studied by comparing the introduced IL to the IL of the centralized model (as a lower bound) and to a distributed model with no communication (as an upper bound). The experimental results show that the main contributing factor to IL is the number of attributes in the quasi-identifier (50%-75%) and the number of sites contributed about 1% and this proves the scalability of the proposed approach. In addition, providing l-diversity is shown to introduce about 25% increase in IL when compared to k-anonymity. Moreover, 35% reduction in IL is achieved by messaging cost (in bytes) of about 0.3% of the data set size


Author(s):  
Xin Wu ◽  
Hao Zheng ◽  
Zuochao Dou ◽  
Feng Chen ◽  
Jieren Deng ◽  
...  

Abstract Genome-wide association studies (GWAS) have been widely used for identifying potential risk variants in various diseases. A statistically meaningful GWAS typically requires a large sample size to detect disease-associated single nucleotide polymorphisms (SNPs). However, a single institution usually only possesses a limited number of samples. Therefore, cross-institutional partnerships are required to increase sample size and statistical power. However, cross-institutional partnerships offer significant challenges, a major one being data privacy. For example, the privacy awareness of people, the impact of data privacy leakages and the privacy-related risks are becoming increasingly important, while there is no de-identification standard available to safeguard genomic data sharing. In this paper, we introduce a novel privacy-preserving federated GWAS framework (iPRIVATES). Equipped with privacy-preserving federated analysis, iPRIVATES enables multiple institutions to jointly perform GWAS analysis without leaking patient-level genotyping data. Only aggregated local statistics are exchanged within the study network. In addition, we evaluate the performance of iPRIVATES through both simulated data and a real-world application for identifying potential risk variants in ankylosing spondylitis (AS). The experimental results showed that the strongest signal of AS-associated SNPs reside mostly around the human leukocyte antigen (HLA) regions. The proposed iPRIVATES framework achieved equivalent results as traditional centralized implementation, demonstrating its great potential in driving collaborative genomic research for different diseases while preserving data privacy.


Author(s):  
Rainer Schnell ◽  
Christian Borgs

IntroductionDiagnostic codes, such as the ICD-10, may be considered as sensitive information. If such codes have to be encoded using current methods for data linkage, all hierarchical information given by the code positions will be lost. We present a technique (HPBFs) for preserving the hierarchical information of the codes while protecting privacy. The new method modifies a widely used Privacy-preserving Record Linkage (PPRL) technique based on Bloom filters for the use with hierarchical codes. Objectives and ApproachAssessing the similarities of hierarchical codes requires considering the code positions of two codes in a given diagnostic hierarchy. The hierarchical similarities of the original diagnostic code pairs should correspond closely to the similarity of the encoded pairs of the same code. Furthermore, to assess the hierarchy-preserving properties of an encoding, the impact on similarity measures from differing code positions at all levels of the code hierarchy can be evaluated. A full match of codes should yield a higher similarity than partial matches. Finally, the new method is tested against ad-hoc solutions as an addition to a standard PPRL setup. This is done using real-world mortality data with a known link status of two databases. ResultsIn all applications for encoded ICD codes where either categorical discrimination, relational similarity or linkage quality in a PPRL setting is required, HPBFs outperform other known methods. Lower mean differences and smaller confidence intervals between clear-text codes and encrypted code pairs were observed, indicating better preservation of hierarchical similarities. Finally, using these techniques allows for much better hierarchical discrimination for partial matches. ConclusionThe new technique yields better linkage results than all other known methods to encrypt hierarchical codes. In all tests, comparing categorical discrimination, relational similarity and PPRL linkage quality, HPBFs outperformed methods currently used.


Author(s):  
Rainer Schnell ◽  
Christian Borgs

BackgroundData on newborns is regularly linked for epidemiological research. However, hospital data often suffers from incomplete data. We report on a linkage of two population-covering administrative health databases containing neonatal and perinatal data without unique personal identifiers and with incomplete information in standard patient identifiers. GoalTo study the effects of a policy-induced change from linking a national database without standard patient identifiers to a privacy-preserving Record Linkage method, we compare the linkage system in use to clear-text and privacy-preserving Record Linkage techniques. We expected large proportions of missing identifiers since they are not needed for clinical practice. Therefore, we expected missing links caused by missing identifiers. To study the impact of these missing identifiers on these successful links, we compared several linkage methods. Furthermore, we study the variations of linkage success between hospitals. MethodsPerinatal and neonatal data from population-covering real-world administrative databases was linked using several variants of state of the art methods, including Privacy-preserving Record Linkage (PPRL) techniques such as multiple match keys and Bloom filter methods. Results We report on the variation of linkage results between the hospitals and give possible explanations for the differences. The resulting linkage success is reported for each method. The impact of incomplete data on linkage success for each method is documented. Finally, we report on the relative performance of the modified techniques compared to standard linkage procedures used in practice. ConclusionImplementing a record linkage system based on identifiers not required for clinical practice caused a large number of missing identifiers. Since this information is essential for successful clear-text and private linkage methods, emphasizing the need for documenting patient identifiers, especially in cases where auxiliary information (such as stable addresses, date of birth or health insurance numbers) are missing, is of central importance for implementing a privacy-preserving Record Linkage system.


2021 ◽  
Vol 11 (16) ◽  
pp. 7360
Author(s):  
Andreea Bianca Popescu ◽  
Ioana Antonia Taca ◽  
Cosmin Ioan Nita ◽  
Anamaria Vizitiu ◽  
Robert Demeter ◽  
...  

Data privacy is a major concern when accessing and processing sensitive medical data. A promising approach among privacy-preserving techniques is homomorphic encryption (HE), which allows for computations to be performed on encrypted data. Currently, HE still faces practical limitations related to high computational complexity, noise accumulation, and sole applicability the at bit or small integer values level. We propose herein an encoding method that enables typical HE schemes to operate on real-valued numbers of arbitrary precision and size. The approach is evaluated on two real-world scenarios relying on EEG signals: seizure detection and prediction of predisposition to alcoholism. A supervised machine learning-based approach is formulated, and training is performed using a direct (non-iterative) fitting method that requires a fixed and deterministic number of steps. Experiments on synthetic data of varying size and complexity are performed to determine the impact on runtime and error accumulation. The computational time for training the models increases but remains manageable, while the inference time remains in the order of milliseconds. The prediction performance of the models operating on encoded and encrypted data is comparable to that of standard models operating on plaintext data.


Sign in / Sign up

Export Citation Format

Share Document