Deconvoluting kernel density estimation and regression for locally differentially private data

AbstractLocal differential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points in a privacy-preserving manner. However, locally differential data can twist the probability density of the data because of the additive noise used to ensure privacy. In fact, the density of privacy-preserving data (no matter how many samples we gather) is always flatter in comparison with the density function of the original data points due to convolution with privacy-preserving noise density function. The effect is especially more pronounced when using slow-decaying privacy-preserving noises, such as the Laplace noise. This can result in under/over-estimation of the heavy-hitters. This is an important challenge facing social scientists due to the use of differential privacy in the 2020 Census in the United States. In this paper, we develop density estimation methods using smoothing kernels. We use the framework of deconvoluting kernel density estimators to remove the effect of privacy-preserving noise. This approach also allows us to adapt the results from non-parametric regression with errors-in-variables to develop regression models based on locally differentially private data. We demonstrate the performance of the developed methods on financial and demographic datasets.

Download Full-text

Log-Transform Kernel Density Estimation of Income Distribution

L Actualité économique ◽

10.7202/1036917ar ◽

2016 ◽

Vol 91 (1-2) ◽

pp. 141-159 ◽

Cited By ~ 5

Author(s):

Arthur Charpentier ◽

Emmanuel Flachaire

Keyword(s):

Income Distribution ◽

Density Estimation ◽

Kernel Density Estimation ◽

Kernel Density ◽

Estimation Methods ◽

Density Functions ◽

Income Distributions ◽

Heavy Tailed Distributions ◽

Heavy Tailed ◽

Positive Support

Standard kernel density estimation methods are very often used in practice to estimate density functions. It works well in numerous cases. However, it is known not to work so well with skewed, multimodal and heavy-tailed distributions. Such features are usual with income distributions, defined over the positive support. In this paper, we show that a preliminary logarithmic transformation of the data, combined with standard kernel density estimation methods, can provide a much better fit of the density estimation.

Download Full-text

Differential privacy preserving clustering using Daubechies-2 wavelet transform

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691315500289 ◽

2015 ◽

Vol 13 (04) ◽

pp. 1550028 ◽

Cited By ~ 1

Author(s):

Mohammad Reza Ebrahimi Dishabi ◽

Mohammad Abdollahi Azgomi

Keyword(s):

Wavelet Transform ◽

Clustering Analysis ◽

Differential Privacy ◽

Original Data ◽

Privacy Preserving ◽

The Other ◽

High Dimensionality ◽

Worst Case ◽

Strong Notion

Most of the existing privacy preserving clustering (PPC) algorithms do not consider the worst case privacy guarantees and are based on heuristic notions. In addition, these algorithms do not run efficiently in the case of high dimensionality of data. In this paper, to alleviate these challenges, we propose a new PPC algorithm, which is based on Daubechies-2 wavelet transform (D2WT) and preserves the differential privacy notion. Differential privacy is the strong notion of privacy, which provides the worst case privacy guarantees. On the other hand, most of the existing differential-based PPC algorithms generate data with poor utility. If we apply differential privacy properties over the original raw data, the resulting data will offer lower quality of clustering (QOC) during the clustering analysis. Therefore, we use D2WT for the preprocessing of the original data before adding noise to the data. By applying D2WT to the original data, the resulting data not only contains lower dimension compared to the original data, but also can provide differential privacy guarantee with high QOC due to less noise addition. The proposed algorithm has been implemented and experimented over some well-known datasets. We also compare the proposed algorithm with some recently introduced algorithms based on utility and privacy degrees.

Download Full-text

Application of Kernel Density Estimation in Lamb Wave-Based Damage Detection

Mathematical Problems in Engineering ◽

10.1155/2012/406521 ◽

2012 ◽

Vol 2012 ◽

pp. 1-24 ◽

Cited By ~ 9

Author(s):

Long Yu ◽

Zhongqing Su

Keyword(s):

Damage Detection ◽

Density Estimation ◽

Kernel Density Estimation ◽

Lamb Wave ◽

Kernel Density ◽

Nonparametric Methods ◽

Measured Data ◽

Consensus Algorithm ◽

Estimation Methods ◽

Empirical Methods

The present work concerns the estimation of the probability density function (p.d.f.) of measured data in the Lamb wave-based damage detection. Although there was a number of research work which focused on the consensus algorithm of combining all the results of individual sensors, the p.d.f. of measured data, which was the fundamental part of the probability-based method, was still given by experience in existing work. Based on the analysis about the noise-induced errors in measured data, it was learned that the type of distribution was related with the level of noise. In the case of weak noise, the p.d.f. of measured data could be considered as the normal distribution. The empirical methods could give satisfied estimating results. However, in the case of strong noise, the p.d.f. was complex and did not belong to any type of common distribution function. Nonparametric methods, therefore, were needed. As the most popular nonparametric method, kernel density estimation was introduced. In order to demonstrate the performance of the kernel density estimation methods, a numerical model was built to generate the signals of Lamb waves. Three levels of white Gaussian noise were intentionally added into the simulated signals. The estimation results showed that the nonparametric methods outperformed the empirical methods in terms of accuracy.

Download Full-text

Privacy-Preserving Stacking with Application to Cross-organizational Diabetes Prediction

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/571 ◽

2019 ◽

Author(s):

Quanming Yao ◽

Xiawei Guo ◽

James Kwok ◽

Weiwei Tu ◽

Yuqiang Chen ◽

...

Keyword(s):

Transfer Learning ◽

Differential Privacy ◽

Original Data ◽

Privacy Preserving ◽

Data Sets ◽

Data Set ◽

Predicting Performance ◽

Empirical Performance ◽

Feature Based ◽

Diabetes Prediction

To meet the standard of differential privacy, noise is usually added into the original data, which inevitably deteriorates the predicting performance of subsequent learning algorithms. In this paper, motivated by the success of improving predicting performance by ensemble learning, we propose to enhance privacy-preserving logistic regression by stacking. We show that this can be done either by sample-based or feature-based partitioning. However, we prove that when privacy-budgets are the same, feature-based partitioning requires fewer samples than sample-based one, and thus likely has better empirical performance. As transfer learning is difficult to be integrated with a differential privacy guarantee, we further combine the proposed method with hypothesis transfer learning to address the problem of learning across different organizations. Finally, we not only demonstrate the effectiveness of our method on two benchmark data sets, i.e., MNIST and NEWS20, but also apply it into a real application of cross-organizational diabetes prediction from RUIJIN data set, where privacy is of a significant concern.

Download Full-text

An Analysis of Differential Privacy Research in Location and Trajectory Data

10.21203/rs.3.rs-94765/v1 ◽

2020 ◽

Author(s):

Fatima Zahra Errounda ◽

Yan Liu

Keyword(s):

Location Privacy ◽

Differential Privacy ◽

State Of The Art ◽

Original Data ◽

Privacy Preserving ◽

The State ◽

Trajectory Data ◽

Powerful Technique ◽

Location Data ◽

Single User

Abstract Location and trajectory data are routinely collected to generate valuable knowledge about users' pattern behavior. However, releasing location data may jeopardize the privacy of the involved individuals. Differential privacy is a powerful technique that prevents an adversary from inferring the presence or absence of an individual in the original data solely based on the observed data. The first challenge in applying differential privacy in location is that a it usually involves a single user. This shifts the adversary's target to the user's locations instead of presence or absence in the original data. The second challenge is that the inherent correlation between location data, due to people's movement regularity and predictability, gives the adversary an advantage in inferring information about individuals. In this paper, we review the differentially private approaches to tackle these challenges. Our goal is to help newcomers to the field to better understand the state-of-the art by providing a research map that highlights the different challenges in designing differentially private frameworks that tackle the characteristics of location data. We find that in protecting an individual's location privacy, the attention of differential privacy mechanisms shifts to preventing the adversary from inferring the original location based on the observed one. Moreover, we find that the privacy-preserving mechanisms make use of the predictability and regularity of users' movements to design and protect the users' privacy in trajectory data. Finally, we explore how well the presented frameworks succeed in protecting users' locations and trajectories against well-known privacy attacks.

Download Full-text

How the Distribution of After-Tax Income Changed Over the 1990s Business Cycle: A Comparison of the United States, Great Britain, Germany and Japan

Journal of Income Distribution ◽

10.25071/1874-6322.15297 ◽

2008 ◽

pp. 87

Author(s):

Richard V. Burkhauser ◽

Takashi Oshio ◽

Ludmila Rovba

Keyword(s):

United States ◽

Great Britain ◽

Business Cycles ◽

Business Cycle ◽

Density Estimation ◽

Kernel Density ◽

The United States ◽

The Right ◽

Older Populations ◽

Entire Distribution

Using kernel density estimation we find that over the 1990s business cycles in the United States and Great Britain the entire distribution of after-tax (disposable) income moved to the right while inequality declined. In contrast, Germany and Japan experienced less growth, a rise in inequality, and a decline in the middle mass of their distributions, that spread mostly to the right, much like in the United States over its 1980s business cycle. Inequality fell within the older populations of all four countries; inequality also fell within the younger populations of the United States and Great Britain, but it rose substantially in Germany and Japan.

Download Full-text

The weighted kernel density estimation methods for analysing reliability of electricity supply

2016 17th International Scientific Conference on Electric Power Engineering (EPE) ◽

10.1109/epe.2016.7521729 ◽

2016 ◽

Cited By ~ 9

Author(s):

Miroslaw Kornatka

Keyword(s):

Density Estimation ◽

Kernel Density Estimation ◽

Kernel Density ◽

Estimation Methods ◽

Electricity Supply ◽

Weighted Kernel

Download Full-text

Privacy Preserving RBF Kernel Support Vector Machine

BioMed Research International ◽

10.1155/2014/827371 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 10

Author(s):

Haoran Li ◽

Li Xiong ◽

Lucila Ohno-Machado ◽

Xiaoqian Jiang

Keyword(s):

Support Vector Machine ◽

Performance Metrics ◽

Differential Privacy ◽

Privacy Preserving ◽

Healthcare Research ◽

Support Vector ◽

Biomedical Data ◽

Private Data ◽

Public Data ◽

Rbf Kernel

Data sharing is challenging but important for healthcare research. Methods for privacy-preserving data dissemination based on the rigorous differential privacy standard have been developed but they did not consider the characteristics of biomedical data and make full use of the available information. This often results in too much noise in the final outputs. We hypothesized that this situation can be alleviated by leveraging a small portion of open-consented data to improve utility without sacrificing privacy. We developed a hybrid privacy-preserving differentially private support vector machine (SVM) model that uses public data and private data together. Our model leverages the RBF kernel and can handle nonlinearly separable cases. Experiments showed that this approach outperforms two baselines: (1) SVMs that only use public data, and (2) differentially private SVMs that are built from private data. Our method demonstrated very close performance metrics compared to nonprivate SVMs trained on the private data.

Download Full-text

FIRE OCCURRENCE PATTERNS AT LANDSCAPE LEVEL: BEYOND POSITIONAL ACCURACY OF IGNITION POINTS WITH KERNEL DENSITY ESTIMATION METHODS

Natural Resource Modeling ◽

10.1111/j.1939-7445.2004.tb00141.x ◽

2008 ◽

Vol 17 (4) ◽

pp. 359-375 ◽

Cited By ~ 41

Author(s):

NIKOS KOUTSIAS ◽

KOSTAS D. KALABOKIDIS ◽

BRITTA ALLGÖWER

Keyword(s):

Density Estimation ◽

Kernel Density Estimation ◽

Kernel Density ◽

Estimation Methods ◽

Fire Occurrence ◽

Positional Accuracy ◽

Occurrence Patterns ◽

Landscape Level

Download Full-text

A Defense Framework for Privacy Risks in Remote Machine Learning Service

Security and Communication Networks ◽

10.1155/2021/9924684 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Yang Bai ◽

Yu Li ◽

Mingchuang Xie ◽

Mingyu Fan

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Original Data ◽

Privacy Preserving ◽

Training Data ◽

Sensitive Information ◽

Learning Approaches ◽

Local Data ◽

Sensitive Data ◽

Privacy Risks

In recent years, machine learning approaches have been widely adopted for many applications, including classification. Machine learning models deal with collective sensitive data usually trained in a remote public cloud server, for instance, machine learning as a service (MLaaS) system. In this scene, users upload their local data and utilize the computation capability to train models, or users directly access models trained by MLaaS. Unfortunately, recent works reveal that the curious server (that trains the model with users’ sensitive local data and is curious to know the information about individuals) and the malicious MLaaS user (who abused to query from the MLaaS system) will cause privacy risks. The adversarial method as one of typical mitigation has been studied by several recent works. However, most of them focus on the privacy-preserving against the malicious user; in other words, they commonly consider the data owner and the model provider as one role. Under this assumption, the privacy leakage risks from the curious server are neglected. Differential privacy methods can defend against privacy threats from both the curious sever and the malicious MLaaS user by directly adding noise to the training data. Nonetheless, the differential privacy method will decrease the classification accuracy of the target model heavily. In this work, we propose a generic privacy-preserving framework based on the adversarial method to defend both the curious server and the malicious MLaaS user. The framework can adapt with several adversarial algorithms to generate adversarial examples directly with data owners’ original data. By doing so, sensitive information about the original data is hidden. Then, we explore the constraint conditions of this framework which help us to find the balance between privacy protection and the model utility. The experiments’ results show that our defense framework with the AdvGAN method is effective against MIA and our defense framework with the FGSM method can protect the sensitive data from direct content exposed attacks. In addition, our method can achieve better privacy and utility balance compared to the existing method.

Download Full-text