Generalized random rotation perturbation for vertically partitioned data sets

Objective To develop an accurate logistic regression (LR) algorithm to support federated data analysis of vertically partitioned distributed data sets. Material and Methods We propose a novel technique that solves the binary LR problem by dual optimization to obtain a global solution for vertically partitioned data. We evaluated this new method, VERTIcal Grid lOgistic regression (VERTIGO), in artificial and real-world medical classification problems in terms of the area under the receiver operating characteristic curve, calibration, and computational complexity. We assumed that the institutions could “align” patient records (through patient identifiers or hashed “privacy-protecting” identifiers), and also that they both had access to the values for the dependent variable in the LR model (eg, that if the model predicts death, both institutions would have the same information about death). Results The solution derived by VERTIGO has the same estimated parameters as the solution derived by applying classical LR. The same is true for discrimination and calibration over both simulated and real data sets. In addition, the computational cost of VERTIGO is not prohibitive in practice. Discussion There is a technical challenge in scaling up federated LR for vertically partitioned data. When the number of patients m is large, our algorithm has to invert a large Hessian matrix. This is an expensive operation of time complexity O(m3) that may require large amounts of memory for storage and exchange of information. The algorithm may also not work well when the number of observations in each class is highly imbalanced. Conclusion The proposed VERTIGO algorithm can generate accurate global models to support federated data analysis of vertically partitioned data.

Download Full-text

Fast and Secure Back-Propagation Learning Using Vertically Partitioned Data with IoT

2019 Seventh International Symposium on Computing and Networking Workshops (CANDARW) ◽

10.1109/candarw.2019.00085 ◽

2019 ◽

Author(s):

Hirofumi Miyajima ◽

Hiromi Miyajima ◽

Norio Shiratori

Keyword(s):

Back Propagation ◽

Partitioned Data ◽

Vertically Partitioned Data

Download Full-text

Equally contributory privacy-preserving k-means clustering over vertically partitioned data

Information Systems ◽

10.1016/j.is.2012.06.001 ◽

2013 ◽

Vol 38 (1) ◽

pp. 97-107 ◽

Cited By ~ 24

Author(s):

Xun Yi ◽

Yanchun Zhang

Keyword(s):

Privacy Preserving ◽

Partitioned Data ◽

Vertically Partitioned Data

Download Full-text

The Relationship between Harvest and Survival Rates of Mallards: A Straightforward Approach with Partitioned Data Sets

Journal of Wildlife Management ◽

10.2307/3808506 ◽

1983 ◽

Vol 47 (2) ◽

pp. 334 ◽

Cited By ~ 13

Author(s):

James D. Nichols ◽

James E. Hines

Keyword(s):

Survival Rates ◽

Data Sets ◽

Partitioned Data ◽

The Relationship ◽

Straightforward Approach

Download Full-text

Partition Based Perturbation for Privacy Preserving Distributed Data Mining

Cybernetics and Information Technologies ◽

10.1515/cait-2017-0015 ◽

2017 ◽

Vol 17 (2) ◽

pp. 44-55 ◽

Cited By ~ 1

Author(s):

M. Antony Sheela ◽

K. Vijayalakshmi

Keyword(s):

Data Mining ◽

Threshold Level ◽

Third Party ◽

Distributed Data Mining ◽

Distributed Data ◽

Data Perturbation ◽

Private Data ◽

Partitioned Data ◽

Vertically Partitioned Data ◽

The Mean

Abstract Data mining on vertically or horizontally partitioned dataset has the overhead of protecting the private data. Perturbation is a technique that protects the revealing of data. This paper proposes a perturbation and anonymization technique that is performed on the vertically partitioned data. A third-party coordinator is used to partition the data recursively in various parties. The parties perturb the data by finding the mean, when the specified threshold level is reached. The perturbation maintains the statistical relationship among attributes.

Download Full-text