Scalable Privacy-Preserving Distributed Learning

Abstract In this paper, we address the problem of privacy-preserving distributed learning and the evaluation of machine-learning models by analyzing it in the widespread MapReduce abstraction that we extend with privacy constraints. We design spindle (Scalable Privacy-preservINg Distributed LEarning), the first distributed and privacy-preserving system that covers the complete ML workflow by enabling the execution of a cooperative gradient-descent and the evaluation of the obtained model and by preserving data and model confidentiality in a passive-adversary model with up to N −1 colluding parties. spindle uses multiparty homomorphic encryption to execute parallel high-depth computations on encrypted data without significant overhead. We instantiate spindle for the training and evaluation of generalized linear models on distributed datasets and show that it is able to accurately (on par with non-secure centrally-trained models) and efficiently (due to a multi-level parallelization of the computations) train models that require a high number of iterations on large input data with thousands of features, distributed among hundreds of data providers. For instance, it trains a logistic-regression model on a dataset of one million samples with 32 features distributed among 160 data providers in less than three minutes.

Download Full-text

Accelerating privacy-preserving momentum federated learning for industrial cyber-physical systems

Complex & Intelligent Systems ◽

10.1007/s40747-021-00519-2 ◽

2021 ◽

Author(s):

Linlin Zhang ◽

Zehui Zhang ◽

Cong Guan

Keyword(s):

Convergence Rate ◽

Performance Improvement ◽

Gradient Descent ◽

Homomorphic Encryption ◽

Cyber Physical Systems ◽

Privacy Preserving ◽

Fully Homomorphic Encryption ◽

Physical Systems ◽

Cloud Server ◽

And Performance

AbstractFederated learning (FL) is a distributed learning approach, which allows the distributed computing nodes to collaboratively develop a global model while keeping their data locally. However, the issues of privacy-preserving and performance improvement hinder the applications of the FL in the industrial cyber-physical systems (ICPSs). In this work, we propose a privacy-preserving momentum FL approach, named PMFL, which uses the momentum term to accelerate the model convergence rate during the training process. Furthermore, a fully homomorphic encryption scheme CKKS is adopted to encrypt the gradient parameters of the industrial agents’ models for preserving their local privacy information. In particular, the cloud server calculates the global encrypted momentum term by utilizing the encrypted gradients based on the momentum gradient descent optimization algorithm (MGD). The performance of the proposed PMFL is evaluated on two common deep learning datasets, i.e., MNIST and Fashion-MNIST. Theoretical analysis and experiment results confirm that the proposed approach can improve the convergence rate while preserving the privacy information of the industrial agents.

Download Full-text

Homomorphic Model Selection for Data Analysis in an Encrypted Domain

Applied Sciences ◽

10.3390/app10186174 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6174

Author(s):

Mi Yeon Hong ◽

Joon Soo Yoo ◽

Ji Won Yoon

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Model Selection ◽

Goodness Of Fit ◽

Linear Models ◽

Homomorphic Encryption ◽

Matrix Multiplication ◽

Secure Computation ◽

Encrypted Data ◽

The Matrix

Secure computation, a methodology of computing on encrypted data, has become a key factor in machine learning. Homomorphic encryption (HE) enables computation on encrypted data without leaking any information to untrusted servers. In machine learning, the model selection method is a crucial algorithm that determines the performance and reduces the fitting problem. Despite the importance of finding the optimal model, none of the previous studies have considered model selection when performing data analysis through the HE scheme. The HE-based model selection we proposed finds the optimal complexity that best describes given data that is encrypted and whose distribution is unknown. Since this process requires a matrix calculation, we constructed the matrix multiplication and inverse of the matrix based on the bitwise operation. Based on these, we designed the model selection of the HE cross-validation approach and the HE Bayesian approach for homomorphic machine learning. Our focus was on evidence approximation for linear models to find goodness-of-fit that maximizes the evidence. We conducted an experiment on a dataset of age and Body Mass Index (BMI) from Kaggle to compare the capabilities and our model showed that encrypted data can regress homomorphically without decrypting it.

Download Full-text

PRIVACY PRESERVING CLASSIFICATION OVER ENCRYPTED DATA USING FULLY HOMOMORPHIC ENCRYPTION TECHNIQUE

i-manager s Journal on Digital Signal Processing ◽

10.26634/jdp.6.2.15590 ◽

2018 ◽

Vol 6 (2) ◽

pp. 36

Author(s):

MONDAY JUBRIN ABDULLAHI ◽

ONOMZA WAZIRI VICTOR ◽

BASHIR ABDULLAHI MUHAMMAD ◽

ISMAILA IDRIS ◽

◽

...

Keyword(s):

Homomorphic Encryption ◽

Privacy Preserving ◽

Fully Homomorphic Encryption ◽

Encrypted Data

Download Full-text

Achieving Lightweight Verifiable Privacy Preserving Search Over Encrypted Data

JOIV International Journal on Informatics Visualization ◽

10.30630/joiv.3.3.267 ◽

2019 ◽

Vol 3 (3) ◽

Author(s):

Selasi Kwame Ocansey ◽

Charles Fynn Oduro

Keyword(s):

Service Provider ◽

Homomorphic Encryption ◽

Bloom Filter ◽

Cloud Service ◽

Privacy Preserving ◽

Data Sets ◽

Cloud Service Provider ◽

Encrypted Data ◽

Security Proofs

When cloud clients outsource their database to the cloud, they entrust management operations to a cloud service provider who is expected to answer the client’s queries on the cloud where database is located. Efficient techniques can ensure critical requirements for outsourced data’s integrity and authenticity. A lightweight privacy preserving verifiable scheme for outsourcingdatabase securely is proposed, our scheme encrypts data before outsourcing and returned query results are verified with parameters of correctness and completeness. Our scheme is projected on lightweight homomorphic encryption technique and bloom filter which are efficiently authenticated to guarantee the outsourced database’s integrity, authenticity, and confidentiality. An ordering challenge technique is proposed for verifying top-k query results. We conclude by detailing our analysis of security proofs, privacy, verifiability and the performance efficiency of our scheme. Our proposed scheme’s proof and evaluation analysis show its security and efficiency for practical deployment. We also evaluate our scheme’s performances over two UCI data sets.

Download Full-text

Hail the Closest Driver on Roads: Privacy-Preserving Ride Matching in Online Ride Hailing Services

Security and Communication Networks ◽

10.1155/2020/4948387 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Haining Yu ◽

Hongli Zhang ◽

Xiangzhan Yu

Keyword(s):

Service Provider ◽

Road Network ◽

Location Privacy ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Smartphone App ◽

Network Embedding ◽

Encrypted Data ◽

Short Notice ◽

Road Distance

Online ride hailing (ORH) services enable a rider to request a driver to take him wherever he wants through a smartphone app on short notice. To use ORH services, users have to submit their ride information to the ORH service provider to make ride matching, such as pick-up/drop-off location. However, the submission of ride information may lead to the leakages of users’ privacy. In this paper, we focus on the issue of protecting the location information of both riders and drivers during ride matching and propose a privacy-preserving online ride matching scheme, called pRMatch. It enables an ORH service provider to find the closest available driver for an incoming rider over a city-scale road network, while protecting the location privacy of both riders and drivers against the ORH service provider and other unauthorized participants. In pRMatch, we compute the shortest road distance over encrypted data by using road network embedding and partially homomorphic encryption and further efficiently compare encrypted distances by using ciphertext packing and shuffling. The theoretical analysis and experimental results demonstrate that pRMatch is accurate and efficient, yet preserving users’ location privacy.

Download Full-text

Privacy Preserving Classification of EEG Data Using Machine Learning and Homomorphic Encryption

Applied Sciences ◽

10.3390/app11167360 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7360

Author(s):

Andreea Bianca Popescu ◽

Ioana Antonia Taca ◽

Cosmin Ioan Nita ◽

Anamaria Vizitiu ◽

Robert Demeter ◽

...

Keyword(s):

Machine Learning ◽

Data Privacy ◽

Homomorphic Encryption ◽

Synthetic Data ◽

Privacy Preserving ◽

Supervised Machine Learning ◽

Computational Time ◽

Small Integer ◽

Encrypted Data ◽

The Impact

Data privacy is a major concern when accessing and processing sensitive medical data. A promising approach among privacy-preserving techniques is homomorphic encryption (HE), which allows for computations to be performed on encrypted data. Currently, HE still faces practical limitations related to high computational complexity, noise accumulation, and sole applicability the at bit or small integer values level. We propose herein an encoding method that enables typical HE schemes to operate on real-valued numbers of arbitrary precision and size. The approach is evaluated on two real-world scenarios relying on EEG signals: seizure detection and prediction of predisposition to alcoholism. A supervised machine learning-based approach is formulated, and training is performed using a direct (non-iterative) fitting method that requires a fixed and deterministic number of steps. Experiments on synthetic data of varying size and complexity are performed to determine the impact on runtime and error accumulation. The computational time for training the models increases but remains manageable, while the inference time remains in the order of milliseconds. The prediction performance of the models operating on encoded and encrypted data is comparable to that of standard models operating on plaintext data.

Download Full-text