Privacy-Preserving Federated Learning Using Homomorphic Encryption

Jaehyoung Park; Hyuk Lim

doi:10.3390/app12020734

Privacy-Preserving Federated Learning Using Homomorphic Encryption

Applied Sciences ◽

10.3390/app12020734 ◽

2022 ◽

Vol 12 (2) ◽

pp. 734

Author(s):

Jaehyoung Park ◽

Hyuk Lim

Keyword(s):

Private Information ◽

Privacy Preservation ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Local Model ◽

Model Parameters ◽

Local Data ◽

Machine Learning Technique ◽

Analysis And Evaluation ◽

Learning Technique

Federated learning (FL) is a machine learning technique that enables distributed devices to train a learning model collaboratively without sharing their local data. FL-based systems can achieve much stronger privacy preservation since the distributed devices deliver only local model parameters trained with local data to a centralized server. However, there exists a possibility that a centralized server or attackers infer/extract sensitive private information using the structure and parameters of local learning models. We propose employing homomorphic encryption (HE) scheme that can directly perform arithmetic operations on ciphertexts without decryption to protect the model parameters. Using the HE scheme, the proposed privacy-preserving federated learning (PPFL) algorithm enables the centralized server to aggregate encrypted local model parameters without decryption. Furthermore, the proposed algorithm allows each node to use a different HE private key in the same FL-based system using a distributed cryptosystem. The performance analysis and evaluation of the proposed PPFL algorithm are conducted in various cloud computing-based FL service scenarios.

Download Full-text

D2D Big Data Privacy-Preserving Framework Based on (a, k)-Anonymity Model

Mathematical Problems in Engineering ◽

10.1155/2019/2076542 ◽

2019 ◽

Vol 2019 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Jie Wang ◽

Hongtao Li ◽

Feng Guo ◽

Wenyin Zhang ◽

Yifeng Cui

Keyword(s):

Big Data ◽

Private Information ◽

Data Privacy ◽

Privacy Preservation ◽

Computing Time ◽

Privacy Preserving ◽

D2d Communication ◽

Group Data ◽

Big Data Privacy ◽

Daunting Challenge

As a novel and promising technology for 5G networks, device-to-device (D2D) communication has garnered a significant amount of research interest because of the advantages of rapid sharing and high accuracy on deliveries as well as its variety of applications and services. Big data technology offers unprecedented opportunities and poses a daunting challenge to D2D communication and sharing, where the data often contain private information concerning users or organizations and thus are at risk of being leaked. Privacy preservation is necessary for D2D services but has not been extensively studied. In this paper, we propose an (a, k)-anonymity privacy-preserving framework for D2D big data deployed on MapReduce. Firstly, we provide a framework for the D2D big data sharing and analyze the threat model. Then, we propose an (a, k)-anonymity privacy-preserving framework for D2D big data deployed on MapReduce. In our privacy-preserving framework, we adopt (a, k)-anonymity as privacy-preserving model for D2D big data and use the distributed MapReduce to classify and group data for massive datasets. The results of experiments and theoretical analysis show that our privacy-preserving algorithm deployed on MapReduce is effective for D2D big data privacy protection with less information loss and computing time.

Download Full-text

Mobility-Aware Privacy-Preserving Mobile Crowdsourcing

Sensors ◽

10.3390/s21072474 ◽

2021 ◽

Vol 21 (7) ◽

pp. 2474

Author(s):

Guoying Qiu ◽

Yulong Shen ◽

Ke Cheng ◽

Lingtong Liu ◽

Shuiguang Zeng

Keyword(s):

Private Information ◽

Privacy Preservation ◽

Location Privacy ◽

Privacy Preserving ◽

Mobile User ◽

Unbiased Estimation ◽

Transfer Model ◽

Trajectory Prediction ◽

Mobile Crowdsourcing ◽

Time Partitioning

The increasing popularity of smartphones and location-based service (LBS) has brought us a new experience of mobile crowdsourcing marked by the characteristics of network-interconnection and information-sharing. However, these mobile crowdsourcing applications suffer from various inferential attacks based on mobile behavioral factors, such as location semantic, spatiotemporal correlation, etc. Unfortunately, most of the existing techniques protect the participant’s location-privacy according to actual trajectories. Once the protection fails, data leakage will directly threaten the participant’s location-related private information. It open the issue of participating in mobile crowdsourcing service without actual locations. In this paper, we propose a mobility-aware trajectory-prediction solution, TMarkov, for achieving privacy-preserving mobile crowdsourcing. Specifically, we introduce a time-partitioning concept into the Markov model to overcome its traditional limitations. A new transfer model is constructed to record the mobile user’s time-varying behavioral patterns. Then, an unbiased estimation is conducted according to Gibbs Sampling method, because of the data incompleteness. Finally, we have the TMarkov model which characterizes the participant’s dynamic mobile behaviors. With TMarkov in place, a mobility-aware spatiotemporal trajectory is predicted for the mobile user to participate in the crowdsourcing application. Extensive experiments with real-world dataset demonstrate that TMarkov well balances the trade-off between privacy preservation and data usability.

Download Full-text

Techniques and Challenges while Applying Machine Learning Algorithms in Privacy Preserving Fashion

Proceeding International Conference on Science and Engineering ◽

10.14421/icse.v3.600 ◽

2020 ◽

Vol 3 ◽

pp. xix-xix

Author(s):

Artrim Kjamilji

Keyword(s):

Machine Learning ◽

Private Information ◽

Cyber Security ◽

Credit Card ◽

Differential Privacy ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Machine Learning Algorithms ◽

Garbled Circuits ◽

Private Data

Nowadays many different entities collect data of the same nature, but in slightly different environments. In this sense different hospitals collect data about their patients’ symptoms and corresponding disease diagnoses, different banks collect transactions of their customers’ bank accounts, multiple cyber-security companies collect data about log files and corresponding attacks, etc. It is shown that if those different entities would merge their privately collected data in a single dataset and use it to train a machine learning (ML) model, they often end up with a trained model that outperforms the human experts of the corresponding fields in terms of accurate predictions. However, there is a drawback. Due to privacy concerns, empowered by laws and ethical reasons, no entity is willing to share with others their privately collected data. The same problem appears during the classification case over an already trained ML model. On one hand, a user that has an unclassified query (record), doesn’t want to share with the server that owns the trained model neither the content of the query (which might contain private data such as credit card number, IP address, etc.), nor the final prediction (classification) of the query. On the other hand, the owner of the trained model doesn’t want to leak any parameter of the trained model to the user. In order to overcome those shortcomings, several cryptographic and probabilistic techniques have been proposed during the last few years to enable both privacy preserving training and privacy preserving classification schemes. Some of them include anonymization and k-anonymity, differential privacy, secure multiparty computation (MPC), federated learning, Private Information Retrieval (PIR), Oblivious Transfer (OT), garbled circuits and/or homomorphic encryption, to name a few. Theoretical analyses and experimental results show that the current privacy preserving schemes are suitable for real-case deployment, while the accuracy of most of them differ little or not at all with the schemes that work in non-privacy preserving fashion.

Download Full-text

SoK: Efficient Privacy-preserving Clustering

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2021-0068 ◽

2021 ◽

Vol 2021 (4) ◽

pp. 225-248

Author(s):

Aditya Hegde ◽

Helen Möllering ◽

Thomas Schneider ◽

Hossein Yalame

Keyword(s):

Privacy Preserving ◽

Sensitive Information ◽

Multiple Sources ◽

Machine Learning Technique ◽

Clustering Quality ◽

Business Analysis ◽

Learning Technique ◽

Real World Applications ◽

Cloud Servers

Abstract Clustering is a popular unsupervised machine learning technique that groups similar input elements into clusters. It is used in many areas ranging from business analysis to health care. In many of these applications, sensitive information is clustered that should not be leaked. Moreover, nowadays it is often required to combine data from multiple sources to increase the quality of the analysis as well as to outsource complex computation to powerful cloud servers. This calls for efficient privacy-preserving clustering. In this work, we systematically analyze the state-of-the-art in privacy-preserving clustering. We implement and benchmark today’s four most efficient fully private clustering protocols by Cheon et al. (SAC’19), Meng et al. (ArXiv’19), Mohassel et al. (PETS’20), and Bozdemir et al. (ASIACCS’21) with respect to communication, computation, and clustering quality. We compare them, assess their limitations for a practical use in real-world applications, and conclude with open challenges.

Download Full-text

A Clustering Approach for the l-Diversity Model in Privacy Preserving Data Mining Using Fractional Calculus-Bacterial Foraging Optimization Algorithm

Advances in Computer Engineering ◽

10.1155/2014/396529 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 23

Author(s):

Pawan R. Bhaladhare ◽

Devesh C. Jinwala

Keyword(s):

Data Mining ◽

Fractional Calculus ◽

Private Information ◽

Privacy Preservation ◽

Clustering Algorithms ◽

Privacy Preserving ◽

Information Loss ◽

Bacterial Foraging Optimization ◽

Privacy Preserving Data Mining ◽

Computational Performance

In privacy preserving data mining, the l-diversity and k-anonymity models are the most widely used for preserving the sensitive private information of an individual. Out of these two, l-diversity model gives better privacy and lesser information loss as compared to the k-anonymity model. In addition, we observe that numerous clustering algorithms have been proposed in data mining, namely, k-means, PSO, ACO, and BFO. Amongst them, the BFO algorithm is more stable and faster as compared to all others except k-means. However, BFO algorithm suffers from poor convergence behavior as compared to other optimization algorithms. We also observed that the current literature lacks any approaches that apply BFO with l-diversity model to realize privacy preservation in data mining. Motivated by this observation, we propose here an approach that uses fractional calculus (FC) in the chemotaxis step of the BFO algorithm. The FC is used to boost the computational performance of the algorithm. We also evaluate our proposed FC-BFO and BFO algorithms empirically, focusing on information loss and execution time as vital metrics. The experimental evaluation shows that our proposed FC-BFO algorithm derives an optimal cluster as compared to the original BFO algorithm and existing clustering algorithms.

Download Full-text

Privacy Preserving Implementation of the Max-Sum Algorithm and its Variants

Journal of Artificial Intelligence Research ◽

10.1613/jair.5504 ◽

2017 ◽

Vol 59 ◽

pp. 311-349 ◽

Cited By ~ 3

Author(s):

Tamir Tassa ◽

Tal Grinshpoun ◽

Roie Zivan

Keyword(s):

Secret Sharing ◽

Privacy Preservation ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Factor Graph ◽

Multiple Agents ◽

Ongoing Effort

One of the basic motivations for solving DCOPs is maintaining agents' privacy. Thus, researchers have evaluated the privacy loss of DCOP algorithms and defined corresponding notions of privacy preservation for secured DCOP algorithms. However, no secured protocol was proposed for Max-Sum, which is among the most studied DCOP algorithms. As part of the ongoing effort of designing secure DCOP algorithms, we propose P-Max-Sum, the first private algorithm that is based on Max-Sum. The proposed algorithm has multiple agents preforming the role of each node in the factor graph, on which the Max-Sum algorithm operates. P-Max-Sum preserves three types of privacy: topology privacy, constraint privacy, and assignment/decision privacy. By allowing a single call to a trusted coordinator, P-Max-Sum also preserves agent privacy. The two main cryptographic means that enable this privacy preservation are secret sharing and homomorphic encryption. In addition, we design privacy-preserving implementations of four variants of Max-Sum. We conclude by analyzing the price of privacy in terns of runtime overhead, both theoretically and by extensive experimentation.

Download Full-text

Power Consumption Data Privacy-preserving Scheme Based on Improved Multi-key Fully Homomorphic Encryption

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096514666210713115244 ◽

2021 ◽

Vol 14 ◽

Author(s):

Yuancheng Li ◽

Jiawen Yu

Keyword(s):

Power Consumption ◽

Data Privacy ◽

Privacy Preservation ◽

Homomorphic Encryption ◽

Computational Cost ◽

Privacy Preserving ◽

Secret Key ◽

Multiple Data ◽

Consumption Data ◽

Secret Keys

Background: In the power Internet of Things (IoT), power consumption data faces the risk of privacy leakage. Traditional privacy-preserving schemes cannot ensure data privacy on the system, as the secret key pairs shall be shared between all the interior nodes once leaked. In addition, the general schemes only support summation algorithms, resulting in a lack of extensibility. Objective: To preserve the privacy of power consumption data, ensure the privacy of secret keys, and support multiple data processing methods, we propose an improved power consumption data privacy-preserving scheme. Method: Firstly, we have established a power IoT architecture based on edge computing. Then the data is encrypted with the multi-key fully homomorphic algorithm to realize the operation of ciphertext, without the restrictions of calculation type. Through the improved decryption algorithm, ciphertext that can be separately decrypted in cloud nodes is generated, which contributes to reducing communication costs and preventing data leakage. Results: The experimental results show that our scheme is more efficient than traditional schemes in privacy preservation. According to the variance calculation result, the proposed scheme has reached the application standard in terms of computational cost and is feasible for practical operation. Discussion: In the future, we plan to adopt a secure multi-party computation based scheme so that data can be managed locally with homomorphic encryption, so as to ensure data privacy.

Download Full-text

Privacy-Preserving and Lightweight Selective Aggregation with Fault-Tolerance for Edge Computing-Enhanced IoT

Sensors ◽

10.3390/s21165369 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5369

Author(s):

Qiannan Wang ◽

Haibing Mu

Keyword(s):

Fault Tolerance ◽

Data Privacy ◽

Privacy Preservation ◽

Fault Tolerant ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Edge Computing ◽

Communication Overhead ◽

Time Data ◽

Selective Aggregation

Edge computing has been introduced to the Internet of Things (IoT) to meet the requirements of IoT applications. At the same time, data aggregation is widely used in data processing to reduce the communication overhead and energy consumption in IoT. Most existing schemes aggregate the overall data without filtering. In addition, aggregation schemes also face huge challenges, such as the privacy of the individual IoT device’s data or the fault-tolerant and lightweight requirements of the schemes. In this paper, we present a privacy-preserving and lightweight selective aggregation scheme with fault tolerance (PLSA-FT) for edge computing-enhanced IoT. In PLSA-FT, selective aggregation can be achieved by constructing Boolean responses and numerical responses according to specific query conditions of the cloud center. Furthermore, we modified the basic Paillier homomorphic encryption to guarantee data privacy and support fault tolerance of IoT devices’ malfunctions. An online/offline signature mechanism is utilized to reduce computation costs. The system characteristic analyses prove that the PLSA-FT scheme achieves confidentiality, privacy preservation, source authentication, integrity verification, fault tolerance, and dynamic membership management. Moreover, performance evaluation results show that PLSA-FT is lightweight with low computation costs and communication overheads.

Download Full-text

Privacy-Preserving Federated Neural Network Learning for Disease-Associated Cell Classification

10.1101/2022.01.10.475610 ◽

2022 ◽

Author(s):

Sinem Sav ◽

Jean-Philippe Bossuat ◽

Juan R. Troncoso-Pastoriza ◽

Manfred Claassen ◽

Jean-Pierre Hubaux

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Network Architecture ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Model Parameters ◽

Patient Privacy ◽

Learning Models ◽

Machine Learning Models

Training accurate and robust machine learning models requires a large amount of data that is usually scattered across data-silos. Sharing or centralizing the data of different healthcare institutions is, however, unfeasible or prohibitively difficult due to privacy regulations. In this work, we address this problem by using a novel privacy-preserving federated learning-based approach, PriCell, for complex machine learning models such as convolutional neural networks. PriCell relies on multiparty homomorphic encryption and enables the collaborative training of encrypted neural networks with multiple healthcare institutions. We preserve the confidentiality of each institutions' input data, of any intermediate values, and of the trained model parameters. We efficiently replicate the training of a published state-of-the-art convolutional neural network architecture in a decentralized and privacy-preserving manner. Our solution achieves an accuracy comparable to the one obtained with the centralized solution, with an improvement of at least one-order-of-magnitude in execution time with respect to prior secure solutions. Our work guarantees patient privacy and ensures data utility for efficient multi-center studies involving complex healthcare data.

Download Full-text

What Should Investors Care About? Mutual Fund Ratings by Analysts vs. Machine Learning Technique

SSRN Electronic Journal ◽

10.2139/ssrn.3702749 ◽

2020 ◽

Author(s):

Si Cheng ◽

Ruichang Lu ◽

Xiaojun Zhang

Keyword(s):

Machine Learning ◽

Mutual Fund ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text