Privacy Preserving Machine Learning with Homomorphic Encryption and Federated Learning

Privacy protection has been an important concern with the great success of machine learning. In this paper, it proposes a multi-party privacy preserving machine learning framework, named PFMLP, based on partially homomorphic encryption and federated learning. The core idea is all learning parties just transmitting the encrypted gradients by homomorphic encryption. From experiments, the model trained by PFMLP has almost the same accuracy, and the deviation is less than 1%. Considering the computational overhead of homomorphic encryption, we use an improved Paillier algorithm which can speed up the training by 25–28%. Moreover, comparisons on encryption key length, the learning network structure, number of learning clients, etc. are also discussed in detail in the paper.

Download Full-text

Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009244 ◽

2021 ◽

Vol 17 (7) ◽

pp. e1009244

Author(s):

Maximilian Hanussek ◽

Felix Bartusch ◽

Jens Krüger

Keyword(s):

Machine Learning ◽

Virtual Environments ◽

High Performance ◽

Biological Data ◽

Scaling Behavior ◽

Bare Metal ◽

Learning Framework ◽

Speed Up ◽

Clustal Omega ◽

Performance Computing

The large amount of biological data available in the current times, makes it necessary to use tools and applications based on sophisticated and efficient algorithms, developed in the area of bioinformatics. Further, access to high performance computing resources is necessary, to achieve results in reasonable time. To speed up applications and utilize available compute resources as efficient as possible, software developers make use of parallelization mechanisms, like multithreading. Many of the available tools in bioinformatics offer multithreading capabilities, but more compute power is not always helpful. In this study we investigated the behavior of well-known applications in bioinformatics, regarding their performance in the terms of scaling, different virtual environments and different datasets with our benchmarking tool suite BOOTABLE. The tool suite includes the tools BBMap, Bowtie2, BWA, Velvet, IDBA, SPAdes, Clustal Omega, MAFFT, SINA and GROMACS. In addition we added an application using the machine learning framework TensorFlow. Machine learning is not directly part of bioinformatics but applied to many biological problems, especially in the context of medical images (X-ray photographs). The mentioned tools have been analyzed in two different virtual environments, a virtual machine environment based on the OpenStack cloud software and in a Docker environment. The gained performance values were compared to a bare-metal setup and among each other. The study reveals, that the used virtual environments produce an overhead in the range of seven to twenty-five percent compared to the bare-metal environment. The scaling measurements showed, that some of the analyzed tools do not benefit from using larger amounts of computing resources, whereas others showed an almost linear scaling behavior. The findings of this study have been generalized as far as possible and should help users to find the best amount of resources for their analysis. Further, the results provide valuable information for resource providers to handle their resources as efficiently as possible and raise the user community’s awareness of the efficient usage of computing resources.

Download Full-text

Web-Based Privacy-Preserving Multicenter Medical Data Analysis Tools Via Threshold Homomorphic Encryption: Design and Development Study

Journal of Medical Internet Research ◽

10.2196/22555 ◽

2020 ◽

Vol 22 (12) ◽

pp. e22555

Author(s):

Yao Lu ◽

Tianshu Zhou ◽

Yu Tian ◽

Shiqiang Zhu ◽

Jingsong Li

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Logistic Regression Model ◽

Cross Validation ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Medical Data ◽

Multiple Sources ◽

Model Training ◽

Fold Cross Validation

Background Data sharing in multicenter medical research can improve the generalizability of research, accelerate progress, enhance collaborations among institutions, and lead to new discoveries from data pooled from multiple sources. Despite these benefits, many medical institutions are unwilling to share their data, as sharing may cause sensitive information to be leaked to researchers, other institutions, and unauthorized users. Great progress has been made in the development of secure machine learning frameworks based on homomorphic encryption in recent years; however, nearly all such frameworks use a single secret key and lack a description of how to securely evaluate the trained model, which makes them impractical for multicenter medical applications. Objective The aim of this study is to provide a privacy-preserving machine learning protocol for multiple data providers and researchers (eg, logistic regression). This protocol allows researchers to train models and then evaluate them on medical data from multiple sources while providing privacy protection for both the sensitive data and the learned model. Methods We adapted a novel threshold homomorphic encryption scheme to guarantee privacy requirements. We devised new relinearization key generation techniques for greater scalability and multiplicative depth and new model training strategies for simultaneously training multiple models through x-fold cross-validation. Results Using a client-server architecture, we evaluated the performance of our protocol. The experimental results demonstrated that, with 10-fold cross-validation, our privacy-preserving logistic regression model training and evaluation over 10 attributes in a data set of 49,152 samples took approximately 7 minutes and 20 minutes, respectively. Conclusions We present the first privacy-preserving multiparty logistic regression model training and evaluation protocol based on threshold homomorphic encryption. Our protocol is practical for real-world use and may promote multicenter medical research to some extent.

Download Full-text

Techniques and Challenges while Applying Machine Learning Algorithms in Privacy Preserving Fashion

Proceeding International Conference on Science and Engineering ◽

10.14421/icse.v3.600 ◽

2020 ◽

Vol 3 ◽

pp. xix-xix

Author(s):

Artrim Kjamilji

Keyword(s):

Machine Learning ◽

Private Information ◽

Cyber Security ◽

Credit Card ◽

Differential Privacy ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Machine Learning Algorithms ◽

Garbled Circuits ◽

Private Data

Nowadays many different entities collect data of the same nature, but in slightly different environments. In this sense different hospitals collect data about their patients’ symptoms and corresponding disease diagnoses, different banks collect transactions of their customers’ bank accounts, multiple cyber-security companies collect data about log files and corresponding attacks, etc. It is shown that if those different entities would merge their privately collected data in a single dataset and use it to train a machine learning (ML) model, they often end up with a trained model that outperforms the human experts of the corresponding fields in terms of accurate predictions. However, there is a drawback. Due to privacy concerns, empowered by laws and ethical reasons, no entity is willing to share with others their privately collected data. The same problem appears during the classification case over an already trained ML model. On one hand, a user that has an unclassified query (record), doesn’t want to share with the server that owns the trained model neither the content of the query (which might contain private data such as credit card number, IP address, etc.), nor the final prediction (classification) of the query. On the other hand, the owner of the trained model doesn’t want to leak any parameter of the trained model to the user. In order to overcome those shortcomings, several cryptographic and probabilistic techniques have been proposed during the last few years to enable both privacy preserving training and privacy preserving classification schemes. Some of them include anonymization and k-anonymity, differential privacy, secure multiparty computation (MPC), federated learning, Private Information Retrieval (PIR), Oblivious Transfer (OT), garbled circuits and/or homomorphic encryption, to name a few. Theoretical analyses and experimental results show that the current privacy preserving schemes are suitable for real-case deployment, while the accuracy of most of them differ little or not at all with the schemes that work in non-privacy preserving fashion.

Download Full-text

A Survey on Privacy-Preserving Machine Learning with Fully Homomorphic Encryption

Communications in Computer and Information Science - High Performance Computing ◽

10.1007/978-3-030-68035-0_9 ◽

2021 ◽

pp. 115-129

Author(s):

Luis Bernardo Pulido-Gaytan ◽

Andrei Tchernykh ◽

Jorge M. Cortés-Mendoza ◽

Mikhail Babenko ◽

Gleb Radchenko

Keyword(s):

Machine Learning ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Fully Homomorphic Encryption

Download Full-text

Web-Based Privacy-Preserving Multicenter Medical Data Analysis Tools Via Threshold Homomorphic Encryption: Design and Development Study (Preprint)

10.2196/preprints.22555 ◽

2020 ◽

Author(s):

Yao Lu ◽

Tianshu Zhou ◽

Yu Tian ◽

Shiqiang Zhu ◽

Jingsong Li

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Logistic Regression Model ◽

Cross Validation ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Medical Data ◽

Multiple Sources ◽

Model Training ◽

Fold Cross Validation

BACKGROUND Data sharing in multicenter medical research can improve the generalizability of research, accelerate progress, enhance collaborations among institutions, and lead to new discoveries from data pooled from multiple sources. Despite these benefits, many medical institutions are unwilling to share their data, as sharing may cause sensitive information to be leaked to researchers, other institutions, and unauthorized users. Great progress has been made in the development of secure machine learning frameworks based on homomorphic encryption in recent years; however, nearly all such frameworks use a single secret key and lack a description of how to securely evaluate the trained model, which makes them impractical for multicenter medical applications. OBJECTIVE The aim of this study is to provide a privacy-preserving machine learning protocol for multiple data providers and researchers (eg, logistic regression). This protocol allows researchers to train models and then evaluate them on medical data from multiple sources while providing privacy protection for both the sensitive data and the learned model. METHODS We adapted a novel threshold homomorphic encryption scheme to guarantee privacy requirements. We devised new relinearization key generation techniques for greater scalability and multiplicative depth and new model training strategies for simultaneously training multiple models through x-fold cross-validation. RESULTS Using a client-server architecture, we evaluated the performance of our protocol. The experimental results demonstrated that, with 10-fold cross-validation, our privacy-preserving logistic regression model training and evaluation over 10 attributes in a data set of 49,152 samples took approximately 7 minutes and 20 minutes, respectively. CONCLUSIONS We present the first privacy-preserving multiparty logistic regression model training and evaluation protocol based on threshold homomorphic encryption. Our protocol is practical for real-world use and may promote multicenter medical research to some extent.

Download Full-text

Machine Learning Enabled Adaptive Optimization of a Transonic Compressor Rotor With Pre-Compression

Volume 2C: Turbomachinery ◽

10.1115/gt2018-77098 ◽

2018 ◽

Author(s):

Michael Joly ◽

Soumalya Sarkar ◽

Dhagash Mehta

Keyword(s):

Machine Learning ◽

Design Space Exploration ◽

Surrogate Models ◽

Adaptive Optimization ◽

Transonic Compressor ◽

Learning Framework ◽

Compressor Rotor ◽

Self Tuning ◽

Speed Up ◽

The Stability

In aerodynamic design, accurate and robust surrogate models are important to accelerate computationally expensive CFD-based optimization. In this paper, a machine learning framework is presented to speed-up the design optimization of a highly-loaded transonic compressor rotor. The approach is three-fold: (1) dynamic selection and self-tuning among several surrogate models; (2) classification to anticipate failure of the performance evaluation; and (3) adaptive selection of new candidates to perform CFD evaluation for updating the surrogate, which facilitates design space exploration and reduces surrogate uncertainty. The framework is demonstrated with a multi-point optimization of the transonic NASA rotor 37, yielding increased compressor efficiency in less than 48 hours on 100 CPU cores. The optimized rotor geometry features pre-compression that relocates and attenuates the shock, without the stability penalty or undesired reacceleration usually observed in the literature.

Download Full-text

Privacy-Preserving Federated Neural Network Learning for Disease-Associated Cell Classification

10.1101/2022.01.10.475610 ◽

2022 ◽

Author(s):

Sinem Sav ◽

Jean-Philippe Bossuat ◽

Juan R. Troncoso-Pastoriza ◽

Manfred Claassen ◽

Jean-Pierre Hubaux

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Network Architecture ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Model Parameters ◽

Patient Privacy ◽

Learning Models ◽

Machine Learning Models

Training accurate and robust machine learning models requires a large amount of data that is usually scattered across data-silos. Sharing or centralizing the data of different healthcare institutions is, however, unfeasible or prohibitively difficult due to privacy regulations. In this work, we address this problem by using a novel privacy-preserving federated learning-based approach, PriCell, for complex machine learning models such as convolutional neural networks. PriCell relies on multiparty homomorphic encryption and enables the collaborative training of encrypted neural networks with multiple healthcare institutions. We preserve the confidentiality of each institutions' input data, of any intermediate values, and of the trained model parameters. We efficiently replicate the training of a published state-of-the-art convolutional neural network architecture in a decentralized and privacy-preserving manner. Our solution achieves an accuracy comparable to the one obtained with the centralized solution, with an improvement of at least one-order-of-magnitude in execution time with respect to prior secure solutions. Our work guarantees patient privacy and ensures data utility for efficient multi-center studies involving complex healthcare data.

Download Full-text

SEAL-Embedded: A Homomorphic Encryption Library for the Internet of Things

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2021.i3.756-779 ◽

2021 ◽

pp. 756-779

Author(s):

Deepika Natarajan ◽

Wei Dai

Keyword(s):

Internet Of Things ◽

High Performance ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Security And Privacy ◽

The Internet ◽

Embedded Devices ◽

Computational Overhead ◽

The Internet Of Things ◽

Memory Efficient

The growth of the Internet of Things (IoT) has led to concerns over the lack of security and privacy guarantees afforded by IoT systems. Homomorphic encryption (HE) is a promising privacy-preserving solution to allow devices to securely share data with a cloud backend; however, its high memory consumption and computational overhead have limited its use on resource-constrained embedded devices. To address this problem, we present SEAL-Embedded, the first HE library targeted for embedded devices, featuring the CKKS approximate homomorphic encryption scheme. SEAL-Embedded employs several computational and algorithmic optimizations along with a detailed memory re-use scheme to achieve memory efficient, high performance CKKS encoding and encryption on embedded devices without any sacrifice of security. We additionally provide an “adapter” server module to convert data encrypted by SEAL-Embedded to be compatible with the Microsoft SEAL library for homomorphic encryption, enabling an end-to-end solution for building privacy-preserving applications. For a polynomial ring degree of 4096, using RNS primes of 30 or fewer bits, our library can be configured to use between 64–137 KB of RAM and 1–264 KB of flash data, depending on developer-selected configurations and tradeoffs. Using these parameters, we evaluate SEAL-Embedded on two different IoT platforms with high performance, memory efficient, and balanced configurations of the library for asymmetric and symmetric encryption. With 136 KB of RAM, SEAL-Embedded can perform asymmetric encryption of 2048 single-precision numbers in 77 ms on the Azure Sphere Cortex-A7 and 737 ms on the Nordic nRF52840 Cortex-M4.

Download Full-text

Privacy Preserving Classification of EEG Data Using Machine Learning and Homomorphic Encryption

Applied Sciences ◽

10.3390/app11167360 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7360

Author(s):

Andreea Bianca Popescu ◽

Ioana Antonia Taca ◽

Cosmin Ioan Nita ◽

Anamaria Vizitiu ◽

Robert Demeter ◽

...

Keyword(s):

Machine Learning ◽

Data Privacy ◽

Homomorphic Encryption ◽

Synthetic Data ◽

Privacy Preserving ◽

Supervised Machine Learning ◽

Computational Time ◽

Small Integer ◽

Encrypted Data ◽

The Impact

Data privacy is a major concern when accessing and processing sensitive medical data. A promising approach among privacy-preserving techniques is homomorphic encryption (HE), which allows for computations to be performed on encrypted data. Currently, HE still faces practical limitations related to high computational complexity, noise accumulation, and sole applicability the at bit or small integer values level. We propose herein an encoding method that enables typical HE schemes to operate on real-valued numbers of arbitrary precision and size. The approach is evaluated on two real-world scenarios relying on EEG signals: seizure detection and prediction of predisposition to alcoholism. A supervised machine learning-based approach is formulated, and training is performed using a direct (non-iterative) fitting method that requires a fixed and deterministic number of steps. Experiments on synthetic data of varying size and complexity are performed to determine the impact on runtime and error accumulation. The computational time for training the models increases but remains manageable, while the inference time remains in the order of milliseconds. The prediction performance of the models operating on encoded and encrypted data is comparable to that of standard models operating on plaintext data.

Download Full-text

A Survey for Predicting Enzyme Family Classes Using Machine Learning Methods

Current Drug Targets ◽

10.2174/1389450119666181002143355 ◽

2019 ◽

Vol 20 (5) ◽

pp. 540-550 ◽

Cited By ~ 11

Author(s):

Jiu-Xin Tan ◽

Hao Lv ◽

Fang Wang ◽

Fu-Ying Dao ◽

Wei Chen ◽

...

Keyword(s):

Machine Learning ◽

Catalytic Mechanism ◽

Biological Function ◽

Learning Methods ◽

Biochemical Processes ◽

Machine Learning Methods ◽

Enzyme Family ◽

The Family ◽

Speed Up ◽

Family Classification

Enzymes are proteins that act as biological catalysts to speed up cellular biochemical processes. According to their main Enzyme Commission (EC) numbers, enzymes are divided into six categories: EC-1: oxidoreductase; EC-2: transferase; EC-3: hydrolase; EC-4: lyase; EC-5: isomerase and EC-6: synthetase. Different enzymes have different biological functions and acting objects. Therefore, knowing which family an enzyme belongs to can help infer its catalytic mechanism and provide information about the relevant biological function. With the large amount of protein sequences influxing into databanks in the post-genomics age, the annotation of the family for an enzyme is very important. Since the experimental methods are cost ineffective, bioinformatics tool will be a great help for accurately classifying the family of the enzymes. In this review, we summarized the application of machine learning methods in the prediction of enzyme family from different aspects. We hope that this review will provide insights and inspirations for the researches on enzyme family classification.

Download Full-text