Privacy-Preserving Federated Learning Framework with General Aggregation and Multiparty Entity Matching

The requirement for data sharing and privacy has brought increasing attention to federated learning. However, the existing aggregation models are too specialized and deal less with users’ withdrawal issue. Moreover, protocols for multiparty entity matching are rarely covered. Thus, there is no systematic framework to perform federated learning tasks. In this paper, we systematically propose a privacy-preserving federated learning framework (PFLF) where we first construct a general secure aggregation model in federated learning scenarios by combining the Shamir secret sharing with homomorphic cryptography to ensure that the aggregated value can be decrypted correctly only when the number of participants is greater than t . Furthermore, we propose a multiparty entity matching protocol by employing secure multiparty computing to solve the entity alignment problems and a logistic regression algorithm to achieve privacy-preserving model training and support the withdrawal of users in vertical federated learning (VFL) scenarios. Finally, the security analyses prove that PFLF preserves the data privacy in the honest-but-curious model, and the experimental evaluations show PFLF attains consistent accuracy with the original model and demonstrates the practical feasibility.

Download Full-text

Privacy-preserving Decentralized Learning Framework for Healthcare System

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3426474 ◽

2021 ◽

Vol 17 (2s) ◽

pp. 1-24

Author(s):

Harsh Kasyap ◽

Somanath Tripathy

Keyword(s):

Privacy Preserving ◽

Mitigation Strategies ◽

Learning System ◽

Data Repository ◽

Security Issues ◽

Learning Framework ◽

Decentralized Learning ◽

Potential Contributor ◽

Model Training ◽

The Cost

Clinical trials and drug discovery would not be effective without the collaboration of institutions. Earlier, it has been at the cost of individual’s privacy. Several pacts and compliances have been enforced to avoid data breaches. The existing schemes collect the participant’s data to a central repository for learning predictions as the collaboration is indispensable for research advances. The current COVID pandemic has put a question mark on our existing setup where the existing data repository has proved to be obsolete. There is a need for contemporary data collection, processing, and learning. The smartphones and devices held by the last person of the society have also made them a potential contributor. It demands to design a distributed and decentralized Collaborative Learning system that would make the knowledge inference from every data point. Federated Learning [21], proposed by Google, brings the concept of in-place model training by keeping the data intact to the device. Though it is privacy-preserving in nature, however, it is susceptible to inference, poisoning, and Sybil attacks. Blockchain is a decentralized programming paradigm that provides a broader control of the system, making it attack resistant. It poses challenges of high computing power, storage, and latency. These emerging technologies can contribute to the desired learning system and motivate them to address their security and efficiency issues. This article systematizes the security issues in Federated Learning, its corresponding mitigation strategies, and Blockchain’s challenges. Further, a Blockchain-based Federated Learning architecture with two layers of participation is presented, which improves the global model accuracy and guarantees participant’s privacy. It leverages the channel mechanism of Blockchain for parallel model training and distribution. It facilitates establishing decentralized trust between the participants and the gateways using the Blockchain, which helps to have only honest participants.

Download Full-text

Secular: A Decentralized Blockchain-based Data Privacy-preserving Model Training Platform

2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC) ◽

10.1109/miucc52538.2021.9447654 ◽

2021 ◽

Author(s):

Mohamed El Ghamry ◽

Islam Tharwat Abdel Halim ◽

Ayman M. Bahaa-Eldin

Keyword(s):

Data Privacy ◽

Privacy Preserving ◽

Model Training

Download Full-text

An Introduction to the Federated Learning Standard

GetMobile Mobile Computing and Communications ◽

10.1145/3511285.3511291 ◽

2022 ◽

Vol 25 (3) ◽

pp. 18-22

Author(s):

Ticao Zhang ◽

Shiwen Mao

Keyword(s):

Machine Learning ◽

Performance Evaluation ◽

Data Privacy ◽

Model Building ◽

Evaluation Criteria ◽

Regulatory Requirements ◽

Privacy And Security ◽

Learning Framework ◽

Learning Tasks ◽

Decentralized Learning

With the growing concern on data privacy and security, it is undesirable to collect data from all users to perform machine learning tasks. Federated learning, a decentralized learning framework, was proposed to construct a shared prediction model while keeping owners' data on their own devices. This paper presents an introduction to the emerging federated learning standard and discusses its various aspects, including i) an overview of federated learning, ii) types of federated learning, iii) major concerns and the performance evaluation criteria of federated learning, and iv) associated regulatory requirements. The purpose of this paper is to provide an understanding of the standard and facilitate its usage in model building across organizations while meeting privacy and security concerns.

Download Full-text

Privacy-preserving Collaborative Training for Medical Image Analysis Based on Multi-Blockchain

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207323666201022110616 ◽

2020 ◽

Vol 23 ◽

Author(s):

Wanlu Zhang ◽

Qigang Wang ◽

Mei Li

Keyword(s):

Medical Image ◽

Data Privacy ◽

Medical Image Analysis ◽

Auxiliary Information ◽

Training Process ◽

Private Data ◽

Medical Institutions ◽

Model Training ◽

Collaborative Training ◽

Similar Task

Background: As artificial intelligence and big data analysis develop rapidly, data privacy, especially patient medical data privacy, is getting more and more attention. Objective: To strengthen the protection of private data while ensuring the model training process, this article introduces a multi-Blockchain-based decentralized collaborative machine learning training method for medical image analysis. In this way, researchers from different medical institutions are able to collaborate to train models without exchanging sensitive patient data. Method: Partial parameter update method is applied to prevent indirect privacy leakage during model propagation. With the peer-to-peer communication in the multi-Blockchain system, a machine learning task can leverage auxiliary information from another similar task in another Blockchain. In addition, after the collaborative training process, personalized models of different medical institutions will be trained. Results: The experimental results show that our method achieves similar performance with the centralized model-training method by collecting data sets of all participants and prevents private data leakage at the same time. Transferring auxiliary information from similar task on another Blockchain has also been proven to effectively accelerate model convergence and improve model accuracy, especially in the scenario of absence of data. Personalization training process further improves model performance. Conclusion: Our approach can effectively help researchers from different organizations to achieve collaborative training without disclosing their private data.

Download Full-text

Federated deep learning for detecting COVID-19 lung abnormalities in CT: a privacy-preserving multinational validation study

npj Digital Medicine ◽

10.1038/s41746-021-00431-6 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Qi Dou ◽

Tiffany Y. So ◽

Meirui Jiang ◽

Quande Liu ◽

Varut Vardhanabhuti ◽

...

Keyword(s):

Data Privacy ◽

Medical Training ◽

External Validation ◽

Mainland China ◽

Privacy Preserving ◽

Generalization Capability ◽

Major Focus ◽

Sensitive Data ◽

Lung Abnormalities ◽

Medical Image Diagnosis

AbstractData privacy mechanisms are essential for rapidly scaling medical training databases to capture the heterogeneity of patient data distributions toward robust and generalizable machine learning systems. In the current COVID-19 pandemic, a major focus of artificial intelligence (AI) is interpreting chest CT, which can be readily used in the assessment and management of the disease. This paper demonstrates the feasibility of a federated learning method for detecting COVID-19 related CT abnormalities with external validation on patients from a multinational study. We recruited 132 patients from seven multinational different centers, with three internal hospitals from Hong Kong for training and testing, and four external, independent datasets from Mainland China and Germany, for validating model generalizability. We also conducted case studies on longitudinal scans for automated estimation of lesion burden for hospitalized COVID-19 patients. We explore the federated learning algorithms to develop a privacy-preserving AI model for COVID-19 medical image diagnosis with good generalization capability on unseen multinational datasets. Federated learning could provide an effective mechanism during pandemics to rapidly develop clinically useful AI across institutions and countries overcoming the burden of central aggregation of large amounts of sensitive data.

Download Full-text

Privacy Preserving Machine Learning with Homomorphic Encryption and Federated Learning

Future Internet ◽

10.3390/fi13040094 ◽

2021 ◽

Vol 13 (4) ◽

pp. 94

Author(s):

Haokun Fang ◽

Quan Qian

Keyword(s):

Machine Learning ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Great Success ◽

Learning Framework ◽

Computational Overhead ◽

Important Concern ◽

Speed Up ◽

Key Length ◽

Core Idea

Privacy protection has been an important concern with the great success of machine learning. In this paper, it proposes a multi-party privacy preserving machine learning framework, named PFMLP, based on partially homomorphic encryption and federated learning. The core idea is all learning parties just transmitting the encrypted gradients by homomorphic encryption. From experiments, the model trained by PFMLP has almost the same accuracy, and the deviation is less than 1%. Considering the computational overhead of homomorphic encryption, we use an improved Paillier algorithm which can speed up the training by 25–28%. Moreover, comparisons on encryption key length, the learning network structure, number of learning clients, etc. are also discussed in detail in the paper.

Download Full-text

FedDetect: A Novel Privacy-Preserving Federated Learning Framework for Energy Theft Detection in Smart Grid

IEEE Internet of Things Journal ◽

10.1109/jiot.2021.3110784 ◽

2021 ◽

pp. 1-1

Author(s):

Mi Wen ◽

Rong Xie ◽

Kejie Lu ◽

Liangliang Wang ◽

Kai Zhang

Keyword(s):

Smart Grid ◽

Privacy Preserving ◽

Learning Framework ◽

Energy Theft

Download Full-text

Privacy-Preserving Sorting Algorithms Based on Logistic Map for Clouds

Security and Communication Networks ◽

10.1155/2018/2373545 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10

Author(s):

Hua Dai ◽

Hui Ren ◽

Zhiye Chen ◽

Geng Yang ◽

Xun Yi

Keyword(s):

Data Privacy ◽

Logistic Map ◽

Security Analysis ◽

Privacy Preserving ◽

Service Recommendation ◽

Sensitive Data ◽

Encrypted Data ◽

Sorting Algorithms ◽

Common Operation ◽

Cloud Servers

Outsourcing data in clouds is adopted by more and more companies and individuals due to the profits from data sharing and parallel, elastic, and on-demand computing. However, it forces data owners to lose control of their own data, which causes privacy-preserving problems on sensitive data. Sorting is a common operation in many areas, such as machine learning, service recommendation, and data query. It is a challenge to implement privacy-preserving sorting over encrypted data without leaking privacy of sensitive data. In this paper, we propose privacy-preserving sorting algorithms which are on the basis of the logistic map. Secure comparable codes are constructed by logistic map functions, which can be utilized to compare the corresponding encrypted data items even without knowing their plaintext values. Data owners firstly encrypt their data and generate the corresponding comparable codes and then outsource them to clouds. Cloud servers are capable of sorting the outsourced encrypted data in accordance with their corresponding comparable codes by the proposed privacy-preserving sorting algorithms. Security analysis and experimental results show that the proposed algorithms can protect data privacy, while providing efficient sorting on encrypted data.

Download Full-text

Study of a Privacy Preserving Logistic Regression Algorithm (PPLRA) For Data Privacy in the Context of Big Data

Journal of Physics Conference Series ◽

10.1088/1742-6596/2083/3/032059 ◽

2021 ◽

Vol 2083 (3) ◽

pp. 032059

Author(s):

Qiang Chen ◽

Meiling Deng

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Privacy Protection ◽

Data Privacy ◽

Absolute Error ◽

Average Absolute Error ◽

Regression Algorithms ◽

Hadoop Platform ◽

Logistic Regression Algorithm ◽

Computing Speed

Abstract Regression algorithms are commonly used in machine learning. Based on encryption and privacy protection methods, the current key hot technology regression algorithm and the same encryption technology are studied. This paper proposes a PPLAR based algorithm. The correlation between data items is obtained by logistic regression formula. The algorithm is distributed and parallelized on Hadoop platform to improve the computing speed of the cluster while ensuring the average absolute error of the algorithm.

Download Full-text

An Unsupervised Aspect-Aware Recommendation Model with Explanation Text Generation

ACM Transactions on Information Systems ◽

10.1145/3483611 ◽

2022 ◽

Vol 40 (3) ◽

pp. 1-29

Author(s):

Peijie Sun ◽

Le Wu ◽

Kun Zhang ◽

Yu Su ◽

Meng Wang

Keyword(s):

Data Privacy ◽

Auxiliary Information ◽

Generation Process ◽

Text Generation ◽

Generation Task ◽

Auxiliary Data ◽

Fine Grained ◽

Aspect Extraction ◽

Learning Framework ◽

Real World Datasets

Review based recommendation utilizes both users’ rating records and the associated reviews for recommendation. Recently, with the rapid demand for explanations of recommendation results, reviews are used to train the encoder–decoder models for explanation text generation. As most of the reviews are general text without detailed evaluation, some researchers leveraged auxiliary information of users or items to enrich the generated explanation text. Nevertheless, the auxiliary data is not available in most scenarios and may suffer from data privacy problems. In this article, we argue that the reviews contain abundant semantic information to express the users’ feelings for various aspects of items, while these information are not fully explored in current explanation text generation task. To this end, we study how to generate more fine-grained explanation text in review based recommendation without any auxiliary data. Though the idea is simple, it is non-trivial since the aspect is hidden and unlabeled. Besides, it is also very challenging to inject aspect information for generating explanation text with noisy review input. To solve these challenges, we first leverage an advanced unsupervised neural aspect extraction model to learn the aspect-aware representation of each review sentence. Thus, users and items can be represented in the aspect space based on their historical associated reviews. After that, we detail how to better predict ratings and generate explanation text with the user and item representations in the aspect space. We further dynamically assign review sentences which contain larger proportion of aspect words with larger weights to control the text generation process, and jointly optimize rating prediction accuracy and explanation text generation quality with a multi-task learning framework. Finally, extensive experimental results on three real-world datasets demonstrate the superiority of our proposed model for both recommendation accuracy and explainability.

Download Full-text